User Guide
Synthetic dataset generation in ASID
To fit a generative model with ASID import a GenerativeModel
instance.
from asid.automl_small.gm import GenerativeModel
There are several modes of generative model estimation available in GenerativeModel class. Firstly, a certain type of generative algorithm could be chosen. For example, a scikit-learn implementation of KDE could be fitted.
from sklearn.datasets import load_iris
X = load_iris().data
genmod = GenerativeModel(gen_model_type="sklearn_kde")
genmod.fit(X)
The option optimize
allows to search through a number of generative algorithms and returns an optimal option in terms of overfitting. It is also possible to control the number of synthetic samples that are used to evaluate overfitting with num_syn_samples
parameter, and Hyperopt time for generative model hyper-parameters optimization with hyperopt_time
parameter.
genmod = GenerativeModel(gen_model_type="optimize", num_syn_samples=10, hyperopt_time=10)
genmod.fit(X)
After the model is fitted it is possible to generate synthetic datasets of the required size or evaluate the similarity of model’s samples with the train dataset with one of the implemented indicators.
genmod.sample(1000)
genmod.score(X, "ks_test")
Imbalanced learning in ASID
AutoBalanceBoost
Import an AutoBalanceBoost
instance from an automl_imbalanced
module.
from asid.automl_imbalanced.abb import AutoBalanceBoost
AutoBalanceBoost is an easy-to-use tool, that allows to obtain a high-quality model without time-consuming hyper-parameters tuning.
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.metrics import f1_score
X, Y = make_classification(n_classes=4, n_features=6, n_redundant=2, n_repeated=0, n_informative=4,
n_clusters_per_class=2, flip_y=0.05, n_samples=700, random_state=45,
weights=(0.7, 0.2, 0.05, 0.05))
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.2, random_state=42)
clf = AutoBalanceBoost()
clf.fit(X_train, y_train)
pred = clf.predict(X_test)
score = f1_score(y_test, pred, average="macro")
In addition to that, feature importances of AutoBalanceBoost could be also calculated.
feat_imp = clf.feature_importances()
Choosing an imbalanced learning classifier
Import an ImbalancedLearningClassifier
instance from an automl_imbalanced
module.
from asid.automl_imbalanced.ilc import ImbalancedLearningClassifier
ImbalancedLearningClassifier
looks through the combinations of state-of-the-art classifiers and balancing procedures, compares their result with AutoBalanceBoost and chooses the best classifier. Users could control the number of splits (split_num
) that are used to evaluate the classifiers performance, Hyperopt time (hyperopt_time
) for balancing algorithms hyper-parameters optimization and classification score metric (eval_metric
).
clf = ImbalancedLearningClassifier(split_num=50, hyperopt_time=10, eval_metric="f1_macro")
clf.fit(X_train, y_train)
pred = clf.predict(X_test)
score = f1_score(y_test, pred, average="macro")
The leaderboard statistics is also available once ImbalancedLearningClassifier
is fitted. It includes sorted lists in accordance with the following indicators: “Mean score”, “Mean rank”, “Share of experiments with the first place, %”, “Average difference with the leader, %”.
clf.leaderboard()