User Guide

Synthetic dataset generation in ASID

To fit a generative model with ASID import a GenerativeModel instance.

from asid.automl_small.gm import GenerativeModel

There are several modes of generative model estimation available in GenerativeModel class. Firstly, a certain type of generative algorithm could be chosen. For example, a scikit-learn implementation of KDE could be fitted.

from sklearn.datasets import load_iris

X = load_iris().data
genmod = GenerativeModel(gen_model_type="sklearn_kde")
genmod.fit(X)

The option optimize allows to search through a number of generative algorithms and returns an optimal option in terms of overfitting. It is also possible to control the number of synthetic samples that are used to evaluate overfitting with num_syn_samples parameter, and Hyperopt time for generative model hyper-parameters optimization with hyperopt_time parameter.

genmod = GenerativeModel(gen_model_type="optimize", num_syn_samples=10, hyperopt_time=10)
genmod.fit(X)

After the model is fitted it is possible to generate synthetic datasets of the required size or evaluate the similarity of model’s samples with the train dataset with one of the implemented indicators.

genmod.sample(1000)
genmod.score(X, "ks_test")

Imbalanced learning in ASID

AutoBalanceBoost

Import an AutoBalanceBoost instance from an automl_imbalanced module.

from asid.automl_imbalanced.abb import AutoBalanceBoost

AutoBalanceBoost is an easy-to-use tool, that allows to obtain a high-quality model without time-consuming hyper-parameters tuning.

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.metrics import f1_score

X, Y = make_classification(n_classes=4, n_features=6, n_redundant=2, n_repeated=0, n_informative=4,
                               n_clusters_per_class=2, flip_y=0.05, n_samples=700, random_state=45,
                               weights=(0.7, 0.2, 0.05, 0.05))
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.2, random_state=42)
clf = AutoBalanceBoost()
clf.fit(X_train, y_train)
pred = clf.predict(X_test)
score = f1_score(y_test, pred, average="macro")

In addition to that, feature importances of AutoBalanceBoost could be also calculated.

feat_imp = clf.feature_importances()

Choosing an imbalanced learning classifier

Import an ImbalancedLearningClassifier instance from an automl_imbalanced module.

from asid.automl_imbalanced.ilc import ImbalancedLearningClassifier

ImbalancedLearningClassifier looks through the combinations of state-of-the-art classifiers and balancing procedures, compares their result with AutoBalanceBoost and chooses the best classifier. Users could control the number of splits (split_num) that are used to evaluate the classifiers performance, Hyperopt time (hyperopt_time) for balancing algorithms hyper-parameters optimization and classification score metric (eval_metric).

clf = ImbalancedLearningClassifier(split_num=50, hyperopt_time=10, eval_metric="f1_macro")
clf.fit(X_train, y_train)
pred = clf.predict(X_test)
score = f1_score(y_test, pred, average="macro")

The leaderboard statistics is also available once ImbalancedLearningClassifier is fitted. It includes sorted lists in accordance with the following indicators: “Mean score”, “Mean rank”, “Share of experiments with the first place, %”, “Average difference with the leader, %”.

clf.leaderboard()