User Guide ================================ Synthetic dataset generation in ASID ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ To fit a generative model with ASID import a ``GenerativeModel`` instance. .. code-block:: python from asid.automl_small.gm import GenerativeModel There are several modes of generative model estimation available in GenerativeModel class. Firstly, a certain type of generative algorithm could be chosen. For example, a scikit-learn implementation of KDE could be fitted. .. code-block:: python from sklearn.datasets import load_iris X = load_iris().data genmod = GenerativeModel(gen_model_type="sklearn_kde") genmod.fit(X) The option ``optimize`` allows to search through a number of generative algorithms and returns an optimal option in terms of overfitting. It is also possible to control the number of synthetic samples that are used to evaluate overfitting with ``num_syn_samples`` parameter, and Hyperopt time for generative model hyper-parameters optimization with ``hyperopt_time`` parameter. .. code-block:: python genmod = GenerativeModel(gen_model_type="optimize", num_syn_samples=10, hyperopt_time=10) genmod.fit(X) After the model is fitted it is possible to generate synthetic datasets of the required size or evaluate the similarity of model's samples with the train dataset with one of the implemented indicators. .. code-block:: python genmod.sample(1000) genmod.score(X, "ks_test") Imbalanced learning in ASID ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ AutoBalanceBoost ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Import an ``AutoBalanceBoost`` instance from an ``automl_imbalanced`` module. .. code-block:: python from asid.automl_imbalanced.abb import AutoBalanceBoost AutoBalanceBoost is an easy-to-use tool, that allows to obtain a high-quality model without time-consuming hyper-parameters tuning. .. code-block:: python from sklearn.datasets import make_classification from sklearn.model_selection import train_test_split from sklearn.metrics import f1_score X, Y = make_classification(n_classes=4, n_features=6, n_redundant=2, n_repeated=0, n_informative=4, n_clusters_per_class=2, flip_y=0.05, n_samples=700, random_state=45, weights=(0.7, 0.2, 0.05, 0.05)) X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.2, random_state=42) clf = AutoBalanceBoost() clf.fit(X_train, y_train) pred = clf.predict(X_test) score = f1_score(y_test, pred, average="macro") In addition to that, feature importances of AutoBalanceBoost could be also calculated. .. code-block:: python feat_imp = clf.feature_importances() Choosing an imbalanced learning classifier ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Import an ``ImbalancedLearningClassifier`` instance from an ``automl_imbalanced`` module. .. code-block:: python from asid.automl_imbalanced.ilc import ImbalancedLearningClassifier ``ImbalancedLearningClassifier`` looks through the combinations of state-of-the-art classifiers and balancing procedures, compares their result with AutoBalanceBoost and chooses the best classifier. Users could control the number of splits (``split_num``) that are used to evaluate the classifiers performance, Hyperopt time (``hyperopt_time``) for balancing algorithms hyper-parameters optimization and classification score metric (``eval_metric``). .. code-block:: python clf = ImbalancedLearningClassifier(split_num=50, hyperopt_time=10, eval_metric="f1_macro") clf.fit(X_train, y_train) pred = clf.predict(X_test) score = f1_score(y_test, pred, average="macro") The leaderboard statistics is also available once ``ImbalancedLearningClassifier`` is fitted. It includes sorted lists in accordance with the following indicators: "Mean score", "Mean rank", "Share of experiments with the first place, %", "Average difference with the leader, %". .. code-block:: python clf.leaderboard()