Models

Collection of wrappers for machine learning models

LogExpModel

class ml_investment.models.LogExpModel(base_model)[source]

Bases: object

Model wrapper to fit on log of target and exp produced prediction. May be usefull for some target distributions.

Parameters

base_model – class implements fit(X, y), predict(X)/predict_proba(X) interfaces

fit(X: pandas.core.frame.DataFrame, y)[source]

Interface for model training

Parameters
  • Xpd.DataFrame containing features

  • y – target data

predict(X)[source]

Interface for prediction

Parameters

Xpd.DataFrame containing features

EnsembleModel

class ml_investment.models.EnsembleModel(base_models: List, bagging_fraction: float = 0.8, model_cnt: int = 20)[source]

Bases: object

Class for training ansamble of base models.

Parameters
  • base_models – list of classes implements fit(X, y), predict(X)/predict_proba(X) interfaces

  • bagging_fraction – part of random data subsample for training models

  • model_cnt – total number of models in resulted ansamble

fit(X: pandas.core.frame.DataFrame, y: pandas.core.series.Series)[source]

Interface for model training

Parameters
  • Xpd.DataFrame containing features

  • y – target data

predict(X)[source]

Interface for prediction

Parameters

X – pd.DataFrame containing features

GroupedOOFModel

class ml_investment.models.GroupedOOFModel(base_model, group_column: str, fold_cnt: int = 5)[source]

Bases: object

Model wrapper incapsulate out of fold separation within data groups. Each sample in group can not be in training and validation fold at the same time.

Parameters
  • base_model – model implements fit(X, y), predict(X)/predict_proba(X) interfaces

  • group_column – name of column for grouping training data. X in fit(X, y) and predict(X) should contain this column. Samples with one group value will be placed only in one training fold.

  • fold_cnt – number of folds for training

fit(X: pandas.core.frame.DataFrame, y: pandas.core.series.Series)[source]

Interface for model training

Parameters
  • Xpd.DataFrame containing features and self.group_column

  • y – target data

predict(X: pandas.core.frame.DataFrame) numpy.array[source]

Interface for prediction

Parameters

Xpd.DataFrame containing features and self.group_column

TimeSeriesOOFModel

class ml_investment.models.TimeSeriesOOFModel(base_model, time_column: str, fold_cnt: int = 5)[source]

Bases: object

Model wrapper incapsulate out of fold time-series separation.

Parameters
  • base_model – model implements fit(X, y), predict(X)/predict_proba(X) interfaces

  • time_column – name of column for separating training data. X in fit(X, y) and predict(X) should contain this column. Samples from feature would not be used for training and prediction past.

  • fold_cnt – number of folds for training

fit(X: pandas.core.frame.DataFrame, y)[source]

Interface for model training

Parameters
  • Xpd.DataFrame containing features and self.time_column

  • y – target data

predict(X: pandas.core.frame.DataFrame) numpy.array[source]

Interface for prediction

Parameters

Xpd.DataFrame containing features and self.time_column