Pipelines๏ƒ

Collection of pipelines

Pipeline๏ƒ

class ml_investment.pipelines.Pipeline(data: Dict, feature, target, model, out_name=None)[source]๏ƒ

Bases: object

Class incapsulate feature and target calculation, model training and validation during fit-phase and feature calculation and model prediction during execute-phase. Support multi-target with different models and metrics.

Parameters
  • data โ€“ dict having needed for features and targets fields. This field should contain classes implementing load(index) -> pd.DataFrame interfaces

  • feature โ€“ feature calculator implements calculate(data: Dict, index) -> pd.DataFrame interface

  • target โ€“ target calculator implements calculate(data: Dict, index) -> pd.DataFrame interface OR List of such target calculators

  • model โ€“ class implements fit(X, y) and predict(X) interfaces. ะกopy of the model will be used for every single target if type of target is List. OR List of such classes(len of this list should be equal to len of target)

  • out_name โ€“ str column name of result in pd.DataFrame after execute() OR List[str] (len of this list should be equal to len of target) OR None ( List['y_0', 'y_1'...] will be used in this case)

execute(index)[source]๏ƒ

Interface for executing pipeline for tickers. Features will be based on data from data_loader

Parameters

index โ€“ execute identification(i.e. list of tickers to predict model for)

Returns

result values in columns named as out_name param in __init__()

Return type

pd.DataFrame

export_core(path=None)[source]๏ƒ

Interface for saving pipelines core

Parameters

path โ€“ str with path to store pipeline core OR None (path will be generated automatically)

fit(index: typing.List[str], metric=None, target_filter_foo=<function nan_mask>)[source]๏ƒ

Interface to fit pipeline model for tickers. Features and target will be based on data from data_loader

Parameters
  • index โ€“ fit identification(i.e. list of tickers to fit model for)

  • metric โ€“ function implements foo(gt, y) -> float interface. The same metric will be used for every single target if type of target is List. OR List of such functions(len of this list should be equal to len of target)

  • target_filter_foo โ€“ function for filtering samples according target values/ Should implement foo(arr) -> np.array[bool] interface. Len of resulted array should be equal to len of arr. OR List of such functions(len of this list should be equal to len of target)

load_core(path)[source]๏ƒ

Interface for loading pipeline core

Parameters

path โ€“ str with path to load pipeline core from

MergePipeline๏ƒ

class ml_investment.pipelines.MergePipeline(pipeline_list: List, execute_merge_on)[source]๏ƒ

Bases: object

Class combining list of pipelines to single pipilene.

Parameters
  • pipeline_list โ€“ list of classes implementing fit(index) and execute(index) -> pd.DataFrame() interfaces. Order is important: merging results during execute() will be done from left to right.

  • execute_merge_on โ€“ column names for merging pipelines results on.

execute(index, batch_size=None) pandas.core.frame.DataFrame[source]๏ƒ

Interface for executing pipeline for tickers. Features will be based on data from data_loader

Parameters
  • index โ€“ identifiers for executing pipelines. I.e. list of companies tickers

  • batch_size โ€“ size of batch for execute separation(may be usefull for lower memory usage). OR None (for full-size executing)

Returns

combined pipelines execute result

Return type

pd.DataFrame

fit(index)[source]๏ƒ

Interface for training all pipelines

Parameters

index โ€“ identifiers for fit pipelines. I.e. list of companies tickers

LoadingPipeline๏ƒ

class ml_investment.pipelines.LoadingPipeline(data_loader, columns: List[str])[source]๏ƒ

Bases: object

Wrapper for data loaders for loading data in execute(index) -> pd.DataFrame interface

Parameters
  • data_loader โ€“ class implements load(index) -> pd.DataFrame interface

  • columns โ€“ column names for loading

execute(index)[source]๏ƒ

Interface for executing pipeline(lading data) for tickers.

Parameters

index โ€“ inentification for loading data, i.e. list of tickers

Returns

resulted data

Return type

pd.DataFrame

fit(index)[source]๏ƒ