Pipelines๏
Collection of pipelines
Pipeline๏
- class ml_investment.pipelines.Pipeline(data: Dict, feature, target, model, out_name=None)[source]๏
Bases:
objectClass incapsulate feature and target calculation, model training and validation during fit-phase and feature calculation and model prediction during execute-phase. Support multi-target with different models and metrics.
- Parameters
data โ dict having needed for features and targets fields. This field should contain classes implementing
load(index) -> pd.DataFrameinterfacesfeature โ feature calculator implements
calculate(data: Dict, index) -> pd.DataFrameinterfacetarget โ target calculator implements
calculate(data: Dict, index) -> pd.DataFrameinterface ORListof such target calculatorsmodel โ class implements
fit(X, y)andpredict(X)interfaces. ะกopy of the model will be used for every single target if type of target isList. ORListof such classes(len of this list should be equal to len of target)out_name โ str column name of result in
pd.DataFrameafterexecute()ORList[str](len of this list should be equal to len of target) ORNone(List['y_0', 'y_1'...]will be used in this case)
- execute(index)[source]๏
Interface for executing pipeline for tickers. Features will be based on data from data_loader
- Parameters
index โ execute identification(i.e. list of tickers to predict model for)
- Returns
result values in columns named as
out_nameparam in__init__()- Return type
pd.DataFrame
- export_core(path=None)[source]๏
Interface for saving pipelines core
- Parameters
path โ str with path to store pipeline core OR
None(path will be generated automatically)
- fit(index: typing.List[str], metric=None, target_filter_foo=<function nan_mask>)[source]๏
Interface to fit pipeline model for tickers. Features and target will be based on data from data_loader
- Parameters
index โ fit identification(i.e. list of tickers to fit model for)
metric โ function implements
foo(gt, y) -> floatinterface. The same metric will be used for every single target if type of target isList. ORListof such functions(len of this list should be equal to len of target)target_filter_foo โ function for filtering samples according target values/ Should implement
foo(arr) -> np.array[bool]interface. Len of resulted array should be equal to len of arr. ORListof such functions(len of this list should be equal to len of target)
MergePipeline๏
- class ml_investment.pipelines.MergePipeline(pipeline_list: List, execute_merge_on)[source]๏
Bases:
objectClass combining list of pipelines to single pipilene.
- Parameters
pipeline_list โ list of classes implementing
fit(index)andexecute(index) -> pd.DataFrame()interfaces. Order is important: merging results duringexecute()will be done from left to right.execute_merge_on โ column names for merging pipelines results on.
- execute(index, batch_size=None) pandas.core.frame.DataFrame[source]๏
Interface for executing pipeline for tickers. Features will be based on data from data_loader
- Parameters
index โ identifiers for executing pipelines. I.e. list of companies tickers
batch_size โ size of batch for execute separation(may be usefull for lower memory usage). OR
None(for full-size executing)
- Returns
combined pipelines execute result
- Return type
pd.DataFrame
LoadingPipeline๏
- class ml_investment.pipelines.LoadingPipeline(data_loader, columns: List[str])[source]๏
Bases:
objectWrapper for data loaders for loading data in
execute(index) -> pd.DataFrameinterface- Parameters
data_loader โ class implements
load(index) -> pd.DataFrameinterfacecolumns โ column names for loading