Pipelines๏
Collection of pipelines
Pipeline๏
- class ml_investment.pipelines.Pipeline(data: Dict, feature, target, model, out_name=None)[source]๏
Bases:
object
Class incapsulate feature and target calculation, model training and validation during fit-phase and feature calculation and model prediction during execute-phase. Support multi-target with different models and metrics.
- Parameters
data โ dict having needed for features and targets fields. This field should contain classes implementing
load(index) -> pd.DataFrame
interfacesfeature โ feature calculator implements
calculate(data: Dict, index) -> pd.DataFrame
interfacetarget โ target calculator implements
calculate(data: Dict, index) -> pd.DataFrame
interface ORList
of such target calculatorsmodel โ class implements
fit(X, y)
andpredict(X)
interfaces. ะกopy of the model will be used for every single target if type of target isList
. ORList
of such classes(len of this list should be equal to len of target)out_name โ str column name of result in
pd.DataFrame
afterexecute()
ORList[str]
(len of this list should be equal to len of target) ORNone
(List['y_0', 'y_1'...]
will be used in this case)
- execute(index)[source]๏
Interface for executing pipeline for tickers. Features will be based on data from data_loader
- Parameters
index โ execute identification(i.e. list of tickers to predict model for)
- Returns
result values in columns named as
out_name
param in__init__()
- Return type
pd.DataFrame
- export_core(path=None)[source]๏
Interface for saving pipelines core
- Parameters
path โ str with path to store pipeline core OR
None
(path will be generated automatically)
- fit(index: typing.List[str], metric=None, target_filter_foo=<function nan_mask>)[source]๏
Interface to fit pipeline model for tickers. Features and target will be based on data from data_loader
- Parameters
index โ fit identification(i.e. list of tickers to fit model for)
metric โ function implements
foo(gt, y) -> float
interface. The same metric will be used for every single target if type of target isList
. ORList
of such functions(len of this list should be equal to len of target)target_filter_foo โ function for filtering samples according target values/ Should implement
foo(arr) -> np.array[bool]
interface. Len of resulted array should be equal to len of arr. ORList
of such functions(len of this list should be equal to len of target)
MergePipeline๏
- class ml_investment.pipelines.MergePipeline(pipeline_list: List, execute_merge_on)[source]๏
Bases:
object
Class combining list of pipelines to single pipilene.
- Parameters
pipeline_list โ list of classes implementing
fit(index)
andexecute(index) -> pd.DataFrame()
interfaces. Order is important: merging results duringexecute()
will be done from left to right.execute_merge_on โ column names for merging pipelines results on.
- execute(index, batch_size=None) pandas.core.frame.DataFrame [source]๏
Interface for executing pipeline for tickers. Features will be based on data from data_loader
- Parameters
index โ identifiers for executing pipelines. I.e. list of companies tickers
batch_size โ size of batch for execute separation(may be usefull for lower memory usage). OR
None
(for full-size executing)
- Returns
combined pipelines execute result
- Return type
pd.DataFrame
LoadingPipeline๏
- class ml_investment.pipelines.LoadingPipeline(data_loader, columns: List[str])[source]๏
Bases:
object
Wrapper for data loaders for loading data in
execute(index) -> pd.DataFrame
interface- Parameters
data_loader โ class implements
load(index) -> pd.DataFrame
interfacecolumns โ column names for loading