Featuresο
Collection of feature calculators
QuarterlyFeaturesο
- class ml_investment.features.QuarterlyFeatures(data_key: str, columns: typing.List[str], quarter_counts: typing.List[int] = [2, 4, 10], max_back_quarter: int = 10, min_back_quarter: int = 0, stats: typing.Dict[str, typing.Callable] = {'max': <function amax>, 'mean': <function mean>, 'median': <function median>, 'min': <function amin>, 'std': <function std>}, calc_stats_on_diffs: bool = True, data_preprocessing: typing.Optional[typing.Callable] = None, n_jobs: int = 2, verbose: bool = False)[source]ο
Bases:
objectFeature calculator for qaurtrly-based statistics. Return features for company quarter slices.
- Parameters
data_key β key of dataloader in
dataargument duringcalculate()columns β column names for feature calculation(like revenue, debt etc)
quarter_counts β list of number of quarters for statistics calculation. e.g. if
quarter_counts = [2]than statistics will be calculated on current and previous quartermax_back_quarter β max bound of company slices in time. If
max_back_quarter = 1than features will be calculated for only current company quarter. If max_back_quarter is larger than total number of quarters for company than features will be calculated for all quartersmin_back_quarter β min bound of company slices in time. If
min_back_quarter = 0(default) than features will be calculated for all quarters. Ifmin_back_quarter = 2than current and previous quarter slices will not be used for feature calculationstats β aggregation functions for features calculation. Should be as
Dict[str, Callable]. Keys of this dict will be used as features names prefixes. Values of this dict should implementfoo(x:List) -> floatinterfacecalc_stats_on_diffs β calculate statistics on series diffs(
np.diff(series)) or notdata_preprocessing β function implemening
foo(x) -> x_interface. It will be used before feature calculation.n_jobs β number of threads for calculation
verbose β show progress or not
- calculate(data: Dict, index: List[str]) pandas.core.frame.DataFrame[source]ο
Interface to calculate features for tickers based on data
- Parameters
data β dict having field named as value in
data_keyparam of__init__()This field should contain class implementingload(index) -> pd.DataFrameinterfaceindex β list of tickers to calculate features for, i.e.
['AAPL', 'TSLA']
- Returns
resulted features with index
['ticker', 'date']. Each row contains features fortickercompany atdatequarter- Return type
pd.DataFrame
QuarterlyDiffFeaturesο
- class ml_investment.features.QuarterlyDiffFeatures(data_key: str, columns: List[str], compare_quarter_idxs: List[int] = [1, 4], max_back_quarter: int = 10, min_back_quarter: int = 0, norm: bool = True, data_preprocessing: Optional[Callable] = None, n_jobs: int = 2, verbose: bool = False)[source]ο
Bases:
objectFeature calculator for qaurtr-to-another-quarter company indicators(revenue, debt etc) progress evaluation. Return features for company quarter slices.
- Parameters
data_key β key of dataloader in
dataargument duringcalculate()columns β column names for feature calculation(like revenue, debt etc)
compare_quarter_idxs β list of back quarter idxs for progress calculation. e.g. if
compare_quarter_idxs = [1]than current quarter will be compared with previous quarter. Ifcompare_quarter_idxs = [4]than current quarter will be compared with previous year quarter.max_back_quarter β max bound of company slices in time. If
max_back_quarter = 1than features will be calculated for only current company quarter. If max_back_quarter is larger than total number of quarters for company than features will be calculated for all quartersmin_back_quarter β min bound of company slices in time. If
min_back_quarter = 0(default) than features will be calculated for all quarters. Ifmin_back_quarter = 2than current and previous quarter slices will not be used for feature calculationnorm β normalize to compare quarter or not
data_preprocessing β function implemening
foo(x) -> x_interface. It will be used before feature calculation.n_jobs β number of threads for calculation
verbose β show progress or not
- calculate(data: Dict, index: List[str]) pandas.core.frame.DataFrame[source]ο
Interface to calculate features for tickers based on data
- Parameters
data β dict having field named as value in
data_keyparam of__init__()This field should contain class implementingload(index) -> pd.DataFrameinterfaceindex β list of tickers to calculate features for, i.e.
['AAPL', 'TSLA']
- Returns
resulted features with index
['ticker', 'date']. Each row contains features fortickercompany atdatequarter- Return type
pd.DataFrame
BaseCompanyFeaturesο
- class ml_investment.features.BaseCompanyFeatures(data_key: str, cat_columns: List[str], verbose: bool = False)[source]ο
Bases:
objectFeature calculator for getting base company information(sector, industry etc). Encode categorical columns via hashing label encoding. Return features for current company state.
- Parameters
data_key β key of dataloader in
dataargument duringcalculate()cat_columns β column names of categorical features for encoding
verbose β show progress or not
- calculate(data: Dict, index: List[str]) pandas.core.frame.DataFrame[source]ο
Interface to calculate features for tickers based on data
- Parameters
data β dict having field named as value in
data_keyparam of__init__()This field should contain class implementingload(index) -> pd.DataFrameinterfaceindex β list of tickers to calculate features for, i.e.
['AAPL', 'TSLA']
- Returns
resulted features with index
['ticker']. Each row contains features fortickercompany- Return type
pd.DataFrame
DailyAggQuarterFeaturesο
- class ml_investment.features.DailyAggQuarterFeatures(daily_data_key: str, quarterly_data_key: str, columns: typing.List[str], agg_day_counts: typing.List[typing.Union[int, numpy.timedelta64]] = [100, 200], max_back_quarter: int = 10, min_back_quarter: int = 0, daily_index=None, stats: typing.Dict[str, typing.Callable] = {'max': <function amax>, 'mean': <function mean>, 'median': <function median>, 'min': <function amin>, 'std': <function std>}, norm: bool = True, n_jobs: int = 2, verbose: bool = False)[source]ο
Bases:
objectFeature calculator for daily-based statistics for quarter slices. Return features for company quarter slices.
- Parameters
daily_data_key β key of dataloader in
dataargument duringcalculate()for daily data loadingquarterly_data_key β key of dataloader in
dataargument duringcalculate()for quarterly data loadingcolumns β column names for feature calculation(like marketcap, pe)
agg_day_counts β list of days counts to calculate statistics on. e.g. if
agg_day_counts = [100, 200]statistics will be calculated based on last 100 and 200 days(separetly).max_back_quarter β max bound of company slices in time. If
max_back_quarter = 1than features will be calculated for only current company quarter. If max_back_quarter is larger than total number of quarters for company than features will be calculated for all quartersmin_back_quarter β min bound of company slices in time. If
min_back_quarter = 0(default) than features will be calculated for all quarters. Ifmin_back_quarter = 2than current and previous quarter slices will not be used for feature calculationdaily_index β indexes for
data[daily_data_key]dataloader. IfNonethan index will be the same as fordata[quarterly]. I.e. if you want to use this class for calculating commodities features,daily_indexmay be list of interesting commodities codes. If you want want to use it i.e. for calculating daily price features,daily_indexshould beNonestats β aggregation functions for features calculation. Should be as
Dict[str, Callable]. Keys of this dict will be used as features names prefixes. Values of this dict should implementfoo(x:List) -> floatinterfacenorm β normalize daily stats or not
n_jobs β number of threads for calculation
verbose β show progress or not
- calculate(data: Dict, index: List[str]) pandas.core.frame.DataFrame[source]ο
Interface to calculate features for tickers based on data
- Parameters
data β dict having fields named as values in
daily_data_keyandquarterly_data_keyparams of__init__()This fields should contain classes implementingload(index) -> pd.DataFrameinterfacesindex β list of tickers to calculate features for, i.e.
['AAPL', 'TSLA']
- Returns
resulted features with index
['ticker', 'date']. Each row contains features fortickercompany atdatequarter- Return type
pd.DataFrame
RelativeGroupFeaturesο
- class ml_investment.features.RelativeGroupFeatures(feature_calculator, group_data_key: str, group_col: str, relation_foo=<function RelativeGroupFeatures.<lambda>>, keep_group_feats=False, verbose: bool = False)[source]ο
Bases:
objectFeature calculator for features relative to some group median. I.e. calculate revenue growth relative to median in sector/industry.
- Parameters
feature_calculator β key of dataloader in
dataargument duringcalculate()for daily data loadinggroup_data_key β key of dataloader in
dataargument duringcalculate()for loading data havinggroup_colgroup_col β column name for groups in which median values will be calculated
relation_foo β function implementing
foo(x, y) -> zinterface. E.g. if foo = lambda x: x - y, than resulted features will be calculated as difference between current company features and group median features.keep_group_feats β return group median features or not
verbose β show progress or not
- calculate(data, index)[source]ο
Interface to calculate features for tickers based on data
- Parameters
data β dict having fields named as values in
group_data_keyand necessary forfeature_calculatorkeys. This fields should contain classes implementingload(index) -> pd.DataFrameinterfacesindex β index needed for
feature_calculator.calculate()
- Returns
resulted features with index as in ββfeature_calculator.calculate``.
- Return type
pd.DataFrame
FeatureMergerο
- class ml_investment.features.FeatureMerger(fc1, fc2, on=typing.Union[str, typing.List[str]])[source]ο
Bases:
objectFeature calculator that combined two other feature calculators. Merge is executed by left.
- Parameters
fc1 β first feature calculator implements
calculate(data: Dict, index) -> pd.DataFrameinterfacefc2 β second feature calculator implements
calculate(data: Dict, index) -> pd.DataFrameinterfaceon β columns on which merge the results of executed calculate methods
- calculate(data: Dict, index) pandas.core.frame.DataFrame[source]ο
Interface to calculate features for tickers based on data
- Parameters
data β dict having field names needed for
fc1andfc2This fields should contain classes implementingload(index) -> pd.DataFrameinterfaceindex β indexes dor feature calculators. I.e. if features about companies than index may be list of tickers, like
['AAPL', 'TSLA']
- Returns
resulted merged features
- Return type
pd.DataFrame