Featuresο
Collection of feature calculators
QuarterlyFeaturesο
- class ml_investment.features.QuarterlyFeatures(data_key: str, columns: typing.List[str], quarter_counts: typing.List[int] = [2, 4, 10], max_back_quarter: int = 10, min_back_quarter: int = 0, stats: typing.Dict[str, typing.Callable] = {'max': <function amax>, 'mean': <function mean>, 'median': <function median>, 'min': <function amin>, 'std': <function std>}, calc_stats_on_diffs: bool = True, data_preprocessing: typing.Optional[typing.Callable] = None, n_jobs: int = 2, verbose: bool = False)[source]ο
Bases:
object
Feature calculator for qaurtrly-based statistics. Return features for company quarter slices.
- Parameters
data_key β key of dataloader in
data
argument duringcalculate()
columns β column names for feature calculation(like revenue, debt etc)
quarter_counts β list of number of quarters for statistics calculation. e.g. if
quarter_counts = [2]
than statistics will be calculated on current and previous quartermax_back_quarter β max bound of company slices in time. If
max_back_quarter = 1
than features will be calculated for only current company quarter. If max_back_quarter is larger than total number of quarters for company than features will be calculated for all quartersmin_back_quarter β min bound of company slices in time. If
min_back_quarter = 0
(default) than features will be calculated for all quarters. Ifmin_back_quarter = 2
than current and previous quarter slices will not be used for feature calculationstats β aggregation functions for features calculation. Should be as
Dict[str, Callable]
. Keys of this dict will be used as features names prefixes. Values of this dict should implementfoo(x:List) -> float
interfacecalc_stats_on_diffs β calculate statistics on series diffs(
np.diff(series)
) or notdata_preprocessing β function implemening
foo(x) -> x_
interface. It will be used before feature calculation.n_jobs β number of threads for calculation
verbose β show progress or not
- calculate(data: Dict, index: List[str]) pandas.core.frame.DataFrame [source]ο
Interface to calculate features for tickers based on data
- Parameters
data β dict having field named as value in
data_key
param of__init__()
This field should contain class implementingload(index) -> pd.DataFrame
interfaceindex β list of tickers to calculate features for, i.e.
['AAPL', 'TSLA']
- Returns
resulted features with index
['ticker', 'date']
. Each row contains features forticker
company atdate
quarter- Return type
pd.DataFrame
QuarterlyDiffFeaturesο
- class ml_investment.features.QuarterlyDiffFeatures(data_key: str, columns: List[str], compare_quarter_idxs: List[int] = [1, 4], max_back_quarter: int = 10, min_back_quarter: int = 0, norm: bool = True, data_preprocessing: Optional[Callable] = None, n_jobs: int = 2, verbose: bool = False)[source]ο
Bases:
object
Feature calculator for qaurtr-to-another-quarter company indicators(revenue, debt etc) progress evaluation. Return features for company quarter slices.
- Parameters
data_key β key of dataloader in
data
argument duringcalculate()
columns β column names for feature calculation(like revenue, debt etc)
compare_quarter_idxs β list of back quarter idxs for progress calculation. e.g. if
compare_quarter_idxs = [1]
than current quarter will be compared with previous quarter. Ifcompare_quarter_idxs = [4]
than current quarter will be compared with previous year quarter.max_back_quarter β max bound of company slices in time. If
max_back_quarter = 1
than features will be calculated for only current company quarter. If max_back_quarter is larger than total number of quarters for company than features will be calculated for all quartersmin_back_quarter β min bound of company slices in time. If
min_back_quarter = 0
(default) than features will be calculated for all quarters. Ifmin_back_quarter = 2
than current and previous quarter slices will not be used for feature calculationnorm β normalize to compare quarter or not
data_preprocessing β function implemening
foo(x) -> x_
interface. It will be used before feature calculation.n_jobs β number of threads for calculation
verbose β show progress or not
- calculate(data: Dict, index: List[str]) pandas.core.frame.DataFrame [source]ο
Interface to calculate features for tickers based on data
- Parameters
data β dict having field named as value in
data_key
param of__init__()
This field should contain class implementingload(index) -> pd.DataFrame
interfaceindex β list of tickers to calculate features for, i.e.
['AAPL', 'TSLA']
- Returns
resulted features with index
['ticker', 'date']
. Each row contains features forticker
company atdate
quarter- Return type
pd.DataFrame
BaseCompanyFeaturesο
- class ml_investment.features.BaseCompanyFeatures(data_key: str, cat_columns: List[str], verbose: bool = False)[source]ο
Bases:
object
Feature calculator for getting base company information(sector, industry etc). Encode categorical columns via hashing label encoding. Return features for current company state.
- Parameters
data_key β key of dataloader in
data
argument duringcalculate()
cat_columns β column names of categorical features for encoding
verbose β show progress or not
- calculate(data: Dict, index: List[str]) pandas.core.frame.DataFrame [source]ο
Interface to calculate features for tickers based on data
- Parameters
data β dict having field named as value in
data_key
param of__init__()
This field should contain class implementingload(index) -> pd.DataFrame
interfaceindex β list of tickers to calculate features for, i.e.
['AAPL', 'TSLA']
- Returns
resulted features with index
['ticker']
. Each row contains features forticker
company- Return type
pd.DataFrame
DailyAggQuarterFeaturesο
- class ml_investment.features.DailyAggQuarterFeatures(daily_data_key: str, quarterly_data_key: str, columns: typing.List[str], agg_day_counts: typing.List[typing.Union[int, numpy.timedelta64]] = [100, 200], max_back_quarter: int = 10, min_back_quarter: int = 0, daily_index=None, stats: typing.Dict[str, typing.Callable] = {'max': <function amax>, 'mean': <function mean>, 'median': <function median>, 'min': <function amin>, 'std': <function std>}, norm: bool = True, n_jobs: int = 2, verbose: bool = False)[source]ο
Bases:
object
Feature calculator for daily-based statistics for quarter slices. Return features for company quarter slices.
- Parameters
daily_data_key β key of dataloader in
data
argument duringcalculate()
for daily data loadingquarterly_data_key β key of dataloader in
data
argument duringcalculate()
for quarterly data loadingcolumns β column names for feature calculation(like marketcap, pe)
agg_day_counts β list of days counts to calculate statistics on. e.g. if
agg_day_counts = [100, 200]
statistics will be calculated based on last 100 and 200 days(separetly).max_back_quarter β max bound of company slices in time. If
max_back_quarter = 1
than features will be calculated for only current company quarter. If max_back_quarter is larger than total number of quarters for company than features will be calculated for all quartersmin_back_quarter β min bound of company slices in time. If
min_back_quarter = 0
(default) than features will be calculated for all quarters. Ifmin_back_quarter = 2
than current and previous quarter slices will not be used for feature calculationdaily_index β indexes for
data[daily_data_key]
dataloader. IfNone
than index will be the same as fordata[quarterly]
. I.e. if you want to use this class for calculating commodities features,daily_index
may be list of interesting commodities codes. If you want want to use it i.e. for calculating daily price features,daily_index
should beNone
stats β aggregation functions for features calculation. Should be as
Dict[str, Callable]
. Keys of this dict will be used as features names prefixes. Values of this dict should implementfoo(x:List) -> float
interfacenorm β normalize daily stats or not
n_jobs β number of threads for calculation
verbose β show progress or not
- calculate(data: Dict, index: List[str]) pandas.core.frame.DataFrame [source]ο
Interface to calculate features for tickers based on data
- Parameters
data β dict having fields named as values in
daily_data_key
andquarterly_data_key
params of__init__()
This fields should contain classes implementingload(index) -> pd.DataFrame
interfacesindex β list of tickers to calculate features for, i.e.
['AAPL', 'TSLA']
- Returns
resulted features with index
['ticker', 'date']
. Each row contains features forticker
company atdate
quarter- Return type
pd.DataFrame
RelativeGroupFeaturesο
- class ml_investment.features.RelativeGroupFeatures(feature_calculator, group_data_key: str, group_col: str, relation_foo=<function RelativeGroupFeatures.<lambda>>, keep_group_feats=False, verbose: bool = False)[source]ο
Bases:
object
Feature calculator for features relative to some group median. I.e. calculate revenue growth relative to median in sector/industry.
- Parameters
feature_calculator β key of dataloader in
data
argument duringcalculate()
for daily data loadinggroup_data_key β key of dataloader in
data
argument duringcalculate()
for loading data havinggroup_col
group_col β column name for groups in which median values will be calculated
relation_foo β function implementing
foo(x, y) -> z
interface. E.g. if foo = lambda x: x - y, than resulted features will be calculated as difference between current company features and group median features.keep_group_feats β return group median features or not
verbose β show progress or not
- calculate(data, index)[source]ο
Interface to calculate features for tickers based on data
- Parameters
data β dict having fields named as values in
group_data_key
and necessary forfeature_calculator
keys. This fields should contain classes implementingload(index) -> pd.DataFrame
interfacesindex β index needed for
feature_calculator.calculate()
- Returns
resulted features with index as in ββfeature_calculator.calculate``.
- Return type
pd.DataFrame
FeatureMergerο
- class ml_investment.features.FeatureMerger(fc1, fc2, on=typing.Union[str, typing.List[str]])[source]ο
Bases:
object
Feature calculator that combined two other feature calculators. Merge is executed by left.
- Parameters
fc1 β first feature calculator implements
calculate(data: Dict, index) -> pd.DataFrame
interfacefc2 β second feature calculator implements
calculate(data: Dict, index) -> pd.DataFrame
interfaceon β columns on which merge the results of executed calculate methods
- calculate(data: Dict, index) pandas.core.frame.DataFrame [source]ο
Interface to calculate features for tickers based on data
- Parameters
data β dict having field names needed for
fc1
andfc2
This fields should contain classes implementingload(index) -> pd.DataFrame
interfaceindex β indexes dor feature calculators. I.e. if features about companies than index may be list of tickers, like
['AAPL', 'TSLA']
- Returns
resulted merged features
- Return type
pd.DataFrame