Welcome to ml_investmentβs documentation!ο
π Installationο
PyPI version
$ pip install ml-investment
Latest version from source
$ pip install git+https://github.com/fartuk/ml_investment
Configuration
You may use config file ~/.ml_investment/config.json to change repo parameters i.e. downloading datasets pathes, models pathes etc.
Private information (i.e. api tokens for private datasets downloading) should be located at ~/.ml_investment/secrets.json
β³ Quick Startο
Use application modelο
There are several pre-defined fitted models at
ml_investment.applications
.
It incapsulating data and weights downloading, pipeline creation
and model fitting. So you can just use it without knowing internal structure.
from ml_investment.applications.fair_marketcap_yahoo import FairMarketcapYahoo
fair_marketcap_yahoo = FairMarketcapYahoo()
fair_marketcap_yahoo.execute(['AAPL', 'FB', 'MSFT'])
ticker |
date |
fair_marketcap_yahoo |
---|---|---|
AAPL |
2020-12-31 |
5.173328e+11 |
FB |
2020-12-31 |
8.442045e+11 |
MSFT |
2020-12-31 |
4.501329e+11 |
Create your own pipelineο
1. Download data
You may download default datasets by
ml_investment.download_scripts
from ml_investment.download_scripts import download_yahoo
from ml_investment.utils import load_config
# Config located at ~/.ml_investment/config.json
config = load_config()
download_yahoo.main(config['yahoo_data_path'])
>>> 1365it [03:32, 6.42it/s]
>>> 1365it [01:49, 12.51it/s]
2. Create dict with dataloaders
You may choose from default
ml_investment.data_loaders
or wrote your own. Each dataloader should have load(index)
interface.
from ml_investment.data_loaders.yahoo import YahooQuarterlyData, YahooBaseData
data = {}
data['quarterly'] = YahooQuarterlyData(config['yahoo_data_path'])
data['base'] = YahooBaseData(config['yahoo_data_path'])
3. Define and fit pipeline
You may specify all steps of pipeline creation. Base pipeline consist of the folowing steps:
Create data dict(it was done in previous step)
Define features. Features is a number of values and characteristics that will be calculated for model trainig. Default feature calculators are located at
ml_investment.features
Define targets. Target is a final goal of the pipeline, it should represent some desired useful property. Default target calculators are located at
ml_investment.targets
Choose model. Model is machine learning algorithm, core of the pipeline. It also may incapsulate validation and other stuff. You may use wrappers from
ml_investment.models
import lightgbm as lgbm
from ml_investment.utils import load_config, load_tickers
from ml_investment.features import QuarterlyFeatures, BaseCompanyFeatures,\
FeatureMerger
from ml_investment.targets import BaseInfoTarget
from ml_investment.models import LogExpModel, GroupedOOFModel
from ml_investment.pipelines import Pipeline
from ml_investment.metrics import median_absolute_relative_error
fc1 = QuarterlyFeatures(data_key='quarterly',
columns=['netIncome',
'cash',
'totalAssets',
'ebit'],
quarter_counts=[2, 4, 10],
max_back_quarter=1)
fc2 = BaseCompanyFeatures(data_key='base', cat_columns=['sector'])
feature = FeatureMerger(fc1, fc2, on='ticker')
target = BaseInfoTarget(data_key='base', col='enterpriseValue')
base_model = LogExpModel(lgbm.sklearn.LGBMRegressor())
model = GroupedOOFModel(base_model=base_model,
group_column='ticker',
fold_cnt=4)
pipeline = Pipeline(data=data,
feature=feature,
target=target,
model=model,
out_name='my_super_model')
tickers = load_tickers()['base_us_stocks']
pipeline.fit(tickers, metric=median_absolute_relative_error)
>>> {'metric_my_super_model': 0.40599471294301914}
4. Inference your pipeline
Since ml_investment.models.GroupedOOFModel
was used,
there are no data leakage and you may use pipeline on the same company tickers.
pipeline.execute(['AAPL', 'FB', 'MSFT'])
ticker |
date |
my_super_model |
---|---|---|
AAPL |
2020-12-31 |
8.170051e+11 |
FB |
2020-12-31 |
3.898840e+11 |
MSFT |
2020-12-31 |
3.540126e+11 |
π¦ Applicationsο
Collection of pre-trained models
FairMarketcapYahooο
- ml_investment.applications.fair_marketcap_yahoo.FairMarketcapYahoo(pretrained=True) ml_investment.pipelines.Pipeline [source]ο
Model is used to estimate fair company marketcap for last quarter. Pipeline uses features from
BaseCompanyFeatures
,QuarterlyFeatures
and trained to predict real market capitalizations ( usingQuarterlyTarget
). Since some companies are overvalued and some are undervalued, the model makes an average βfairβ prediction.yahoo
is used for loading data.- Parameters
pretrained β use pretreined weights or not. If so, fair_marketcap_yahoo.pickle will be downloaded. Downloading directory path can be changed in ~/.ml_investment/config.json
models_path
FairMarketcapSF1ο
- ml_investment.applications.fair_marketcap_sf1.FairMarketcapSF1(max_back_quarter: Optional[int] = None, min_back_quarter: Optional[int] = None, data_source: Optional[str] = None, pretrained: bool = True, verbose: Optional[bool] = None) ml_investment.pipelines.Pipeline [source]ο
Model is used to estimate fair company marketcap for several last quarters. Pipeline uses features from
BaseCompanyFeatures
,QuarterlyFeatures
,DailyAggQuarterFeatures
,CommoditiesAggQuarterFeatures
and trained to predict real market capitalizations ( usingQuarterlyTarget
). Since some companies are overvalued and some are undervalued, the model makes an average βfairβ prediction.sf1
andquandl_commodities
is used for loading data.Note
SF1 dataset is paid, so for using this model you need to subscribe and paste quandl token to ~/.ml_investment/secrets.json
quandl_api_key
- Parameters
max_back_quarter β max quarter number which will be used in model
min_back_quarter β min quarter number which will be used in model
data_source β which data use for model. One of [βsf1β, βmongoβ]. If βmongoβ, than data will be loaded from db, credentials specified at ~/.ml_investment/config.json. If βsf1β - from folder specified at
sf1_data_path
in ~/.ml_investment/secrets.json.pretrained β use pretreined weights or not. Downloading directory path can be changed in ~/.ml_investment/config.json
models_path
verbose β show progress or not
FairMarketcapDiffYahooο
- ml_investment.applications.fair_marketcap_diff_yahoo.FairMarketcapDiffYahoo(pretrained=True) ml_investment.pipelines.Pipeline [source]ο
Model is used to evaluate quarter-to-quarter(q2q) company fundamental progress. Model uses
QuarterlyDiffFeatures
(q2q results progress, e.g. 30% revenue increase, decrease in debt by 15% etc),BaseCompanyFeatures
,QuarterlyFeatures
and trying to predict smoothed real q2q marketcap difference(DailySmoothedQuarterlyDiffTarget
). So model prediction may be interpreted as βfairβ marketcap change according this q2q fundamental change.yahoo
anddaily_bars
are used for loading data.- Parameters
pretrained β use pretreined weights or not. If so, fair_marketcap_diff_yahoo.pickle will be downloaded. Downloading directory path can be changed in ~/.ml_investment/config.json
models_path
FairMarketcapDiffSF1ο
- ml_investment.applications.fair_marketcap_diff_sf1.FairMarketcapDiffSF1(max_back_quarter: Optional[int] = None, min_back_quarter: Optional[int] = None, data_source: Optional[str] = None, pretrained: bool = True, verbose: Optional[bool] = None) ml_investment.pipelines.Pipeline [source]ο
Model is used to evaluate quarter-to-quarter(q2q) company fundamental progress. Model uses
QuarterlyDiffFeatures
(q2q results progress, e.g. 30% revenue increase, decrease in debt by 15% etc),BaseCompanyFeatures
,QuarterlyFeatures
CommoditiesAggQuarterFeatures
and trying to predict real q2q marketcap difference(QuarterlyDiffTarget
). So model prediction may be interpreted as βfairβ marketcap change according this q2q fundamental change.sf1
is used for loading data.Note
SF1 dataset is paid, so for using this model you need to subscribe and paste quandl token to ~/.ml_investment/secrets.json
quandl_api_key
- Parameters
max_back_quarter β max quarter number which will be used in model
min_back_quarter β min quarter number which will be used in model
data_source β which data use for model. One of [βsf1β, βmongoβ]. If βmongoβ, than data will be loaded from db, credentials specified at ~/.ml_investment/config.json. If βsf1β - from folder specified at
sf1_data_path
in ~/.ml_investment/secrets.json.pretrained β use pretreined weights or not. Downloading directory path can be changed in ~/.ml_investment/config.json
models_path
verbose β show progress or not
MarketcapDownStdYahooο
- ml_investment.applications.marketcap_down_std_yahoo.MarketcapDownStdYahoo(pretrained=True) ml_investment.pipelines.Pipeline [source]ο
Model is used to predict future down-std value. Pipeline consist of time-series model training(
TimeSeriesOOFModel
) and validation on real marketcap down-std values(DailyAggTarget
). Model prediction may be interpreted as βriskβ for the next quarter.yahoo
is used for loading data.- Parameters
pretrained β use pretreined weights or not. If so, marketcap_down_std_yahoo.pickle will be downloaded. Downloading directory path can be changed in ~/.ml_investment/config.json
models_path
MarketcapDownStdSF1ο
- ml_investment.applications.marketcap_down_std_sf1.MarketcapDownStdSF1(max_back_quarter: Optional[int] = None, min_back_quarter: Optional[int] = None, data_source: Optional[str] = None, pretrained: bool = True, verbose: Optional[bool] = None) ml_investment.pipelines.Pipeline [source]ο
Model is used to predict future down-std value. Pipeline consist of time-series model training(
TimeSeriesOOFModel
) and validation on real marketcap down-std values(DailyAggTarget
). Model prediction may be interpreted as βriskβ for the next quarter.sf1
is used for loading data.Note
SF1 dataset is paid, so for using this model you need to subscribe and paste quandl token to ~/.ml_investment/secrets.json
quandl_api_key
- Parameters
max_back_quarter β max quarter number which will be used in model
min_back_quarter β min quarter number which will be used in model
data_source β which data use for model. One of [βsf1β, βmongoβ]. If βmongoβ, than data will be loaded from db, credentials specified at ~/.ml_investment/config.json. If βsf1β - from folder specified at
sf1_data_path
in ~/.ml_investment/secrets.json.pretrained β use pretreined weights or not. Downloading directory path can be changed in ~/.ml_investment/config.json
models_path
verbose β show progress or not
Featuresο
Collection of feature calculators
QuarterlyFeaturesο
- class ml_investment.features.QuarterlyFeatures(data_key: str, columns: typing.List[str], quarter_counts: typing.List[int] = [2, 4, 10], max_back_quarter: int = 10, min_back_quarter: int = 0, stats: typing.Dict[str, typing.Callable] = {'max': <function amax>, 'mean': <function mean>, 'median': <function median>, 'min': <function amin>, 'std': <function std>}, calc_stats_on_diffs: bool = True, data_preprocessing: typing.Optional[typing.Callable] = None, n_jobs: int = 2, verbose: bool = False)[source]ο
Bases:
object
Feature calculator for qaurtrly-based statistics. Return features for company quarter slices.
- Parameters
data_key β key of dataloader in
data
argument duringcalculate()
columns β column names for feature calculation(like revenue, debt etc)
quarter_counts β list of number of quarters for statistics calculation. e.g. if
quarter_counts = [2]
than statistics will be calculated on current and previous quartermax_back_quarter β max bound of company slices in time. If
max_back_quarter = 1
than features will be calculated for only current company quarter. If max_back_quarter is larger than total number of quarters for company than features will be calculated for all quartersmin_back_quarter β min bound of company slices in time. If
min_back_quarter = 0
(default) than features will be calculated for all quarters. Ifmin_back_quarter = 2
than current and previous quarter slices will not be used for feature calculationstats β aggregation functions for features calculation. Should be as
Dict[str, Callable]
. Keys of this dict will be used as features names prefixes. Values of this dict should implementfoo(x:List) -> float
interfacecalc_stats_on_diffs β calculate statistics on series diffs(
np.diff(series)
) or notdata_preprocessing β function implemening
foo(x) -> x_
interface. It will be used before feature calculation.n_jobs β number of threads for calculation
verbose β show progress or not
- calculate(data: Dict, index: List[str]) pandas.core.frame.DataFrame [source]ο
Interface to calculate features for tickers based on data
- Parameters
data β dict having field named as value in
data_key
param of__init__()
This field should contain class implementingload(index) -> pd.DataFrame
interfaceindex β list of tickers to calculate features for, i.e.
['AAPL', 'TSLA']
- Returns
resulted features with index
['ticker', 'date']
. Each row contains features forticker
company atdate
quarter- Return type
pd.DataFrame
QuarterlyDiffFeaturesο
- class ml_investment.features.QuarterlyDiffFeatures(data_key: str, columns: List[str], compare_quarter_idxs: List[int] = [1, 4], max_back_quarter: int = 10, min_back_quarter: int = 0, norm: bool = True, data_preprocessing: Optional[Callable] = None, n_jobs: int = 2, verbose: bool = False)[source]ο
Bases:
object
Feature calculator for qaurtr-to-another-quarter company indicators(revenue, debt etc) progress evaluation. Return features for company quarter slices.
- Parameters
data_key β key of dataloader in
data
argument duringcalculate()
columns β column names for feature calculation(like revenue, debt etc)
compare_quarter_idxs β list of back quarter idxs for progress calculation. e.g. if
compare_quarter_idxs = [1]
than current quarter will be compared with previous quarter. Ifcompare_quarter_idxs = [4]
than current quarter will be compared with previous year quarter.max_back_quarter β max bound of company slices in time. If
max_back_quarter = 1
than features will be calculated for only current company quarter. If max_back_quarter is larger than total number of quarters for company than features will be calculated for all quartersmin_back_quarter β min bound of company slices in time. If
min_back_quarter = 0
(default) than features will be calculated for all quarters. Ifmin_back_quarter = 2
than current and previous quarter slices will not be used for feature calculationnorm β normalize to compare quarter or not
data_preprocessing β function implemening
foo(x) -> x_
interface. It will be used before feature calculation.n_jobs β number of threads for calculation
verbose β show progress or not
- calculate(data: Dict, index: List[str]) pandas.core.frame.DataFrame [source]ο
Interface to calculate features for tickers based on data
- Parameters
data β dict having field named as value in
data_key
param of__init__()
This field should contain class implementingload(index) -> pd.DataFrame
interfaceindex β list of tickers to calculate features for, i.e.
['AAPL', 'TSLA']
- Returns
resulted features with index
['ticker', 'date']
. Each row contains features forticker
company atdate
quarter- Return type
pd.DataFrame
BaseCompanyFeaturesο
- class ml_investment.features.BaseCompanyFeatures(data_key: str, cat_columns: List[str], verbose: bool = False)[source]ο
Bases:
object
Feature calculator for getting base company information(sector, industry etc). Encode categorical columns via hashing label encoding. Return features for current company state.
- Parameters
data_key β key of dataloader in
data
argument duringcalculate()
cat_columns β column names of categorical features for encoding
verbose β show progress or not
- calculate(data: Dict, index: List[str]) pandas.core.frame.DataFrame [source]ο
Interface to calculate features for tickers based on data
- Parameters
data β dict having field named as value in
data_key
param of__init__()
This field should contain class implementingload(index) -> pd.DataFrame
interfaceindex β list of tickers to calculate features for, i.e.
['AAPL', 'TSLA']
- Returns
resulted features with index
['ticker']
. Each row contains features forticker
company- Return type
pd.DataFrame
DailyAggQuarterFeaturesο
- class ml_investment.features.DailyAggQuarterFeatures(daily_data_key: str, quarterly_data_key: str, columns: typing.List[str], agg_day_counts: typing.List[typing.Union[int, numpy.timedelta64]] = [100, 200], max_back_quarter: int = 10, min_back_quarter: int = 0, daily_index=None, stats: typing.Dict[str, typing.Callable] = {'max': <function amax>, 'mean': <function mean>, 'median': <function median>, 'min': <function amin>, 'std': <function std>}, norm: bool = True, n_jobs: int = 2, verbose: bool = False)[source]ο
Bases:
object
Feature calculator for daily-based statistics for quarter slices. Return features for company quarter slices.
- Parameters
daily_data_key β key of dataloader in
data
argument duringcalculate()
for daily data loadingquarterly_data_key β key of dataloader in
data
argument duringcalculate()
for quarterly data loadingcolumns β column names for feature calculation(like marketcap, pe)
agg_day_counts β list of days counts to calculate statistics on. e.g. if
agg_day_counts = [100, 200]
statistics will be calculated based on last 100 and 200 days(separetly).max_back_quarter β max bound of company slices in time. If
max_back_quarter = 1
than features will be calculated for only current company quarter. If max_back_quarter is larger than total number of quarters for company than features will be calculated for all quartersmin_back_quarter β min bound of company slices in time. If
min_back_quarter = 0
(default) than features will be calculated for all quarters. Ifmin_back_quarter = 2
than current and previous quarter slices will not be used for feature calculationdaily_index β indexes for
data[daily_data_key]
dataloader. IfNone
than index will be the same as fordata[quarterly]
. I.e. if you want to use this class for calculating commodities features,daily_index
may be list of interesting commodities codes. If you want want to use it i.e. for calculating daily price features,daily_index
should beNone
stats β aggregation functions for features calculation. Should be as
Dict[str, Callable]
. Keys of this dict will be used as features names prefixes. Values of this dict should implementfoo(x:List) -> float
interfacenorm β normalize daily stats or not
n_jobs β number of threads for calculation
verbose β show progress or not
- calculate(data: Dict, index: List[str]) pandas.core.frame.DataFrame [source]ο
Interface to calculate features for tickers based on data
- Parameters
data β dict having fields named as values in
daily_data_key
andquarterly_data_key
params of__init__()
This fields should contain classes implementingload(index) -> pd.DataFrame
interfacesindex β list of tickers to calculate features for, i.e.
['AAPL', 'TSLA']
- Returns
resulted features with index
['ticker', 'date']
. Each row contains features forticker
company atdate
quarter- Return type
pd.DataFrame
RelativeGroupFeaturesο
- class ml_investment.features.RelativeGroupFeatures(feature_calculator, group_data_key: str, group_col: str, relation_foo=<function RelativeGroupFeatures.<lambda>>, keep_group_feats=False, verbose: bool = False)[source]ο
Bases:
object
Feature calculator for features relative to some group median. I.e. calculate revenue growth relative to median in sector/industry.
- Parameters
feature_calculator β key of dataloader in
data
argument duringcalculate()
for daily data loadinggroup_data_key β key of dataloader in
data
argument duringcalculate()
for loading data havinggroup_col
group_col β column name for groups in which median values will be calculated
relation_foo β function implementing
foo(x, y) -> z
interface. E.g. if foo = lambda x: x - y, than resulted features will be calculated as difference between current company features and group median features.keep_group_feats β return group median features or not
verbose β show progress or not
- calculate(data, index)[source]ο
Interface to calculate features for tickers based on data
- Parameters
data β dict having fields named as values in
group_data_key
and necessary forfeature_calculator
keys. This fields should contain classes implementingload(index) -> pd.DataFrame
interfacesindex β index needed for
feature_calculator.calculate()
- Returns
resulted features with index as in ββfeature_calculator.calculate``.
- Return type
pd.DataFrame
FeatureMergerο
- class ml_investment.features.FeatureMerger(fc1, fc2, on=typing.Union[str, typing.List[str]])[source]ο
Bases:
object
Feature calculator that combined two other feature calculators. Merge is executed by left.
- Parameters
fc1 β first feature calculator implements
calculate(data: Dict, index) -> pd.DataFrame
interfacefc2 β second feature calculator implements
calculate(data: Dict, index) -> pd.DataFrame
interfaceon β columns on which merge the results of executed calculate methods
- calculate(data: Dict, index) pandas.core.frame.DataFrame [source]ο
Interface to calculate features for tickers based on data
- Parameters
data β dict having field names needed for
fc1
andfc2
This fields should contain classes implementingload(index) -> pd.DataFrame
interfaceindex β indexes dor feature calculators. I.e. if features about companies than index may be list of tickers, like
['AAPL', 'TSLA']
- Returns
resulted merged features
- Return type
pd.DataFrame
Targetsο
Collection of target calculators
QuarterlyTargetο
- class ml_investment.targets.QuarterlyTarget(data_key: str, col: str, quarter_shift: int = 0, n_jobs: int = 2)[source]ο
Bases:
object
Calculator of target represented as column in quarter-based data. Work with quarterly slices of company.
- Parameters
data_key β key of dataloader in
data
argument duringcalculate()
col β column name for target calculation(like marketcap, revenue)
quarter_shift β number of quarters to shift. e.g. if
quarter_shift = 0
than value for current quarter will be returned. Ifquarter_shift = 1
than value for next quarter will be returned. Ifquarter_shift = -1
than value for previous quarter will be returned.
- calculate(data: Dict, index: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame [source]ο
Interface to calculate targets for dates and tickers in index parameter based on data
- Parameters
data β dict having field named as value in
data_key
param of__init__()
This field should contain class implementingload(index) -> pd.DataFrame
interfaceindex β
pd.DataFrame
containing information of tickers and dates to calculate targets for. Should have columns:["ticker", "date"]
- Returns
targets having βyβ column. Index of this dataframe has the same values as
index
param. Each row contains target forticker
company atdate
quarter- Return type
pd.DataFrame
QuarterlyDiffTargetο
- class ml_investment.targets.QuarterlyDiffTarget(data_key: str, col: str, norm: bool = True, n_jobs: int = 2)[source]ο
Bases:
object
Calculator of target represented as difference between column values in current and previous quarter. Work with quarterly slices of company.
- Parameters
data_key β key of dataloader in
data
argument duringcalculate()
col β column name for target calculation(like marketcap, revenue)
norm β normalize difference to previous quarter or not
n_jobs β number of threads for calculation
- calculate(data: Dict, index: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame [source]ο
Interface to calculate targets for dates and tickers in index parameter based on data
- Parameters
data β dict having field named as value in
data_key
param of__init__()
This field should contain class implementingload(index) -> pd.DataFrame
interfaceindex β
pd.DataFrame
containing information of tickers and dates to calculate targets for. Should have columns:["ticker", "date"]
- Returns
targets having βyβ column. Index of this dataframe has the same values as
index
param. Each row contains target forticker
company atdate
quarter- Return type
pd.DataFrame
QuarterlyBinDiffTargetο
- class ml_investment.targets.QuarterlyBinDiffTarget(data_key: str, col: str, n_jobs: int = 2)[source]ο
Bases:
object
Calculator of target represented as binary difference between column values in current and previous quarter. Work with quarterly slices of company.
- Parameters
data_key β key of dataloader in
data
argument duringcalculate()
col β column name for target calculation(like marketcap, revenue)
n_jobs β number of threads for calculation
- calculate(data: Dict, index: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame [source]ο
Interface to calculate targets for dates and tickers in index parameter based on data
- Parameters
data β dict having field named as value in
data_key
param of__init__()
This field should contain class implementingload(index) -> pd.DataFrame
interfaceindex β
pd.DataFrame
containing information of tickers and dates to calculate targets for. Should have columns:["ticker", "date"]
- Returns
targets having βyβ column. Index of this dataframe has the same values as
index
param. Each row contains target forticker
company atdate
quarter- Return type
pd.DataFrame
DailyAggTargetο
- class ml_investment.targets.DailyAggTarget(data_key: str, col: str, horizon: int = 100, foo: typing.Callable = <function mean>, n_jobs: int = 2)[source]ο
Bases:
object
Calculator of target represented as aggregation function of daily values. Work with daily slices of company.
- Parameters
data_key β key of dataloader in
data
argument duringcalculate()
col β column name for target calculation(like marketcap, pe)
horizon β number of days for target calculation. If
horizon > 0
than values will be get from the future of current date. Ifhorizon < 0
than values will be get from the past of current datefoo β function processing target aggregation
n_jobs β number of threads for calculation
- calculate(data: Dict, index: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame [source]ο
Interface to calculate targets for dates and tickers in index parameter based on data
- Parameters
data β dict having field named as value in
data_key
param of__init__()
This field should contain class implementingload(index) -> pd.DataFrame
interfaceindex β
pd.DataFrame
containing information of tickers and dates to calculate targets for. Should have columns:["ticker", "date"]
- Returns
targets having βyβ column. Index of this dataframe has the same values as
index
param. Each row contains target forticker
company atdate
day- Return type
pd.DataFrame
DailySmoothedQuarterlyDiffTargetο
- class ml_investment.targets.DailySmoothedQuarterlyDiffTarget(daily_data_key: str, quarterly_data_key: str, col: str, smooth_horizon: int = 30, norm: bool = True, n_jobs: int = 2)[source]ο
Bases:
object
Feature calculator getting difference between current and last quarter smoothed daily column values. Work with company quarter slices.
- Parameters
daily_data_key β key of dataloader in
data
argument duringcalculate()
for daily data loadingquarterly_data_key β key of dataloader in
data
argument duringcalculate()
for quarterly data loadingcol β column name for target calculation(like marketcap, pe)
smooth_horizon β number of days for target calculation. If
smooth_horizon > 0
than values for smoothing wiil be get from future of quarter date. Ifsmooth_horizon < 0
than values for smoothing will be get from the past of quarter datenorm β normalize result or not
n_jobs β number of threads for calculation
- calculate(data: Dict, index: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame [source]ο
Interface to calculate targets for dates and tickers in index parameter based on data
- Parameters
data β dict having field named as value in
data_key
param of__init__()
This field should contain class implementingload(index) -> pd.DataFrame
interfaceindex β
pd.DataFrame
containing information of tickers and dates to calculate targets for. Should have columns:["ticker", "date"]
- Returns
targets having βyβ column. Index of this dataframe has the same values as
index
param. Each row contains target forticker
company atdate
quarter- Return type
pd.DataFrame
ReportGapTargetο
- class ml_investment.targets.ReportGapTarget(data_key: str, col: str, smooth_horizon: int = 1, norm: bool = True, n_jobs: int = 2)[source]ο
Bases:
object
Calculator of target represented as smoothed gap at some date(i.e. report date). Work with daily slices of company.
- Parameters
data_key β key of dataloader in
data
argument duringcalculate()
col β column name for target calculation(like marketcap, pe)
smooth_horizon β number of days for column smoothing
norm β normalize gap value or not
n_jobs β number of threads for calculation
- calculate(data: Dict, index: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame [source]ο
Interface to calculate targets for dates and tickers in index parameter based on data
- Parameters
data β dict having field named as value in
data_key
param of__init__()
This field should contain class implementingload(index) -> pd.DataFrame
interfaceindex β
pd.DataFrame
containing information of tickers and dates to calculate targets for. Should have columns:["ticker", "date"]
- Returns
targets having βyβ column. Index of this dataframe has the same values as
index
param. Each row contains target forticker
company atdate
time- Return type
pd.DataFrame
BaseInfoTargetο
- class ml_investment.targets.BaseInfoTarget(data_key: str, col: str)[source]ο
Bases:
object
Calculator of target represented by base company information
- Parameters
data_key β key of dataloader in
data
argument duringcalculate()
col β column name for target calculation(like sector, industry)
- calculate(data, index: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame [source]ο
Interface to calculate targets for tickers in index parameter based on data
- Parameters
data β dict having field named as value in
data_key
param of__init__()
This field should contain class implementingload(index) -> pd.DataFrame
interfaceindex β
pd.DataFrame
containing information of tickers to calculate targets for. Should have columns:["ticker"]
- Returns
targets having βyβ column. Index of this dataframe has the same values as
index
param. Each row contains target forticker
company- Return type
pd.DataFrame
Modelsο
Collection of wrappers for machine learning models
LogExpModelο
- class ml_investment.models.LogExpModel(base_model)[source]ο
Bases:
object
Model wrapper to fit on log of target and exp produced prediction. May be usefull for some target distributions.
- Parameters
base_model β class implements
fit(X, y)
,predict(X)
/predict_proba(X)
interfaces
EnsembleModelο
- class ml_investment.models.EnsembleModel(base_models: List, bagging_fraction: float = 0.8, model_cnt: int = 20)[source]ο
Bases:
object
Class for training ansamble of base models.
- Parameters
base_models β list of classes implements
fit(X, y)
,predict(X)
/predict_proba(X)
interfacesbagging_fraction β part of random data subsample for training models
model_cnt β total number of models in resulted ansamble
GroupedOOFModelο
- class ml_investment.models.GroupedOOFModel(base_model, group_column: str, fold_cnt: int = 5)[source]ο
Bases:
object
Model wrapper incapsulate out of fold separation within data groups. Each sample in group can not be in training and validation fold at the same time.
- Parameters
base_model β model implements
fit(X, y)
,predict(X)
/predict_proba(X)
interfacesgroup_column β name of column for grouping training data.
X
infit(X, y)
andpredict(X)
should contain this column. Samples with one group value will be placed only in one training fold.fold_cnt β number of folds for training
TimeSeriesOOFModelο
- class ml_investment.models.TimeSeriesOOFModel(base_model, time_column: str, fold_cnt: int = 5)[source]ο
Bases:
object
Model wrapper incapsulate out of fold time-series separation.
- Parameters
base_model β model implements
fit(X, y)
,predict(X)
/predict_proba(X)
interfacestime_column β name of column for separating training data.
X
infit(X, y)
andpredict(X)
should contain this column. Samples from feature would not be used for training and prediction past.fold_cnt β number of folds for training
Pipelinesο
Collection of pipelines
Pipelineο
- class ml_investment.pipelines.Pipeline(data: Dict, feature, target, model, out_name=None)[source]ο
Bases:
object
Class incapsulate feature and target calculation, model training and validation during fit-phase and feature calculation and model prediction during execute-phase. Support multi-target with different models and metrics.
- Parameters
data β dict having needed for features and targets fields. This field should contain classes implementing
load(index) -> pd.DataFrame
interfacesfeature β feature calculator implements
calculate(data: Dict, index) -> pd.DataFrame
interfacetarget β target calculator implements
calculate(data: Dict, index) -> pd.DataFrame
interface ORList
of such target calculatorsmodel β class implements
fit(X, y)
andpredict(X)
interfaces. Π‘opy of the model will be used for every single target if type of target isList
. ORList
of such classes(len of this list should be equal to len of target)out_name β str column name of result in
pd.DataFrame
afterexecute()
ORList[str]
(len of this list should be equal to len of target) ORNone
(List['y_0', 'y_1'...]
will be used in this case)
- execute(index)[source]ο
Interface for executing pipeline for tickers. Features will be based on data from data_loader
- Parameters
index β execute identification(i.e. list of tickers to predict model for)
- Returns
result values in columns named as
out_name
param in__init__()
- Return type
pd.DataFrame
- export_core(path=None)[source]ο
Interface for saving pipelines core
- Parameters
path β str with path to store pipeline core OR
None
(path will be generated automatically)
- fit(index: typing.List[str], metric=None, target_filter_foo=<function nan_mask>)[source]ο
Interface to fit pipeline model for tickers. Features and target will be based on data from data_loader
- Parameters
index β fit identification(i.e. list of tickers to fit model for)
metric β function implements
foo(gt, y) -> float
interface. The same metric will be used for every single target if type of target isList
. ORList
of such functions(len of this list should be equal to len of target)target_filter_foo β function for filtering samples according target values/ Should implement
foo(arr) -> np.array[bool]
interface. Len of resulted array should be equal to len of arr. ORList
of such functions(len of this list should be equal to len of target)
MergePipelineο
- class ml_investment.pipelines.MergePipeline(pipeline_list: List, execute_merge_on)[source]ο
Bases:
object
Class combining list of pipelines to single pipilene.
- Parameters
pipeline_list β list of classes implementing
fit(index)
andexecute(index) -> pd.DataFrame()
interfaces. Order is important: merging results duringexecute()
will be done from left to right.execute_merge_on β column names for merging pipelines results on.
- execute(index, batch_size=None) pandas.core.frame.DataFrame [source]ο
Interface for executing pipeline for tickers. Features will be based on data from data_loader
- Parameters
index β identifiers for executing pipelines. I.e. list of companies tickers
batch_size β size of batch for execute separation(may be usefull for lower memory usage). OR
None
(for full-size executing)
- Returns
combined pipelines execute result
- Return type
pd.DataFrame
LoadingPipelineο
- class ml_investment.pipelines.LoadingPipeline(data_loader, columns: List[str])[source]ο
Bases:
object
Wrapper for data loaders for loading data in
execute(index) -> pd.DataFrame
interface- Parameters
data_loader β class implements
load(index) -> pd.DataFrame
interfacecolumns β column names for loading
Data loadersο
Collection of data loaders and utils for it
Yahooο
Loader for dataset provided by yahoo.
Data may be downloaded by script
main()
- Expected dataset structure:
- path to Yahoo data folder with structureYahooβββ quarterlyβ βββ AAPL.csvβ βββ FB.csvβ βββ β¦βββ baseβββ AAPL.jsonβββ FB.jsonβββ β¦
- class ml_investment.data_loaders.yahoo.YahooBaseData(data_path: str)[source]ο
Bases:
object
Loader for base information about company(like sector, industry etc)
- Parameters
data_path β path to
yahoo
dataset folder
- class ml_investment.data_loaders.yahoo.YahooQuarterlyData(data_path: str, quarter_count: Optional[int] = None)[source]ο
Bases:
object
Loader for quartely fundamental information about companies(debt, revenue etc)
- Parameters
data_path β path to
yahoo
dataset folderquarter_count β maximum number of last quarters to return. Resulted number may be less due to short history in some companies
SF1ο
Loaders for dataset provided by https://www.quandl.com/databases/SF1/data. Data may be downloaded by script
main()
- Expected structure of dataset
- SF1βββ core_fundamentalβ βββ AAPL.jsonβ βββ FB.jsonβ βββ β¦βββ dailyβ βββ AAPL.jsonβ βββ FB.jsonβ βββ β¦βββ tickers.zip
- class ml_investment.data_loaders.sf1.SF1BaseData(data_path: Optional[str] = None)[source]ο
Bases:
object
Load base information about company(like sector, industry etc)
- Parameters
data_path β path to
sf1
dataset folder If None, than will be usedsf1_data_path
from ~/.ml_investment/config.json
- class ml_investment.data_loaders.sf1.SF1DailyData(data_path: Optional[str] = None, days_count: Optional[int] = None)[source]ο
Bases:
object
Load daily information about company(marketcap, pe etc)
- Parameters
data_path β path to
sf1
dataset folder If None, than will be usedsf1_data_path
from ~/.ml_investment/config.jsondays_count β maximum number of last days to return. Resulted number may be less due to short history in some companies
- class ml_investment.data_loaders.sf1.SF1QuarterlyData(data_path: Optional[str] = None, quarter_count: Optional[int] = None, dimension: Optional[str] = 'ARQ')[source]ο
Bases:
object
Loader for quartely fundamental information about companies(debt, revenue etc)
- Parameters
data_path β path to
sf1
dataset folder If None, than will be usedsf1_data_path
from ~/.ml_investment/config.jsonquarter_count β maximum number of last quarters to return. Resulted number may be less due to short history in some companies
dimension β one of
['MRY', 'MRT', 'MRQ', 'ARY', 'ART', 'ARQ']
. SF1 dataset-based parameter
- class ml_investment.data_loaders.sf1.SF1SNP500Data(data_path: Optional[str] = None)[source]ο
Bases:
object
S&P500 historical constituents
- Parameters
data_path β path to
sf1
dataset folder If None, than will be usedsf1_data_path
from ~/.ml_investment/config.json
- existing_index()[source]ο
- Returns
existing index values that can pe pushed to load
- Return type
List
- load(index: Optional[List[numpy.datetime64]] = None) pandas.core.frame.DataFrame [source]ο
- Parameters
index β list of dates to load constituents for, i.e.
[np.datetime64('2018-01-01'), np.datetime64('2018-05-10')]
If there are no such date, than nearest past date will be used. ORNone
(loading for all dates when constituents was changed)- Returns
constituents information
- Return type
pd.DataFrame
- ml_investment.data_loaders.sf1.translate_currency(df: pandas.core.frame.DataFrame, columns: Optional[List[str]] = None)[source]ο
Translate currency of columns to USD according course information in appropriate columns(like debtusd-debt)
- Parameters
df β quarterly-based data
columns β columns to translate currency
- Returns
result with the same columns and shapes but with converted currency in columns
- Return type
pd.DataFrame
Quandl Commoditiesο
Loader for commodities price information from
https://blog.quandl.com/api-for-commodity-data.
Data may be downloaded by script
main()
- Expected dataset structure
- commoditiesβββ LBMA_GOLD.jsonβββ CHRIS_CME_CL1.jsonβββ β¦
- class ml_investment.data_loaders.quandl_commodities.QuandlCommoditiesData(data_path: Optional[str] = None)[source]ο
Bases:
object
Loader for commodities price information.
- data_path:
path to
quandl_commodities
dataset folder If None, than will be usedcommodities_data_path
from ~/.ml_investment/config.json
Daily Price Barsο
Loader for daily bars price information.
Data may be downloaded by script
main()
- Expected dataset structure
- daily_barsβββ AAPL.csvβββ TSLA.csvβββ β¦
- class ml_investment.data_loaders.daily_bars.DailyBarsData(data_path: Optional[str] = None, days_count: Optional[int] = None)[source]ο
Bases:
object
Loader for daywise price bars.
- Parameters
data_path β path to
daily_bars
dataset folder If None, than will be useddaily_bars_data_path
from ~/.ml_investment/config.jsondays_count β maximum number of last days to return. Resulted number may be less due to short history in some companies
Data loading utilsο
π₯ Downloading scriptsο
Collection of scripts for data downloading from different sources
SF1ο
- ml_investment.download_scripts.download_sf1.main(data_path: str = '/home/docs/.ml_investment/data/sf1', verbose: bool = False)[source]ο
Download quarterly fundamental data from https://www.quandl.com/databases/SF1/data
Note
SF1 is paid, so you need to subscribe and paste quandl token to ~/.ml_investment/secrets.json
quandl_api_key
- Parameters
data_path β path to folder in which downloaded data will be stored. OR
None
(downloading path will be assf1_data_path
from ~/.ml_investment/config.jsonverbose β show progress or not
Yahooο
- ml_investment.download_scripts.download_yahoo.main(data_path: Optional[str] = None)[source]ο
Download quarterly and base data from https://finance.yahoo.com
- Parameters
data_path β path to folder in which downloaded data will be stored. OR
None
(downloading path will be asyahoo_data_path
from ~/.ml_investment/config.json
Daily price barsο
- ml_investment.download_scripts.download_daily_bars.main(data_path: str = '/home/docs/.ml_investment/data/daily_bars', tickers: Optional[List] = ['OKTA', 'HYLN', 'RSTI', 'CHE', 'WHD', 'USPH', 'TRHC', 'FGEN', 'JD', 'BLNK', 'IRDM', 'FOCS', 'IBM', 'LANC', 'GLW', 'FITB', 'TPTX', 'EXPE', 'UHS', 'FCNCA', 'JBT', 'DRQ', 'RRBI', 'CHWY', 'DGX', 'VXRT', 'CCK', 'PHM', 'SJM', 'XNCR', 'DLB', 'BWA', 'SITE', 'LAD', 'MCHP', 'YUM', 'BOX', 'LHCG', 'BBIO', 'GPI', 'BMRN', 'PII', 'GDDY', 'MLM', 'WORK', 'INTC', 'CHGG', 'CWST', 'RACE', 'ASIX', 'NJR', 'AEE', 'DKS', 'SLP', 'ABMD', 'TE', 'COF', 'PBH', 'OSK', 'BR', 'COWN', 'PRSP', 'RGR', 'CRL', 'SLDB', 'LYB', 'IIVI', 'AYX', 'CSCO', 'ROK', 'WYNN', 'ARE', 'APEI', 'CLR', 'BECN', 'IR', 'EPAY', 'TREE', 'BLL', 'BDC', 'RCL', 'AFL', 'WWW', 'XPO', 'NYT', 'FORR', 'EMN', 'AES', 'PPL', 'ADPT', 'LMT', 'RGEN', 'IART', 'FDX', 'GE', 'OGE', 'SPCE', 'CMCO', 'QLYS', 'VIPS', 'MCD', 'ALXN', 'BLK', 'KLAC', 'AMWD', 'FUL', 'RAVN', 'TM', 'CDNA', 'SYF', 'LLY', 'INCY', 'MU', 'TTMI', 'FTV', 'CMA', 'EEFT', 'ATRA', 'ARCT', 'PB', 'YUMC', 'DASH', 'IBP', 'SI', 'MMM', 'CCOI', 'LRN', 'TT', 'BJRI', 'CARG', 'TREX', 'NVS', 'DKNG', 'TSS', 'ALLY', 'CVLT', 'EPAM', 'LDOS', 'NSC', 'EWBC', 'SCI', 'WKHS', 'GHC', 'EBAY', 'MO', 'MDGL', 'VFC', 'MA', 'FLOW', 'CACC', 'PPG', 'VALE', 'DRE', 'NP', 'AGIO', 'YEXT', 'OII', 'CFX', 'GRA', 'AWI', 'DOCU', 'PFE', 'A', 'AVGO', 'QTS', 'PM', 'OSUR', 'PATK', 'INSP', 'GEF', 'DAL', 'KMX', 'CIEN', 'GD', 'SF', 'AVLR', 'MED', 'MDLZ', 'ABT', 'GMS', 'DOV', 'BLKB', 'COKE', 'BLUE', 'CMS', 'VREX', 'MANT', 'ZEN', 'SBAC', 'DVN', 'HNP', 'PCG', 'CHTR', 'GTN', 'SRI', 'SXT', 'NET', 'ALRS', 'SYNH', 'SFM', 'JNJ', 'DG', 'RXN', 'SDGR', 'ALB', 'ITW', 'PRTK', 'BEN', 'PSX', 'RTX', 'SAVA', 'UNF', 'LSTR', 'AZPN', 'OHI', 'ALV', 'COUP', 'EIX', 'KEYS', 'PKG', 'WELL', 'ILMN', 'WH', 'PFGC', 'CVM', 'AIZ', 'CCXI', 'ANF', 'GT', 'WMB', 'WEC', 'AVNT', 'ROG', 'BKR', 'CRTX', 'GPC', 'CEA', 'ACH', 'NVDA', 'MORN', 'LNTH', 'PTC', 'CGNT', 'EAR', 'MYGN', 'PEGA', 'SAFM', 'HLI', 'SRE', 'STZ', 'IOSP', 'NTGR', 'PAGS', 'GDOT', 'CNXC', 'XEC', 'Y', 'PNC', 'CABO', 'OLLI', 'J', 'TGT', 'TPH', 'NFE', 'DLTR', 'CW', 'VRNS', 'XRX', 'SIG', 'BDTX', 'CL', 'T', 'NVTA', 'SMTC', 'BBBY', 'CFG', 'VRSK', 'NARI', 'TW', 'DIS', 'TAP', 'QTWO', 'PLTR', 'CHNG', 'COLD', 'ABBV', 'JELD', 'UBER', 'CLSK', 'STE', 'ZUO', 'STLD', 'HAL', 'HQY', 'GS', 'FTDR', 'ABC', 'ARQT', 'AMT', 'WABC', 'SYNA', 'LKQ', 'LHX', 'GILD', 'POR', 'TPR', 'NTAP', 'CVS', 'TTWO', 'PGNY', 'HAS', 'HUBS', 'CBSH', 'LPSN', 'KEX', 'TWNK', 'ARCC', 'ALNY', 'TXT', 'AFG', 'ADSK', 'AVY', 'SWK', 'PRI', 'URI', 'AFMD', 'RS', 'PNFP', 'KOD', 'RIDE', 'REGI', 'MCO', 'CB', 'TSM', 'SRCL', 'FIS', 'BAH', 'TRMK', 'ZG', 'SCHW', 'MDB', 'VG', 'OI', 'SHAK', 'TRIT', 'TKR', 'CVET', 'TWLO', 'MOH', 'PTR', 'ALLK', 'THG', 'YNDX', 'NRG', 'ELAN', 'DT', 'VZIO', 'IVZ', 'AYI', 'NUS', 'SO', 'IP', 'FWRD', 'LEGH', 'ADS', 'VIRT', 'GATX', 'WSO', 'DPZ', 'AQUA', 'EPC', 'CDNS', 'L', 'CTB', 'SCSC', 'NBIX', 'NOV', 'FIZZ', 'GWRE', 'MAA', 'KRYS', 'AKAM', 'CAT', 'IPAR', 'HPE', 'TWTR', 'BDX', 'MD', 'TSN', 'CNC', 'ASGN', 'KWR', 'ENTG', 'MAN', 'ICUI', 'HPQ', 'CVNA', 'MTX', 'DDS', 'BILI', 'IDCC', 'SEE', 'HES', 'JBHT', 'H', 'SAP', 'TAK', 'WERN', 'ATEX', 'EXLS', 'BMY', 'MGY', 'FSLY', 'KMB', 'SFIX', 'APLT', 'CGNX', 'K', 'UAA', 'APH', 'REG', 'EGRX', 'WSM', 'WMT', 'PRTS', 'HHR', 'PINC', 'SWBI', 'TXN', 'SP', 'WBS', 'SWN', 'LFUS', 'MODV', 'MSI', 'AMZN', 'BFAM', 'FFIV', 'EMR', 'CNS', 'EXAS', 'ET', 'SSNC', 'ED', 'TFX', 'TNDM', 'MSFT', 'KHC', 'PTCT', 'PUMP', 'MOMO', 'SBH', 'KEP', 'CRVL', 'MSTR', 'MNRO', 'VNE', 'MGLN', 'SXI', 'WRLD', 'ARW', 'VZ', 'IGMS', 'V', 'HRC', 'ZBRA', 'SBGI', 'HAE', 'PH', 'KIDS', 'ATRO', 'CY', 'LW', 'MBT', 'NEE', 'NTLA', 'ROL', 'MGM', 'GCO', 'ALE', 'NPK', 'PKI', 'CDLX', 'APA', 'WLTW', 'IIPR', 'CORT', 'CHX', 'PFG', 'PAYC', 'MNST', 'ESS', 'AIG', 'CE', 'CMI', 'PRAX', 'KMPR', 'MMC', 'NRIX', 'JBSS', 'BTI', 'IPGP', 'TSCO', 'QNST', 'BHF', 'JKHY', 'FTI', 'ZTS', 'MYRG', 'ATVI', 'LCII', 'EVH', 'FLT', 'SWX', 'OGS', 'VTR', 'NCBS', 'IONS', 'HD', 'CSOD', 'CPB', 'DISH', 'AAL', 'SAVE', 'PCAR', 'MRK', 'MKTX', 'DRI', 'CSX', 'LITE', 'KO', 'EDIT', 'HAIN', 'SMPL', 'PXD', 'AEIS', 'CVGW', 'CNMD', 'NEO', 'MPWR', 'CINF', 'SRDX', 'MTN', 'MRC', 'GH', 'W', 'BURL', 'VIAC', 'DOW', 'USM', 'MANH', 'FCN', 'RMD', 'LEG', 'EFX', 'ROKU', 'TRUP', 'IRBT', 'NWE', 'RAMP', 'PSA', 'MSCI', 'ANTM', 'HEI', 'BTAI', 'ACM', 'TTM', 'WDAY', 'GKOS', 'RVLV', 'GBCI', 'ALG', 'AAP', 'HA', 'UNVR', 'LASR', 'ALGT', 'HRB', 'SLAB', 'JCI', 'IBKR', 'AA', 'NEU', 'VNT', 'ICPT', 'AMP', 'AVAV', 'OMC', 'RSG', 'NTRA', 'APPN', 'BA', 'MANU', 'WTS', 'OFIX', 'RUN', 'PVH', 'NGVT', 'SKLZ', 'ZNH', 'CTLT', 'OMCL', 'EVER', 'SAIC', 'HCSG', 'BMI', 'AGCO', 'SLG', 'AJG', 'URBN', 'MTG', 'ONTO', 'ALXO', 'PRLB', 'SIVB', 'CHD', 'EQIX', 'UFPI', 'KMT', 'BOOT', 'MHO', 'ARVN', 'CCMP', 'MAR', 'AIN', 'CRMT', 'CHEF', 'HSY', 'WRB', 'SEIC', 'MATX', 'ARNC', 'BRO', 'MFGP', 'VIE', 'POOL', 'VRNT', 'MBUU', 'ATRC', 'ZS', 'AVT', 'PGR', 'FICO', 'HFC', 'RF', 'RE', 'NTUS', 'BLDR', 'IEX', 'PLD', 'BBY', 'ESE', 'AOUT', 'JWN', 'EYE', 'SR', 'INVH', 'INDB', 'AMN', 'ENV', 'ES', 'ACAD', 'TOL', 'CTSH', 'EHTH', 'GVA', 'TTEK', 'SNA', 'SNAP', 'EVRG', 'ENR', 'CTXS', 'WK', 'PBI', 'CLDR', 'AME', 'WBA', 'MSGE', 'JPM', 'PSTG', 'SYY', 'EGHT', 'LPX', 'NLOK', 'EME', 'HXL', 'TMHC', 'GWW', 'CLF', 'INFO', 'RPD', 'PCRX', 'NWSA', 'GDRX', 'SIGI', 'CHCO', 'RDFN', 'IRM', 'LGND', 'DNKN', 'FLIR', 'ENTA', 'CSWI', 'TCX', 'FLR', 'AMGN', 'FOXF', 'JOUT', 'HWM', 'JNPR', 'AMTI', 'KALU', 'ECL', 'LEA', 'BK', 'WM', 'BNGO', 'ACN', 'MUR', 'NOW', 'QS', 'XLNX', 'CTAS', 'F', 'SSTK', 'LRCX', 'PEAK', 'CBU', 'AMED', 'CLDT', 'ITRI', 'HSC', 'MTB', 'LUMN', 'VMI', 'HTHT', 'GTHX', 'PIPR', 'LSCC', 'ANDE', 'SPOT', 'SBUX', 'HEAR', 'BAC', 'BOKF', 'MXIM', 'TFC', 'FRHC', 'EAT', 'POSH', 'HBI', 'ABNB', 'STMP', 'AVNS', 'BH', 'ADUS', 'AWK', 'UCTT', 'DRNA', 'MSGN', 'MUSA', 'VRSN', 'D', 'WAL', 'UI', 'TDS', 'RYTM', 'HIBB', 'EQT', 'VAC', 'SHW', 'ITCI', 'DD', 'SRC', 'NXPI', 'RNG', 'ZYXI', 'MKSI', 'DCI', 'DLTH', 'SONO', 'OTIS', 'BPMC', 'FATE', 'CRWD', 'KEY', 'LIN', 'GEVO', 'AMG', 'BSX', 'DHR', 'HURN', 'CROX', 'LB', 'UMBF', 'CHDN', 'XEL', 'IBN', 'ALSN', 'TER', 'DMTK', 'CFR', 'AXSM', 'CPNG', 'NLSN', 'WWD', 'YY', 'MAC', 'PNTG', 'PRU', 'APPH', 'ANIP', 'RETA', 'TGNA', 'CPS', 'STAG', 'EA', 'JJSF', 'RHI', 'ROP', 'UNP', 'WHR', 'SONY', 'PLCE', 'HUBG', 'CRUS', 'ZGNX', 'ETN', 'LEVI', 'ANET', 'TENB', 'ALGN', 'MTRN', 'WDFC', 'NTNX', 'SQ', 'RDS.A', 'SCCO', 'VCYT', 'UNH', 'AAPL', 'RGLD', 'TTCF', 'ZBH', 'VC', 'AON', 'UPWK', 'BAND', 'MEDP', 'LFC', 'SEDG', 'LUV', 'HALO', 'CAG', 'HGV', 'HUM', 'SSD', 'CI', 'TDG', 'CCI', 'CSGS', 'ULTA', 'RYN', 'EXR', 'JBL', 'SKM', 'ORLY', 'AAON', 'COO', 'GO', 'DAR', 'TNC', 'CDW', 'COP', 'BIG', 'UAL', 'AN', 'BFYT', 'CR', 'PRGS', 'VEEV', 'KIM', 'NOK', 'FISV', 'HII', 'ABG', 'NEOG', 'VRTS', 'SMG', 'FNF', 'COR', 'RGNX', 'ANSS', 'BXP', 'INSG', 'DNLI', 'HP', 'CF', 'SWCH', 'WING', 'FCFS', 'HLT', 'ALLO', 'NVCR', 'NWL', 'CLX', 'HRTX', 'AOS', 'COLM', 'TCS', 'PLUS', 'IRTC', 'VMC', 'LII', 'BZUN', 'HST', 'INGN', 'GL', 'CARS', 'TDOC', 'GTX', 'AZO', 'CHKP', 'CHL', 'CBRE', 'COG', 'PBF', 'R', 'OLED', 'RPM', 'PETQ', 'SJI', 'PRFT', 'BX', 'XOM', 'VTRS', 'ADM', 'KMI', 'FLWS', 'AIR', 'ADBE', 'GCP', 'MLAB', 'AERI', 'BL', 'MKC', 'SAM', 'STAA', 'APPF', 'MOV', 'GM', 'BRC', 'XYL', 'BWXT', 'WGO', 'WISH', 'FCX', 'QCOM', 'MVIS', 'HHC', 'MDRX', 'AIRC', 'TAL', 'WEX', 'INMD', 'MOS', 'ITGR', 'FOE', 'DXCM', 'MELI', 'NSP', 'GPN', 'ZUMZ', 'HSIC', 'DNOW', 'FTNT', 'CBRL', 'RGA', 'VNDA', 'TRV', 'HIG', 'ROCK', 'FELE', 'HCCI', 'TRIP', 'MDT', 'EXC', 'AEO', 'QRTEA', 'BILL', 'SWKS', 'DECK', 'UGI', 'CSL', 'CNST', 'XRAY', 'ENDP', 'GNL', 'PZZA', 'PRAA', 'TPX', 'HUBB', 'NUE', 'PYPL', 'OVV', 'MXL', 'PINS', 'VIR', 'KNX', 'RRC', 'ATRI', 'VLDR', 'CLH', 'JACK', 'KRTX', 'NFLX', 'SLB', 'MEI', 'GBT', 'DFS', 'LNT', 'NAVI', 'WAB', 'CSII', 'SHEN', 'MIDD', 'LAZR', 'BCO', 'BIDU', 'ROLL', 'UTHR', 'FSLR', 'MLHR', 'GOOGL', 'CRS', 'BOH', 'DVA', 'WAT', 'CME', 'KTB', 'AAN', 'BIIB', 'DISCA', 'BLD', 'NEWR', 'VEON', 'MET', 'SAIA', 'CRI', 'DE', 'ARMK', 'ALK', 'PCTY', 'SKX', 'ZION', 'FLS', 'JEF', 'UDR', 'BABA', 'AMD', 'MS', 'IQV', 'HSKA', 'QRVO', 'USNA', 'KOPN', 'C', 'MAT', 'FRPH', 'MDLA', 'NKE', 'TMO', 'ENPH', 'CLOV', 'NVEE', 'BERY', 'HBAN', 'ORCL', 'ODFL', 'NVR', 'ECPG', 'ANAB', 'AIV', 'UFS', 'MLCO', 'SMAR', 'TXG', 'NEM', 'MTD', 'RARE', 'MASI', 'CAH', 'POLY', 'TDY', 'BKI', 'EL', 'DLR', 'WTTR', 'NKTR', 'QIWI', 'GTLS', 'KR', 'LGIH', 'MCRI', 'FRPT', 'CORR', 'FL', 'YETI', 'TNL', 'CHA', 'BKNG', 'PRG', 'LI', 'WOR', 'PLNT', 'COST', 'KSU', 'CDK', 'APD', 'SYK', 'PSN', 'TMX', 'WFC', 'AIT', 'NTCT', 'GSHD', 'FDS', 'TEL', 'SUPN', 'NTES', 'FIVN', 'ATUS', 'TJX', 'ALRM', 'BCPC', 'CPRT', 'BRK.B', 'QDEL', 'FANG', 'VCEL', 'DXC', 'PLAY', 'BYND', 'MRTX', 'LEN', 'AVP', 'FOXA', 'BUD', 'EXPO', 'ETRN', 'THO', 'ROST', 'AX', 'MINI', 'IPG', 'ARWR', 'AGRO', 'TMUS', 'AMCX', 'PWR', 'REZI', 'WWE', 'DSKY', 'ALTR', 'CNK', 'TDC', 'NTCO', 'RIG', 'PLAN', 'UPS', 'SNBR', 'HRL', 'CRSP', 'M', 'CSGP', 'FBHS', 'CENT', 'RBC', 'PTON', 'WB', 'LECO', 'AVB', 'THRM', 'EXP', 'SYKE', 'IT', 'REX', 'LTHM', 'WRK', 'VRTX', 'ADP', 'GPS', 'ON', 'MSA', 'OZON', 'STRA', 'INTU', 'PEG', 'CMP', 'CMG', 'CHK', 'EVBG', 'AWH', 'VUZI', 'LOPE', 'NCR', 'LULU', 'WLK', 'VNO', 'HTA', 'AMAT', 'ANIK', 'LPL', 'ZM', 'ECHO', 'CTVA', 'XLRN', 'AWR', 'JLL', 'PPC', 'CMC', 'USB', 'TNET', 'FMC', 'WDC', 'SPR', 'RRGB', 'BIO', 'TECH', 'MRNA', 'SHI', 'OXY', 'AJRD', 'ATNI', 'TPIC', 'MMS', 'TCBI', 'OSIS', 'DLX', 'CRM', 'GBX', 'REGN', 'CWT', 'ALL', 'UNM', 'SJW', 'PEN', 'HON', 'CVX', 'LOW', 'SON', 'SNX', 'VLO', 'KDP', 'DK', 'DELL', 'DHI', 'MRVL', 'COHR', 'CCL', 'O', 'PD', 'PEP', 'MMI', 'IFF', 'HCA', 'NTRS', 'APPS', 'CALM', 'SOHU', 'GNRC', 'CGEN', 'WY', 'NOC', 'WTFC', 'DIOD', 'MTCH', 'BBSI', 'VMW', 'CPRI', 'DCPH', 'MTH', 'NUVA', 'ISRG', 'SPG', 'ATR', 'DDOG', 'PBCT', 'STX', 'AMSF', 'GRMN', 'TTD', 'YELP', 'TYL', 'HOG', 'ATKR', 'PGTI', 'SNY', 'ESPR', 'MMSI', 'SIBN', 'PLXS', 'CMCSA', 'ICE', 'CREE', 'OKE', 'EVR', 'FAST', 'SPLK', 'TRU', 'DY', 'SPSC', 'ERIE', 'TXRH', 'NXST', 'ETR', 'BF.B', 'GGG', 'CLGX', 'EXPD', 'MAS', 'ALLE', 'LYV', 'OC', 'ADI', 'MTOR', 'RH', 'NVRO', 'LIFE', 'IOVA', 'VPG', 'SBRA', 'CERN', 'NDSN', 'PG', 'VSAT', 'COTY', 'POWI', 'PRAH', 'TCRR', 'RJF', 'EW', 'FARO', 'NATI', 'ETSY', 'KBH', 'ARNA', 'EXEL', 'MHK', 'CASY', 'COUR', 'SWAV', 'RL', 'KRG', 'KFY', 'MPC', 'BRKS', 'NWLI', 'POST', 'EOG', 'ATGE', 'TROW', 'FNKO', 'SAIL', 'GSKY', 'VCRA', 'FORM', 'ANGI', 'NMIH', 'CHH', 'GMED', 'SBCF', 'TWOU', 'GRUB', 'IAC', 'JOBS', 'CEVA', 'CHRW', 'MKL', 'ASH', 'BAX', 'MCK', 'GOSS', 'MRO', 'EBS', 'IDXX', 'SNPS', 'MSGS', 'INGR', 'FTCH', 'FB', 'VICR', 'SWI', 'ACMR', 'ASO', 'DORM', 'LYFT', 'FND', 'CONE', 'SGEN', 'PODD', 'GIS', 'PANW', 'WSC', 'DBX', 'QUOT', 'UTL', 'STT', 'NDAQ', 'OIS', 'TTC', 'PDCO', 'BRKR', 'BC', 'AXP', 'LPLA', 'ZI', 'AXON', 'JCOM', 'ITT', 'FIVE', 'CARR', 'APLE', 'CVCO', 'WST', 'LH', 'WU', 'MSM', 'ELS', 'LNN', 'USFD', 'CARA', 'FOLD', 'AXGN', 'RDY', 'HOLX', 'COIN', 'APTV', 'CNXN', 'PFPT', 'TSLA', 'TRMB', 'THS', 'TCMD', 'ENS', 'CNP', 'SPGI', 'AVTR', 'LVS', 'PRTA', 'SAGE', 'VRTV', 'SRPT', 'WCC', 'BGS', 'KNSL', 'SPY', 'TLT', 'QQQ'], from_date: Optional[numpy.datetime64] = numpy.datetime64('2010-01-01'), to_date: Optional[numpy.datetime64] = numpy.datetime64('2022-02-01T10:45:10'), verbose: bool = False)[source]ο
Download daily price bars for base US stocks and indexes.
- Parameters
data_path β path to folder in which downloaded data will be stored. OR
None
(downloading path will be asdaily_bars_data_path
from ~/.ml_investment/config.jsontickers β tickers to download daily bars for
from_date β start date for loading data
to_date β end day for loading data
verbose β show progress or not
Commoditiesο
- ml_investment.download_scripts.download_commodities.main(data_path: str = '/home/docs/.ml_investment/data/commodities', verbose: bool = False)[source]ο
Download commodities price history from https://blog.quandl.com/api-for-commodity-data
Note
To download this dataset you need to register at quandl and paste token to ~/.ml_investment/secrets.json
- Parameters
data_path β path to folder in which downloaded data will be stored. OR
None
(downloading path will be ascommodities_data_path
from ~/.ml_investment/config.jsonverbose β show progress or not
Backtestο
Backtesting utils
Strategyο
- class ml_investment.backtest.strategy.Strategy[source]ο
Bases:
object
Base class for strategy backtesting. It contains overrideble method
step
for defining user strategy. This class incapsulate backtesting and metrics calculation process and also contains information about orders.- backtest(data_loader, date_col: str, price_col: str, return_col: str, return_format: str, step_dates: Optional[List[numpy.timedelta64]] = None, cash: float = 100000, comission: float = 0.00025, latency: numpy.timedelta64 = numpy.timedelta64(0, 'h'), allow_short: bool = False, metrics=None, preload: bool = False, verbose: bool = True)[source]ο
Backtest strategy on provided data and other parameters. It will create and execute orders and calculate resulted equity and metrics.
- Parameters
data_loader β class implementing
load(index) -> pd.DataFrame
interface. index in this case is list of tickers to load market data for.date_col β name of column containing date (time) information in market data provided by
data_loader
.price_col β name of column containing price information in market data provided by
data_loader
.return_col β name of column containing total return information in data provided by
data_loader
. It may be differ from price due to dividends, stock splits and etc.return_format β format of data provided by
return_col
column. Ifreturn_format = 'ratio'
than column should contain ratio between previous and current adjusted price. E.g. 1.2 means growth by 20% from the previous step. Ifreturn_format = 'price'
than column should contain adjusted price (price, including dividends and etc.) Ifreturn_format = 'change'
than column should contain relative change between current and previous step. E.g. 0.2 means growth by 20% from the previous step.step_dates β dates in which all actions can be taken. Include new market prices receiving, order creation and executing.
step
method will iterate over all those dates. If None than all possible dates, provided bydate_col
column indata_loader
will be used. Possible only ifpreload = True
anddata_loader
haveexisting_index(index) -> List
interface.cash β initial amount of cash
comission β commission charged for each trade (in percent of order value)
latency β time between current step date and actual order posting. It emulates delays during
step
logic and in the Internet connection with the exchange.allow_short β allow short positions or not
preload β load all data provided from
data_loader
to ram or notverbose β show progress or not
- post_order(ticker: str, direction: int, size: float, order_type: int = 0, lifetime: numpy.timedelta64 = numpy.timedelta64(300, 'D'), allow_partial: bool = True)[source]ο
Post new order to backtest. It may be used inside your strategy overriden
step
method.- Parameters
ticker β ticker of company to post order for
direction β one of
Order.BUY
(1),Order.SELL
(-1)size β size of order in pieces
order_type β one of
Order.MARKET
(0),Order.LIMIT
(1)lifetime β amount of time before order closing if it can not be executed (e.g. if unsatisfactory price lasts a long time)
allow_partial β may order be executed with not full size or not
- post_order_value(ticker: str, direction: int, value: float, order_type: int = 0, lifetime: numpy.timedelta64 = numpy.timedelta64(300, 'D'), allow_partial: bool = True)[source]ο
Post new order by value (instead of size) to backtest. It may be used inside your strategy overriden
step
method.- Parameters
ticker β ticker of company to post order for
direction β one of
Order.BUY
(1),Order.SELL
(-1)value β value of order in money
order_type β one of
Order.MARKET
(0),Order.LIMIT
(1)lifetime β amount of time before order closing if it can not be executed (e.g. if unsatisfactory price lasts a long time)
allow_partial β may order be executed with not full size or not
- post_portfolio_part(ticker: str, part: float, lifetime: numpy.timedelta64 = numpy.timedelta64(300, 'D'), allow_partial: bool = True)[source]ο
Post order to backtest to have desired part in portfolio. It will calculate difference between current and desired part to create appropriate order. It may be used inside your strategy overriden
step
method.- Parameters
ticker β ticker of company to post order for
part β desired part in all equity including other stocks and cash in portfolio (value between 0 and 1)
lifetime β amount of time before order closing if it can not be executed (e.g. if unsatisfactory price lasts a long time)
allow_partial β may order be executed with not full size or not
- post_portfolio_size(ticker: str, size: int, lifetime: numpy.timedelta64 = numpy.timedelta64(300, 'D'), allow_partial: bool = True)[source]ο
Post order to backtest to have desired size in portfolio. It will calculate difference between current and desired size to create appropriate order. It may be used inside your strategy overriden
step
method.- Parameters
ticker β ticker of company to post order for
size β desired size in portfolio (in pieces)
lifetime β amount of time before order closing if it can not be executed (e.g. if unsatisfactory price lasts a long time)
allow_partial β may order be executed with not full size or not
- post_portfolio_value(ticker: str, value: float, lifetime: numpy.timedelta64 = numpy.timedelta64(300, 'D'), allow_partial: bool = True)[source]ο
Post order to backtest to have desired value in portfolio. It will calculate difference between current and desired value to create appropriate order. It may be used inside your strategy overriden
step
method.- Parameters
ticker β ticker of company to post order for
value β desired value in portfolio (in money)
lifetime β amount of time before order closing if it can not be executed (e.g. if unsatisfactory price lasts a long time)
allow_partial β may order be executed with not full size or not