Data loaders

Collection of data loaders and utils for it

Yahoo

Loader for dataset provided by yahoo. Data may be downloaded by script main()

Expected dataset structure:: path to Yahoo data folder with structure

Yahoo

├── quarterly

│ ├── AAPL.csv

│ ├── FB.csv

│ └── …

├── base

├── AAPL.json

├── FB.json

└── …

class ml_investment.data_loaders.yahoo.YahooBaseData(data_path: str)[source]

Bases: object

Loader for base information about company(like sector, industry etc)

Parameters: data_path – path to yahoo dataset folder

load(index: Optional[List[str]] = None) → pandas.core.frame.DataFrame[source]

Parameters: index – list of tickers to load data for OR None (for loading all possible tickers)
Returns: base companies information
Return type: pd.DataFrame

class ml_investment.data_loaders.yahoo.YahooQuarterlyData(data_path: str, quarter_count: Optional[int] = None)[source]

Bases: object

Loader for quartely fundamental information about companies(debt, revenue etc)

Parameters

data_path – path to yahoo dataset folder
quarter_count – maximum number of last quarters to return. Resulted number may be less due to short history in some companies

load(index: List[str]) → pandas.core.frame.DataFrame[source]

Parameters: index – list of tickers to load data for
Returns: quarterly information about companies
Return type: pd.DataFrame

SF1

Loaders for dataset provided by https://www.quandl.com/databases/SF1/data. Data may be downloaded by script main()

Expected structure of dataset: SF1

├── core_fundamental

│ ├── AAPL.json

│ ├── FB.json

│ └── …

├── daily

│ ├── AAPL.json

│ ├── FB.json

│ └── …

└── tickers.zip

class ml_investment.data_loaders.sf1.SF1BaseData(data_path: Optional[str] = None)[source]

Bases: object

Load base information about company(like sector, industry etc)

Parameters: data_path – path to sf1 dataset folder If None, than will be used sf1_data_path from ~/.ml_investment/config.json

existing_index()[source]

Returns: existing index values that can pe pushed to load
Return type: List

load(index: Optional[List[str]] = None) → pandas.core.frame.DataFrame[source]

Parameters: index – list of ticker to load data for, i.e. ['AAPL', 'TSLA'] OR None (loading for all possible tickers)
Returns: base companies information
Return type: pd.DataFrame

class ml_investment.data_loaders.sf1.SF1DailyData(data_path: Optional[str] = None, days_count: Optional[int] = None)[source]

Bases: object

Load daily information about company(marketcap, pe etc)

Parameters

data_path – path to sf1 dataset folder If None, than will be used sf1_data_path from ~/.ml_investment/config.json
days_count – maximum number of last days to return. Resulted number may be less due to short history in some companies

existing_index()[source]

Returns: existing index values that can pe pushed to load
Return type: List

load(index: List[str]) → pandas.core.frame.DataFrame[source]

Parameters: index – list of ticker to load data for, i.e. ['AAPL', 'TSLA']
Returns: daily information about companies
Return type: pd.DataFrame

class ml_investment.data_loaders.sf1.SF1QuarterlyData(data_path: Optional[str] = None, quarter_count: Optional[int] = None, dimension: Optional[str] = 'ARQ')[source]

Bases: object

Loader for quartely fundamental information about companies(debt, revenue etc)

Parameters

data_path – path to sf1 dataset folder If None, than will be used sf1_data_path from ~/.ml_investment/config.json
quarter_count – maximum number of last quarters to return. Resulted number may be less due to short history in some companies
dimension – one of ['MRY', 'MRT', 'MRQ', 'ARY', 'ART', 'ARQ']. SF1 dataset-based parameter

existing_index()[source]

Returns: existing index values that can pe pushed to load
Return type: List

load(index: List[str]) → pandas.core.frame.DataFrame[source]

Parameters: index – list of tickers to load data for, i.e. ['AAPL', 'TSLA']
Returns: quarterly information about companies
Return type: pd.DataFrame

class ml_investment.data_loaders.sf1.SF1SNP500Data(data_path: Optional[str] = None)[source]

Bases: object

S&P500 historical constituents

Parameters: data_path – path to sf1 dataset folder If None, than will be used sf1_data_path from ~/.ml_investment/config.json

existing_index()[source]

Returns: existing index values that can pe pushed to load
Return type: List

load(index: Optional[List[numpy.datetime64]] = None) → pandas.core.frame.DataFrame[source]

Parameters: index – list of dates to load constituents for, i.e. [np.datetime64('2018-01-01'), np.datetime64('2018-05-10')] If there are no such date, than nearest past date will be used. OR None (loading for all dates when constituents was changed)
Returns: constituents information
Return type: pd.DataFrame

ml_investment.data_loaders.sf1.translate_currency(df: pandas.core.frame.DataFrame, columns: Optional[List[str]] = None)[source]

Translate currency of columns to USD according course information in appropriate columns(like debtusd-debt)

Parameters

df – quarterly-based data
columns – columns to translate currency

Returns

result with the same columns and shapes but with converted currency in columns

Return type

pd.DataFrame

Quandl Commodities

Loader for commodities price information from https://blog.quandl.com/api-for-commodity-data. Data may be downloaded by script main()

Expected dataset structure: commodities

├── LBMA_GOLD.json

├── CHRIS_CME_CL1.json

└── …

class ml_investment.data_loaders.quandl_commodities.QuandlCommoditiesData(data_path: Optional[str] = None)[source]

Bases: object

Loader for commodities price information.

data_path:: path to quandl_commodities dataset folder If None, than will be used commodities_data_path from ~/.ml_investment/config.json

existing_index()[source]

Returns: existing index values that can pe pushed to load
Return type: List

load(index: List[str]) → pandas.core.frame.DataFrame[source]

Load time-series information about commodity price

Parameters: index – list of commodities codes to load data for, i.e. ['LBMA/GOLD', 'JOHNMATT/PALL']
Returns: time series price information
Return type: pd.DataFrame

Daily Price Bars

Loader for daily bars price information. Data may be downloaded by script main()

Expected dataset structure: daily_bars

├── AAPL.csv

├── TSLA.csv

└── …

class ml_investment.data_loaders.daily_bars.DailyBarsData(data_path: Optional[str] = None, days_count: Optional[int] = None)[source]

Bases: object

Loader for daywise price bars.

Parameters

data_path – path to daily_bars dataset folder If None, than will be used daily_bars_data_path from ~/.ml_investment/config.json
days_count – maximum number of last days to return. Resulted number may be less due to short history in some companies

existing_index()[source]

Returns: existing index values that can pe pushed to load
Return type: List

load(index: List[str]) → pandas.core.frame.DataFrame[source]

Load daily price bars

Parameters: index – list of tickers to load data for, i.e. ['AAPL', 'TSLA']
Returns: daily price bars
Return type: pd.DataFrame

Data loaders

Yahoo

SF1

Quandl Commodities

Daily Price Bars

Data loading utils