Data loaders

Collection of data loaders and utils for it

Yahoo

Loader for dataset provided by yahoo. Data may be downloaded by script main()

Expected dataset structure:
path to Yahoo data folder with structure
Yahoo
├── quarterly
│ ├── AAPL.csv
│ ├── FB.csv
│ └── …
├── base
├── AAPL.json
├── FB.json
└── …
class ml_investment.data_loaders.yahoo.YahooBaseData(data_path: str)[source]

Bases: object

Loader for base information about company(like sector, industry etc)

Parameters

data_path – path to yahoo dataset folder

load(index: Optional[List[str]] = None) pandas.core.frame.DataFrame[source]
Parameters

index – list of tickers to load data for OR None (for loading all possible tickers)

Returns

base companies information

Return type

pd.DataFrame

class ml_investment.data_loaders.yahoo.YahooQuarterlyData(data_path: str, quarter_count: Optional[int] = None)[source]

Bases: object

Loader for quartely fundamental information about companies(debt, revenue etc)

Parameters
  • data_path – path to yahoo dataset folder

  • quarter_count – maximum number of last quarters to return. Resulted number may be less due to short history in some companies

load(index: List[str]) pandas.core.frame.DataFrame[source]
Parameters

index – list of tickers to load data for

Returns

quarterly information about companies

Return type

pd.DataFrame

SF1

Loaders for dataset provided by https://www.quandl.com/databases/SF1/data. Data may be downloaded by script main()

Expected structure of dataset
SF1
├── core_fundamental
│ ├── AAPL.json
│ ├── FB.json
│ └── …
├── daily
│ ├── AAPL.json
│ ├── FB.json
│ └── …
└── tickers.zip
class ml_investment.data_loaders.sf1.SF1BaseData(data_path: Optional[str] = None)[source]

Bases: object

Load base information about company(like sector, industry etc)

Parameters

data_path – path to sf1 dataset folder If None, than will be used sf1_data_path from ~/.ml_investment/config.json

existing_index()[source]
Returns

existing index values that can pe pushed to load

Return type

List

load(index: Optional[List[str]] = None) pandas.core.frame.DataFrame[source]
Parameters

index – list of ticker to load data for, i.e. ['AAPL', 'TSLA'] OR None (loading for all possible tickers)

Returns

base companies information

Return type

pd.DataFrame

class ml_investment.data_loaders.sf1.SF1DailyData(data_path: Optional[str] = None, days_count: Optional[int] = None)[source]

Bases: object

Load daily information about company(marketcap, pe etc)

Parameters
  • data_path – path to sf1 dataset folder If None, than will be used sf1_data_path from ~/.ml_investment/config.json

  • days_count – maximum number of last days to return. Resulted number may be less due to short history in some companies

existing_index()[source]
Returns

existing index values that can pe pushed to load

Return type

List

load(index: List[str]) pandas.core.frame.DataFrame[source]
Parameters

index – list of ticker to load data for, i.e. ['AAPL', 'TSLA']

Returns

daily information about companies

Return type

pd.DataFrame

class ml_investment.data_loaders.sf1.SF1QuarterlyData(data_path: Optional[str] = None, quarter_count: Optional[int] = None, dimension: Optional[str] = 'ARQ')[source]

Bases: object

Loader for quartely fundamental information about companies(debt, revenue etc)

Parameters
  • data_path – path to sf1 dataset folder If None, than will be used sf1_data_path from ~/.ml_investment/config.json

  • quarter_count – maximum number of last quarters to return. Resulted number may be less due to short history in some companies

  • dimension – one of ['MRY', 'MRT', 'MRQ', 'ARY', 'ART', 'ARQ']. SF1 dataset-based parameter

existing_index()[source]
Returns

existing index values that can pe pushed to load

Return type

List

load(index: List[str]) pandas.core.frame.DataFrame[source]
Parameters

index – list of tickers to load data for, i.e. ['AAPL', 'TSLA']

Returns

quarterly information about companies

Return type

pd.DataFrame

class ml_investment.data_loaders.sf1.SF1SNP500Data(data_path: Optional[str] = None)[source]

Bases: object

S&P500 historical constituents

Parameters

data_path – path to sf1 dataset folder If None, than will be used sf1_data_path from ~/.ml_investment/config.json

existing_index()[source]
Returns

existing index values that can pe pushed to load

Return type

List

load(index: Optional[List[numpy.datetime64]] = None) pandas.core.frame.DataFrame[source]
Parameters

index – list of dates to load constituents for, i.e. [np.datetime64('2018-01-01'), np.datetime64('2018-05-10')] If there are no such date, than nearest past date will be used. OR None (loading for all dates when constituents was changed)

Returns

constituents information

Return type

pd.DataFrame

ml_investment.data_loaders.sf1.translate_currency(df: pandas.core.frame.DataFrame, columns: Optional[List[str]] = None)[source]

Translate currency of columns to USD according course information in appropriate columns(like debtusd-debt)

Parameters
  • df – quarterly-based data

  • columns – columns to translate currency

Returns

result with the same columns and shapes but with converted currency in columns

Return type

pd.DataFrame

Quandl Commodities

Loader for commodities price information from https://blog.quandl.com/api-for-commodity-data. Data may be downloaded by script main()

Expected dataset structure
commodities
├── LBMA_GOLD.json
├── CHRIS_CME_CL1.json
└── …
class ml_investment.data_loaders.quandl_commodities.QuandlCommoditiesData(data_path: Optional[str] = None)[source]

Bases: object

Loader for commodities price information.

data_path:

path to quandl_commodities dataset folder If None, than will be used commodities_data_path from ~/.ml_investment/config.json

existing_index()[source]
Returns

existing index values that can pe pushed to load

Return type

List

load(index: List[str]) pandas.core.frame.DataFrame[source]

Load time-series information about commodity price

Parameters

index – list of commodities codes to load data for, i.e. ['LBMA/GOLD', 'JOHNMATT/PALL']

Returns

time series price information

Return type

pd.DataFrame

Daily Price Bars

Loader for daily bars price information. Data may be downloaded by script main()

Expected dataset structure
daily_bars
├── AAPL.csv
├── TSLA.csv
└── …
class ml_investment.data_loaders.daily_bars.DailyBarsData(data_path: Optional[str] = None, days_count: Optional[int] = None)[source]

Bases: object

Loader for daywise price bars.

Parameters
  • data_path – path to daily_bars dataset folder If None, than will be used daily_bars_data_path from ~/.ml_investment/config.json

  • days_count – maximum number of last days to return. Resulted number may be less due to short history in some companies

existing_index()[source]
Returns

existing index values that can pe pushed to load

Return type

List

load(index: List[str]) pandas.core.frame.DataFrame[source]

Load daily price bars

Parameters

index – list of tickers to load data for, i.e. ['AAPL', 'TSLA']

Returns

daily price bars

Return type

pd.DataFrame

Data loading utils