Parser | KEP |
Data source | "Short-term Economic Indicators" (KEP) by Rosstat |
Parsing result | Annual, quarterly and monthly time series in CSV files |
Schedule | 2018 |
In this repo we publish a dataset of Russian macroeconomic time series as machine-readable CSV files. We keep track of monthly macroeconomic data releases (vintages) since April 2009. Original files by Rosstat are in MS Word format. does the following job:
- download and unpack MS Word files from Rosstat
- extract tables from Word files and assigns variable names
- create pandas dataframes with time series (at annual, quarterly and monthly frequency)
- save dataframes as CSV files at stable URL
Stable URL:
import pandas as pd
def get_dataframe_from_web(freq):
url_base = (''
filename = "df{}.csv".format(freq)
url = url_base.format(filename)
return pd.read_csv1(url, converters={0: pd.to_datetime}, index_col=0)
dfa, dfq, dfm = (get_dataframe_from_web(freq) for freq in 'aqm')
Around this schedule on a Windows machine I run:
invoke add <year> <month>
and commit changes to this repo.
This command:
- downloads a rar file from Rosstat,
- unpacks MS Word files,
- dumps all tables from MS Word files to an interim CSV file,
- parses interim CSV file to three dataframes by frequency
- transforms some variables (eg. deaccumulates government expenditures)
- validates parsing result
- saves dataframes as processed CSV files
- saves csv for latest date (todo)
- saves an Excel file for latest date (todo).
Same job can be done by
Parcer | mini-kep |
Job | Parse sections of Short-term Economic Indicators (KEP) monthly Rosstat publication |
Source URL | Rosstat KEP page |
Source type | MS Word |
Frequency | Monthly |
When released | Start of month as in schedule |
Code | |
Test health | |
Test coverage | |
Documentation | |
CSV endpoint | |
Transformation | Government revenue/expenses deaccumaulated to monthly values |
Validation | Hardcoded checkpoints and consistency checks |
All historic raw data available on internet?
- Yes
- No (data prior to 2016-12 is in this repo only)
Is scrapper automated (can download required_labels information from internet without manual operations)?
- Yes
- No
We follow cookiecutter-data-science template for directory structure.
Windows and MS Word are required to create interim text dumps from MS Word files. Оnce these text files are created, they can be parsed on a linux machine.
This repo replaces a predecessor, data-rosstat-kep, which could not handle vintages of macroeconomic data.