russtat utilizes the power of Python to download and process the massive public data release by the Russian Statistics Office from the EMISS website. The original XML-formatted datasets (close to 7,000 in total) are parsed and saved as JSON files which can then be fed into a ready-made PostgreSQL database to give additional power. Alternativaly, you can utilize the JSON files with your own software, in any way you like!
russtat source code and documentation are hosted on Github
- parallel non-blocking data retrieval and processing using the multiprocessing library
- fool-proof XML parsing with default values for missing data, string trimming, type conversions and exception handling
- ready SQL script to create the PostgreSQL database from scratch
- database routine handling
- extensive API documentation with Doxygen
All source data are taken from the EMISS website at https://fedstat.ru/. No data is modified by this application.
The EMISS website publishes all materials under the Creative Common Attribution 3.0 license. The copyright owners are the Russian Federal State Statistic Service (ROSSTAT) and the Russian Ministry of Digital Development, Communications and Mass Media (MINKOMSV'AZ').
You must have the following applications / packages installed in your system:
- Python 3.7+ (the app was written and tested with Python 3.8)
- Python packages:
- pip
- psycopg2
- Git (should be pre-installed on most modern Linux and Mac systems, alternatively install from the git website)
To get the latest version from Github, run:
git clone https://github.com/S0mbre/russtat .
I recommend (as many do) installing packages into python's virtual environment using virtualenv or the inbuilt venv:
Create a new virtual environment (assuming your projects root folder is 'myprojects'):
Linux / Mac
cd myprojects
venv russtat
cd russtat
. ./bin/activate
Windows
cd myprojects
venv russtat
cd russtat
scripts\activate.bat
This step is, of course, optional. You can skip it if you don't want to use virtual environments for some reason or other.
Then just run:
cd russtat
python -m pip install -r requirements.txt
If you're using a virtual environment, you can deactivate it after closing the app with deactivate
.
Run python russtat.py
to start the application. Please modify the main()
function is that file first to suit your purpose!
See the documentation in russtat/doc/ref to find out more!