/companies_house_filing_fetcher

Downloads paper filings from the Companies House API

Primary LanguagePythonMIT LicenseMIT

Companies House filings fetcher

Downloads paper filings from the Companies House API.

Uses Scrapy to manage downloading account filings from the Companies House API.

NOTE: This project was created as a prototype

Usage

scrapy crawl latest_paper_filing

Set up

Assumptions:

  • Conda package manager installed

Steps:

Clone the project:

    clone git@github.com:ONSBigData/companies_house_filing_fetcher.git

Set up the Python environment:

    cd companies_house_filing_fetcher
    conda env create -f environment.yml
    conda activate pdf_downloader

Download a BasicCompanyDataAsOneFile csv from http://download.companieshouse.gov.uk/en_output.html.

Copy the config files to ~/config and edit their contents:

    mkdir ~/config
    cp ch_api_key.example.ini ~/config/ch_api_key_example.ini
    cp filing_fetcher_config.example.yml ~/config/filing_fetcher_config.yml

Review config values in spiders/settings.py.

Run the downloader:

    scrapy crawl latest_paper_filing