Repository for extracting data from S-1 and 10-K SEC forms.
- Clone or download the repo.
- In the project's root directory, install the dependencies.
# Use this if you use Poetry
poetry install
poetry shell
# Use this is you use pip
python3 -m venv .venv
source ./venv/bin/activate
python3 -m pip install .
- Create the file
./sec_extract/keys.py
, replacingyour-api-key
with your sec-api key.
SEC_API_KEY = "your-api-key"
This package contains the code for downloading the S-1 and 10-K forms.
To run:
- From the project root directory, run
python3 -m sec_extract.download
. This creates a new directory,./target
, which contains the downloaded forms.
This package contains the code for extracting the business and management sections of the S-1 forms.
To run (only run after running download
first):
- From the project root directory, run
python3 -m sec_extract.extract
. The extracted sections will be in./target
.
The output HTML documents can be rendered to PDFs using the included convert_pdf
script.
This script requires Google Chrome and GNU Parallel.
On macOS, you will need a google-chrome
script on your PATH with the following contents (or similar):
#!/bin/sh
exec /Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome ${1+"$@"}
GNU Parallel is cited below:
Tange, O. (2022, March 22). GNU Parallel 20220322 ('Маріу́поль'). Zenodo. https://doi.org/10.5281/zenodo.6377950