/neo4j-sec-edgar

Downloads Form 13 and Form 10-K data from the SEC EDGAR system. Uploads that to cloud buckets for labs.

Primary LanguagePythonApache License 2.0Apache-2.0

neo4-sec-edgar

This repository contains scripts to download SEC EDGAR data and format it for Neo4j loading and analytics. Specially to:

  • form13: Obtain information on investment managers and the companies they purchase stock from using Form 13. An FAQ on Form 13 is available here.
  • form10-k: Obtain text from 10-K filings for a fraction of the above companies

EDGAR uses HTTP for access. A writeup on that is here.

Setup

To run the scripts you will need to install dependencies:

sudo apt update
sudo apt -y install python3 python3-dev
sudo apt -y install screen wget
sudo python3 get-pip.py
sudo pip3 install --upgrade google-api-python-client
sudo pip3 install --upgrade pandas tqdm xmltodict beautifulsoup4 secedgar

Download

You can now run the scripts for each form.

AWS - Upload to S3

To do

Microsoft Azure - Upload to Blob Storage

To do

Google Cloud - Upload to Cloud Storage Bucket

Now that you have Form-13 and Form 10-K you can push them to Google cloud storage.

To do so, set the environment variables:

gcloud init

Now copy the Form 13 data:

gsutil cp data/form13.csv gs://neo4j-datasets/hands-on-lab/form13-2023.csv
gsutil cp data/form13-2023-05-11.csv gs://neo4j-datasets/hands-on-lab/form13-2023-05-11.csv
# to do - https://github.com/neo4j-partners/neo4j-sec-edgar/issues/4

And copy Form 10-K:

gsutil cp data/form10k.zip gs://neo4j-datasets/hands-on-lab/form10k.zip