neo4j-sec-edgar-form13
These scripts download SEC EDGAR data and load that into Neo4j. They operate specifically on SEC Form 13. An FAQ on Form 13 is available here. EDGAR uses HTTP for access. A writeup on that is here.
This dataset is used by two hands on labs:
Setup
Install dependencies:
sudo apt update
sudo apt -y install python3 python3-dev
sudo apt -y install screen wget
sudo python3 get-pip.py
sudo pip3 install --upgrade google-api-python-client
sudo pip3 install --upgrade pandas
sudo pip3 install --upgrade tqdm
Download
To start the downloader, run this:
cd download
screen -S edgar
python3 download.py
Then type ^ad to detach.
Featurize
Once you have all the CSVs per date, you're going to want to combine and featurize them. This will spit out a single CSV.
cd featurize
python3 featurize.py
Copy data to bucket
Setup the enviromental variables:
gcloud init
Now copy the data:
gsutil cp train.csv gs://neo4j-datasets/form13/
gsutil cp test.csv gs://neo4j-datasets/form13/
Combine train and test
If you want to combine the train and test datasets you can run:
import pandas
train=pandas.read_csv('train.csv')
test=pandas.read_csv('test.csv')
form13=pandas.concat([train,test])
form13.to_csv('form13.csv',index=False)
Then copy it to a bucket with the command:
gsutil cp form13.csv gs://neo4j-datasets/form13/