SEC Filings Analyzer

SEC Filings analyzer

Prerequisites

Install Node.js which includes Node Package Manager

Installation

Frontend

Clone this repository on local machine

git clone https://github.com/ashutoshc8101/sec-fillings-frontend.git

Install dependencies using npm

npm install

Run frontend locally using

npx ng serve

Backend

Python >= 3.7 recommended. Python 2 not supported.

Installation

git clone https://github.com/ParwaanVirk/SEC-filings-backend.git
cd SEC-filings-backend
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
python manage.py makemigrations
python manage.py migrate
python manage.py createsuperuser

Scraping:

cd scraping
python scrape.py

Run backend server:

python manage.py runserver

Feeding companies to backend This process is manual for now but it is planned to be automated.

Run django backend server using python manage.py runserver
Visit http://localhost:8000/admin/ and login using the superuser credentails
Once logged in, add companies with their CIK and ticket numbers.

Seeding: Note, Feeding companies to backend is a pre-requisite for this step.

Database should be seeded from the csv files before actual usage of the application. Manual: This can be done by sending a GET request to route /company/seeder/.

CRON Jobs: A cron job is set up for seeding the backend database from scraped csv filings.

Instructions for using cron jobs

python manage.py crontab add

Problem Statement

The SEC’s EDGAR database contains terabytes of documents and data, including press releases, annual corporate filings, executive employment agreements, and investment company holdings. While EDGAR has existed for over twenty years, scholars have had difficulty conducting or reproducing research based on EDGAR data. Researchers often spend a lot of time and money developing and redeveloping code to retrieve and parse EDGAR data with no common bottom-up framework.

Functionalities

Metrics of SaaS companies can be viewed on an interactive web dashboard.
Side by side comparison of two companies.
Rating of a company in terms of Profitability, Investability and Growth.
Most viewed SaaS companies are available on search page for easier access.
SaaS companies can be marked as favourites.

Architecture Overview:

The source of metrics in our app is EDGAR. Edgar API is used to scrap metrics.
The scrapped forms (10K, 10Q, 8K) are stored as csv files.
A scheduled cron job reads these scrapped csv files, obtain neccessary metrics and seeds them into the backend database.
The django backend reads the database and provides the necessary data to the frontend. It also powers user authentication, search and favourites functionalities.
Frontend written using angular provides a fluid dashboard for easy viewing and comparision of SaaS metrics.

ML Model Used:

Since we had a small training dataset, we went on to use simple machine learning regression models to fit our data. Experiments were performed with three different machine learning models, namely Ridge Regression, SVM Regressor and Lasso Regression. Among these the Lasso model which uses L1 regularization, yielded the best generalization for the validation dataset. Two different models were deployed with each category considered as a separate label in each dataset.

Growth estimation model.
Profitability estimation model.

The lasso procedure encourages simple and sparse models(i.e models with fewer parameters). It also helps reduce overfitting of the model to the dataset, which had to be specially dealt with in this case. Thus the lasso model was chosen and the results have been displayed on the dashboard.