SEC Filings analyzer
Install Node.js which includes Node Package Manager
- Clone this repository on local machine
git clone https://github.com/ashutoshc8101/sec-fillings-frontend.git
- Install dependencies using npm
npm install
- Run frontend locally using
npx ng serve
Python >= 3.7 recommended. Python 2 not supported.
Installation
git clone https://github.com/ParwaanVirk/SEC-filings-backend.git
cd SEC-filings-backend
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
python manage.py makemigrations
python manage.py migrate
python manage.py createsuperuser
Scraping:
cd scraping
python scrape.py
Run backend server:
python manage.py runserver
Feeding companies to backend This process is manual for now but it is planned to be automated.
- Run django backend server using
python manage.py runserver
- Visit http://localhost:8000/admin/ and login using the superuser credentails
- Once logged in, add companies with their CIK and ticket numbers.
Seeding: Note, Feeding companies to backend is a pre-requisite for this step.
Database should be seeded from the csv files before actual usage of the application.
Manual:
This can be done by sending a GET request to route /company/seeder/
.
CRON Jobs: A cron job is set up for seeding the backend database from scraped csv filings.
Instructions for using cron jobs
python manage.py crontab add
The SEC’s EDGAR database contains terabytes of documents and data, including press releases, annual corporate filings, executive employment agreements, and investment company holdings. While EDGAR has existed for over twenty years, scholars have had difficulty conducting or reproducing research based on EDGAR data. Researchers often spend a lot of time and money developing and redeveloping code to retrieve and parse EDGAR data with no common bottom-up framework.
-
Metrics of SaaS companies can be viewed on an interactive web dashboard.
-
Rating of a company in terms of Profitability, Investability and Growth.
-
Most viewed SaaS companies are available on search page for easier access.
- The source of metrics in our app is EDGAR. Edgar API is used to scrap metrics.
- The scrapped forms (10K, 10Q, 8K) are stored as csv files.
- A scheduled cron job reads these scrapped csv files, obtain neccessary metrics and seeds them into the backend database.
- The django backend reads the database and provides the necessary data to the frontend. It also powers user authentication, search and favourites functionalities.
- Frontend written using angular provides a fluid dashboard for easy viewing and comparision of SaaS metrics.
Since we had a small training dataset, we went on to use simple machine learning regression models to fit our data. Experiments were performed with three different machine learning models, namely Ridge Regression, SVM Regressor and Lasso Regression. Among these the Lasso model which uses L1 regularization, yielded the best generalization for the validation dataset. Two different models were deployed with each category considered as a separate label in each dataset.
- Growth estimation model.
- Profitability estimation model.
The lasso procedure encourages simple and sparse models(i.e models with fewer parameters). It also helps reduce overfitting of the model to the dataset, which had to be specially dealt with in this case. Thus the lasso model was chosen and the results have been displayed on the dashboard.