INFO 7374 Product Grade Data Pipelines
Shiqi Dai
Naveen Jami
Satwik Kashyap
Sindhu Raghavendra
Professor: Sri Krishnamurthy
- Web Scraping and storing csv
- Mapping paragraphs from raw html data(tuples - Introduction mappings, Q & A mappings, Conclusion mappings)
- Data Model
- Data Storage
- Pre-processing
- Dockerizing - Data Scraping and Data Preprocessing step
- Sentiment API's module
- Predictions and Analysis
- Project Documentation Link (Google Doc)
- Codelabs Presentation
- System Overview
- Data Module Sequential Diagram
- Steps to run the application: docker-compose instructions using docker-compose.yml
included MongoDB Installation Guide
https://docs.google.com/document/d/1rFcNPuP9XiATgN7kJyd_60TYVKW-msw6CV0yMsOO_qc/edit?usp=sharing
https://codelabs-preview.appspot.com/?file_id=1rFcNPuP9XiATgN7kJyd_60TYVKW-msw6CV0yMsOO_qc#0
-
Clone this repository
git clone https://github.com/jaminaveen/DataPipelines_Earnings_Calls_Transcripts.git
-
Open Docker Quickstart Terminal
-
Change directory to this git clone folder
cd "<local path>/DataPipelines_Earnings_Calls_Transcripts"
-
Docker Compose build (Use this when you run the application for the first time)
docker-compose build
-
Run the services and application (Note: If you are running it for the first time, Scraping will take about 30 min)
docker-compose up
-
Optional - To check data in MongoDB GUI, install either MongoDB Compass Community IDE or Robo3T IDE and use these connection settings
hostname - <docker-machine ip> port - 27017
-
Use the command
docker-machine ip
-
Also, we can find docker machine ip when we open the docker terminal at the beginning