DataPipelines_Earnings_Calls_Transcripts

INFO 7374 Product Grade Data Pipelines
Shiqi Dai
Naveen Jami
Satwik Kashyap
Sindhu Raghavendra
Professor: Sri Krishnamurthy

Web Scraping and storing csv
Mapping paragraphs from raw html data(tuples - Introduction mappings, Q & A mappings, Conclusion mappings)
Data Model
Data Storage
Pre-processing
Dockerizing - Data Scraping and Data Preprocessing step
Sentiment API's module
Predictions and Analysis

Project Documentation Link (Google Doc)
Codelabs Presentation
System Overview
Data Module Sequential Diagram
Steps to run the application: docker-compose instructions using docker-compose.yml

included MongoDB Installation Guide

Clone this repository

git clone https://github.com/jaminaveen/DataPipelines_Earnings_Calls_Transcripts.git

Change directory to this git clone folder

cd "<local path>/DataPipelines_Earnings_Calls_Transcripts"

Docker Compose build (Use this when you run the application for the first time)
```
docker-compose build
```
Run the services and application (Note: If you are running it for the first time, Scraping will take about 30 min)
```
docker-compose up
```
Optional - To check data in MongoDB GUI, install either MongoDB Compass Community IDE or Robo3T IDE and use these connection settings
```
hostname - <docker-machine ip>
port - 27017
```

Use the command
```
docker-machine ip
```
Also, we can find docker machine ip when we open the docker terminal at the beginning

joshzyj/DataPipelines_Earnings_Calls_Transcripts