/DataPipelines_Earnings_Calls_Transcripts

Docker data pipeline for Data collection, Transformation, and storing it into MongoDB

Primary LanguagePythonGNU General Public License v3.0GPL-3.0

DataPipelines_Earnings_Calls_Transcripts

INFO 7374 Product Grade Data Pipelines
Shiqi Dai
Naveen Jami
Satwik Kashyap
Sindhu Raghavendra
Professor: Sri Krishnamurthy

Progress

  • Web Scraping and storing csv
  • Mapping paragraphs from raw html data(tuples - Introduction mappings, Q & A mappings, Conclusion mappings)
  • Data Model
  • Data Storage
  • Pre-processing
  • Dockerizing - Data Scraping and Data Preprocessing step
  • Sentiment API's module
  • Predictions and Analysis

Contents

  • Project Documentation Link (Google Doc)
  • Codelabs Presentation
  • System Overview
  • Data Module Sequential Diagram
  • Steps to run the application: docker-compose instructions using docker-compose.yml

Project Documentation Link (Google Doc):

included MongoDB Installation Guide

https://docs.google.com/document/d/1rFcNPuP9XiATgN7kJyd_60TYVKW-msw6CV0yMsOO_qc/edit?usp=sharing

Codelabs Presentation:

https://codelabs-preview.appspot.com/?file_id=1rFcNPuP9XiATgN7kJyd_60TYVKW-msw6CV0yMsOO_qc#0

System Overview

Data Module Sequential Diagram

Steps to run the application: docker-compose instructions using docker-compose.yml

  1. Clone this repository

    git clone https://github.com/jaminaveen/DataPipelines_Earnings_Calls_Transcripts.git
    
  2. Open Docker Quickstart Terminal

  3. Change directory to this git clone folder

    cd "<local path>/DataPipelines_Earnings_Calls_Transcripts"
    
  4. Docker Compose build (Use this when you run the application for the first time)

    docker-compose build
    
  5. Run the services and application (Note: If you are running it for the first time, Scraping will take about 30 min)

    docker-compose up
    
  6. Optional - To check data in MongoDB GUI, install either MongoDB Compass Community IDE or Robo3T IDE and use these connection settings

    hostname - <docker-machine ip>
    port - 27017
    

Two ways to find the docker ip

  1. Use the command

    docker-machine ip
    
  2. Also, we can find docker machine ip when we open the docker terminal at the beginning