solver149

solver149's Stars

joyceannie/Reddit_Data_Pipeline
The purpose of the project is to create a data pipeline to extract data from Reddit API and create a dashboard to analyse the data. The data is extracted from the subreddit r/Python. The data is extracted daily and uploaded to S3 buckets, and copied to Redshift. The dashboard is created using Google Data Studio.
Language:Python41
anirbanroydas/ci-testing-python
Sample Microservice App in Python for Testing using pytest, uber/doubles, tox on CI servers like Jenkins and Travis CI using Docker + Docker-Compose for test environment.
Language:Shell3616
judeleonard/Prescriber-ETL-data-pipeline
An End-to-End ETL data pipeline that leverages pyspark parallel processing to process about 25 million rows of data coming from a SaaS application using Apache Airflow as an orchestration tool and various data warehouse technologies and finally using Apache Superset to connect to DWH for generating BI dashboards for weekly reports
Language:Python253
moj-analytical-services/etl-pipeline-example
An example of an ETL pipeline that lays out generic DE processes. This is now out of date but still provides useful information
Language:Python2814
NYPL/drb-etl-pipeline
Application for loading records from external sources into the DRB collection and providing access via API
Language:Python71
gadsbytom/docker_sentiment_etl
Dockerized ETL pipeline, applying sentiment analysis to tweets, and storing results in Mongo & SQL databases.
Language:Python3
aschleg/pethub
Scripts and ETL for building Pet and Animal Related Database
Language:Python24
ekampf/PySpark-Boilerplate
A boilerplate for writing PySpark Jobs
Language:Python395156
monte-carlo-data/data-observability-in-practice
Source code for the MC technical blog post "Data Observability in Practice Using SQL"
Language:Jupyter Notebook3618
monte-carlo-data/data-downtime-challenge
Language:Jupyter Notebook8339
uhussain/WebCrawlerForOnlineInflation
Price Crawler - Tracking Price Inflation
Language:Python18554
shafiab/HashtagCashtag
My Insight Data Engineering Fellowship project. I implemented a big data processing pipeline based on lambda architecture, that aggregates Twitter and US stock market data for user sentiment analysis using open source tools - Apache Kafka for data ingestions, Apache Spark & Spark Streaming for batch & real-time processing, Apache Cassandra f or storage, Flask, Bootstrap and HighCharts f or frontend.
Language:Scala495126
Tanay0510/Data-Modeling-with-Apache-Cassandra
Data modeling with Apache Cassandra and completed an ETL pipeline using Python. Build an ETL pipeline that transfers data from a set of CSV files within a directory to create a streamlined CSV file to model and insert data into Apache Cassandra tables.
Language:Jupyter Notebook1
Tanay0510/Data-Modeling-Postgres
Data modeling with Postgres and built ETL pipeline that transfers data from files in two local directories into tables in Postgres using Python and SQL.
Language:Jupyter Notebook1
Tanay0510/Cloud-Data-Warehouse
Built an ETL pipeline that extracts their data from S3, stages them in Redshift, and transforms data into a set of dimensional tables. Load data from S3 to staging tables on Redshift and execute SQL statements that create the analytics tables from these staging tables.
Language:Python1
Tanay0510/Data-Pipeline-with-Airflow
Built Data Pipelines with Airflow. Created custom operators to perform tasks such as staging the data, filling the data warehouse, and running checks on the data as the final step
Language:Python1
Tanay0510/Data-Lake-with-Spark
Load data from S3, process the data into analytics tables using Spark and load them back into S3. Deployed this Spark process on a cluster using AWS EMR
Language:Python1
kpmooney/numerical_methods_youtube
Language:Jupyter Notebook150133
mingbocui/Data-Engineer-Project
Data Science Project to familiar with Airflow, Postgre, Cassandra and PySpark.
Language:PLpgSQL1