solver149's Stars
joyceannie/Reddit_Data_Pipeline
The purpose of the project is to create a data pipeline to extract data from Reddit API and create a dashboard to analyse the data. The data is extracted from the subreddit r/Python. The data is extracted daily and uploaded to S3 buckets, and copied to Redshift. The dashboard is created using Google Data Studio.
anirbanroydas/ci-testing-python
Sample Microservice App in Python for Testing using pytest, uber/doubles, tox on CI servers like Jenkins and Travis CI using Docker + Docker-Compose for test environment.
judeleonard/Prescriber-ETL-data-pipeline
An End-to-End ETL data pipeline that leverages pyspark parallel processing to process about 25 million rows of data coming from a SaaS application using Apache Airflow as an orchestration tool and various data warehouse technologies and finally using Apache Superset to connect to DWH for generating BI dashboards for weekly reports
moj-analytical-services/etl-pipeline-example
An example of an ETL pipeline that lays out generic DE processes. This is now out of date but still provides useful information
NYPL/drb-etl-pipeline
Application for loading records from external sources into the DRB collection and providing access via API
gadsbytom/docker_sentiment_etl
Dockerized ETL pipeline, applying sentiment analysis to tweets, and storing results in Mongo & SQL databases.
aschleg/pethub
Scripts and ETL for building Pet and Animal Related Database
ekampf/PySpark-Boilerplate
A boilerplate for writing PySpark Jobs
monte-carlo-data/data-observability-in-practice
Source code for the MC technical blog post "Data Observability in Practice Using SQL"
monte-carlo-data/data-downtime-challenge
uhussain/WebCrawlerForOnlineInflation
Price Crawler - Tracking Price Inflation
shafiab/HashtagCashtag
My Insight Data Engineering Fellowship project. I implemented a big data processing pipeline based on lambda architecture, that aggregates Twitter and US stock market data for user sentiment analysis using open source tools - Apache Kafka for data ingestions, Apache Spark & Spark Streaming for batch & real-time processing, Apache Cassandra f or storage, Flask, Bootstrap and HighCharts f or frontend.
Tanay0510/Data-Modeling-with-Apache-Cassandra
Data modeling with Apache Cassandra and completed an ETL pipeline using Python. Build an ETL pipeline that transfers data from a set of CSV files within a directory to create a streamlined CSV file to model and insert data into Apache Cassandra tables.
Tanay0510/Data-Modeling-Postgres
Data modeling with Postgres and built ETL pipeline that transfers data from files in two local directories into tables in Postgres using Python and SQL.
Tanay0510/Cloud-Data-Warehouse
Built an ETL pipeline that extracts their data from S3, stages them in Redshift, and transforms data into a set of dimensional tables. Load data from S3 to staging tables on Redshift and execute SQL statements that create the analytics tables from these staging tables.
Tanay0510/Data-Pipeline-with-Airflow
Built Data Pipelines with Airflow. Created custom operators to perform tasks such as staging the data, filling the data warehouse, and running checks on the data as the final step
Tanay0510/Data-Lake-with-Spark
Load data from S3, process the data into analytics tables using Spark and load them back into S3. Deployed this Spark process on a cluster using AWS EMR
kpmooney/numerical_methods_youtube
mingbocui/Data-Engineer-Project
Data Science Project to familiar with Airflow, Postgre, Cassandra and PySpark.