khaledshabasy

Computer Science Graduate and Data Engineering Enthusiast

Cairo, Egypt

Pinned Repositories

course-collaboration-travel-plans
Language:CSS0 0 00
Data-Lake-Spark-EMR
[Sparkify]build an ETL pipeline that extracts their data from S3, processes them using Spark, and loads the data back into S3 as a set of dimensional tables. This will allow their analytics team to continue finding insights in what songs their users are listening to.
Language:Python0 1 00
Data-Modeling-Cassandra
[Sparkify]A Non-Relational database schema and ETL pipeline for data which resides in a directory of CSV logs on user activity for a music app as well as metadata on the songs in their app.
Language:Jupyter Notebook0 1 00
Data-Modeling-Postgres
[Sparkify]A database schema and ETL pipeline for data which resides in a directory of JSON logs on user activity for a music app, as well as a directory with JSON metadata on the songs in their app.
Language:Jupyter Notebook0 1 00
Data-Modeling-Spark-udacity-capstone
An ETL pipeline for I94 immigration, global land temperatures and US demographics datasets is created to form an analytics database on immigration events. A data model is established with pandas and pyspark to find patterns of immigration to the United States.
Language:Jupyter Notebook1 1 00
Data-Pipelines-with-Airflow
[Sparkify]Build high grade data pipelines that are dynamic and built from reusable tasks, can be monitored, and allow easy backfills. The data quality plays a big part when analyses are executed on top the data warehouse and running tests against the datasets after the ETL steps is executed to catch any discrepancies in the datasets.
Language:Python00
Data-Warehouse-AWS-Redshift
[Sparkify]Building an ETL pipeline that extracts data from S3, stages them in Redshift, and transforms data into a set of dimensional tables as a data warehouse for analytics team to continue finding insights into what songs their users are listening to.
Language:Jupyter Notebook0 1 00
github-test-repo
Language:CSS0 1 00

khaledshabasy's Repositories

khaledshabasy/Data-Modeling-Spark-udacity-capstone
An ETL pipeline for I94 immigration, global land temperatures and US demographics datasets is created to form an analytics database on immigration events. A data model is established with pandas and pyspark to find patterns of immigration to the United States.
Language:Jupyter Notebook1 1 00
khaledshabasy/course-collaboration-travel-plans
Language:CSS0 0 00
khaledshabasy/Data-Lake-Spark-EMR
[Sparkify]build an ETL pipeline that extracts their data from S3, processes them using Spark, and loads the data back into S3 as a set of dimensional tables. This will allow their analytics team to continue finding insights in what songs their users are listening to.
Language:Python0 1 00
khaledshabasy/Data-Modeling-Cassandra
[Sparkify]A Non-Relational database schema and ETL pipeline for data which resides in a directory of CSV logs on user activity for a music app as well as metadata on the songs in their app.
Language:Jupyter Notebook0 1 00
khaledshabasy/Data-Modeling-Postgres
[Sparkify]A database schema and ETL pipeline for data which resides in a directory of JSON logs on user activity for a music app, as well as a directory with JSON metadata on the songs in their app.
Language:Jupyter Notebook0 1 00
khaledshabasy/Data-Pipelines-with-Airflow
[Sparkify]Build high grade data pipelines that are dynamic and built from reusable tasks, can be monitored, and allow easy backfills. The data quality plays a big part when analyses are executed on top the data warehouse and running tests against the datasets after the ETL steps is executed to catch any discrepancies in the datasets.
Language:Python00
khaledshabasy/Data-Warehouse-AWS-Redshift
[Sparkify]Building an ETL pipeline that extracts data from S3, stages them in Redshift, and transforms data into a set of dimensional tables as a data warehouse for analytics team to continue finding insights into what songs their users are listening to.
Language:Jupyter Notebook0 1 00
khaledshabasy/github-test-repo
Language:CSS0 1 00

khaledshabasy

Pinned Repositories

course-collaboration-travel-plans

Data-Lake-Spark-EMR

Data-Modeling-Cassandra

Data-Modeling-Postgres

Data-Modeling-Spark-udacity-capstone

Data-Pipelines-with-Airflow

Data-Warehouse-AWS-Redshift

github-test-repo

khaledshabasy's Repositories

khaledshabasy/Data-Modeling-Spark-udacity-capstone

khaledshabasy/course-collaboration-travel-plans

khaledshabasy/Data-Lake-Spark-EMR

khaledshabasy/Data-Modeling-Cassandra

khaledshabasy/Data-Modeling-Postgres

khaledshabasy/Data-Pipelines-with-Airflow

khaledshabasy/Data-Warehouse-AWS-Redshift

khaledshabasy/github-test-repo