Pinned Repositories
course-collaboration-travel-plans
Data-Lake-Spark-EMR
[Sparkify]build an ETL pipeline that extracts their data from S3, processes them using Spark, and loads the data back into S3 as a set of dimensional tables. This will allow their analytics team to continue finding insights in what songs their users are listening to.
Data-Modeling-Cassandra
[Sparkify]A Non-Relational database schema and ETL pipeline for data which resides in a directory of CSV logs on user activity for a music app as well as metadata on the songs in their app.
Data-Modeling-Postgres
[Sparkify]A database schema and ETL pipeline for data which resides in a directory of JSON logs on user activity for a music app, as well as a directory with JSON metadata on the songs in their app.
Data-Modeling-Spark-udacity-capstone
An ETL pipeline for I94 immigration, global land temperatures and US demographics datasets is created to form an analytics database on immigration events. A data model is established with pandas and pyspark to find patterns of immigration to the United States.
Data-Pipelines-with-Airflow
[Sparkify]Build high grade data pipelines that are dynamic and built from reusable tasks, can be monitored, and allow easy backfills. The data quality plays a big part when analyses are executed on top the data warehouse and running tests against the datasets after the ETL steps is executed to catch any discrepancies in the datasets.
Data-Warehouse-AWS-Redshift
[Sparkify]Building an ETL pipeline that extracts data from S3, stages them in Redshift, and transforms data into a set of dimensional tables as a data warehouse for analytics team to continue finding insights into what songs their users are listening to.
github-test-repo
khaledshabasy's Repositories
khaledshabasy/Data-Modeling-Spark-udacity-capstone
An ETL pipeline for I94 immigration, global land temperatures and US demographics datasets is created to form an analytics database on immigration events. A data model is established with pandas and pyspark to find patterns of immigration to the United States.
khaledshabasy/course-collaboration-travel-plans
khaledshabasy/Data-Lake-Spark-EMR
[Sparkify]build an ETL pipeline that extracts their data from S3, processes them using Spark, and loads the data back into S3 as a set of dimensional tables. This will allow their analytics team to continue finding insights in what songs their users are listening to.
khaledshabasy/Data-Modeling-Cassandra
[Sparkify]A Non-Relational database schema and ETL pipeline for data which resides in a directory of CSV logs on user activity for a music app as well as metadata on the songs in their app.
khaledshabasy/Data-Modeling-Postgres
[Sparkify]A database schema and ETL pipeline for data which resides in a directory of JSON logs on user activity for a music app, as well as a directory with JSON metadata on the songs in their app.
khaledshabasy/Data-Pipelines-with-Airflow
[Sparkify]Build high grade data pipelines that are dynamic and built from reusable tasks, can be monitored, and allow easy backfills. The data quality plays a big part when analyses are executed on top the data warehouse and running tests against the datasets after the ETL steps is executed to catch any discrepancies in the datasets.
khaledshabasy/Data-Warehouse-AWS-Redshift
[Sparkify]Building an ETL pipeline that extracts data from S3, stages them in Redshift, and transforms data into a set of dimensional tables as a data warehouse for analytics team to continue finding insights into what songs their users are listening to.
khaledshabasy/github-test-repo