Pinned Repositories
airflow-training
Introduction to the data pipeline management with Airflow. Airflow schedule and maintain numerous ETL processes running on a large scale Enterprise Data Warehouse.
Big_Data_Project
Fake News Detection - Feature Extraction using Vectorization such as Count Vectorizer, TFIDF Vectorizer, Hash Vectorizer,. Then used an Ensemble model to classify whether the news is fake or not.
Cloudera_Material
Cloudera_Material: Study Material to help people preparing for Cloudera CCA Spark and Hadoop Developer Exam (CCA175). Feel free to collaborate.
data-engineer-roadmap
Learning from multiple companies in Silicon Valley. Netflix, Facebook, Google, Startups
goodreads
:snake: Python wrapper for Goodreads API :books:
goodreads_etl_pipeline
An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.
Hadoop-Books
This is my personal collection of free Hadoop books, please feel free to share and learn.
Optimizing-Public-Transportation
A real-time event pipeline around Kafka Ecosystem for Chicago Transit Authority.
Spark_Packaged_project
This project contains pyspark jobs to create data pipelines and shows how to distribute the project package on Cluster.
Udacity-Data-Engineering-Projects
Few projects related to Data Engineering including Data Modeling, Infrastructure setup on cloud, Data Warehousing and Data Lake development.
san089's Repositories
san089/Hadoop-Books
This is my personal collection of free Hadoop books, please feel free to share and learn.
san089/beamer-themes
san089/scala-spark-4