Pinned Repositories
ApacheFlink-SalesAnalytics
This repository contains an end-to-end data engineering project using Apache Flink, focused on performing sales analytics. The project demonstrates how to ingest, process, and analyze sales data, showcasing the capabilities of Apache Flink for big data processing.
changecapture-e2e
This project shows how to capture changes from postgres database and stream them into kafka
e2e-data-engineering
An end-to-end data engineering pipeline that orchestrates data ingestion, processing, and storage using Apache Airflow, Python, Apache Kafka, Apache Zookeeper, Apache Spark, and Cassandra. All components are containerized with Docker for easy deployment and scalability.
FlinkCommerce
This repository contains an Apache Flink application for real-time sales analytics built using Docker Compose to orchestrate the necessary infrastructure components, including Apache Flink, Elasticsearch, and Postgres
FootballDataEngineering
An end-to-end data engineering pipeline that fetches data from Wikipedia, cleans and transforms it with Apache Airflow and saves it on Azure Data Lake. Other processing takes place on Azure Data Factory, Azure Synapse and Tableau.
modern-data-eng-dbt-databricks-azure
In this project, we setup and end to end data engineering using Apache Spark, Azure Databricks, Data Build Tool (DBT) using Azure as our cloud provider.
realtime-voting-data-engineering
This repository contains the code for a realtime election voting system. The system is built using Python, Kafka, Spark Streaming, Postgres and Streamlit. The system is built using Docker Compose to easily spin up the required services in Docker containers.
RealtimeStreamingEngineering
This project serves as a comprehensive guide to building an end-to-end data engineering pipeline using TCP/IP Socket, Apache Spark, OpenAI LLM, Kafka and Elasticsearch. It covers each stage from data acquisition, processing, sentiment analysis with ChatGPT, production to kafka topic and connection to elasticsearch.
RedditDataEngineering
This project provides a comprehensive data pipeline solution to extract, transform, and load (ETL) Reddit data into a Redshift data warehouse. The pipeline leverages a combination of tools and services including Apache Airflow, Celery, PostgreSQL, Amazon S3, AWS Glue, Amazon Athena, and Amazon Redshift.
SparkingFlow
This project demonstrates how to use Apache Airflow to submit jobs to Apache spark cluster in different programming laguages using Python, Scala and Java as an example.
airscholar's Repositories
airscholar/ANP-Web
Sample static ANP Website template
airscholar/ApelBackend
airscholar/AwesomeLogin
airscholar/calendar.js
Port of Python calendar.py module to JavaScript
airscholar/CurrencyConverter
airscholar/Developers-connect
A MERN Stack App to serve as a social media for developers
airscholar/DirectorXY
airscholar/Discuss
airscholar/Docker-api
airscholar/dogehouse
Taking voice conversations to the moon 🚀
airscholar/Elibar
airscholar/ElixCards
airscholar/Elixcon
airscholar/MyApp
airscholar/NestMicroService
airscholar/NestTaskManager
airscholar/node-redis
airscholar/Personify
airscholar/Phishing-Detector-App
airscholar/Phishing-Website-Detector
airscholar/Springboot-h2-starter
airscholar/StoryBooks
airscholar/swagger-file-db
airscholar/TastyRecipes
airscholar/travel-management-system
airscholar/tutorials
DevOps by Example
airscholar/Webscraper-Stackoverflow