airscholar

Building datamasterylab.com

@Orbit-Inc England, United Kingdom

Pinned Repositories

ApacheFlink-SalesAnalytics
This repository contains an end-to-end data engineering project using Apache Flink, focused on performing sales analytics. The project demonstrates how to ingest, process, and analyze sales data, showcasing the capabilities of Apache Flink for big data processing.
Language:Java12 2 07
changecapture-e2e
This project shows how to capture changes from postgres database and stream them into kafka
Language:Python33 2 319
e2e-data-engineering
An end-to-end data engineering pipeline that orchestrates data ingestion, processing, and storage using Apache Airflow, Python, Apache Kafka, Apache Zookeeper, Apache Spark, and Cassandra. All components are containerized with Docker for easy deployment and scalability.
Language:Python230 4 8107
FlinkCommerce
This repository contains an Apache Flink application for real-time sales analytics built using Docker Compose to orchestrate the necessary infrastructure components, including Apache Flink, Elasticsearch, and Postgres
Language:Java41 2 024
FootballDataEngineering
An end-to-end data engineering pipeline that fetches data from Wikipedia, cleans and transforms it with Apache Airflow and saves it on Azure Data Lake. Other processing takes place on Azure Data Factory, Azure Synapse and Tableau.
Language:Python19 2 019
modern-data-eng-dbt-databricks-azure
In this project, we setup and end to end data engineering using Apache Spark, Azure Databricks, Data Build Tool (DBT) using Azure as our cloud provider.
26 3 013
realtime-voting-data-engineering
This repository contains the code for a realtime election voting system. The system is built using Python, Kafka, Spark Streaming, Postgres and Streamlit. The system is built using Docker Compose to easily spin up the required services in Docker containers.
Language:Python34 2 121
RealtimeStreamingEngineering
This project serves as a comprehensive guide to building an end-to-end data engineering pipeline using TCP/IP Socket, Apache Spark, OpenAI LLM, Kafka and Elasticsearch. It covers each stage from data acquisition, processing, sentiment analysis with ChatGPT, production to kafka topic and connection to elasticsearch.
Language:Python32 3 124
RedditDataEngineering
This project provides a comprehensive data pipeline solution to extract, transform, and load (ETL) Reddit data into a Redshift data warehouse. The pipeline leverages a combination of tools and services including Apache Airflow, Celery, PostgreSQL, Amazon S3, AWS Glue, Amazon Athena, and Amazon Redshift.
Language:Python117 4 158
SparkingFlow
This project demonstrates how to use Apache Airflow to submit jobs to Apache spark cluster in different programming laguages using Python, Scala and Java as an example.
Language:Java38 2 624

airscholar's Repositories

airscholar/e2e-data-engineering
An end-to-end data engineering pipeline that orchestrates data ingestion, processing, and storage using Apache Airflow, Python, Apache Kafka, Apache Zookeeper, Apache Spark, and Cassandra. All components are containerized with Docker for easy deployment and scalability.
Language:Python230 4 8107
airscholar/RedditDataEngineering
This project provides a comprehensive data pipeline solution to extract, transform, and load (ETL) Reddit data into a Redshift data warehouse. The pipeline leverages a combination of tools and services including Apache Airflow, Celery, PostgreSQL, Amazon S3, AWS Glue, Amazon Athena, and Amazon Redshift.
Language:Python117 4 158
airscholar/FlinkCommerce
This repository contains an Apache Flink application for real-time sales analytics built using Docker Compose to orchestrate the necessary infrastructure components, including Apache Flink, Elasticsearch, and Postgres
Language:Java41 2 024
airscholar/SparkingFlow
This project demonstrates how to use Apache Airflow to submit jobs to Apache spark cluster in different programming laguages using Python, Scala and Java as an example.
Language:Java38 2 624
airscholar/realtime-voting-data-engineering
This repository contains the code for a realtime election voting system. The system is built using Python, Kafka, Spark Streaming, Postgres and Streamlit. The system is built using Docker Compose to easily spin up the required services in Docker containers.
Language:Python34 2 121
airscholar/changecapture-e2e
This project shows how to capture changes from postgres database and stream them into kafka
Language:Python33 2 319
airscholar/RealtimeStreamingEngineering
This project serves as a comprehensive guide to building an end-to-end data engineering pipeline using TCP/IP Socket, Apache Spark, OpenAI LLM, Kafka and Elasticsearch. It covers each stage from data acquisition, processing, sentiment analysis with ChatGPT, production to kafka topic and connection to elasticsearch.
Language:Python32 3 124
airscholar/modern-data-eng-dbt-databricks-azure
In this project, we setup and end to end data engineering using Apache Spark, Azure Databricks, Data Build Tool (DBT) using Azure as our cloud provider.
26 3 013
airscholar/FootballDataEngineering
An end-to-end data engineering pipeline that fetches data from Wikipedia, cleans and transforms it with Apache Airflow and saves it on Azure Data Lake. Other processing takes place on Azure Data Factory, Azure Synapse and Tableau.
Language:Python19 2 019
airscholar/Kubernetes-For-DataEngineering
This repository contains the necessary configuration files and DAGs (Directed Acyclic Graphs) for setting up a robust data engineering environment using Kubernetes and Apache Airflow
Language:Python18 2 013
airscholar/ApacheFlink-SalesAnalytics
This repository contains an end-to-end data engineering project using Apache Flink, focused on performing sales analytics. The project demonstrates how to ingest, process, and analyze sales data, showcasing the capabilities of Apache Flink for big data processing.
Language:Java12 2 07
airscholar/cicd_for_data_engineering
This project showcases how to integrate the world of DevOps, focusing on Continuous Integration (CI) and Continuous Deployment (CD) with the realm of modern data engineering using Terraform and Azure as the case study
Language:HCL11 3 06
airscholar/YoutubeAnalytics
An end-to-end data engineering pipeline that fetches real-time YouTube analytics and streams them through Kafka for processing with ksqlDB. The processed analytics data is then sent to Telegram for real-time notifications.
Language:HTML11 2 04
airscholar/Japan-visa-data-engineering
This project provides an end-to-end data processing and visualization of visa numbers in Japan using PySpark and Plotly. The spark clusters are set up within a Docker container on Azure.
Language:HTML10 2 010
airscholar/AlphaTeam
Complex Network Analysis Using Machine Learning
Language:HTML7 2 367
airscholar/EMR-for-data-engineers
This project demonstrates the use of Amazon Elastic Map Reduce (EMR) for processing large datasets using Apache Spark. It includes a Spark script for ETL (Extract, Transform, Load) operations, AWS command line instructions for setting up and managing the EMR cluster, and a dataset for testing and demonstration purposes.
Language:Python7 2 08
airscholar/RealtimeAnomalyDetection
Language:Python7 2 11
airscholar/EcommerceKafka-Datagen
Language:Python5 2 0
airscholar/Face-Anonymizer
This repository contains different algorithms and methods to anonymize faces in images by blurring or pixelating them using OpenCV and MTCNN in Python
Language:Jupyter Notebook5 2 01
airscholar/dbt-bigquery-crash-course
A deep dive into the powerful combination of DBT and BigQuery, the game-changers in modern data engineering.
3 2 07
airscholar/CourseProject
Language:TypeScript2 2 00
airscholar/airscholar
1 3 02
airscholar/Background-removal
Language:Jupyter Notebook1 2 01
airscholar/DeepSeek-V3
Language:Python1 0 0
airscholar/qrcode-creator
Language:Jupyter Notebook1 2 0
airscholar/quix-test
Language:Python1 1 0
airscholar/docs.nestjs.com
The official documentation https://docs.nestjs.com 📕
Language:TypeScript1 0
airscholar/nest
A progressive Node.js framework for building efficient, scalable, and enterprise-grade server-side applications with TypeScript/JavaScript 🚀
Language:TypeScript
airscholar/sqlc
Generate type-safe code from SQL
Language:Go1 0
airscholar/typeorm
ORM for TypeScript and JavaScript (ES7, ES6, ES5). Supports MySQL, PostgreSQL, MariaDB, SQLite, MS SQL Server, Oracle, SAP Hana, WebSQL databases. Works in NodeJS, Browser, Ionic, Cordova and Electron platforms.
Language:TypeScript1 0