Pinned Repositories
ETL-Data-Pipeline-RDBMS-TO-HDFS-using-Airflow-Apache-Sqoop-Spark-Postgres-and-Hive
This project aims to move the data from a Relational database system (RDBMS) to a Hadoop file system (HDFS)
Iceberg-Dbt-Trino-Hive-modern-open-source-data-stack
To provide a deeper understanding of how the modern, open-source data stack consisting of Iceberg, dbt, Trino, and Hive operates within a music streaming platform, let’s delve into the detailed workflow and benefits of each component.
Kafka-pipeline
In the following post, we will learn how to build a data pipeline using a combination of open-source software (OSS), including Debezium, Apache Kafka, Kafka Connect.
modern-data-pipeline
reating a modern data pipeline using a combination of Terraform, AWS Lambda and S3, Snowflake, DBT, Mage AI, and Dash.
Nifi-ETL-Data-Pipeline
This post will demonstrate the creation of a containerized data engineer environment using Docker Stacks.
projet_data
Utilizing of open source technologies for the implementation of a data pipeline
railway-station-streaming
Scalable-RSS-Feed-Pipeline
In this article, we'll walk through how to build a scalable ETL pipeline using Apache Airflow, Kafka, and Python, Mongo and Flask
stream-ingestion-redpanda-minio
In this article, you will learn how to set up a real-time data processing and analytics environment using Docker, MySQL, Redpanda, MinIO, and Apache Spark.
Uber_projet
Unveiling the true cost of your ride-sharing and food delivery habits with an ELT data pipeline, PostgreSQL, dbt, and Power BI.
Stefen-Taime's Repositories
Stefen-Taime/Iceberg-Dbt-Trino-Hive-modern-open-source-data-stack
To provide a deeper understanding of how the modern, open-source data stack consisting of Iceberg, dbt, Trino, and Hive operates within a music streaming platform, let’s delve into the detailed workflow and benefits of each component.
Stefen-Taime/modern-data-pipeline
reating a modern data pipeline using a combination of Terraform, AWS Lambda and S3, Snowflake, DBT, Mage AI, and Dash.
Stefen-Taime/stream-ingestion-redpanda-minio
In this article, you will learn how to set up a real-time data processing and analytics environment using Docker, MySQL, Redpanda, MinIO, and Apache Spark.
Stefen-Taime/Scalable-RSS-Feed-Pipeline
In this article, we'll walk through how to build a scalable ETL pipeline using Apache Airflow, Kafka, and Python, Mongo and Flask
Stefen-Taime/railway-station-streaming
Stefen-Taime/Free-Real-time-Flight-Status-Pipeline
real-time flight status data pipeline using a myriad of technologies such as Kafka, Schema Registry, Avro, GraphQL, Postgres, and React.
Stefen-Taime/docSearch
Our project is a testament to this need, offering a comprehensive solution that combines modern technologies and architectures to create a powerful document search engine. This engine is not just a tool but a sophisticated ecosystem designed to handle complex data processing and retrieval tasks.
Stefen-Taime/ModernDataEngineerPipeline
Building a Robust Data Pipeline: Integrating Proxy Rotation, Kafka, MongoDB, Redis, Logstash, Elasticsearch, and MinIO for Efficient Web Scraping
Stefen-Taime/etl_onaws_deploy_with_terraform
The objective of this guide is to demonstrate how to automate the deployment of a data pipeline on AWS using Terraform. The pipeline will utilize AWS services such as Lambda, Glue, Crawler, Redshift, and
Stefen-Taime/investissement
Jenkins Delta pipeline
Stefen-Taime/open-source-data
This repository contains structured datasets in various categories
Stefen-Taime/build_api_auth2.0
Stefen-Taime/build_api_devops_pipeline
Stefen-Taime/datawarehouse
Stefen-Taime/eventmusic
EventMusic Producer is a Dockerized application designed to read data and output them to a Kafka topic, using Avro schemas for data serialization. It integrates seamlessly with Kafka and the Schema Registry to manage the flow of event data linked to music event information.
Stefen-Taime/Gmail-to-MongoDB-Script
This script facilitates the automation of fetching emails from a user's Gmail account and storing them into a MongoDB database. The emails fetched are filtered by specific labels such as Promotions, Social, Updates, and Forums. The script is intended to run continuously, checking for new emails every minute.
Stefen-Taime/MongoElasticMigrator
This tool migrates data from MongoDB collections to Elasticsearch indices. It's built using Rust and supports configurable migrations.
Stefen-Taime/myUberEats_dataPipeline
Building a Modern Uber Eats Data Pipeline
Stefen-Taime/Real-Time-Data-Pipeline-Snake-Game
Dynamic Snake Game: Unleashing Real-Time Streaming Analytics with Redis, Kafka, Flink, ClickHouse & Chart.js in an Online Snake Game via Flask API
Stefen-Taime/Stefen-Taime
Config files for my GitHub profile.
Stefen-Taime/terraform_snowflake_devops
Develop a scalable and secure data infrastructure, Integrate diverse data sources into Snowflake.
Stefen-Taime/-Google-Analytics-360
Welcome to the Google Analytics 360 Dataset Project! This repository is designed for anyone interested in working with realistic Google Analytics data. Whether you're a data scientist, a student, or a marketing analyst
Stefen-Taime/azurePipeline
Azure Data Pipeline
Stefen-Taime/dataops
Stefen-Taime/fake-server-data
Stefen-Taime/Lambda_Pipeline
Stefen-Taime/llm-rag-mtl-public-hospital
Ce projet développe un modèle de type Retrieve-Augment-Generate (RAG) pour répondre aux questions en utilisant les données publiques des avis laissés sur Google pour des hôpitaux à Montréal
Stefen-Taime/openday
Stefen-Taime/Real-Time-Extraction-Transformation-and-Exposure-Architecture-for-Rail-Data
we are thrilled to announce our new PoC project aimed at providing a complete real-time extraction, transformation, and exposure architecture for the new provincial transportation systems.
Stefen-Taime/realtime-race-mapper
In this rendition, Elastic and Kibana have been replaced with the powerful Splunk, MQTT has been swapped out for ActiveMQ, and instead of the traditional Kafka, we’ve integrated Confluent Cloud.