etl-job
There are 63 repositories under etl-job topic.
AlexIoannides/pyspark-example-project
Implementing best practices for PySpark ETL jobs and applications.
san089/goodreads_etl_pipeline
An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.
paillave/Etl.Net
Mass processing data with a complete ETL for .net developers
jbogard/bulk-writer
Provides guidance for fast ETL jobs, an IDataReader implementation for SqlBulkCopy (or the MySql or Oracle equivalents) that wraps an IEnumerable, and libraries for mapping entites to table columns.
cloudposse/terraform-aws-glue
Terraform modules for provisioning and managing AWS Glue resources
felipefrizzo/terraform-aws-kinesis-firehose
This code creates a Kinesis Firehose in AWS to send CloudWatch log data to S3.
ktnsh24/DataModelling
This repo will guide you step-by-step method to create star schema dimensional model.
nsphung/pyspark-template
A Python PySpark Projet with Poetry
kishlayjeet/Twitter-Data-Pipeline-using-Airflow-and-AWS-S3
An end-to-end Twitter Data Pipeline that extracts data from Twitter and loads it into AWS S3.
michaelbironneau/analyst
A declarative, SQL-like DSL for data integration tasks.
yennanliu/AirflowJob
Airflow POC demo : 1) env set up 2) airflow DAG 3) Spark/ML pipeline | #DE
Joshua-omolewa/Retailstore_ETL_pipeline_project
Built a Data Pipeline for a Retail store using AWS services that collects data from its transactional database (OLTP) in Snowflake and transforms the raw data (ETL process) using Apache spark to meet business requirements and also enables Data Analyst create Data Visualization using Superset. Airflow is used to orchestrate the pipeline
TheCocoTeam/source-watcher-core
This is a PHP project which combines ETL with different strategies to extract data from multiple databases, files, and services, transform it and load it into multiple destinations.
2298-Software/Mambo
A simple in-memory, configuration driven, data processing pipeline for Apache Spark.
san089/airflow-training
Introduction to the data pipeline management with Airflow. Airflow schedule and maintain numerous ETL processes running on a large scale Enterprise Data Warehouse.
achugr/flink-comms-processing
Comms processing (ETL) with Apache Flink.
amantewary/Sentiment-Analysis-of-Tweets-Using-ETL-process-and-Elastic-Search
Sentiment Analysis of Tweets Using ETL process and Elastic Search
mdauthentic/ETLProject-Batch
An ETL pipeline where data is captured from REST API (Remotive, Adzuna & GitHub) and RSS feeds (StackOverflow). The data collected from the API is stored on local disk. The files are preprocessed and ETL jobs are written in spark and scheduled in Prefect to run every week. Transformed data is moved to PostgreSQL.
san089/pyspark-example-project
Example project and best practices for Python-based Spark ETL jobs and applications.
ShihWen/tpe-mrt-traffic-etl
A data pipeline from source to data warehouse using Taipei Metro Hourly Traffic data
amrelauoty/Telecom-ETL-SSIS
Telecom ETL is a SSIS package that ingest it's data from CSVs to DB
Iuryck/Fundamentus_API
Code for unofficial API for the brazillian stocks data website called Fundamentus. Uses requests and bs4 for scraping
Naz513/ETLCovid19Project
Event-Driven Python on AWS #CloudGuruChallenge
obaghirli/PyETL
python 3.5 package for ETL jobs
Oguzozcn/Reddit-Data-Pipeline-using-Airflow-and-AWS-S3
This project involves using the Reddit API to extract data, processing it using EC2 instances, and storing the output in CSV format within an AWS S3 bucket, with Airflow managing the overall workflow orchestration.
arturogonzalezm/convert_json_to_parquet
ETL (Extract, Transform, Load) job using PySpark - submodule
calysteau/krawler-job03
Download Meteo France Radar imagery using Kalisio Krawler
EssenceSentry/data_download_pipelines
Utilities for declarative specification of data download pipelines for ETL jobs.
heliomarpm/SQLDataTransfer
Ferramenta para Cópia de Dados SQL Server, que foi desenvolvida para auxiliar na geração de arquivos e cópia eficiente de dados entre bases de dados SQL Server.
JavadMalekzadeh/JavaEE-JBatch-ETL
This is a simple ETL Batch processing to extract data, here are messages, stored in a table. transform them into a new object, then insert them in another table
julientoucoula17/apache_airflow-with-Docker
Apache Airflow installation with Docker 🌬️
LaPetiteSouris/csvloader
Optimized CSV Loader, which replaces a traditional ETL process to load huge CSV dataset to traditional databases
shreeyajoshi2013/AWS_Data_Engineering_YouTube_Data
Data pipeline using S3, Glue, Athena, Lambda and Quicksight to analyze dataset of YouTube
techysanjo/Extract-Transform-and-Load-ETL-Project
Developed an Extract, Transform and Load (ETL) program to extract dataset from various types of sources, applied various transformation techniques and loaded to various destination types.
wallib-bitcoin/wallet-bc-etl-synchronization
Implement a basic ETL to synchronize Lightning Network payments