extract-transform-load

There are 91 repositories under extract-transform-load topic.

  • bonobo

    Extract Transform Load for Python 3.5+

    Language:Python1.6k
  • diffsync

    A utility library for comparing and synchronizing different datasets.

    Language:Python141
  • ETL_with_Python

    ETL with Python - Taught at DWH course 2017 (TAU)

    Language:Jupyter Notebook101
  • YaEtl

    Yet Another ETL in PHP

    Language:PHP63
  • docwire

    DocWire SDK: Award-winning modern data processing in C++20. SourceForge Community Choice & Microsoft support. AI-driven processing. Supports nearly 100 data formats, including email boxes and OCR. Boost efficiency in text extraction, web data extraction, data mining, document analysis. Offline processing is possible for security and confidentiality

    Language:C++62
  • bonobo-sqlalchemy

    PREVIEW - SQL databases in Bonobo, using sqlalchemy

    Language:Python25
  • DIFS

    Data Importer For SharePoint & Office 365

  • Data-Pipeline-with-dbt-using-Airflow-on-GCP

    This project demonstrates how to build and automate an ETL pipeline using DAGs in Airflow and load the transformed data to Bigquery. There are different tools that have been used in this project such as Astro, DBT, GCP, Airflow, Metabase.

    Language:Python19
  • metadata_extractors

    A Working Group on connecting and advancing interoperability of efforts on automated extraction of metadata from materials and chemical file formats

  • bonobo-docker

    PREVIEW - Run Bonobo data processing graphs in docker containers.

    Language:Python13
  • Business-Intelligence-and-Data-Warehousing

    Business Intelligence and Data Warehousing Project

    Language:TSQL10
  • hexbase

    open-source ETL pipeline for HEX cryptocurrency data

    Language:Python9
  • Full-Cycle-ETL-Analytics-with-Google-Analytics-and-Snowflake

    Explore the transformative power of data analytics in my portfolio, where Google Analytics and Snowflake converge to provide comprehensive insights. This project leverages advanced ETL techniques and real-time data integration to enhance user engagement and optimize content delivery effectively.

    Language:Jupyter Notebook7
  • metadata_extractors_registry

    Archive. See Datatractor Yard, below:

    Language:Python6
  • relational-database-design-and-test

    relational-database-design-and-test

    Designing and testing a relational database for The Happy Phone Company.

    Language:SQL6
  • 1C-ERP-OLAP

    OLAP ITL-Утилиты для 1С:ERP Управление предприятием.

    Language:C#6
  • zipline-tardis-bundle

    A bundle for zipline-reloaded to allow data for crypto assets to be ingested from Tardis

    Language:Python4
  • melhordazona-web

    Web app using babashka/apache + ETL pipeline

    Language:Clojure4
  • syr_mads_ist722_data_warehouse

    Syracuse University, Masters of Applied Data Science - IST 722 Data Warehouse

    Language:TSQL4
  • xgeo

    Scriptable geospatial data processing engine

    Language:Go4
  • bonobo-selenium

    PRE-ALPHA - Write web crawlers using Bonobo

    Language:Python4
  • ETL-Chicago-Cafe-Permits

    This ETL (Extract, Transform, Load) project employs several Python libraries, including Airflow, Soda, Polars, YData Profiling, DuckDB, Requests, Loguru, and Google Cloud to streamline the extraction, transformation, and loading of CSV datasets from the U.S. government's data repository at https://catalog.data.gov.

    Language:HTML3
  • Mission-to-Mars

    Application of Python web scraping methodologies for performing data analytics and visualization as part of the Extract, Transform, and Load (ETL) process.

    Language:Jupyter Notebook3
  • The-Music-has-Changed-Extract-transform-load-

    The-Music-has-Changed-Extract-transform-load-

    We examine two data sets relate with the music Industry. We Extract, transform and load the data sets in order to create a data base and identify insides and trends about the music Industry.

    Language:Jupyter Notebook3
  • nyc-crash-mapper-etl-script

    Extract, Transform, and Load script for fetching new data from the NYC Open Data Portal's vehicle collision data and loading into the NYC Crash Mapper table on CARTO.

    Language:Python3
  • Crowdfunding_ETL

    Extract, Transform, and Load (ETL) Project

    Language:Jupyter Notebook2
  • Airbnb-Analysis-with-Tableau

    Built an interactive Tableau dashboard to analyze the Airbnb data extracted from MongoDB Atlas. Developed a Streamlit application for trend analysis, pattern recognition, and data insights using EDA. Explored variations in price, location, property type, and seasons through dynamic plots and charts.

    Language:Jupyter Notebook2
  • ETL-Airline-Accounting-Data

    This is an Extract, Transform, Load (ETL) project of unstructured Airline Billing and Settlement Plans (BSP) data

    Language:Python2
  • metadata_extractors_api

    Archive of MaRDA Metadata Extractors Schema. See Datatractor Beam, below, for the current repository.

    Language:Python2
  • DSND-Term2-Disaster_Response_Pipeline

    Create a machine learning pipeline, that categorizes disaster events.

    Language:Jupyter Notebook2
  • twiddlepy

    Python module for extracting, transforming and loading data

    Language:Python2
  • Data-cleansing

    A group of python scripts that clean large data sets by removing duplicate data, putting data in correct formats, and removing redundant cells

    Language:Python1
  • project-three

    SEC Finance Data Engineering - ETL process for SEC Finance data of S&P 500 companies. Jupyter Notebooks to run ETL work flows. The final dataset is hosted in MongoDB Atlas(cloud). The API is written using Python with PyMongo and Flask libraries. The dashboards with charts are hosted in MongoDB Atlas.

    Language:Jupyter Notebook1
  • Water-Quality-DW-on-SQL-Server

    This is an MSSQL Data Warehouse and ETL implementation on specially formatted Water Quality dataset from DEFRA, UK

    Language:Jupyter Notebook1
  • Seong_Portfolio

    Data Analytics Portfolio

  • Phonepe-Pulse-Data-Visualization-and-Exploration

    Developed a Streamlit application for analyzing transactions and user data from the Pulse dataset. Explored data insights on states, years, quarters, districts, transaction types, and brands through EDA. Visualized trends and patterns using plots and charts to optimize decision-making in the Fintech industry.

    Language:Python1