data-processing-pipelines

There are 17 repositories under data-processing-pipelines topic.

NVIDIA/NeMo-Curator
Scalable data pre processing and curation toolkit for LLMs
Language:Jupyter Notebook741 14 15896
westandskif/convtools
convtools is a specialized Python library for dynamic, declarative data transformations with automatic code generation
Language:Python40 2 99
graphbookai/graphbook
The framework for AI-driven data pipelines. Build interactive, highly efficient data pipelines with PyTorch. ⭐ Leave a star to support us!
Language:TypeScript24 2 551
kaburia/filter-stations
Making it easier to navigate and clean TAHMO weather station data for ML development
Language:Python16 2 102
tamasgal/thepipe
A simplistic, general purpose pipeline framework.
Language:Python14 4 62
Plato-solutions/artifician
Artifician is an event-driven framework designed to simplify and accelerate the process of preparing datasets for Artificial Intelligence models.
Language:Python10 2 00
99sbr/Predictive-Customer-Analytics
Understanding the customer life cycle Acquiring customer data Applying big data concepts to your customer relationships Finding high propensity prospects Upselling by identifying related products and interests Generating customer loyalty by discovering response patterns Predicting customer lifetime value (CLV) Identifying dissatisfied customers Uncovering attrition patterns Applying predictive analytics in multiple use cases Designing data processing pipelines Implementing continuous improvement
Language:Jupyter Notebook4 2 01
chandnii7/Big-Data-Processing-Pipeline
A pipeline that consumes twitter data to extract meaningful insights about a variety of topics using the following technologies: twitter API, Kafka, MongoDB, and Tableau.
Language:Python3 1 04
AIoT-Group-UoP/crossai
An open-source Python library for processing and developing End-to-End AI pipelines for Time Series Analysis
Language:Jupyter Notebook1 5 21
lhotanok/data-engineering
Homework assignments for MFF UK course NDBI046 - Introduction to Data Engineering
Language:TypeScript1 1 20
shuq007/datascience-notebooks
Notebooks from finance, general practice and Jovian courses on data analysis, ML and DL
Language:Jupyter Notebook1 1 00
adarshnitt/30-Day-of-ML
Dataset
Language:Jupyter Notebook0 2 00
smmiri/etl-visuals
Codes for data flow between models, data post-process, and visualization
Language:Jupyter Notebook0 2 00
softwaresalt/blog
Data Engineering & Software Blog
0 2 30
subhasisgorai/MyExperiments
Experimental libraries - Azure Storage, multithreaded Data Processing pipelines, and many more ...
Language:Java0 1 00
mehanix/dhrw
🎢 visually create data processing pipelines - python, rmq, react, meteorjs
Language:JavaScript1 0
SayamAlt/Bank-Customer-Churn-Prediction-using-PySpark
Successfully established a machine learning model using PySpark which can accurately classify whether a bank customer will churn or not up to an accuracy of more than 86% on the test set.
Language:Jupyter Notebook1 0

data-processing-pipelines

NVIDIA/NeMo-Curator

westandskif/convtools

graphbookai/graphbook

kaburia/filter-stations

tamasgal/thepipe

Plato-solutions/artifician

99sbr/Predictive-Customer-Analytics

chandnii7/Big-Data-Processing-Pipeline

AIoT-Group-UoP/crossai

lhotanok/data-engineering

shuq007/datascience-notebooks

adarshnitt/30-Day-of-ML

smmiri/etl-visuals

softwaresalt/blog

subhasisgorai/MyExperiments

mehanix/dhrw

SayamAlt/Bank-Customer-Churn-Prediction-using-PySpark