data-processing-pipelines

There are 17 repositories under data-processing-pipelines topic.

  • NVIDIA/NeMo-Curator

    Scalable data pre processing and curation toolkit for LLMs

    Language:Jupyter Notebook7411415896
  • westandskif/convtools

    convtools is a specialized Python library for dynamic, declarative data transformations with automatic code generation

    Language:Python40299
  • graphbookai/graphbook

    The framework for AI-driven data pipelines. Build interactive, highly efficient data pipelines with PyTorch. ⭐ Leave a star to support us!

    Language:TypeScript242551
  • kaburia/filter-stations

    Making it easier to navigate and clean TAHMO weather station data for ML development

    Language:Python162102
  • tamasgal/thepipe

    A simplistic, general purpose pipeline framework.

    Language:Python14462
  • artifician

    Plato-solutions/artifician

    Artifician is an event-driven framework designed to simplify and accelerate the process of preparing datasets for Artificial Intelligence models.

    Language:Python10200
  • 99sbr/Predictive-Customer-Analytics

    Understanding the customer life cycle Acquiring customer data Applying big data concepts to your customer relationships Finding high propensity prospects Upselling by identifying related products and interests Generating customer loyalty by discovering response patterns Predicting customer lifetime value (CLV) Identifying dissatisfied customers Uncovering attrition patterns Applying predictive analytics in multiple use cases Designing data processing pipelines Implementing continuous improvement

    Language:Jupyter Notebook4201
  • chandnii7/Big-Data-Processing-Pipeline

    A pipeline that consumes twitter data to extract meaningful insights about a variety of topics using the following technologies: twitter API, Kafka, MongoDB, and Tableau.

    Language:Python3104
  • AIoT-Group-UoP/crossai

    An open-source Python library for processing and developing End-to-End AI pipelines for Time Series Analysis

    Language:Jupyter Notebook1521
  • lhotanok/data-engineering

    Homework assignments for MFF UK course NDBI046 - Introduction to Data Engineering

    Language:TypeScript1120
  • shuq007/datascience-notebooks

    Notebooks from finance, general practice and Jovian courses on data analysis, ML and DL

    Language:Jupyter Notebook1100
  • adarshnitt/30-Day-of-ML

    Dataset

    Language:Jupyter Notebook0200
  • smmiri/etl-visuals

    Codes for data flow between models, data post-process, and visualization

    Language:Jupyter Notebook0200
  • blog

    softwaresalt/blog

    Data Engineering & Software Blog

  • subhasisgorai/MyExperiments

    Experimental libraries - Azure Storage, multithreaded Data Processing pipelines, and many more ...

    Language:Java0100
  • mehanix/dhrw

    🎢 visually create data processing pipelines - python, rmq, react, meteorjs

    Language:JavaScript10
  • SayamAlt/Bank-Customer-Churn-Prediction-using-PySpark

    Successfully established a machine learning model using PySpark which can accurately classify whether a bank customer will churn or not up to an accuracy of more than 86% on the test set.

    Language:Jupyter Notebook10