preprocessing
There are 1533 repositories under preprocessing topic.
infiniflow/ragflow
RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.
Unstructured-IO/unstructured
Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
dongrixinyu/JioNLP
中文 NLP 预处理、解析工具包,准确、高效、易用 A Chinese NLP Preprocessing & Parsing Package www.jionlp.com
nidhaloff/igel
a delightful machine learning tool that allows you to train, test, and use models without writing code
OpenGene/fastp
An ultra-fast all-in-one FASTQ preprocessor (QC/adapters/trimming/filtering/splitting/merging...)
AxeldeRomblay/MLBox
MLBox is a powerful Automated Machine Learning python library.
winedarksea/AutoTS
Automated Time Series Forecasting
NVIDIA-Merlin/NVTabular
NVTabular is a feature engineering and preprocessing library for tabular data designed to quickly and easily manipulate terabyte scale datasets used to train deep learning based recommender systems.
KinWaiCheuk/nnAudio
Audio processing by using pytorch 1D convolution network
sunlabuiuc/PyHealth
A Deep Learning Python Toolkit for Healthcare Applications.
TheAlgorithms/R
Collection of various algorithms implemented in R.
pytorch/torcharrow
High performance model preprocessing library on PyTorch
R1j1t/contextualSpellCheck
✔️Contextual word checker for better suggestions (not actively maintained)
qd-cae/awesome-CAE
A curated list of awesome CAE frameworks, libraries and software.
msamogh/nonechucks
Deal with bad samples in your dataset dynamically, use Transforms as Filters, and more!
MaxHalford/xam
:dart: Personal data science and machine learning toolbox
DataCanvasIO/HyperGBM
A full pipeline AutoML tool for tabular data
advaitsave/Introduction-to-Time-Series-forecasting-Python
Introduction to time series preprocessing and forecasting in Python using AR, MA, ARMA, ARIMA, SARIMA and Prophet model with forecast evaluation.
ikegami-yukino/jaconv
Pure-Python Japanese character interconverter for Hiragana, Katakana, Hankaku, and Zenkaku
cylondata/cylon
Cylon is a fast, scalable, distributed memory, parallel runtime with a Pandas like DataFrame.
nlpcl-lab/ace2005-preprocessing
ACE 2005 corpus preprocessing for Event Extraction task
ikegami-yukino/neologdn
Japanese text normalizer for mecab-neologd
Razor12911/xtool
Just some tool repackers like to use...
Deffro/text-preprocessing-techniques
16 Text Preprocessing Techniques in Python for Twitter Sentiment Analysis.
dunky11/voicesmith
[WIP] VoiceSmith makes training text to speech models easy.
quqixun/BrainPrep
Preprocessing pipeline on Brain MR Images through FSL and ANTs, including registration, skull-stripping, bias field correction, enhancement and segmentation.
jbusecke/xMIP
Analysis ready CMIP6 data in python the easy way with pangeo tools.
google/tensorflow-recorder
TFRecorder makes it easy to create TensorFlow records (TFRecords) from Pandas DataFrames and CSVs files containing images or structured data.
basf/mamba-tabular
Mambular is a Python package that simplifies tabular deep learning by providing a suite of models for regression, classification, and distributional regression tasks. It includes models such as Mambular, TabM, FT-Transformer, TabulaRNN, TabTransformer, and tabular ResNets.
jaeho3690/LIDC-IDRI-Preprocessing
This is the preprocessing step of the LIDC-IDRI dataset
ropensci/MODIStsp
An "R" package for automatic download and preprocessing of MODIS Land Products Time Series
githubharald/DeslantImg
The deslanting algorithm sets text upright in images. Python, C++ and OpenCL implementations provided.
mlr-org/mlr3pipelines
Dataflow Programming for Machine Learning in R
autoreject/autoreject
Automated rejection and repair of bad trials/sensors in M/EEG
sappelhoff/pyprep
PyPREP: A Python implementation of the Preprocessing Pipeline (PREP) for EEG data
chakki-works/chariot
Deliver the ready-to-train data to your NLP model.