dataset-filtering
There are 16 repositories under dataset-filtering topic.
ynop/audiomate
Python library for handling audio datasets.
kostyaev/smart_categorizer
Trainable categorization tool
mattevans/distil
💧 In memory dataset filtering
silenterus/deepspeech-cleaner
Multi-Language Dataset Cleaner/Creator for Mozilla's DeepSpeech Framework
JEF1056/clean-discord
Cleaning discord data for NLP
h1alexbel/samples-filter
Command-line filter for GitHub repositories that contain "samples", instead of real project or framework or library
marizombie/headless_directory_viewer
:rocket: Whenever you need to look through huge pile of images and cannot use force of file explorer, or you just work on a remote headless machine, you can use this tool. It also allows to move files from one folder to another, creating destination if it does not exist. Work in progress.
moranyanuka/icc_code
[ACL 2024 (Findings)] ICC: Quantifying Image Caption Concreteness for Multimodal Dataset Curation
vedantbhoj/Security-based-Face-Recognition
Face recognition approach by exploring information jointly in space, scale and orientation.
jovicigor/DataPreprocessor
A simple library that wraps common data processing tasks into an easy to use preprocessing engine. The library currently supports transformation of csv files loaded into Pandas dataframe.
AyoKeito/image-comparison
Compare pictures, keep 2
effesessa/fast-spark-expression
Fast Spark Expression - Write column expressions quickly and easily like a string
geo-c/OCT-Core
Module of the Open City Toolkit to visualize use of open datasets by applications:
PranayMalhotra/Colleges-for-Maria
Data Cleaning - A project which takes all colleges in the US, and narrows down the suitable colleges by slicing, dicing and concatenating startup activity data and crime statistics.
tianhaoz95/capstone
A set of tools to generate and label dataset from academic papers