data-filtering

There are 115 repositories under data-filtering topic.

  • weAIDB/awesome-data-llm

    Official Repository of "LLM × DATA" Survey Paper

    44040
  • p-lambda/dsir

    DSIR large-scale data selection framework for language model training

    Language:Python25921919
  • przemek83/volbx

    Graphical tool for data manipulation written in C++/Qt.

    Language:C++2498825
  • GUNDAM-Labet/GUNDAM

    GUNDAM is a data management system that prioritizes data using language models.

    Language:Python19021032
  • gookit/filter

    ⏳ Provide filtering, sanitizing, and conversion of Golang data. 提供对Golang数据的过滤,净化,转换。

    Language:Go1519412
  • heera/requent

    A GraphQL like interface to map a request to eloquent query with data transformation for Laravel.

    Language:PHP79306
  • Victorwz/MLM_Filter

    Official implementation of our paper "Finetuned Multimodal Language Models are High-Quality Image-Text Data Filters".

    Language:Python67211
  • jonnieZG/EWMA

    Exponentially Weighted Moving Average Filter

    Language:C++651814
  • lpreterite/datagent

    一个用于模块化管理前端请求的工具

    Language:JavaScript40552
  • zhuang-li/SCAR

    [ACL 2025 main] SCAR: Data Selection via Style Consistency-Aware Response Ranking for Efficient Instruction-Tuning of Large Language Models

    Language:Python36104
  • ai-forever/DataProcessingFramework

    Framework for processing and filtering datasets

    Language:Python27632
  • angelajw/QualtricsDataCleaning

    R Tutorial: useful R codes for cleaning and filtering data from Qualtrics surveys, and for creating new variables in the dataframe. With step-by-step explanations.

    Language:R18208
  • kehvinbehvin/json-mcp-filter

    JSON MCP server to filter only relevant data for your LLM

    Language:JavaScript13105
  • noronhadaniel/ACS_2023

    This repository contains all (Python 3) code and libraries required for the 2022-2023 Notre Dame Rocketry Team (NDRT) Apogee Control System (ACS). It also contains sensor/actuator example code and flight data.

    Language:Python11201
  • giupardeb/EpiMethEx

    EpiMethEx (Epigenetic Methylation and Expression), a R package to perform a large-scale integrated analysis by cyclic correlation analyses between methylation and gene expression data.

    Language:R8210
  • RahulGoel2000/SBLS-Smartphone-Bot-Localization-system

    Data extraction from smartphones and GPS and Accelerometer data "fusion" with Kalman filter.

    Language:Java6100
  • levitation-opensource/DataAnonymiser

    Anonymises data inside text files and in sheet files. It recognises and removes various sorts of personally identifiable information (PII). Each removed part is replaced with a suitable generic text, depending on the type of removed data. Currently English and Russian languages are supported. Russian works both with Cyrillic and Latin characters.

    Language:Python4102
  • ryandkuster/ngsComposer

    Base-call error-filtering and read preprocessing pipeline for fastq libraries

    Language:Python4121
  • ajnanmvr/CDC-Connect

    CDC Connect is a cross-platform mobile application built in React Native using JavaScript. The app is designed for data collection with a focus on surveys.

    Language:JavaScript3201
  • azuregray/ExcelScope

    A multi-parameter sequential search utility for filtering through an input Excel Datasheet.

    Language:Python20
  • chaleaoch/lumi-filter

    A powerful and flexible data filtering library with unified interface for multiple data sources including Peewee ORM, Pydantic models, and Python iterables. Flask-friendly.

    Language:Python20
  • DevExpress-Examples/winforms-grid-make-auto-filter-row-insensitive-to-accents

    Make the data grid's Auto Filter Row insensitive to accents.

    Language:Visual Basic .NET25600
  • emre-tarhan/sql-desc-limit

    PHP | SQL - DESC LIMIT ile istenilen sayıda veri çekme işlemi

    Language:PHP2100
  • w2xim3/sqljson

    A powerful tool that allows users to query JSON data using SQL-like syntax. Effortlessly search, filter, and manipulate your JSON data with familiar SQL queries.

    Language:Python2100
  • averageencoreenjoer/processing-csv

    CSV Processing Tool is a Python CLI utility for filtering and aggregating data from CSV files. It allows you to quickly process large amounts of tabular information using the command line, without the need to use Excel or databases.

    Language:Python1
  • axah710/DSA

    This repository features Data Structures and Algorithms (DSA) practices in Dart, focusing on mastering fundamental programming concepts and problem-solving techniques.

    Language:Dart1100
  • emre-tarhan/sql-between-interval

    PHP - SQL | Between & Interval İfadelerinin Kullanımı

    Language:PHP1100
  • kgniewek/FileReader-DataProcessorPractise

    2021 Java practice project focused on file reading and data processing. It includes functions for custom exception handling, data conversion into objects, and basic filtering of records based on specific criteria. A practice of Java fundamentals

    Language:Java1100
  • kvvsatyaravi/ismart-data-visualizer

    demo version

    Language:Python1101
  • Leg0shii/FileArchiver

    FileArchiver is a robust tool designed to safely archive outdated data from very large datasets (Terabyte size) and efficiently filter geo-data for mapping purposes. Developed for Deutsche Bahn AG, it streamlines the management of extensive geographical data to optimize storage and enhance data processing efficiency.

    Language:Java1
  • mrhrifat/mw-react-test

    Filter & Fetch Dynamically Data

    Language:JavaScript110
  • rachits999003/Data-Analysis-and-Analytic-tool

    A powerful, interactive desktop dashboard built with PyQt5, Matplotlib, Seaborn, Plotly, and scikit-learn. Designed for data wrangling, visualization, and machine learning—all in one elegant dark-themed GUI.

    Language:Python1
  • RobCyberLab/Ngram-Similarity-Engine

    🤖Ngram Similarity Engine📚

    Language:Python1
  • sethubolt7/CVE_CUSTOM_API

    This repository contains a backend using Spring Boot, JPA, and H2 to manage and display over 10,000 CVE records. It fetches CVE data from a public source, stores it in H2, and provides custom endpoints with filtering by year, metric score, and last modified date. Built with MVC architecture for structured data handling and web page integration.

    Language:Java10
  • utmhikari/daggre

    DAta-AGGREgator, a tool to handle data aggregation tasks

    Language:Go1100
  • vimalnathnambiar/exfilms

    A command-line interface tool to extract, filter, and standardise MS data.

    Language:JavaScript1170