datacleaning
There are 1638 repositories under datacleaning topic.
OpenRefine/OpenRefine
OpenRefine is a free, open source power tool for working with messy data and improving it
great-expectations/great_expectations
Always know what to expect from your data.
sfu-db/dataprep
Open-source low code data preparation library in python. Collect, clean and visualization your data in python with a few lines of code.
yobulkdev/yobulkdev
🔥 🔥 🔥Open Source & AI driven Data Onboarding Platform:Free flatfile.com alternative
DataCanvasIO/HyperGBM
A full pipeline AutoML tool for tabular data
sharmaroshan/Twitter-Sentiment-Analysis
It is a Natural Language Processing Problem where Sentiment Analysis is done by Classifying the Positive tweets from negative tweets by machine learning models for classification, text mining, text analysis, data analysis and data visualization
DataKitchen/data-observability-installer
Installer for DataKitchen's Open Source Data Observability Products. Data breaks. Servers break. Your toolchain breaks. Ensure your team is the first to know and the first to solve with visibility across and down your data estate. Save time with simple, fast data quality test generation and execution. Trust your data, tools, and systems end to end.
imdevskp/covid_19_jhu_data_web_scrap_and_cleaning
This repository contains data and code used to get and clean data from https://github.com/CSSEGISandData/COVID-19 and https://www.worldometers.info/coronavirus/
prasanthg3/cleantext
An open-source package for python to clean raw text data
benchopt/benchmark_bilevel
Benchmark for bi-level optimization solvers
imdevskp/covid-19-india-data
data and code for scrapping and cleaning data on covid-19 in India from https://www.mohfw.gov.in/ and https://www.covid19india.org/
data-cleaning/validatedb
Validate on a table in a DB, using dbplyr
DemonDamon/tongdaxin-futures-data-clearing-database-operation
对通达信数据进行去重和清洗处理,并将数据存入MongoDB,方便往后研究
sayaliwalke30/Kaggle-Projects
This repo contains 4 different projects. Built various machine learning models for Kaggle competitions. Also carried out Exploratory Data Analysis, Data Cleaning, Data Visualization, Data Munging, Feature Selection etc
Anubhavchandil/RESEARCH-INTERN
Worked on a dataset of high entropy alloys which is used to design materials for additive manufacturing. Being responsible for Performing Data Analysis and constructing Machine learning algorithms, including neural networks, Gradient boosting for carrying predictions useful for advanced material invention.
hoshigan/Supply-Chain-Analytic---Just-In-Time-Company
The project provides a real-world dataset focusing on supply chain analytics
nirala96/Bangalore-House-Prediction-App
Predicts home prices of Bangalore. Used Flutter, Flask and Jupyter Notebook.
weismanm12/finances-database
Personal finance database creation, SQL analysis, and Power BI dashboard
theodi/OpenRefine-WS
Code to enable OpenRefine to run as an authenticated web service
ShrishtiHore/Weapons-Detection-in-Real-Time-Surveillance-Videos
This project aims to minimize the police response time by detecting weapons through a live CCTV camera feed. So it alerts the police as soon as it detects any sort of weapons. In our project we are focusing on guns primarily. 🔫💣💻🎥
konchada2/Excel-Projects
Excel Based Projects
ahmadjaved97/ImageClusterViz
A tool for clustering images using deep learning features and visualizing the results in organized grids.
EastTower16/LLMDataDistill
distill large scale web page text
snesmaeili/PyZapline_plus
PyZaplinePlus is a Python adaptation of the Zapline-plus library, designed to automatically remove spectral peaks like line noise from EEG data while preserving the integrity of the non-noise spectrum and maintaining the data rank.
765276707/straws
Straws是一款开源的离线数据同步中间件(ETL),提供Mysql、SqlServer等离线同步场景,同时支持定时同步(全量、增量、CDC三种模式)和数据转换清洗等功能
mennamamdouh/Analysis-of-Video-Games
This repository is for a data analytics project using SQL. The project is about analyzing and getting insights about video games sales, and users and critics reviews.
ironmussa/Optimus-examples
Examples for Optimus a Data Cleansing Library for Big Data.
MiladNooraei/Quera-Football
Empowering football analytics through Transfermarkt data crawling, robust database design, and advanced analytics, yielding valuable insights and accurate predictions
rahmaahassan/Datacamp-Data-Analyst-Associate
Practical Tasks for get the Data Analyst Associate by Datacamp.
rojaAchary/Data_Preprocessing_Techniques
⚒️ Data preprocessing is the process of transforming raw data into an understandable format. It is also an important step in data mining as we cannot work with raw data. The quality of the data should be checked before applying machine learning or data mining algorithms
ropensci/excluder
Checks for Exclusion Criteria in Online Data
charleswu52/BitcoinAnalysis
Bitcoin Transaction Data Analysis System.
Livingston-k/cleanPyData
cleanPyData is a Python package for data cleaning and preprocessing. It handles missing values, normalizes data, extracts features, and detects outliers, making your data ready for analysis or machine learning.
project1A1B/Big-Mart-Sales-Prediction
Building Big Mart Sales Prediction model