cleaning-data
There are 415 repositories under cleaning-data topic.
pyjanitor-devs/pyjanitor
Clean APIs for data cleaning. Python implementation of R package Janitor
Meteor-Community-Packages/meteor-simple-schema
Meteor integration package for simpl-schema
lemon234071/clean-dialog
A framework for cleaning Chinese dialog data
araafroyall/Cleaner-Royall
๐ ๐ ๐ ๐ผ๐๐ ๐๐ฑ๐๐ฎ๐ป๐ฐ๐ฒ ๐๐น๐ฒ๐ฎ๐ป๐ฒ๐ฟ ๐๐ผ๐ฟ ๐๐ป๐ฑ๐ฟ๐ผ๐ถ๐ฑ [Root]
prasanthg3/cleantext
An open-source package for python to clean raw text data
nguyenthanhjt/google-data-analytics-professional-certificate-course
[Google Data Analytics Professional Certificate] learning resources
notesjor/corpusexplorer2.0
Korpuslinguistik war noch nie so einfach...
sbettid/GPSClean
An application to correct a GPS trace using machine learning techniques. To preview it, a small web interface, named GPSClean Web, is available
longNguyen010203/Youtube-Recommend-Master-ETL-Pipeline
A Data Engineering Project that implements an ETL data pipeline using Dagster, Apache Spark, Streamlit, MinIO, Metabase, Dbt, Polars, Docker. Data from kaggle and youtube-api
Manuscrit/Area-Under-the-Margin-Ranking
Implementation of the paper Identifying Mislabeled Data using the Area Under the Margin Ranking: https://arxiv.org/pdf/2001.10528v2.pdf
ELHoussineT/AutoDataCleaner
Simple and automatic data cleaning in one line of code! It performs one-hot encoding, date & time casting to datetime dtype, detects binary columns, safely convert non-numeric columns to numeric dtypes, cleaning dirty/empty values, normalizing values and removing unwanted columns all in one line of code. Get your data ready for model training and fitting quickly.
MRYingLEE/Time-series-Preprocessing-Studio-in-Jupyter
Time-series Data Preprocessing Studio in Jupyter notebook.
SPARTANX21/SQL-Data-Analysis-Healthcare-Project
SQL - Healthcare Dataset Analysis
alifrmf/Customer-Segmentation-Using-Clustering-Algorithms
Customer Segmentation Using Unsupervised Machine Learning Algorithms
devildances/DataScience_in_small_notes
Some little notes from the author for everyone who wants to know or learn about the process that a data scientist must do from the beginning of data collection to making predictions with a model that has been built. These notes are based on the knowledge that the authors have learned and implemented. Enjoy it!
LieseB-1746743/data-cleaning
Data cleaning tool.
nikhiljsk/preprocess_nlp
A fast framework for pre-processing (Cleaning text, Reduction of vocabulary, Feature extraction and Vectorization). Implemented with parallel processing using custom number of processes.
NouranHany/EU-IT-Salary-Exploration
The aim of our project is to explore IT salaries in Europe and provide insights to two target audiences: employers who are establishing or already have an IT company, and individuals searching for jobs in the IT sector.
alifrmf/Personal-Bank-Loan-Modeling-Classification-Analysis
Supervised Machine Learning Analysis Using Classification Models
beery4010/Wrangle-and-Analyze-Data
Udacity Data Analyst Nanodegree - Project IV
EmadBeltaje/gaza_flutter_cleaner
Clean all your flutter projects with one command line and save your disk space ๐
kingabzpro/Annual-Recycled-Energy-Saved-in-Singapore
Learn how much Singapore is saving energy per years by recycling plastics, paper, glass, ferrous and non-ferrous metal
Salmankadiwal/Advanced-Excel-Formulas-and-Functions
This file contains advanced excel formulas and functions like Vlookup with match function, index and match function, offset & counta, logical operators, max if function, formula formatting, text functions, Date and time functions, data cleaning, etc. which I have worked on to take my excel skills at the high level.
aflah02/cleansetext
This is a simple library to help you clean your textual data
codema-dev/cer-smart-meter-trials-2009-2011
A helper environment/library for cleaning & querying the CER Smart Meter Trials 2009-2011 datasets via pandas, dask, pandas and Google Colaboratory
Photoroom/fast-dataset-cleaner
A simple tool for cleaning image datasets at a glance.
Jaswanth0608/Epilepsy_Disorder_Classification_using_EEG_Signals
Engineered an innovative project for epilepsy disorder classification by harnessing EEG signals, attaining an outstanding accuracy rate of 95% in identifying diverse seizure types. Implemented sophisticated signal processing techniques and advanced machine learning algorithms, enhancing the system's precision and efficiency in classification.
Maxence-Labesse/World-Happiness-Report
Data analysis and forecasting applied to World Happiness
Timorig/Airbus_fuel_leak_detection
This repository contains our work on fuel leak detection for our capstone project of our master in Big Data and Business Analytics. Our group was composed of Pierre Blรฉthon, Alexi Mathay, Diego Garate, Alice Seynaeve and Timothรฉ Rigaudeau.
engineering87/SharpSanitizer
A .NET library for sanitizing and validating object properties using customizable rules to ensure clean and secure data
LaurentVeyssier/Model-to-predict-Energy-consumption-City-of-Seattle
Use Seattle's public energy data and build a model predicting energy consumption
fezzibasma/Speed-Dating-Experiment
What attributes influence the selection of a romantic partner?
flytegg/vanisher
Discord bot to bulk remove someone's messages in a server (since Discord won't).
hellofromtheothersky/Laptop-price-analysis-and-prediction
Crawl data, process data, visualize, and create ML model for laptop price prediction
PragyanTiwari/Refining-Spotify-Dataset-with-LLAMA3-70B
Reclassifying Spotify tracks Explicity with LLAMA3-70B built using LangChain
rafaellagp/Real-estate-price-prediction-FullProject
BeCode Project - Immo Eliza Full Project - Complete pipeline of Real Estate price prediction.