datacleaning

There are 980 repositories under datacleaning topic.

  • OpenRefine/OpenRefine

    OpenRefine is a free, open source power tool for working with messy data and improving it

    Language:Java10.6k4723.1k1.9k
  • great-expectations/great_expectations

    Always know what to expect from your data.

    Language:Python9.6k831.8k1.5k
  • sfu-db/dataprep

    Open-source low code data preparation library in python. Collect, clean and visualization your data in python with a few lines of code.

    Language:Python1.9k26413200
  • yobulkdev/yobulkdev

    🔥 🔥 🔥Open Source & AI driven Data Onboarding Platform:Free flatfile.com alternative

    Language:JavaScript855126243
  • DataCanvasIO/HyperGBM

    A full pipeline AutoML tool for tabular data

    Language:Python324155545
  • sharmaroshan/Twitter-Sentiment-Analysis

    It is a Natural Language Processing Problem where Sentiment Analysis is done by Classifying the Positive tweets from negative tweets by machine learning models for classification, text mining, text analysis, data analysis and data visualization

    Language:Jupyter Notebook20644124
  • imdevskp/covid_19_jhu_data_web_scrap_and_cleaning

    This repository contains data and code used to get and clean data from https://github.com/CSSEGISandData/COVID-19 and https://www.worldometers.info/coronavirus/

    Language:Jupyter Notebook9313493
  • prasanthg3/cleantext

    An open-source package for python to clean raw text data

    Language:Python672711
  • data-observability-installer

    DataKitchen/data-observability-installer

    Installer for DataKitchen's Open Source Data Observability Products. Data breaks. Servers break. Your toolchain breaks. Ensure your team is the first to know and the first to solve with visibility across and down your data estate. Save time with simple, fast data quality test generation and execution. Trust your data, tools, and systems end to end.

    Language:Python513
  • amora-data-build-tool

    mundipagg/amora-data-build-tool

    Amora Data Build Tool enables analysts and engineers to transform data on the data warehouse (BigQuery) by writing Amora Models that describe the data schema using Python's "PEP484 - Type Hints" and select statements with SQLAlchemy. Amora is able to transform Python code into SQL data transformation jobs that run inside the warehouse.

    Language:Python46474
  • imdevskp/covid-19-india-data

    data and code for scrapping and cleaning data on covid-19 in India from https://www.mohfw.gov.in/ and https://www.covid19india.org/

    Language:Jupyter Notebook386881
  • data-cleaning/validatedb

    Validate on a table in a DB, using dbplyr

    Language:R32274
  • benchopt/benchmark_bilevel

    Benchmark for bi-level optimization solvers

    Language:Python30536
  • sayaliwalke30/Kaggle-Projects

    This repo contains 4 different projects. Built various machine learning models for Kaggle competitions. Also carried out Exploratory Data Analysis, Data Cleaning, Data Visualization, Data Munging, Feature Selection etc

    Language:Jupyter Notebook224018
  • DemonDamon/tongdaxin-futures-data-clearing-database-operation

    对通达信数据进行去重和清洗处理,并将数据存入MongoDB,方便往后研究

    Language:Python214018
  • nirala96/Bangalore-House-Prediction-App

    Predicts home prices of Bangalore. Used Flutter, Flask and Jupyter Notebook.

    Language:Jupyter Notebook18100
  • theodi/OpenRefine-WS

    Code to enable OpenRefine to run as an authenticated web service

    Language:JavaScript162
  • EastTower16/LLMDataDistill

    distill large scale web page text

    Language:C++12111
  • ShrishtiHore/Weapons-Detection-in-Real-Time-Surveillance-Videos

    This project aims to minimize the police response time by detecting weapons through a live CCTV camera feed. So it alerts the police as soon as it detects any sort of weapons. In our project we are focusing on guns primarily. 🔫💣💻🎥

    Language:Jupyter Notebook12203
  • Anubhavchandil/RESEARCH-INTERN

    Worked on a dataset of high entropy alloys which is used to design materials for additive manufacturing. Being responsible for Performing Data Analysis and constructing Machine learning algorithms, including neural networks, Gradient boosting for carrying predictions useful for advanced material invention.

    Language:Jupyter Notebook10113
  • konchada2/Excel-Projects

    Excel Based Projects

  • 765276707/straws

    Straws是一款开源的离线数据同步中间件(ETL),提供Mysql、SqlServer等离线同步场景,同时支持定时同步(全量、增量、CDC三种模式)和数据转换清洗等功能

    Language:Java9305
  • ironmussa/Optimus-examples

    Examples for Optimus a Data Cleansing Library for Big Data.

  • project1A1B/Big-Mart-Sales-Prediction

    Building Big Mart Sales Prediction model

    Language:Jupyter Notebook8102
  • rojaAchary/Data_Preprocessing_Techniques

    ⚒️ Data preprocessing is the process of transforming raw data into an understandable format. It is also an important step in data mining as we cannot work with raw data. The quality of the data should be checked before applying machine learning or data mining algorithms

    Language:Jupyter Notebook8102
  • ropensci/excluder

    Checks for Exclusion Criteria in Online Data

    Language:R83105
  • allenlsj/Spark-lean

    Spark-lean, an interactive PySpark-based Data Cleaning Library

    Language:Python7300
  • Livingston-k/cleanPyData

    cleanPyData is a Python package for data cleaning and preprocessing. It handles missing values, normalizes data, extracts features, and detects outliers, making your data ready for analysis or machine learning.

    Language:Python7
  • mennamamdouh/Analysis-of-Video-Games

    This repository is for a data analytics project using SQL. The project is about analyzing and getting insights about video games sales, and users and critics reviews.

    Language:SQL7102
  • Ronlee12355/kaggle-with-R

    All kaggle datasets and the R codes

    Language:HTML7106
  • kkverma/Twitter-Sentiment-Analysis

    A basic machine learning model built in python jupyter notebook to classify whether a set of tweets into two categories: racist/sexist non-racist/sexist.

    Language:Jupyter Notebook6102
  • rahmaahassan/Datacamp-Data-Analyst-Associate

    Practical Tasks for get the Data Analyst Associate by Datacamp.

    Language:Python6101
  • SugarFit_Google_Play_Store_Review_Analysis_and_Power_BI_Reporting

    Mariyajoseph24/SugarFit_Google_Play_Store_Review_Analysis_and_Power_BI_Reporting

    "Utilized Python with Pandas, NumPy, and TensorFlow for data scraping and sentiment analysis in Microsoft Azure Data Studio. Employed MS Excel for data cleaning and exploration, with analysis done in PostgreSQL. Utilized Microsoft Power BI for visualization, deriving actionable insights."

    51
  • MiladNooraei/Quera-Football

    Empowering football analytics through Transfermarkt data crawling, robust database design, and advanced analytics, yielding valuable insights and accurate predictions

    Language:Jupyter Notebook5000
  • weismanm12/finances-database

    Personal finance database creation, SQL analysis, and Power BI dashboard

    Language:Jupyter Notebook50