data-wrangling
There are 1101 repositories under data-wrangling topic.
OpenRefine/OpenRefine
OpenRefine is a free, open source power tool for working with messy data and improving it
TomWright/dasel
Select, put and delete data from JSON, TOML, YAML, XML and CSV files with a single tool. Supports conversion between formats and can be used as a Go package.
khanhnamle1994/cracking-the-data-science-interview
A Collection of Cheatsheets, Books, Questions, and Portfolio For DS/ML Interview Prep
tirthajyoti/Data-science-best-resources
Carefully curated resource links for data science in one place
jqnatividad/qsv
CSVs sliced, diced & analyzed.
ContextLab/hypertools
A Python toolbox for gaining geometric insights into high-dimensional data
brimdata/zui
Zui is a powerful desktop application for exploring and working with data. The official front-end to the Zed lake.
hi-primus/optimus
:truck: Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark
data-forge/data-forge-ts
The JavaScript data transformation and analysis toolkit inspired by Pandas and LINQ.
skrub-data/skrub
Prepping tables for machine learning
moderndive/ModernDive_book
Statistical Inference via Data Science: A ModernDive into R and the Tidyverse
microsoft/prose
Microsoft Program Synthesis using Examples SDK is a framework of technologies for the automatic generation of programs from input-output examples. This repo includes samples and sample data for the Microsoft Program Synthesis using Example SDK.
stefmolin/Hands-On-Data-Analysis-with-Pandas-2nd-edition
Materials for following along with Hands-On Data Analysis with Pandas – Second Edition
stefmolin/Hands-On-Data-Analysis-with-Pandas
Materials for following along with Hands-On Data Analysis with Pandas.
Desbordante/desbordante-core
Desbordante is a high-performance data profiler that is capable of discovering many different patterns in data using various algorithms. It also allows to run data cleaning scenarios using these algorithms. Desbordante has a console version and an easy-to-use web application.
stefmolin/pandas-workshop
An introductory workshop on pandas with notebooks and exercises for following along.
dbohdan/sqawk
Like awk but with SQL and table joins
georgevbsantiago/qsacnpj
Pacote que trata e organiza os dados do Cadastro Nacional da Pessoa Jurídica (CNPJ)
datacarpentry/R-ecology-lesson
Data Analysis and Visualization in R for Ecologists - the version at https://github.com/datacarpentry/R-ecology-lesson-alternative will be merged on 8th July 2024
shawnbrown/datatest
Tools for test driven data-wrangling and data validation.
kjam/data-cleaning-101
Data Cleaning Libraries with Python
tirthajyoti/Web-Database-Analytics
Web scrapping and related analytics using Python tools
ajaymache/data-analysis-using-python
Exploratory data analysis 📊using python 🐍of used car 🚘 database taken from ⓚ𝖆𝖌𝖌𝖑𝖊
LibreCat/Catmandu
Catmandu - a data processing toolkit
swcarpentry/python-novice-gapminder
Plotting and Programming in Python
swcarpentry/r-novice-gapminder
R for Reproducible Scientific Analysis
datacarpentry/python-ecology-lesson
Data Analysis and Visualization in Python for Ecologists
swcarpentry/r-novice-inflammation
Programming with R
strengejacke/sjmisc
Data transformation and utility functions for R
MMBazel/Springboard-DataScienceTrack-Student
Springboard Program: Data Science Career Track - NLP
data-forge/data-forge-js
JavaScript data transformation and analysis toolkit inspired by Pandas and LINQ.
dlab-berkeley/R-Fundamentals-Legacy
D-Lab's 12 hour introduction to R Fundamentals. Learn how to create variables and functions, manipulate data frames, make visualizations, use control flow structures, and more, using R in RStudio.
BdR76/CSVLint
CSV Lint plug-in for Notepad++ for syntax highlighting, csv validation, automatic column and datatype detecting, fixed width datasets, change datetime format, decimal separator, sort data, count unique values, convert to xml, json, sql etc. A plugin for data cleaning and working with messy data files.
sl-solution/InMemoryDatasets.jl
Multithreaded package for working with tabular data in Julia
TrainingByPackt/Data-Wrangling-with-Python
Simplify your ETL processes with these hands-on data sanitation tips, tricks, and best practices
datacarpentry/r-raster-vector-geospatial
Introduction to Geospatial Raster and Vector Data with R