/fastdup-manage-clean-curate-blogpost

Find duplicate and anomalies in your dataset. Identify wrong/confusing labels in your dataset. Uncover data leak in your dataset.

Primary LanguageJupyter Notebook

fastdup: A Powerful Tool to Manage, Clean & Curate Visual Data at Scale on Your CPU - For Free.

image

Companion repo for the blogpost.

📑 Notebooks

clean_v1.ipynb - Demo notebook showing the functionality of fastdup using the V1 API.

clean.ipynb - Demo notebook showing the functionality of fastdup using the older V0.1 API.

train_clean.ipynb - Train a Fastai model on the clean version of the data.

train.ipynb - Train a Fastai model on the original data.

📂 Folders

scene_classification - Folder with raw uncleaned images.

scene_classification_clean - Folder with cleaned version of images and fastdup report files.

📞 Questions? Connect with me

If you have any questions or feedback, please don't hesitate to reach out to me. I'm active on the following platforms.

dnth

❤️ Support Me

I am thrilled to share my work with you and I hope you find it useful.

If you do, please consider supporting my efforts by making a donation and/or sharing this repository on your social media.

Your support will help me to continue developing and maintaining this project, as well as create new ones.

Buy Me A Coffee