If you are into the data world you've probably heard of the #dataCleaningChallenge organized by @PromiseNonso_ and @vicSomadina via twitter.
Data cleaning is a crucial aspect among the steps involved in analyzing data, in order to make sure the data is accurate, thorough, and dependable, it requires locating and fixing errors, inconsistencies, and inaccuracies. Remember garbage in garbage out.
The original dataset is available on Kaggle but as you know, datasets on Kaggle are not messy or dirty like real world datasets.
Datasets provided by stakeholders come out of order, some fields have line breaks, they don't have the correct information, and we need to get it from other fields, etc.
If you want to become a data analyst practice is the best way to achieve your goals. With this repository I want you to practice beyond the easy part. What if you don't have the ages in the csv you are working on and you need to get it from another csv? What if you don't have the full name but you need that information?
PRACTICE is the key. So let's practice.
I already gave you some key points of what you should do with the data provided in this repository, but there is much more to do.
Share your solutions with me via twitter or linkedin and this way I can learn from you too.
Next week I'll be sharing my own solution.
🎉 HAPPY CODING!