codebasics/DataAnalysisProjects

Need advice on cleaning a dataset

Opened this issue · 1 comments

So I am working on my first data analysis project and am uncertain how much of the data I should clean. I have the raw data properly organized but alot of rows have blanks. None of the blanks affect any of the fields the data project is concerned so I'm unsure whether or not to keep those entries. I currently have a cleaned dataset and a raw dataset. The cleaned dataset has no rows with blanks or invalid entries, the uncleaned dataset keeps those rows in (minus the invalid entries). It's alot of entries thrown out if I use the clean dataset so I'm unsure what to do. Would greatly appreciate any advice.

"Good point! Since the blanks don’t affect your analysis fields, I think it’s fine to keep those rows. Keeping both raw and cleaned versions is a smart approach."