IronHack Logo

Guided Project: Demonstration of Data Cleaning and Manipulation with Pandas

Overview

The goal of this project is to combine everything you have learned about data wrangling, cleaning, and manipulation with Pandas so you can see how it all works together. For this project, you will start with this messy data set Shark Attack. You will need to import it, use your data wrangling skills to clean it up, prepare it to be analyzed, and then export it as a clean CSV data file.

You will be working individually for this project, but we'll be guiding you along the process and helping you as you go. Show us what you've got!


Technical Requirements

The technical requirements for this project are as follows:

  • The dataset that we provide you is a significantly messy data set. Apply the different cleaning and manipulation techniques you have learned.
  • Import the data using Pandas.
  • Examine the data for potential issues.
  • Use at least 8 of the cleaning and manipulation methods you have learned on the data.
  • Produce a Jupyter Notebook that shows the steps you took and the code you used to clean and transform your data set.
  • Export a clean CSV version of your data using Pandas.

Necessary Deliverables

The following deliverables should be pushed to your Github repo for this chapter.

  • A cleaned CSV data file containing the results of your data wrangling work.
  • A Jupyter Notebook (data-wrangling.ipynb) containing all Python code and commands used in the importing, cleaning, manipulation, and exporting of your data set.
  • A README.md file containing a detailed explanation of the process followed in the importing, cleaning, manipulation, and exporting of your data as well as your results, obstacles encountered, and lessons learned.

Suggested Ways to Get Started

  • Examine the data and try to understand what the fields mean before diving into data cleaning and manipulation methods.
  • Break the project down into different steps - use the topics covered in the lessons to form a check list, add anything else you can think of that may be wrong with your data set, and then work through the check list.
  • Use the tools in your tool kit - your knowledge of Python, data structures, Pandas, and data wrangling.
  • Work through the lessons in class & ask questions when you need to! Think about adding relevant code to your project each night, instead of, you know... procrastinating.
  • Commit early, commit often, don’t be afraid of doing something incorrectly because you can always roll back to a previous version.
  • Consult documentation and resources provided to better understand the tools you are using and how to accomplish what you want.

Useful Resources

Project Feedback + Evaluation

  • Technical Requirements: Did you deliver a project that met all the technical requirements? Given what the class has covered so far, did you build something that was reasonably complex?

  • Creativity: Did you add a personal spin or creative element into your project submission? Did you incorporate domain knowledge or unique perspective into your analysis.

  • Code Quality: Did you follow code style guidance and best practices covered in class?

  • Total: Your instructors will give you a total score on your project between:

    Score Expectations
    0 Does not meet expectations
    1 Meets expectactions, good job!
    2 Exceeds expectations, you wonderful creature, you!

This will be useful as an overall gauge of whether you met the project goals, but the more important scores are described in the specs above, which can help you identify where to focus your efforts for the next project!