/Global-Shark-Attacks

A simple example of Data Wrangling with Pandas, Regex & Matplotlib

Primary LanguageJupyter Notebook

Global-Shark-Attacks

Intro:

In this repository you can see a 2 steps process; data cleaning and data analysis of the csv file "Global Sharck Attacks". The source of the file can be found here: https://www.kaggle.com/teajay/global-shark-attacks

Goals:

The main goal of this project is to answer the question: which are the most dangerous sharks?

Steps:

To find the answer to that question we have made the following steps:

  1. INPUT (download original csv)
  2. src (create functions to apply)
  3. global-sharks-attack-cleaning.ipynb (data cleaning)
  4. OUTPUT (cleaned csv)
  5. global-sharks-attack-analysis.ipynb (analysis and conclusion)

Conclusion:

Which sharks are the most dangerous? Well, it depends; If we talk about total amount of attacks and deaths, the winner is the white However if we talk about probability of dying if you are attacked by a shark, the most dangerous is the zambesi, but is has only 29 attacks (10 deaths)

So is the white the most dangerous by far? No Actually the tiger shark has 25% of deaths out of its attacks, while the white has 23% and as the tiger is the 2nd in terms of total attacks and deaths, it should be considered as well as 1 of the monst dangerous sharks, just after the white.

Most dangerous sharks:

  1. White shark
  2. Tiger shark