shark5

Project Status

✔️ Complete

Table of Contents:

Objective

  • Apply different cleaning and manipulation techniques to generate a cleaner CSV version of the dataset and analyse the data. SharkAttackFile.

Business question

Is surfing a dangerous sport ?

Are these attacks fatal ?

What are the areas/countries that most attacks happened while surfing?

Motivation

  • Sharks in fact do not intend to attack humans, the chances of being attacked by a shark is 1 in 5 million, says Katherine Maslenikov, manager of the UW Fish Collection at the Burke Museum. When they bite humans, they are most likely trying to figure out what they are. So safe to say that we are not included in their diet.
  • Surfing is a very popular sport in water, we will check if the attacks are more likely to be linked to this activity.

Process

  1. Import database, analyse the shape,analyse the sample and store a backup;

  2. Clean the lines (drop if NaN mean>0.9);

  3. Clean the columns (drop if NaN mean>0.9);

  4. Drop duplicates;

  5. Drop unnecessary columns;

  6. Analyse the columns 'Activity', 'Fatal (Y/N)' 'Years' and 'Country':

    • Type of data:
      The column 'Activity' has over 1,5k categories, using regex we will reduce it to 13.
      Aplling the function we get:

      'Activity' 'Value counts'
      surfing 1394
      swimming 1159
      fishing 1137
      other 1101
      diving 603
      sailing 269
      walking 220
      bathing 192
      floating in water 102
      ep_boats 72
      kiting 43
      feeding 12
      Photoshooting 8
    • Cleaning columns 'Fatals':

    Original Transform
    N,M, n; N
    Y, y Y
    UNKNOWN, 2017, NaN UNKNOWN
  7. Analyse the column 'Years':
    dropped Years below 1900.

  8. Export the database to CSV (Exported_Files path):

Results

Most of the attacks are non lethal:

Qty Fatal Qty Non-Fatal
1389 4373

Using the library matplotlib:


Conclusion: Death by shark attack is uncommon .


Considering the attacks since 1900s, we get an average of 26 attacks every year in the world. From these "26" attacks, every year only 6 are fatal ones.

In the GSAF (Global Shark Attack Files) website, affirms that more people drown every year than are killed by sharks. According to CDC around 3960 people drown every year in USA which give us an average of 22 drowns every day. The fatal shark rate of Entire World is lower than drowns per year on USA.

Deadliest Activities

Analising the the activities that had death related we can see that people killed by shark while "surfing" are rare.

Activity death rate
ep_boats 51.85%
bathing 40.32%
swimming 31.97%
floating in water 23.73%
diving 19.72%
other 16.34%
sailing 16.06%
fishing 10.90%
walking 7.26%
surfing 7.24%

This result highlights that the shark attacks out of curiosity or unintentionally, more than by revenge.

Activity

Although surfing has a low death by shark attack rate, most of the attacks happened while people were surfing. However, we cannot determine if surfing is a dangerous activity by only looking at this variable. According to SMA Surfing has a very low injury rate (and that includes shark attacks ). Further more, as pointed before shark attacks are rare.

Using the library seaborn:


Again sharks are not likely to attack humans, a probable cause for the majority of shark attacks occur near the shore, in the surf zone and sandbars, because their natural preys live in these areas.

Countries

It is common to hear people saying "do not surf in Australia because it has a high shark attack rate". Diving into the data set we can prove that this is untrue. USA alone has almost twice as much attacks than Australia, in fact both countries (USA and Australia) holds over 60% of the total attacks in the world.

Country Value Count
USA 1906
AUSTRALIA 1084
SOUTH AFRICA 487
PAPUA NEW GUINEA 128
BRAZIL 100
BAHAMAS 95
NEW ZEALAND 89
MEXICO 70
REUNION 58
PHILIPPINES 55

Attacks by country while surfing. Attacks while surfing are common in USA but this data is not necessary worrisome (the avg of attacks every year still very low comparing to other accidents).
Using the library Matplotlib:


Having the countries and the locations from the attacks we can plot a map where the attacks happened while surfing.
Using the library folium:


Learning Process

Theory Applied

  • Pandas (Import and Export data)
  • Explore Analysis Data
  • Data Manipulation (Filtering)
  • Data Cleaning

Challenges

  • Understand the dataset;
  • Choose which columns explore;
  • Defining functions in other to clean the dataset;

New Learnings

  • Display information visually (matplotlib).

Improvements

  • There are some columns that are unexplored and could help adding further information about the attacks.

Author

Lucas Angulski