✔️ Complete
- Apply different cleaning and manipulation techniques to generate a cleaner CSV version of the dataset and analyse the data.
SharkAttackFile.
Is surfing a dangerous sport ?
Are these attacks fatal ?
What are the areas/countries that most attacks happened while surfing?
- Sharks in fact do not intend to attack humans, the chances of being attacked by a shark is 1 in 5 million, says Katherine Maslenikov, manager of the UW Fish Collection at the Burke Museum. When they bite humans, they are most likely trying to figure out what they are. So safe to say that we are not included in their diet.
- Surfing is a very popular sport in water, we will check if the attacks are more likely to be linked to this activity.
-
Import database, analyse the shape,analyse the sample and store a backup;
-
Clean the lines (drop if NaN mean>0.9);
-
Clean the columns (drop if NaN mean>0.9);
-
Drop duplicates;
-
Drop unnecessary columns;
-
Analyse the columns 'Activity', 'Fatal (Y/N)' 'Years' and 'Country':
-
Type of data:
The column 'Activity' has over 1,5k categories, using regex we will reduce it to 13.
Aplling the function we get:'Activity' 'Value counts' surfing 1394 swimming 1159 fishing 1137 other 1101 diving 603 sailing 269 walking 220 bathing 192 floating in water 102 ep_boats 72 kiting 43 feeding 12 Photoshooting 8 -
Cleaning columns 'Fatals':
Original Transform N,M, n; N Y, y Y UNKNOWN, 2017, NaN UNKNOWN -
-
Analyse the column 'Years':
dropped Years below 1900. -
Export the database to CSV (Exported_Files path):
Most of the attacks are non lethal:
Qty Fatal | Qty Non-Fatal |
---|---|
1389 | 4373 |
Using the library matplotlib:
Conclusion: Death by shark attack is uncommon .
Considering the attacks since 1900s, we get an average of 26 attacks every year in the world. From these "26" attacks, every year only 6 are fatal ones.
In the GSAF (Global Shark Attack Files) website, affirms that more people drown every year than are killed by sharks. According to CDC around 3960 people drown every year in USA which give us an average of 22 drowns every day. The fatal shark rate of Entire World is lower than drowns per year on USA.
Analising the the activities that had death related we can see that people killed by shark while "surfing" are rare.
Activity | death rate |
---|---|
ep_boats | 51.85% |
bathing | 40.32% |
swimming | 31.97% |
floating in water | 23.73% |
diving | 19.72% |
other | 16.34% |
sailing | 16.06% |
fishing | 10.90% |
walking | 7.26% |
surfing | 7.24% |
This result highlights that the shark attacks out of curiosity or unintentionally, more than by revenge.
Although surfing has a low death by shark attack rate, most of the attacks happened while people were surfing. However, we cannot determine if surfing is a dangerous activity by only looking at this variable. According to SMA Surfing has a very low injury rate (and that includes shark attacks ). Further more, as pointed before shark attacks are rare.
Using the library seaborn:
Again sharks are not likely to attack humans, a probable cause for the majority of shark attacks occur near the shore, in the surf zone and sandbars, because their natural preys live in these areas.
It is common to hear people saying "do not surf in Australia because it has a high shark attack rate". Diving into the data set we can prove that this is untrue.
USA alone has almost twice as much attacks than Australia, in fact both countries (USA and Australia) holds over 60% of the total attacks in the world.
Country | Value Count |
---|---|
USA | 1906 |
AUSTRALIA | 1084 |
SOUTH AFRICA | 487 |
PAPUA NEW GUINEA | 128 |
BRAZIL | 100 |
BAHAMAS | 95 |
NEW ZEALAND | 89 |
MEXICO | 70 |
REUNION | 58 |
PHILIPPINES | 55 |
Attacks by country while surfing. Attacks while surfing are common in USA but this data is not necessary worrisome (the avg of attacks every year still very low comparing to other accidents).
Using the library Matplotlib:
Having the countries and the locations from the attacks we can plot a map where the attacks happened while surfing.
Using the library folium:
- Pandas (Import and Export data)
- Explore Analysis Data
- Data Manipulation (Filtering)
- Data Cleaning
- Understand the dataset;
- Choose which columns explore;
- Defining functions in other to clean the dataset;
- Display information visually (matplotlib).
- There are some columns that are unexplored and could help adding further information about the attacks.
Lucas Angulski