river_bod

Analysis of public data of river organic pollution in South Korea.

Overview

This is a quick Exploratory Data Analysis (EDA) of a public dataset of Biochemical Oxygen Demand (BOD) measurements from 7 spots somewhere in South Korea in the period form 1992 to 2016 :) The aim of this analysis is to find the missing values to assess the reliabiltiy of the measurements, the distribution of the BOD values at different sites and finally the major trends over time.

Datasets

The following datasets used in the analysis are:

river_metadata.csv This is a metadata about the measurements' spots. This dataset consist of 4 columns:
- river_id which is obviously the river ID
- river_name this on isn't obvious at all and wouldn't even read out on my computer :(
- north the 'N' coordinate of the site in the formate (degree.minute.seconds)
- east the 'E' coordinate of the site in the formate (degree.minute.seconds)
bod.csv This is the measurements (BOD) from the period from 1992 to 2016. This dataset consist of 7 columns and 300 rows. Eache represnet a single BOD measurement each month for 25 years at a particular site.
score.csv This is some score and a category - that I don't understand :P
- river_id the same river IDs mentioned above
- score some number!
- category a category based on the number much like an elementary school grade category (excellent, good, fair)