Machine learning is an important tool for predictive automated processing. In this notebook we target the basis of Machine learning, classification and naives probabilistic models, given a challenge problem: Acquire knowledge about the predictive capacity of meteorological data on a particular day of the week.
####The data
Each file is a set of readings corresponding to a given day at a given weather station. The various field values are (in order):
- The name of the station
- The year in which the observation(s) took place
- The month in which the observation(s) took place
- The day of the month on which the observation(s) took place
- The aggregated precipitation observed (in mm)
- The minimum observed temperature (in degrees Celcius)
- The maximum observed temperature (in degrees Celcius)
- The average of the observed temperature readings (in degrees Celcius); if only one reading took place for a given day, this will be the same as the minimum and maximum temperature
- The maximum recorded wind reading (in metres per second)
- The minimum observed sky ceiling (in metres)
- The maximum observed sky ceiling (in metres)
- The average of the observed sky ceiling readings (in metres)
- The minimum observed visibility (in metres)
- The maximum observed visibility (in metres)
- The average of the observed visibility readings (in metres)
- The minimum observed dew point (in degrees Celcius)
- The maximum observed dew point (in degrees Celcius)
- The average of the observed dew point readings (in degrees Celcius)
- The minimum observed barometric pressure, normalised to sea level (in hectopascals)
- The maximum observed barometric pressure (in hectopascals)
- The average of the observed pressure readings (in hectopascals)
- The day of the week
Missing values are indicated with a question mark (?)
The ipython notebook details this in a better way, you can see the final report here