Dataset attributes analysis

Goals

1.Acquire two appropriate datasets from UCI Machine Learning Repository and provide an exploratory analysis of its content:

2.Dataset requirements:

  • at least five numerical attributes attributes
  • at least 1000 instances

3.Subtasks:

  • scatter plot of EACH attribute pair (matrix of plots), on diagonal a histogram of given attribute is expected
  • scatter plot of each attribute pair with histograms on axes (write a function that generates plot for given attributes pair)
  • correlation (for EACH attribute pair, if appropriate)
  • covariances (for EACH attribute pair, if appropriate)

4.Elaborate on results.

Datasets

Two dataset were used:

  • UCI Air polution
  • UCI Concrete