/anscombes_quartet

This analysis uses graphical, statistical, and regression methods.

Primary LanguagePythonISC LicenseISC

Anscombe's quartet

In brevi

Francis Anscombe created four datasets in 1973 to demonstrate the value of graphical analysis before statistical analysis. Each set of 11 x-y pairs has nearly identical descriptive statistics and different graphical analyses.

The purpose of this repository is to demonstrate a scatter plot with a linear regression line, four scatter plots in a grid, and descriptive statistics.

Data

The data are defined inside the .py script.

anscombes_quartet_1. The data form a linear relationship with variation around the regression line.

anscombes_quartet_2. The data form a curved relationship.

anscombes_quartet_3. The data form a linear relationship with very little variation around the regression line, and one severe outlier.

anscombes_quartet_4. The data form a vertical relationship and one extreme outlier.

Descriptive statistics

Item Value Uncertainty
Average of x 9 Exact
Sample variance of x 11 Exact
Average of y 7.50 To 2 decimal places
Sample variance of y 4.125 ± 0.003
Correlation between x and y 0.816 To 3 decimal places
Linear regression line y = 3.00 + 0.500 x To 2 and 3 decimal places
Coefficient of determination of the linear regression 0.67 To 2 decimal places

Visualizations

Anscombe's quartet

References

Anscombe, F.J. 1973. "Graphs in Statistical Analysis." American Statistician 27(1): 17-21. JSTOR 2682899.

Wikipedia. 2017. "Anscombe's quartet." Last modified 2017-06-12. https://en.wikipedia.org/wiki/Anscombe%27s_quartet.