Francis Anscombe created four datasets in 1973 to demonstrate the value of graphical analysis before statistical analysis. Each set of 11 x-y pairs has nearly identical descriptive statistics and different graphical analyses.
The purpose of this repository is to demonstrate a scatter plot with a linear regression line, four scatter plots in a grid, and descriptive statistics.
The data are defined inside the .py script.
anscombes_quartet_1. The data form a linear relationship with variation around the regression line.
anscombes_quartet_2. The data form a curved relationship.
anscombes_quartet_3. The data form a linear relationship with very little variation around the regression line, and one severe outlier.
anscombes_quartet_4. The data form a vertical relationship and one extreme outlier.
Item | Value | Uncertainty |
---|---|---|
Average of x | 9 | Exact |
Sample variance of x | 11 | Exact |
Average of y | 7.50 | To 2 decimal places |
Sample variance of y | 4.125 | ± 0.003 |
Correlation between x and y | 0.816 | To 3 decimal places |
Linear regression line | y = 3.00 + 0.500 x | To 2 and 3 decimal places |
Coefficient of determination of the linear regression | 0.67 | To 2 decimal places |
Anscombe, F.J. 1973. "Graphs in Statistical Analysis." American Statistician 27(1): 17-21. JSTOR 2682899.
Wikipedia. 2017. "Anscombe's quartet." Last modified 2017-06-12. https://en.wikipedia.org/wiki/Anscombe%27s_quartet.