dot plot, stem-leaf plot, box-whiskers plot, outliers by CrashCourse Statistics
EmbraceLife opened this issue · 0 comments
Plots, Outliers, and Justin Timberlake - Data visualization part 2
key words
dot plots, stem-leaf plot, box-whiskers plot, outliers, cumulative frequency plot
Video links
Key Questions
what other ways of representing data
how other visualizations differentiate themselves in different angles and uses (brilliant and intuitive effort!)
Interesting points
dot plots
replace histogram's bars with dots, a dot stands for an individual data point)
frequency = height of a bar = number of dots in the bar's position
see visualization of dot plots
but still missing details of individual data values
Stem-Leaf Plot
replace bar positions with stems 0s, 10s, 20s... 80s
replace dots with numbers, ranging from 0 lower to 9 higher
both overall frequency and individual data values are portrayed on the graph
Box-Whisckers plot
use median and spread based on median IQR to represent all the dataset
central tendency
- median as middle point
- Q1 = first half's median
- Q3 = second half's median
- IQR = Q3 - Q1 = middle half
lower fence
- from middle median extend 1.5 IQR to the left
upper fence
- from middle median extend 1.5 IQR to the right
minimum
- the actual smallest data point inside left fence
maximum
- the actual largest data point within upper fence
outliers
- the actual data points smaller than lower fence
- the actual data points larger than upper fence
how densely distributed
- from Q1 to median, same amount of data points widely spreaded
- from median to Q3, same amount of data points narrowly spreaded compared with left
outliers are much less frequent, but do occur
when to or not throw away an outlier
keep the interested
- a few very high rents in the neighborhood you are interested in
throw away the non-interested
- a football star sneak into your local neighborhood team
- or a typo (making 5 into 500) in the dataset when collected
if not sure, use pre-existing rules to decide
- data points, beyond lower and upper fence, are outliers, should be removed
When not to use box-whiskers plot
this one does not tell you anything useful or meaningful
cumulative frequency plot
imagine it yourself :)
Be critical on the charts
stats visualizations are everywhere
visualizations are only as good as the data behind
many of them could be mistaken or misleading
always ask questions about them