asjadnaqvi/stata-bumpline

Bumpline plot using binaries - the challenge

Opened this issue · 2 comments

Dear Asjad,

I have worked your examples and beside my minor comments all is well.
Then, to proceed to create a bumpline plot like:
NSO21_Reden_Veranderen _v_Werkgever_Wissel intentie_20230209
some serious work will be required.
The current functionality is based on using ' a numeric y variable and a numeric x variable. The x variable is usually a time variable'.
For my type of bump the 'x variable' is categorical variable (age groups), so that is in agreement with your syntax.
The 'problem' is with the 'y variable' for which your code uses the by(varname) to calculate the bumps of the plot, which is the same categorical by(varname) but now ranked relatively at each ordered position of the x variable.
It is here where my 'problem' originates as you calculate the relative rank order of the y variable (summed) by the by(varname). That will work just fine but only when the y variable is a single measurement as such.
But that is not how the above plot was created. What we see on the y-axis is the relative rank of thirteen binary variables. To be able to create this graph I first have to go through matrix calculations to collect the relative rank of the binary variables and hustle back and forth to collect labels at the correct position in the matrix, save all that into a result data set to finally settle on a twoway command like:


local color "blue "0 162 232" "0 162 232" "0 162 232" teal blue orange orange teal purple orange gs7 gs8"
local mc "`color'"
local fc "blue "0 162 232" "0 162 232" "0 162 232" "255 255 0" "255 255 0" orange orange "255 255 0" "255 255 0" orange white gs8"
local lc "`color'"
local pms "o o o o o o o o o o o s s"
local msi "*2 *2 *2 *2 *2 *2 *2 *2 *2 *2 *2 *1.8 *1.8"
tw line sal uitd ontw arbv kreis bwsf mzing mflex intb mvert maut geen andr xage , lc(`lc') ///
	|| scatter sal uitd ontw arbv kreis bwsf mzing mflex intb mvert maut geen andr xage , ms(`pms') msiz(`msi') mc(`mc') mfc(`fc')  ///
	, legend(off) text(14 -1.5 "NSO21 `binary'", size(*.8)) ///
	plotreg(m(b+1 l+1 r+1)) xtit("") ytit("") xsize(12) ysize(8) ///
	graphreg(fc(white) lc(white) ilc(white) m(l-0 r-1 b-2 t-1))	///
	ysca(range(1(1)13) rev) xsca(range(2(1)9)) ///
	xlab(1 "20t25" 2 "25t30" 3 "30t35" 4 "35t40" 5 "40t45" 6 "45t50" 7 "50t55" 8 "55t60" 9 "60t65", labs(*.9) ) /// 11 "60t70"
	ylab(1 "Hoger salaris of loon " 6 "Betere arbeidsvoorwaarden " 11 "Meer autonomie " 2 "Meer uitdaging " 8 "Meer flexibiliteit " 7 "Meer vertrouwen " 5 "Betere werksfeer " 9 "Kortere reistijden " 3 "Meer ontwikkeling " 10 "Internationale loopbaan " 4 "Meer zingeving " 12 "Andere reden " 13 "Geen enkele reden ", labs(*.9) angle(none))

This is something of a 'coding cauchemar' because I am drawing a bundle of line graphs for which properties are set manually (through locals) and, even worse, the same has to be done to label each one of them. Moreover, to be able to continue with another bump plot but now for a subset of the data (like males or females) I have to tinker with all these options and labels all over again.
So, you can imagine that this could be improved seriously.
But how?
My more naive solution would be to be able to specify multiple binary variables in your command, something like:
bumpline y1 y2 y3 y4 y5 y6 y7 y8 y9 y10 y11 y12 y13 x
which must prohibit the use of by(varname), or, should this totally mess up your code, create a special version, like:
bibumpline y1 y2 y3 y4 y5 y6 y7 y8 y9 y10 y11 y12 y13 , over(x)
or
bibumparea y1 y2 y3 y4 y5 y6 y7 y8 y9 y10 y11 y12 y13 , over(x)
where the x variable is set through the over(varname) (although I expect Nick Cox will complain about the use of 'over' for this purpose.
Anyway, I assume that you probably will have very different ideas on how to accomplisch such 'binary based' bump plots.
And if possible, how to carry over options once set for a (first) plot to the next (subset) plot, is a problem of its own.
Ofcourse, I am more than willing to support you with data to test and testing itself.

On a side note, I assume that you are aware of the 'origin' of the 'bump chart' itself: Oxford University Boat Races!
which date back to 1815 and with the first chart from 1824!
There is actually a lot on this subject to be found (I have collected a good deal of it, should you be interested).
But, I did not try to create a true Oxford bump chart, the data is rather terse, but I suspect something like a binary approach will be required for that as well.

Allowing bumplines over categorical variables is doable! I got a similar request from another user so will prioritize this.

Good to read that this is possible. But note that the above case is a little more 'complicated because a 'collection' of binary variables are 'bumped' (so, here not the individual categories of a single categorical variable-which would be just as useful).