Ability to set the graph region size and use value labels for yaxis labels

Question

Ability to set the graph region size and use value labels for yaxis labels

ericmelse opened this issue a year ago · 3 comments

Dear Hongyu and Yiqing,

Working on a rather simple example to visualize the pattern of salary earned in various sectors by age groups, reported by the salary stanines, I do get an interesting result but also I would like to ask you for some options.
I upload a zipped dta and do file so you can replicate my example:
Example_salary_by_market_sector_and_age.zip
The code used is:

use "PANELVIEW_Example_salary_by_market_sector_and_age.dta" , clear

panelview median , i(c_sector) t(age) type(treat) xlabdist(1) ylabdist(1) bytiming ///
	graphreg(m(l-17 r-15 t-0 b-4)) tit("", s(*.8) m(t-0 b+0))  ///
	xtit(, m(t-1)) ytit("market sector", m(l-2 r+3)) ///
	text(37.4 5 "Median salary stanine:" , s(3pt)) ///
	legend(pos(6) tit("", s(2pt)) symy(*.68) symx(*.54) bmargin(t-1.5))

which results in:

As such this is already very good, The plot seems to indicate that within market sectors, generally, there is a tendency that the median salary increases by age group. Something that should not be a surprise. However we can observe exceptions like for market sector 10 and 8. Also of interest is where information is not available in sectors or age groups.
Having described this first impresstion of the panelview plot, you maybe can understand that I would like to use the value labels of the variable c_sector for yaxis labels. Instead of the values being reported as yaxis labels, I would like to be able to use the labels, names, of the sectors, like:

. fre c_sector, nomiss

c_sector -- Sectorial occupation of respondent
-----------------------------------------------------------
                         |      Freq.    Percent       Cum.
-------------------------+---------------------------------
 1  Agriculture          |         43       3.02       3.02
 2  Industry             |         48       3.37       6.38
 3  Energy               |         44       3.09       9.47
 4  Mining               |         28       1.96      11.43
 5  Building             |         45       3.16      14.59
 6  Media                |         41       2.88      17.46
 7  Other industry       |         47       3.30      20.76
 8  Retail               |         48       3.37      24.12
 9  Wholesale            |         47       3.30      27.42
 10 Catering industry    |         49       3.44      30.86
 11 Transport            |         49       3.44      34.29
 12 Other trade          |         47       3.30      37.59
 13 Education            |         46       3.23      40.81
 14 Government, national |         46       3.23      44.04
 15 Government, regional |         40       2.81      46.84
 16 Government, local    |         44       3.09      49.93
 17 Water authority      |         36       2.52      52.45
 18 Health care          |         49       3.44      55.89
 19 Other health care    |         42       2.95      58.84
 20 Judiciary            |         29       2.03      60.87
 21 Other overheid       |         47       3.30      64.17
 22 Bankers, insurers    |         49       3.44      67.60
 23 Brokerage            |         41       2.88      70.48
 24 Maintenance          |         45       3.16      73.63
 25 Legal                |         42       2.95      76.58
 26 ICT                  |         48       3.37      79.94
 27 Consulting           |         45       3.16      83.10
 28 Advertisement        |         44       3.09      86.19
 29 Marketing research   |         17       1.19      87.38
 30 Recreation           |         45       3.16      90.53
 31 Recruitment, HRM     |         41       2.88      93.41
 32 Other services       |         46       3.23      96.63
 33 Unknown              |         48       3.37     100.00
 Total                   |       1426     100.00           
-----------------------------------------------------------

But, to be able to make this work graphically, we need also the ability to set the size of the graphregion, like:

panelview median , i(c_sector) t(age) type(treat) xlabdist(1) ylabdist(1) bytiming ///
	xsize(6) ysize(4) graphreg(m(l-17 r-15 t-0 b-4)) tit("", s(*.8) m(t-0 b+0))  ///
	xtit(, m(t-1)) ytit("market sector", m(l-2 r+3)) ///
	text(37.4 5 "Median salary stanine:" , s(3pt)) ///
	legend(pos(6) tit("", s(2pt)) symy(*.68) symx(*.54) bmargin(t-1.5))

That is now not possible and an error message is reported in the result window of Stata:

option xsize() not allowed
r(198);

Time permitting, maybe you can implement both options.

Best regards,
Eric Melse

Answer 1 · 2023-07-18T12:47:51.000Z

Dear Hongyu and Yiqing,

I have manually edited some of the labels to give you an impression of what I am looking for, using this code:

panelview median , i(c_sector) t(age) type(treat) xlabdist(1) ylabdist(1) bytiming ///
	graphreg(m(l-17 r-15 t-0 b-4)) tit("", s(*.8) m(t-0 b+0))  /// xsize(4) ysize(4) 
	xtit(, m(t-1)) ytit("market sector", m(l-2 r+3)) ///
	text(37.4 5 "Median salary stanine:" , s(3pt)) ///
	ylab(1 "Catering industry" 2 "Industry" 3 "Other services" 4 "Transport" 5 "Health care" 6 "Bankers, insurers" 7 "ICT" 8 "Unknown" 9 "Other industry" ) ///
	legend(pos(6) tit("", s(2pt)) symy(*.68) symx(*.54) bmargin(t-1.5))

which results in:

As such, because of the problematic graphregion (x) size, your panelview plot is pushed to the right side of the plot.
I think there are some options to automate the use of the value labels but, for my suggestions, I would need to know how the sorted order of the panelview is stored in the background. When a matrix is used I do know how value labels can be saved and retrieved to store in the result matrix for proper use. Maybe even as a variable.

Answer 2 · 2023-07-18T18:46:51.000Z

Thanks, Eric. Hongyu, could you take a look?

…

On Tue, Jul 18, 2023 at 5:48 AM Eric Melse ***@***.***> wrote: Dear Hongyu and Yiqing, I have manually edited some of the labels to give you an impression of what I am looking for, using this code: panelview median , i(c_sector) t(age) type(treat) xlabdist(1) ylabdist(1) bytiming /// graphreg(m(l-17 r-15 t-0 b-4)) tit("", s(*.8) m(t-0 b+0)) /// xsize(4) ysize(4) xtit(, m(t-1)) ytit("market sector", m(l-2 r+3)) /// text(37.4 5 "Median salary stanine:" , s(3pt)) /// ylab(1 "Catering industry" 2 "Industry" 3 "Other services" 4 "Transport" 5 "Health care" 6 "Bankers, insurers" 7 "ICT" 8 "Unknown" 9 "Other industry" ) /// legend(pos(6) tit("", s(2pt)) symy(*.68) symx(*.54) bmargin(t-1.5)) which results in: [image: PANELVIEW_Example_salary_by_market_sector_and_age_using_value_labels_manually_20230718] <https://user-images.githubusercontent.com/3686136/254265592-1101109d-ad00-4987-8dd4-9e5bc38b5955.png> As such, because of the problematic graphregion (x) size, your panelview plot is pushed to the right side of the plot. I think there are some options to automate the use of the value labels but, for my suggestions, I would need to know how the sorted order of the panelview is stored in the background. When a matrix is used I do know how value labels can be saved and retrieved to store in the result matrix for proper use. Maybe even as a variable. — Reply to this email directly, view it on GitHub <#3 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AB2PKGEXN23M7T5ZB6OMM33XQ2AYJANCNFSM6AAAAAA2IY5BRE> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

-- Yiqing Xu Assistant Professor Department of Political Science Stanford University https://yiqingxu.org/

Answer 3 · 2023-07-18T21:19:43.000Z

Hi Eric,

In a binary treatment case with only "control" group and "treated" group, the sorted order of the panelview with option bytiming is:

First, sort units by the timing of receiving the treatment.
Then for those who share the same timing of receiving the treatment, we sort units by the total number of periods exposed to the treatment.
Lastly, for those who share the same timing of being treated and same total number of periods exposed to the treatment, we order them by their given label value.

However, in the example you provided here, we are in a non-binary treatment case, where using the option bytiming seems not that helpful, because now the programming treats every non-zero treatment value as "treated", then the order here is:

First, order unit group to the first rows if they are "treated" (no-missing) in the first period (in the age group 18): row 1 to 3. We take them as sharing the same timing of being "treated".
Next we sort row 1 to 3 based on the total number of periods exposed to the treatment, i.e., based on the total number of periods that has no missing values, which is how the row 1 to 3 is arranged as: 1 missing across all periods for row 1; 2 missings for row 2; 4 missings for row 3...
Then for row 4 to 6, their treatment begins at the second period (in the age group 19), but they all have no missing across all periods, so we sort them by the nature order of their label value: 11, 18, 22.
Then for row 7 and 8, they both have one missing, and thus we order them as their label order: 26, 33.
The following rows all works in the similar sorting rule.

If we do not use the option bytiming for non-binary treatment, then the code

use "PANELVIEW_Example_salary_by_market_sector_and_age.dta" , clear

panelview median , i(c_sector) t(age) type(treat) xlabdist(1) ylabdist(1) ///
	graphreg(m(l-17 r-15 t-0 b-4)) tit("", s(*.8) m(t-0 b+0))  ///
	xtit(, m(t-1)) ytit("market sector", m(l-2 r+3)) ///
	text(37.4 5 "Median salary stanine:" , s(3pt)) ///
	legend(pos(6) tit("", s(2pt)) symy(*.68) symx(*.54) bmargin(t-1.5))

gives us

where the order of the values on the y-axis seems more easy to understand. Therefore, if there is a need to sort the units group according to certain rule, I would prefer to do it at the stage of defining the label value, then apply panelview without bytiming to use the given order in a natural way.

Now, if we want to show the label instead of the values, if I use the example "turnout.dta" for panelview, the label on y axis will show up as desired, instead of its corresponding numeric value. For the example you shared here, I have tried several methods to find out why only the numeric value is shown, but still cannot figure it out, I will keep trying and will let you know once I get the answer. Thank you for pointing this out :)