R function to calculate the ε-index of a researcher's relative citation performance
Prof Corey J. A. Bradshaw
Global Ecology, Flinders University, Adelaide, Australia
September 2021
e-mail
Existing citation-based indices used to rank research performance do not permit a fair comparison of researchers among career stages or disciplines, nor do they treat women and men equally. We designed the ε-index, which is simple to calculate, based on open-access data, corrects for disciplinary variation, can be adjusted for career breaks, and sets a sample-specific threshold above and below which a researcher is deemed to be performing above or below expectation.
Code accompanies the article:
BRADSHAW, CJA, JM CHALKER, SA CRABTREE, BA EIJKELKAMP, JA LONG, JR SMITH, K TRINAJSTIC, V WEISBECKER. 2021. A fairer way to compare researchers at any career stage and in any discipline using open-access citation data. PLoS One 16(9): e0257141. doi:10.1371/journal.pone.0257141
--
DIRECTIONS
- Create a .csv file of exactly the same format as the example file in this repository ('datasample.csv'):
- COLUMN 1: personID — any character identification of an individual researcher (can be a name)
- COLUMN 2: gender — researcher's gender ("F" or "M")
- COLUMN 3: i10 — researcher's i10 index (# papers with ≥ 10 citations); must be > 0
- COLUMN 4: h — researcher's h-index
- COLUMN 5: maxcit — number of citations of researcher's most cited peer-reviewed paper
- COLUMN 6: firstyrpub — the year of the researcher's first published peer-reviewed paper
-
Import the sample .csv file, or your own following the format indicated above (make sure first to specify the directory in which 'datasample.csv' resides using the 'setwd()' command):
setwd("/path") # where /path is the directory path on your machine example.dat <- read.csv("datasample.csv", header=T)
-
Alternatively, you can automatically harvest the necessary citation data from Google Scholar using the 'get.profile.func.R' function, which produces a file that can be called directly by the 'epsilon.index.func.R':
i. Predefine a Google Scholar ids vector (12-character user ID from scholar.google.com), e.g.,
ids <- c("1sO0O3wAAAAJ","ZBUju2QAAAAJ","oGAui-IAAAAJ","cpJnEYIAAAAJ","ptDEg44AAAAJ","PJYrOvQAAAAJ","4UxbBYIAAAAJ")
ii. Then define a 'genders' vector of the same length, e.g.,
genders <- c("M","M","F","M","M","F","F")
iii. Load get.profile.func
iv. Define an input file that the epsilon.index.func will use, e.g.,
example.dat <- getProfiledatFunc(ids, genders)
Note: The estimation of the first year of publication (Y1) can return errors because the function does not differentiate peer-reviewed and non-peer-reviewed entries in Google Scholar, nor can it avoid clearly erroneous entries in a researcher's publication history. We recommend that all harvested values for the year of first publication be checked manually for each researcher in the sample. A case in point is id=ptDEg44AAAAJ that returns Y1 = 1791, but the true year of first publication for this researcher is 1982.
-
Load the function ('epsilon.index.func') in R by submitting the entire function code (lines 20 to 212) to the R console.
-
Simply run the function as follows:
epsilonIndexFunc(dat.samp=example.dat, bygender=c('no','yes'), sort.index=c('e', 'd', 'ep', 'dp'))
where 'bygender' indicates whether you want to calculate the gender-debiased index, and 'sort.out' is a sorting option for the final results table based on desired index (default = 'e')
possible values: 'e' = pooled; 'ep' = normalised; 'd' = gender-debiased; 'dp' = normalised gender-debiased
If there are insufficient individuals per gender to estimate a gender-specific index, we recommmend selecting bygender='no' and not using or sorting based on the gender-debiased index (option 'd'). If the individuals in the sample are not all in the same approximate discipline, we recommend not using or sorting based on either of the two normalised indices (options 'ep' or 'dp').
The output includes the following columns:
- person: researcher's ID (specified by user)
- gender: F=female; M=male
- yrs.publ: number of years since first peer-reviewed article
- gender.eindex: ε-index relative to others of the same gender in the sample
- expectation: whether above or below expectation based on chosen index (default is 'e' = pooled index)
- m-quotient: h-index ÷ yrs.publ
- h-index: h-index
- debiased.e.prime.index: scaled gender.eindex (gender ε′-index)
- gender.rank: rank from gender.eindex (1 = highest)
- rnk.debiased: gender-debiased rank (1 = highest)
- pooled.eindex: ε-index generated from the entire sample (not gender-specific)
- e.prime.index: scaled pooled.eindex (ε′-index)
- pooled.rnk: rank from pooled.eindex (1 = highest)
and
if sort.index = 'ep':
- eprime.rnk: rank from scaled pooled.eindex (ε′-index)
or if sort.index = 'dp':
- eprime.debiased.rnk: rank from scaled gender.eindex (gender ε′-index)
-
You can easily export the output to a file like this:
out <- epsilon.index.func(dat.samp=example.dat, sort.index=c('e', 'd', 'ep', 'dp')) write.table(out,file="rank.output.csv",sep=",",dec = ".", row.names = F,col.names = TRUE)