Publication lag time analysis from PubMed data
Get a summary of publication lag times from a PubMed XML file.
- Run a Pubmed search and download the file as XML. Example query:
nat commun[ta] AND 2000 : 2018[pdat] AND journal article[pt]
. Select Send To: File: XML. - Run
pubmedlag.R
pointing R at your downloaded XML file. - R will save a version of the data as csv, it will calculate publication lag time, make some plots and save as png.
Three dates of interest are stored in a complete pubmed record.
- When the paper was received.
- When the paper was accepted.
- When the paper was published.
Note that these dates are not available for every record. This is especially true of older papers.
Because we have three dates, we can calculate three different time periods.
- From received to accepted (recacc).
- From received to published (recpub).
- From accepted to published (accpub).
Most of the time the accpub time is short and constant for a journal, and so the mose interesting time is recacc.
The original pubmedXML.R
is by christopherBelter. It has been modified to retrieve the necessary data.
For more on publication lag times:
- Check out Daniel Himmelstein's History of Delays.
- Posts at quantixed on lag times.