hugheylab/pmparser

Abstract(s) missing for publication

Closed this issue · 2 comments

Hello,

I wanted to retrieve the abstract for the publication with PMID: 5912471 from PMDB (local/Google Big Query) but no data was returned.

For this publication, abstracts in three languages are available online in PubMed:
grafik

And the content can also be retrieved via the Entrez API:
https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&id=5912471&retmode=xml&rettype=abstract

When I search for the publication via Google Query in PMDB, the metadata can be retrieved from the article table:
grafik

But unfortunately nothing is found for this publication in "abstract" or "otherAbstract":
grafik

Perhaps this is related to this previous issue? #71

Is it a parsing error ? I also thought that the abstract might simply not be in the XML files used for parsing the data, but since its a Medline citation I think it should be included.

In this case, it's not a parsing error. I checked the baseline XML file (pubmed23n0197.xml) here, and it actually does not contain any of the abstracts for pmid 5912471. I'm not sure why.

Okay, thanks for the help! It is very useful that pmparser/PMDB keeps track of the parsed XML file for each PMID to be able to compare data with the original files.

I contacted the NLM support to ask for clarification and figured it is helpful to share the answer here also for others:

grafik

Essentially as of right now, the data provided in the XML files (FTP) is not exactly the same data as provided by the E-Utilities API or PubMed online, but that should change in the future.