neuroquery/pubget

pubget.download_pmcids broken

A-Telfer opened this issue · 2 comments

The pubget.download_pmcids function appears to be broken

It's not throwing an error message, but all results are The following PMCID is not available

e.g.

import pubget 
import pandas as pd

pmcids = [19233148, 24567909, 18550622]
article_sets, ret = pubget.download_pmcids(pmcids, 'temp')
pd.read_xml(article_sets/'articleset_00000.xml') # All results are "PMCID is not available"

I've also tried using an api key. The query downloader is working

This is an error on my part, pubmed ids are not the same as pmcids. Using the pmcid worked

(On the pubmed page, you can also see the pmcid)

great, I'm glad you fixed it!
After the next release of pubget (or with the development version now) you would get slightly more useful output, because pubget now first filters the list of PMCIDs to keep those in the PMC open access subset. So the relevant part of the log would look like this:

INFO	2022-12-22T14:47:22-0300	pubget._entrez	Posting 3 PMCIDs to Entrez.
INFO	2022-12-22T14:47:23-0300	pubget._entrez	Search returned 0 results
INFO	2022-12-22T14:47:23-0300	pubget._entrez	0 / 3 articles are in PMC Open Access.

and instead of having an xml file containing error messages the articlesets directory would not contain any xml files.