scholarly.scholarly not found?
Closed this issue · 12 comments
Hello folks -- @perrette first off, very very glad to see this fantastic project, and very much considering replacing my workflow (that relies on one of the particularly outdated proprietary packages you have listed on your front page) entirely with this! Thanks kindly! Thing is, I have a local library of 80 Gb of PDFs that's a good set of test cases here...
When I try papers extract yanofsky_qc.pdf --scholar
, which should work, I get:
ModuleNotFoundError: No module named 'scholarly.scholarly'
This is with pip install papers-cli
which may be out of date...
Anybody else seeing this?
If I just run papers extract yanofsky_qc.pdf
, this does return a correctly formatted bibtex entry, but happens to be the wrong one, hence my want to try Google scholar here...
Hi @boyanpenkov , thanks for the feedback. This package is not far from usable, but unfortunately it does require some more work to make it actually useful. And as you point out, it seems outdated w.r.t. some dependencies. I'll see whether i can at least fix those later today.
Super -- thanks kindly; would be glad to help out here, especially since I have a significant set of test cases to check against, so please let me know if there's any snippets you'd like me to run (completely serious!).
My workflow for the last 12 years has been:
-- dump PDFs in folder, read them in emacs
-- use "proprietary solution" to get PDF metadata, rename file appropriately and cp it to "organized" folder or subfolder
-- per PDF, add Bibtex metadata to library.bib
that my individual paper repos then depend on.
I started writing code to reproduce this workflow yesterday, and got as far as validating DOIs using https://github.com/MicheleCotrufo/pdf2doi and https://pypi.org/project/isbnlib/ before I realized the fuzzy matching here was the way to go, since the error rate is pretty high. I look forward to trying to reproduce this workflow using papers, and making contributions here...
I was not aware of pdf2doi. Actually it would make sense to concentrate efforts in one project to extract the proper DOI, and then re-use it in projects like this one. But well, again it needs time.
More directly though, for the scholarly issue, you simply need to install the dependency pip install -U scholarly
I was not aware of pdf2doi. Actually it would make sense to concentrate efforts in one project to extract the proper DOI, and then re-use it in projects like this one. But well, again it needs time.
Yep, and I think papers got started first here, and the feature-set is closer to what I'm after...
More directly though, for the scholarly issue, you simply need to install the dependency
pip install -U scholarly
Regrettably, I have to confirm I did have scholarly
installed. On my system:
Python 3.8.16 (default, Mar 2 2023, 03:21:46)
[GCC 11.2.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import scholarly
>>> import scholarly.scholarly
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ModuleNotFoundError: No module named 'scholarly.scholarly'
´´Before I look in more details in your issue, would you mind to test the current dev branch?
Do you know how to do that?
I think the latest version works OK (the tests run fine -- though I might not have tests for scholarly).
https://github.com/perrette/papers/archive/refs/heads/dev.zip
from extracted dir: pip install .
should work OK
I'll check (and if necessary fix) in a few hours.
And later update on pypi.
Super -- thanks kindly! I pulled your archive down, and installed it. However, the traceback now reads:
(python311) → testing renamer/stage papers extract yanofsky_qc.pdf --scholar 20:09:24
Traceback (most recent call last):
File "/home/boyan/boyanshouse/miniconda3/envs/python311/bin/papers", line 5, in <module>
papers.bib.main()
File "/home/boyan/boyanshouse/miniconda3/envs/python311/lib/python3.11/site-packages/papers/bib.py", line 1388, in main
extractcmd(o)
File "/home/boyan/boyanshouse/miniconda3/envs/python311/lib/python3.11/site-packages/papers/bib.py", line 1313, in extractcmd
print(extract_pdf_metadata(o.pdf, search_doi=not o.fulltext, search_fulltext=True, scholar=o.scholar, minwords=o.word_count, max_query_words=o.word_count, image=o.image))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/boyan/boyanshouse/miniconda3/envs/python311/lib/python3.11/site-packages/papers/extract.py", line 206, in extract_pdf_metadata
return extract_txt_metadata(txt, search_doi, search_fulltext, **kw)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/boyan/boyanshouse/miniconda3/envs/python311/lib/python3.11/site-packages/papers/extract.py", line 193, in extract_txt_metadata
bibtex = fetch_bibtex_by_fulltext_scholar(query_txt)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/boyan/boyanshouse/miniconda3/envs/python311/lib/python3.11/site-packages/papers/config.py", line 224, in decorated
res = cache[key] = fun(doi)
^^^^^^^^
File "/home/boyan/boyanshouse/miniconda3/envs/python311/lib/python3.11/site-packages/papers/extract.py", line 258, in fetch_bibtex_by_fulltext_scholar
score = _scholar_score(txt, res.bib)
^^^^^^^
AttributeError: 'dict' object has no attribute 'bib'
Please note that this is on python 3.11 now, instead of the 3.8 I was testing on this morning (chardet would not play nice with that one...).
If this is getting annoying and you can tell me what your CONTRIBUTING.md looks like, I can try to debug...
Hi, unfortunately I don't have python 3.11 installed right now.
I just finished to implement other long-awaited changes and pushed a version 2 to pypi. You might try that one pip install -U papers-cli
though I don't think I did any work on scholarly so I don't expect that will fix your issue.
And I do not have structured contribution guidelines to offer at this point, sorry. Others have just cloned and made a pull request. If you have specific questions I am glad to answer.
Here we have one of two situations:
- either scholarly tests are just so poor that they don't flag your use case as causing an error (that'S the most likely situation)
- or really something broke in 3.11 -- I'd be surprised, but who knows
If you like me to take a look, you can just drop me your PDF and sum up the set of commands causing the issue.
Disconnecting for now. Not sure when I'll have time again...
In case you find what's wrong, it would be great to add a test, too.
In any case, papers extract --scholar somepaper.pdf
definitely works for me.
For now I'll just class as not reprocible until news.
I updated to v2.1 with better pip/pyproject.toml distribution.
Locally it also passes the tests with py311 (does not work with github CI + tox yet). I'm closing it for now. Please re-open if the issue persists.
Ok, after some more poking around, I do see that with a bunch of other pdfs, both --scholar
and without --scholar
work, so the issue could be specific to the subset of files I had chosen. To confirm, this is on papers-cli 2.1.1, running under the 3.11 interpreter.
I think I'll clone and poke around, and issue PR's as needed; re: CONTRIBUTING.md, if you end up wanting things like flake8
or black
, please let me know and I'll end up cleaning them up.
Again, thanks for your responsiveness here, and I look forward to seeing what's up!