scholarly-python-package/scholarly

search_pubs - StopIteration

MatteoRiva95 opened this issue · 4 comments

Describe the bug

After I run the code, I receive this error:


StopIteration Traceback (most recent call last)
in <cell line: 8>()
6
7 search_query = scholarly.search_pubs("Advances in the diagnosis and treatment of small bowel lesions with Crohn's disease using double-balloon endoscopy")
----> 8 scholarly.pprint(next(search_query))

/usr/local/lib/python3.10/dist-packages/scholarly/publication_parser.py in next(self)
91 return self.next()
92 else:
---> 93 raise StopIteration
94
95 # Pickle protocol

StopIteration:

To Reproduce

from scholarly import scholarly, ProxyGenerator

pg = ProxyGenerator()
success = pg.FreeProxies()
scholarly.use_proxy(pg)

search_query = scholarly.search_pubs("Advances in the diagnosis and treatment of small bowel lesions with Crohn's disease using double-balloon endoscopy")
scholarly.pprint(next(search_query))

Expected behavior

I would like to give to Scholarly a title and then to have the url for the PDF as return, please.

Desktop (please complete the following information):

  • Proxy service: FreeProxies
  • python version: 3.10.12
  • Colab

Thank you in advance!

Any update? Still getting this issue @ipeirotis @papr @marcoscarpetta @guicho271828

image

You need a paid proxy for searching publications.

@Owaiskhan9654 , that was not the issue that OP @MatteoRiva95 posted.

The issue was with the exception StopIteration, which I don't think it has anything to do with Proxies.

First, consider this code:

from scholarly import scholarly

search_phrase = "massive MIMO"
search_query = scholarly.search_pubs(search_phrase)
search_query2 = scholarly.search_pubs(search_phrase, start_index=970)

You will get:
search_query.total_results --> 179000
search_query2.total_results --> 0 (it's 0 even if start_index= 10)

This is issue 1.

Issue 2:
When you iterate over the results using next(search_query).
search_query2 raises THAT exception (StopIteration) after 10 results or so.

What's going on? Any idea @ipeirotis?

Seems to be related to the anti-crawling mechanism of Google. The URL that we create seems to be flagged as unusual by Google and Google returns back an "error" page. I do not have a clear path how to fix this.