search_pubs - StopIteration

Question

search_pubs - StopIteration

MatteoRiva95 opened this issue a year ago · 4 comments

Describe the bug

After I run the code, I receive this error:

StopIteration Traceback (most recent call last)
in <cell line: 8>()
6
7 search_query = scholarly.search_pubs("Advances in the diagnosis and treatment of small bowel lesions with Crohn's disease using double-balloon endoscopy")
----> 8 scholarly.pprint(next(search_query))

/usr/local/lib/python3.10/dist-packages/scholarly/publication_parser.py in next(self)
91 return self.next()
92 else:
---> 93 raise StopIteration
94
95 # Pickle protocol

StopIteration:

To Reproduce

from scholarly import scholarly, ProxyGenerator

pg = ProxyGenerator()
success = pg.FreeProxies()
scholarly.use_proxy(pg)

search_query = scholarly.search_pubs("Advances in the diagnosis and treatment of small bowel lesions with Crohn's disease using double-balloon endoscopy")
scholarly.pprint(next(search_query))

Expected behavior

I would like to give to Scholarly a title and then to have the url for the PDF as return, please.

Desktop (please complete the following information):

Proxy service: FreeProxies
python version: 3.10.12
Colab

Thank you in advance!

Answer 1 · 2023-11-12T11:46:52.000Z

Any update? Still getting this issue @ipeirotis @papr @marcoscarpetta @guicho271828

Answer 2 · 2023-11-12T13:55:50.000Z

You need a paid proxy for searching publications.

Answer 3 · 2024-04-03T19:19:07.000Z

@Owaiskhan9654 , that was not the issue that OP @MatteoRiva95 posted.

The issue was with the exception StopIteration, which I don't think it has anything to do with Proxies.

First, consider this code:

from scholarly import scholarly

search_phrase = "massive MIMO"
search_query = scholarly.search_pubs(search_phrase)
search_query2 = scholarly.search_pubs(search_phrase, start_index=970)

You will get:
search_query.total_results --> 179000
search_query2.total_results --> 0 (it's 0 even if start_index= 10)

This is issue 1.

Issue 2:
When you iterate over the results using next(search_query).
search_query2 raises THAT exception (StopIteration) after 10 results or so.

What's going on? Any idea @ipeirotis?

Answer 4 · 2024-04-05T21:06:44.000Z

Seems to be related to the anti-crawling mechanism of Google. The URL that we create seems to be flagged as unusual by Google and Google returns back an "error" page. I do not have a clear path how to fix this.