search_pubs - StopIteration
MatteoRiva95 opened this issue · 4 comments
Describe the bug
After I run the code, I receive this error:
StopIteration Traceback (most recent call last)
in <cell line: 8>()
6
7 search_query = scholarly.search_pubs("Advances in the diagnosis and treatment of small bowel lesions with Crohn's disease using double-balloon endoscopy")
----> 8 scholarly.pprint(next(search_query))
/usr/local/lib/python3.10/dist-packages/scholarly/publication_parser.py in next(self)
91 return self.next()
92 else:
---> 93 raise StopIteration
94
95 # Pickle protocol
StopIteration:
To Reproduce
from scholarly import scholarly, ProxyGenerator
pg = ProxyGenerator()
success = pg.FreeProxies()
scholarly.use_proxy(pg)
search_query = scholarly.search_pubs("Advances in the diagnosis and treatment of small bowel lesions with Crohn's disease using double-balloon endoscopy")
scholarly.pprint(next(search_query))
Expected behavior
I would like to give to Scholarly a title and then to have the url for the PDF as return, please.
Desktop (please complete the following information):
- Proxy service: FreeProxies
- python version: 3.10.12
- Colab
Thank you in advance!
Any update? Still getting this issue @ipeirotis @papr @marcoscarpetta @guicho271828
You need a paid proxy for searching publications.
@Owaiskhan9654 , that was not the issue that OP @MatteoRiva95 posted.
The issue was with the exception StopIteration, which I don't think it has anything to do with Proxies.
First, consider this code:
from scholarly import scholarly
search_phrase = "massive MIMO"
search_query = scholarly.search_pubs(search_phrase)
search_query2 = scholarly.search_pubs(search_phrase, start_index=970)
You will get:
search_query.total_results --> 179000
search_query2.total_results --> 0 (it's 0 even if start_index= 10)
This is issue 1.
Issue 2:
When you iterate over the results using next(search_query)
.
search_query2 raises THAT exception (StopIteration) after 10 results or so.
What's going on? Any idea @ipeirotis?
Seems to be related to the anti-crawling mechanism of Google. The URL that we create seems to be flagged as unusual by Google and Google returns back an "error" page. I do not have a clear path how to fix this.