scholarly-python-package/scholarly

Can the search_... functions return a count of the results?

dlebedinsky opened this issue · 2 comments

What feature would you like to request?
I would find it really helpful if I could programmatically get the number of Google Scholar results on a subject, given a keyword.

Is your feature request related to a problem? Please describe.
I am working on an inventory management project where I have to find the number of academic publications on a given item. The specific titles, authors, institutions are not immediately relevant. I came across scholarly, and it offers a very nice keyword search function, but it returns a generator of author objects with no option to just get the number of hits.

Describe the solution you'd like
Ideally, I would like to be able to run something like n_hits= scholarly.search_keyword('Haptics', num_authors=True), and have n_hits be an int corresponding to the number of authors in the generator.

Describe alternatives you've considered
Another good implementation would be something like:
search_query = scholarly.search_keyword('Haptics') print(search_query.n_results)

Do you plan on contributing?
Your response below will clarify if this is something that the maintainers can expect you to work on or not.

  • Yes, I plan to contribute towards this feature in the next couple of days.

Additional context
In _navigator.py, there is a logger that keeps track of number of authors found, and when author pages end. If it could return this info somehow, that would be great too.

Some search_ functions already return this that are accessbile via the total_results attribute. I don't remember off the top of my head if search_keyboard returns this or not. See here for examples:

scholarly/test_module.py

Lines 735 to 750 in 9269ff3

def test_search_pubs_total_results(self):
"""
As of September 16, 2021 there are 32 pubs that fit the search term:
["naive physics" stability "3d shape"], and 17'000 results that fit
the search term ["WIEN2k Blaha"] and none for ["sdfsdf+24r+asdfasdf"].
Check that the total results for that search term equals 32.
"""
pubs = scholarly.search_pubs('"naive physics" stability "3d shape"')
self.assertGreaterEqual(pubs.total_results, 32)
pubs = scholarly.search_pubs('WIEN2k Blaha')
self.assertGreaterEqual(pubs.total_results, 10000)
pubs = scholarly.search_pubs('sdfsdf+24r+asdfasdf')
self.assertEqual(pubs.total_results, 0)

I just tested it out, and search_keyword does not have total_results implemented. I think it's because search_keyword does not return a SearchScholarIterator, unlike search_pubs.
Based on the WIEN2k Blaha example, it looks like search_pubs will get all publications that have the search term in their title, not just exact matches, right? For now, I will use that as a substitute for search_keyword(...).total_results in my project. Thanks for pointing those tests out.
2023-11-21-180447_986x499_scrot