Error when using 2 word search query term with quotation marks Eg: %22search+words%22

Question

Error when using 2 word search query term with quotation marks Eg: %22search+words%22

Closed this issue 9 months ago · 1 comments

Error encountered:

AttributeError                            Traceback (most recent call last)
[<ipython-input-9-9d246cb1d241>](https://localhost:8080/#) in <cell line: 1>()
     14 
     15   # year , author , publication of the paper
---> 16   year , publication , author = get_author_year_publi_info(author_tag)
     17 
     18   # url of the paper

[<ipython-input-7-88d165b15fb9>](https://localhost:8080/#) in get_author_year_publi_info(authors_tag)
     41   for i in range(len(authors_tag)):
     42       authortag_text = (authors_tag[i].text).split()
---> 43       year = int(re.search(r'\d+', authors_tag[i].text).group())
     44       years.append(year)
     45       publication.append(authortag_text[-1])

AttributeError: 'NoneType' object has no attribute 'group'

when using as per subject, in the cell:

for i in range (0,250,10):

  # get url for the each page
  url = "https://scholar.google.com/scholar?start={}&q=%22search+words%22&hl=en&as_sdt=0,5".format(i)

my original search term replaced with "search words" to protect privacy

Answer 1 · 2024-01-09T12:52:59.000Z

Hello,

Thank you for bringing this issue to our attention. The AttributeError you're encountering is due to the script trying to access a non-existent group in the regex search result. This occurs when the regular expression r'\d+' does not find a match in authors_tag[i].text, resulting in re.search returning None.

We have addressed this issue by adding a check to ensure a match is found before trying to access its group. Here's the updated portion of the code in the get_author_year_publi_info function:

year_match = re.search(r'\d+', authors_tag[i].text)
if year_match:
    year = int(year_match.group())
    years.append(year)
else:
    # Handling the case where no year is found
    continue

With this change, the script checks if year_match is not None before trying to access its group. If no match is found, it simply skips adding that entry. This prevents the AttributeError and ensures the script's stability.

Please note that this modification could result in the years list being shorter than the other lists if some entries lack a detectable year. Ensure that your data alignment and subsequent processing logic in the script accommodate this potential discrepancy.

I hope this resolves the issue. Please let us know if you encounter any more problems or have further questions.