Different results between the search engine scraper and google
jxpeng98 opened this issue · 2 comments
Hello,
Thanks for the outstanding library!
I recently faced an issue with different results when using the scraper.
Search query is PROCTER & GAMBLE CO sustainability report
.
From google web query, I can get the results as following:
However, when I use scraper,
from search_engines import Google
from googlesearch import search
query = 'PROCTER & GAMBLE CO sustainability report'
results = engine.search(query, 1)
links = results.links()
The output links are:
https://us.pg.com/
https://twitter.com/ProcterGamble?ref_src=twsrc^google|twcamp^serp|twgr^author
https://en.wikipedia.org/wiki/Procter_&_Gamble
https://www.pgcareers.com/
https://www.linkedin.com/company/procter-and-gamble
https://www.facebook.com/proctergamble/
https://pginvestor.com/
May I know why this happens? How can I get the consistent result?
Many thanks!
I find the issue.
It is due to &
in the query. If I change the query to PROCTER and GAMBLE CO sustainability report
. The output will be:
https://us.pg.com/sustainability-reports/
https://www.pg.co.uk/environmental-sustainability/
https://www.sustainability-reports.com/company/procter-gamble-nederland-bv/
https://www.responsibilityreports.com/Company/procter-gamble-co
https://www.pginvestor.com/esg/esg-overview/
https://www.knowesg.com/esg-ratings/the-procter-and-gamble-company
https://assets.ctfassets.net/oggad6svuzkv/6BTnYGZ9raiy4is806wCkI/dfb3ae4d8c1304f24ece241f643aed7f/2010_Full_Sustainability_Report.pdf
Is there any way to solve this problem except change the character?
First of all, thanks for all the details. You're right, the &
character changes the query from "PROCTER & GAMBLE CO sustainability report" to "PROCTER ", and so we get wrong results. I've added URL-encoding to the query, which should fix this issue.