tasos-py/Search-Engines-Scraper

Google text attribute is empty

sershev opened this issue · 2 comments

Many thanks again for your work, it works great. Recently I noticed that if searching with Google the text field is empty for all results.

se = Google()
res = se.search("news", 1)
res.results()

[{'host': 'bbc.com',
  'link': 'https://www.bbc.com/news/world',
  'title': 'World - BBC Newshttps://www.bbc.com › news › world',
  'text': ''},
 {'host': 'edition.cnn.com',
  'link': 'https://edition.cnn.com/world',
  'title': 'World news – breaking news, videos and headlines - CNNhttps://edition.cnn.com › world',
  'text': ''},
 {'host': 'theguardian.com',
  'link': 'https://www.theguardian.com/world',
  'title': 'Latest news from around the world | The Guardianhttps://www.theguardian.com › world',
  'text': ''},
 {'host': 'hindustantimes.com',
  'link': 'https://www.hindustantimes.com/world-news',
  'title': 'World News, Latest World News, Breaking News and ...https://www.hindustantimes.com › World News',
  'text': ''},
 {'host': 'reuters.com',
  'link': 'https://www.reuters.com/news/archive/worldNews',
  'title': 'World News Headlines | Reutershttps://www.reuters.com › news › archive › worldNews',
  'text': ''},
 {'host': 'abcnews.go.com',
  'link': 'https://abcnews.go.com/International/',
  'title': 'International News | Latest World News, Videos & Photos ...https://abcnews.go.com › International',
  'text': ''},
 {'host': 'news.sky.com',
  'link': 'https://news.sky.com/world',
  'title': 'World News - Breaking international news and headlines | Sky ...https://news.sky.com › world',
  'text': ''},
 {'host': 'nytimes.com',
  'link': 'https://www.nytimes.com/section/world',
  'title': 'World News - The New York Timeshttps://www.nytimes.com › section › world',
  'text': ''},

Thanks for letting me know. They changed their HTML structure and the CSS selector we had stopped working. It's fixed now.

Recently, Google HTML structure for each page seems to change and evolve as more queries is submitted. More and more of the text fields become empty with the increase usage of the Google search engine. Currently, span > span works decently, but there are times where entire batches of data have their text field completely empty. Will there be any plans to fix this/explore the different combinations of HTML structure Google can generate?