Iceloof/GoogleNews

get_news() does not take into account the date range

PashaM999 opened this issue · 7 comments

Hi, I have been looking through the code and found that the function get_news has the url generated as follows:

self.url = 'https://news.google.com/search?q={}+when:{}&hl={}'.format(key,self.__period,self.__lang.lower())

(line 259 in __init__.py)

This uses the self.__period variable, which is only responsible for periods like 7d 1m and etc. Google has a search filter to use for specific dates, which can be implemented into your code as follows:

start = f'{self.__start[-4:]}-{self.__start[:2]}-{self.__start[3:5]}'
end = f'{self.__end[-4:]}-{self.__end[:2]}-{self.__end[3:5]}'
self.url = 'https://news.google.com/search?q={}+before:{}+after:{}&hl={}'.format(key,end, start, self.__lang.lower())

This is merely a suggestion, but I feel like if the __start and __end variables are set, you sould prioritize this over your original solution.

Hope that will be useful for someone :)

Time period is not always working with Google, sometimes Google will return data out of specific range. Anyway, start/end filter might be able to add, will update it in next release.

I came across the same issue, PashaM999's solution seems to work.

zxdawn commented

Nice solution! Any plan to fix it?

Nice solution! Any plan to fix it?

It's from Google, we can't do anything about it.

zxdawn commented

Em ... I have tried to add before:{}+after:{} like @PashaM999 did and it works well for my case. Maybe we can add the function and mention that Google sometimes returns data out of a specific range in the README file?

I am trying this on 1.6.12 version. after my code (shown below) was only returning the recent news.

import pandas as pd
from GoogleNews import GoogleNews
#googlenews = GoogleNews()
googlenews = GoogleNews()
googlenews.clear()
googlenews = GoogleNews(start='01/01/2021',end='12/31/2021')

#googlenews = GoogleNews(lang='pt', region='BR')
googlenews.set_lang('pt')
googlenews.set_time_range('01/01/2021','12/31/2021')

I am editing now init.py (seems the self.url line is now on 273)
Just copy and paste the @PashaM999 solution on init.py file over the 273rd line?
Where do I set the start and end? on my own code?

Is it possible to include @PashaM999 in future updates? That would be helpful