GeneralMills/pytrends

Is it just me or have 429 errors really increase these past few days?

igalci opened this issue · 23 comments

I used to get 429 error every time I request more than 6 items per hour or so.

But recently, especially today I am not able to request more than 1 per hour without getting 429. Is it just my IP acting up?

Experiencing the same issue with the node library from @pat310 lately.

Any workarounds?

Same! the only workaround I was able to do is to re-run the code multiple times in different machines or using colab with different account.

Same here, they are locking down folks.

How is there no reliable solution for this in 2022 😢 - was getting issues on pat's node js proj and was about to try this one, but then saw same thing happening

Not facing the issue after using proxies and sleep 60 seconds after each request.

Not facing the issue after using proxies and sleep 60 seconds after each request.

Do you mind sharing where you're getting proxy from and how you're implementing them?

Not facing the issue after using proxies and sleep 60 seconds after each request.

Do you mind sharing where you're getting proxy from and how you're implementing them?

I’m using proxies from newipnow. I’m using 25 proxies and using random proxy on each request.

Guys, I just figured that if you downgrade the pytrend library to 4.7.2 or 4.7.3 it works. Also, collecting data for different geographical locations may stop the process, use only one location at a time, with up to 5 keywords. For more than 5 keywords, you need to apply normalization, and that is by using one shared keyword as a control in all sets of 5 keywords.

Alternatively, you may want to try R instead of Python: https://cran.r-project.org/web/packages/gtrendsR/gtrendsR.pdf
This is a recent release of this month; it is more recent and reliable to use.

Guys, I just figured that if you downgrade the pytrend library to 4.7.2 or 4.7.3 it works. Also, collecting data for different geographical locations may stop the process, use only one location at a time, with up to 5 keywords. For more than 5 keywords, you need to apply normalization, and that is by using one shared keyword as a control in all sets of 5 keywords.

This worked for me! Thanks for the fix!

Guys, I just figured that if you downgrade the pytrend library to 4.7.2 or 4.7.3 it works. Also, collecting data for different geographical locations may stop the process, use only one location at a time, with up to 5 keywords. For more than 5 keywords, you need to apply normalization, and that is by using one shared keyword as a control in all sets of 5 keywords.

This worked for me! Thanks for the fix!

Update: now it doesn't work... It only took 12 hours for them to block it

It is still working for me! try to increase the sleep time between requests for each subset of keywords (I use random number between 5 and 30 seconds wait). Also, don't use only one machine with the same IP address, alternate between your machine and Google Colab.

Both! I figured that even if I downgraded and sent too many requests, Google may block my IP, therefore, I had to change the IP this can be done by using proxies, vpn, Google Colab has also different range of IPs.

@ReemOmer I am curious, how do you use Google Colab to scrape? I don't believe they have an API... And I haven't found any guide like that....

@emlazzarin honestly I didn't open any of the main files in both versions, so I don't know what the difference is.
@igalci I used the same way of running the code in Jupyter notebook or a regular Python file run using the cmd. You will still use the Pytrends library and call all its functions.

I have an extremely novice understanding of the inner-workings of the package, but could this problem have something to do with cookies expiring on the trends.google site? I previously have been able to workaround 429 errors with this solution but now that doesn't work either. Scrolling through the request headers, I noticed a cookie expiration time that ends in the same minute as submitting the request.

Looks like no user-agent is specified in the requests, meaning they are blocked more often. I fixed it here: 18f230d

Fixed by #553.