Error 429: Too many requests
Closed this issue ยท 11 comments
$ curl -I 'https://paulgo.io/search?q=test'
HTTP/2 429
content-security-policy: upgrade-insecure-requests; default-src 'none'; script-src 'self'; style-src 'self' 'unsafe-inline'; form-action 'self' https://github.com; font-src 'self'; frame-ancestors 'self'; base-uri 'self'; connect-src 'self' https://overpass-api.de; img-src 'self' data: https://*.tile.openstreetmap.org; frame-src https://www.youtube-nocookie.com https://player.vimeo.com https://www.dailymotion.com https://www.deezer.com https://www.mixcloud.com https://w.soundcloud.com https://embed.spotify.com
content-type: text/html; charset=utf-8
permissions-policy: accelerometer=(),ambient-light-sensor=(),autoplay=(),camera=(),encrypted-media=(),focus-without-user-activation=(),geolocation=(),gyroscope=(),magnetometer=(),microphone=(),midi=(),payment=(),picture-in-picture=(),speaker=(),sync-xhr=(),usb=(),vr=()
referrer-policy: no-referrer
server-timing: total;dur=1.773, render;dur=0
strict-transport-security: max-age=63113904; includeSubDomains; preload
vary: Accept-Encoding
x-content-type-options: nosniff
x-download-options: noopen
x-frame-options: SAMEORIGIN
x-xss-protection: 1; mode=block
content-length: 17
date: Sat, 27 Aug 2022 10:07:59 GMT
Looks like the intended behaviour of the limiter plugin to me. https://github.com/searxng/searxng/blob/master/searx/plugins/limiter.py
Although my IP address is a public one (i.e. many other users use it). I don't believe a single person other than me actually uses PaulGo (or any other search engine outside Google or Bing). Whatever this rate limiter is doing is not right.
If it is using public IP databases, they're often wrong. For example, browserleaks.com is reporting this to be a generic tunnel or VPN while it is not (which, by the way, shouldn't be cared by PaulGo in first place, it's something sites like Wikipedia blocks from editing).
OK, it appears PaulGo is trying to fingerprint my browser (under cloudflare or something similar?). It falsely detected that I am using a tor browser since my browser is sending Accept:
as header instead of a more standard header such as Accept: */*
.
This appears to be culprit: searxng/searxng@221740f#diff-0c2c4d4a0707a722cf7fc205d624aff721b0ee750a7c9c17cb934d9945f43624R70
They probably had to explicitly check for the header before checking the type of encoding. Here's the full cURL command that I was using (or was sent via the configured browser):
curl 'https://paulgo.io/search?q=test' \
-H 'authority: paulgo.io' \
-H 'accept: ' \
-H 'accept-language: en' \
-H 'cache-control: no-cache' \
-H 'pragma: no-cache' \
-H 'sec-ch-ua: " Not A;Brand";v="99", "Chromium";v="104"' \
-H 'sec-ch-ua-mobile: ?0' \
-H 'sec-ch-ua-platform: "macOS"' \
-H 'sec-fetch-dest: document' \
-H 'sec-fetch-mode: navigate' \
-H 'sec-fetch-site: none' \
-H 'sec-fetch-user: ?1' \
-H 'upgrade-insecure-requests: 1' \
-H 'user-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.5112.102 Safari/537.36' \
--compressed
Hi @MuntashirAkon ,
First of all our limiter plugin (That is what is blocking you) is way simpler then you think... It is doing basic rate limiting with a sliding window with yoru user agent and your IP which is probably not the problem...
On top of that it parses the headers you send in your request; Here is the list of all headers that need to be send (and normally a browser would send those headers with each request) -> https://github.com/searxng/searxng/blob/master/searx/plugins/limiter.py#L61-L76
For example this is a request from firefox on linux:
curl 'https://paulgo.io/search' \
-X POST \
-H 'User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:103.0) Gecko/20100101 Firefox/103.0' \
-H 'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8' \
-H 'Accept-Language: en-US,en;q=0.5' \
-H 'Accept-Encoding: gzip, deflate, br' \
-H 'Content-Type: application/x-www-form-urlencoded' \
-H 'Origin: null' \
-H 'DNT: 1' \
-H 'Connection: keep-alive' \
-H 'Upgrade-Insecure-Requests: 1' \
-H 'Sec-Fetch-Dest: document' \
-H 'Sec-Fetch-Mode: navigate' \
-H 'Sec-Fetch-Site: same-origin' \
-H 'Sec-Fetch-User: ?1' \
-H 'Sec-GPC: 1' \
-H 'Pragma: no-cache' \
-H 'Cache-Control: no-cache' \
-H 'TE: trailers' \
--data-raw 'q=time&category_general=1&language=en-US&time_range=&safesearch=1&theme=simple'
So you can fix your issues by sending the Accept-Language
, Connection
and Accept-Encoding encode headers with your browser...
Since this is not really an issue with SearXNG nor my patch repo, but rather your browser config IMO -> I will close the issue for now, if you think that was in error, feel free to comment again :D ; Closing.
I do think this to be a bug and I believe that sufficient testing has not been carried out. Although I have specified the underlying problem in the comment above which I have found after a little testing, I will try to elaborate it here:
The empty Accept:
as specified in the comment is indeed the issue. For example, the command above (copied from the unGoogled Chromium browser) returns 429
code as I have explained above:
Command 1
curl 'https://paulgo.io/search?q=test' \ -H 'authority: paulgo.io' \ -H 'accept: ' \ -H 'accept-language: en' \ -H 'cache-control: no-cache' \ -H 'pragma: no-cache' \ -H 'sec-ch-ua: " Not A;Brand";v="99", "Chromium";v="104"' \ -H 'sec-ch-ua-mobile: ?0' \ -H 'sec-ch-ua-platform: "macOS"' \ -H 'sec-fetch-dest: document' \ -H 'sec-fetch-mode: navigate' \ -H 'sec-fetch-site: none' \ -H 'sec-fetch-user: ?1' \ -H 'upgrade-insecure-requests: 1' \ -H 'user-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.5112.102 Safari/537.36' \ -I \ --compressed
Output 1
HTTP/2 429
content-security-policy: upgrade-insecure-requests; default-src 'none'; script-src 'self'; style-src 'self' 'unsafe-inline'; form-action 'self' https://github.com; font-src 'self'; frame-ancestors 'self'; base-uri 'self'; connect-src 'self' https://overpass-api.de; img-src 'self' data: https://*.tile.openstreetmap.org; frame-src https://www.youtube-nocookie.com https://player.vimeo.com https://www.dailymotion.com https://www.deezer.com https://www.mixcloud.com https://w.soundcloud.com https://embed.spotify.com
content-type: text/html; charset=utf-8
permissions-policy: accelerometer=(),ambient-light-sensor=(),autoplay=(),camera=(),encrypted-media=(),focus-without-user-activation=(),geolocation=(),gyroscope=(),magnetometer=(),microphone=(),midi=(),payment=(),picture-in-picture=(),speaker=(),sync-xhr=(),usb=(),vr=()
referrer-policy: no-referrer
server-timing: total;dur=3.085, render;dur=0
strict-transport-security: max-age=63113904; includeSubDomains; preload
vary: Accept-Encoding
x-content-type-options: nosniff
x-download-options: noopen
x-frame-options: SAMEORIGIN
x-xss-protection: 1; mode=block
content-length: 17
date: Mon, 29 Aug 2022 15:27:47 GMT
However this issue goes away if Accept:
(highlighted in command 1) is not present altogether.
Command 2
curl 'https://paulgo.io/search?q=test' \
-H 'authority: paulgo.io' \
-H 'accept-language: en' \
-H 'cache-control: no-cache' \
-H 'pragma: no-cache' \
-H 'sec-ch-ua: " Not A;Brand";v="99", "Chromium";v="104"' \
-H 'sec-ch-ua-mobile: ?0' \
-H 'sec-ch-ua-platform: "macOS"' \
-H 'sec-fetch-dest: document' \
-H 'sec-fetch-mode: navigate' \
-H 'sec-fetch-site: none' \
-H 'sec-fetch-user: ?1' \
-H 'upgrade-insecure-requests: 1' \
-H 'user-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.5112.102 Safari/537.36' \
-I \
--compressed
Output 2
HTTP/2 200
content-security-policy: upgrade-insecure-requests; default-src 'none'; script-src 'self'; style-src 'self' 'unsafe-inline'; form-action 'self' https://github.com; font-src 'self'; frame-ancestors 'self'; base-uri 'self'; connect-src 'self' https://overpass-api.de; img-src 'self' data: https://*.tile.openstreetmap.org; frame-src https://www.youtube-nocookie.com https://player.vimeo.com https://www.dailymotion.com https://www.deezer.com https://www.mixcloud.com https://w.soundcloud.com https://embed.spotify.com
content-type: text/html; charset=utf-8
permissions-policy: accelerometer=(),ambient-light-sensor=(),autoplay=(),camera=(),encrypted-media=(),focus-without-user-activation=(),geolocation=(),gyroscope=(),magnetometer=(),microphone=(),midi=(),payment=(),picture-in-picture=(),speaker=(),sync-xhr=(),usb=(),vr=()
referrer-policy: no-referrer
server-timing: total;dur=625.116, render;dur=24.782, total_0_wikipedia;dur=107.22, total_1_bing;dur=266.973, total_2_qwant;dur=413.882, total_3_google;dur=591.302, load_0_wikipedia;dur=106.326, load_1_bing;dur=255.132, load_2_qwant;dur=392.4, load_3_google;dur=563.537
strict-transport-security: max-age=63113904; includeSubDomains; preload
vary: Accept-Encoding
x-content-type-options: nosniff
x-download-options: noopen
x-frame-options: SAMEORIGIN
x-xss-protection: 1; mode=block
content-length: 49951
date: Mon, 29 Aug 2022 15:31:02 GMT
I hope this gives you more insight regarding the issue.
Any updates on this?
Hi,
sorry for not reaching back for so long; I did not fix the limiter issue yet, but disabled the limiter as a test on my public instance. If my proxies are enough so my instance will not get rate limited by engines I will leave the limiter disabled and maybe even enable the JSON API ๐
Thanks! I can confirm that it's working again.
Just a quick update I enabled the limiter again, since my instance was getting blocked on major engines like qwant and google.
There are only really two ways to solve this issue: Buy more VPS servers with public IPs to route the traffic or to enable the limiter again. For now I enabled the limiter, but in the future I want to deploy more servers to make it possible to use my instance without the limiter ๐