hnhx/librex

Text results not outputting.

Closed this issue · 10 comments

Server

Linode VPS box running Ubuntu 22.04

Issue

I tried running the same query as one of your example images ("gnu project"). The image on the right side showed up, the pagination showed up, but no listings.

Screenshot

librex_no_text_results

Error Logs

2023/05/17 14:14:12 [error] 6378#6378: *165 FastCGI sent in stderr: "PHP message: PHP Warning:  Undefined array key 0 in /sites/librex/engines/google/text.php on line 156PHP message: PHP Fatal error:  Uncaught TypeError: array_key_exists(): Argument #2 ($array) must be of type array, null given in /sites/librex/engines/google/text.php:157
Stack trace:
#0 /sites/librex/search.php(73): print_text_results()
#1 {main}
  thrown in /sites/librex/engines/google/text.php on line 157" while reading response header from upstream, client: 23.124.160.46, server: pengigo.com, request: "GET /search.php?p=10&q=gnu+project&t=0 HTTP/1.1", upstream: "fastcgi://unix:/run/php/php-fpm.sock:", host: "pengigo.com"

bonjour, même problème sous docker synology
hello, same problem under docker synology

Désolé pour le dérangement, problème résolu, j'avais oublié le réglage d'un paramètre. Merci pour ce très bon logiciel
Sorry for the inconvenience, problem solved, I forgot to set a parameter.Thank you for this very good software

Update

After installing php-xml, curl, and maybe another package (can't quite remember), it still isn't working on my Linode server.

I downloaded the code from my server to my local box (Pop!_OS 22.04).
I downloaded the nginx config file to my local box and changed the path and url, nothing else

Everything works as intended.

Is there a package needed for this to work that maybe I have on my local box but not installed on the server, and not listed in the Wiki-How to host LibreX?

Fijxu commented

Probably (at 90% accuracy) that your instance is getting rate limited. This is very very common problem if you are hosting services like this in well known VPS services like AWS, Linode, Vultr, etc. Since Linode and Vultr are the most affordable ones, people use them more than other VPS providers so it's very normal for search engines to range rate limit LI or VUL ip addresses.

How see if I'am getting rate limited?

You will need to set up a MITM proxy, the most common program to do this is mitmproxy, a CLI mitm proxy, easy to use. This will allow us to see the requests made and see the error message that Google or Qwant is throwing.

After you install mitmproxy, launch it using mitmproxy command.

Now open a new session, go into the config.php file at line 137 and 138, uncomment those and in the line 137 set "ip:port" to "127.0.0.1:8080". Leave 138 untouched.

Now add this line, it can be below of the CURLOPT_PROXYTYPE line (following the syntax of course):

CURLOPT_SSL_VERIFYPEER => "0L",

It should look like this:

"curl_settings" => array(
      CURLOPT_PROXY => "127.0.0.1:8080",
      CURLOPT_PROXYTYPE => CURLPROXY_HTTP,
      CURLOPT_SSL_VERIFYPEER => "0L",
      CURLOPT_RETURNTRANSFER => true,
      ...

Save the file, search something on the LibreX instance and see the responses on mitmproxy.

If you see in the responses that you are rate limited or you requests are automated, then nothing to do. Choose other VPS provider or Selfhost

@Fijxu Thank you very much for your reply.

Per your suggestion, I installed mitmproxy, started it up, and ran a search (for "Laravel").

Is it normal for a 302 Redirect response from Google? Is that how they "rate limit"?

You can see the results via the attached images.

MITMPROXY

mitmproxy

SEARCH RESULTS

laravel_search

Fijxu commented

Is it normal for a 302 Redirect response from Google? Is that how they "rate limit"?

Yes, that is the redirect to the "unusual traffic" webpage with a captcha.

Ignore any IP addresses, is it just is a throwaway server from Linode

The image says it all.

@Fijxu Once again, thank you very much for your reply.

I guess I can't use my Linode VPS for my own LibreX instance.
That sucks.

Oh well.
Thanks for your help.

As weird as it sounds, proxying through Tor will just work. You can open it up on a port somewhere and configure it in curl_settings.

@codedipper can you give a few more details on the setup process? Sorry, haven't dealt with tor much.

From what I was able to see in googling, I guess I need to install the tor package, make sure port 9050 is open in firewall, and set curl to run through localhost:9050 or something like that?

Is that anywhere close to what needs to happen?

Fijxu commented

Yes, install tor, start the tor service (to connect it to the tor network), and then just modify config.php like this:

"curl_settings" => array(
      CURLOPT_PROXY => "127.0.0.1:9050",
      CURLOPT_PROXYTYPE => CURLPROXY_SOCKS5,
      ...

To be sure that it works, do curl -x socks5://127.0.0.1:9050 ip.me

Don't open any firewall ports, they are not necessary. I didn't test this by the way.

References to CURLOPT_PROXYTYPE & CURLPROXY_SOCKS5: https://curl.se/libcurl/c/CURLOPT_PROXYTYPE.html

@codedipper can you give a few more details on the setup process? Sorry, haven't dealt with tor much.

From what I was able to see in googling, I guess I need to install the tor package, make sure port 9050 is open in firewall, and set curl to run through localhost:9050 or something like that?

Is that anywhere close to what needs to happen?

Yes, but don't publicly expose any ports.
Exact configuration I use on my instance:
/etc/systemd/system/tor-proxy.service (make sure to systemctl enable!):

[Unit]
Description=Anonymizing overlay network for TCP
After=network-online.target nss-lookup.target
PartOf=tor.service
ReloadPropagatedFrom=tor.service

[Service]
Type=notify
NotifyAccess=all
PIDFile=/run/tor-proxy/tor.pid
PermissionsStartOnly=yes
ExecStartPre=/usr/bin/tor -f /etc/tor/torrc-proxy --verify-config
ExecStart=/usr/bin/tor -f /etc/tor/torrc-proxy
ExecReload=/bin/kill -HUP ${MAINPID}
KillSignal=SIGINT
TimeoutStartSec=300
TimeoutStopSec=60
Restart=on-failure
LimitNOFILE=65536

# Hardening
NoNewPrivileges=yes
PrivateTmp=yes
PrivateDevices=yes
ProtectHome=yes
ProtectSystem=full
ReadOnlyDirectories=/
# We would really like to restrict the next item to [..]/%i but we can't,
# as systemd does not support that yet.  See also #781730.
ReadWriteDirectories=-/var/lib/tor-proxy/
ReadWriteDirectories=-/run
CapabilityBoundingSet=CAP_SETUID CAP_SETGID CAP_NET_BIND_SERVICE CAP_DAC_READ_SEARCH

[Install]
WantedBy=multi-user.target

Make sure to create the directories and run these commands on them:

chown -R debian-tor: 
chmod -R u+rwX,og-rwx 

/etc/tor/torrc-proxy:

User debian-tor
DataDirectory /var/lib/tor-proxy
SOCKSPort 9050

librex/config.php:

        "curl_settings" => array(
            CURLOPT_PROXY => "127.0.0.1:9050",
            CURLOPT_PROXYTYPE => CURLPROXY_SOCKS5,
            ...

This will vary on different systems, blah blah blah, you know what you're doing.
I use multiple systemd services because I have multiple instances of Tor running for bridge and onion services, and DataDirectory because otherwise it will try to create one in /root/.tor.