nginx/njs

PCRE2 Token Match Errors

Closed this issue · 11 comments

In the pre-release version of NJS (repo version), there are some errors appearing in logs when certain types of string and regex match occur.

I’ve started to notice this in the logs.
2023/10/12 15:36:25 [error] 2221197#2221197: *356823 js: Decoding
variable error: InternalError: pcre2_match() failed: bad offset value

Since that’s referring to an internal C code call to pcre2_match, I have no way to tell what it’s actually referring to at the C level.

I was able to find the log entry related to the problem, though. The
request URI was this:
/?s=%E5%A4%A7%E5%A5%96GOS%E8%80%81%E8%99%8E%E6%9C%BA%E9%80%81%E5%BD%A9%E9%87%91%somsething.com///

That’s a bogus request, obviously, but PCRE seems to have failed entirely when trying to decode decode that. Basically I’m passing it through decodeURIComponent first and then it does a bit more work.

I think that the code that it’s choking on is this:

// Collapse multiple slashes
if(decoded_uri.indexOf('//') != -1 || decoded_uri.indexOf('\\\\') != -1){
 decoded_uri = decoded_uri.replace(/([\/\\]){2,}/g, '$1');
 if(decoded_uri.indexOf('http') != -1){
 decoded_uri = decoded_uri.replace(/(https?):\//i, '$1://');
 }
}

As far as I can tell, the problem occurs with the first decoded_uri.replace call. The problem is reproduceable 100% of the time that I use that sort of URL.

Please let me know if there is anything further that I can provide to help identify this issue.

xeioex commented

Hi @lancedockins,

Could you please provide me with the full but shortest script that fails?
I am trying the following script, but it fails before the RegExp part.

var uri = '/?s=%E5%A4%A7%E5%A5%96GOS%E8%80%81%E8%99%8E%E6%9C%BA%E9%80%81%E5%BD%A9%E9%87%91%somsething.com///';
var decoded_uri = decodeURIComponent(uri);

// Collapse multiple slashes
if(decoded_uri.indexOf('//') != -1 || decoded_uri.indexOf('\\\\') != -1){
 decoded_uri = decoded_uri.replace(/([\/\\]){2,}/g, '$1');
 if(decoded_uri.indexOf('http') != -1){
 decoded_uri = decoded_uri.replace(/(https?):\//i, '$1://');
 }
}
./build/njs  t.js
Thrown:
URIError: malformed URI
    at decodeURIComponent (native)
    at main (t.js:2)
xeioex commented

@lancedockins,

Also, please ensure that you are using the latest revision, which includes f781317.

Sorry. The URL encoding seems to have partially gone away in the sample URL that I provided. Try this:

/?s=%E8%B6%A3%E6%8B%8D%E6%8D%95%E9%B1%BC%E6%B8%B8%E6%88%8F%E9%A6%96%E5%AD%98%E6%B4%BB%E5%8A%A8%20www.qpyl18.com%2F%2F%2F

Also, I should note that I've now also been able to tie this type of URL and code to a CPU and memory spike that has been crashing Nginx on the servers that experience these types of URLs.

I am now able to reproduce this problem (both the PCRE token match error and the CPU/memory spike + crash) pretty consistently. I tried to get debug output but it wasn't useful. It just told me that it dispatched the js_set function that runs the code above and the code works without issue until it gets to this same block of code that I provided above. Once it gets to that step, CPU and memory use spike and the server often runs out of memory even if it has many GB's free of memory. I've seen this happen with servers that have 32gb of RAM with around 27gb free. So whatever is going wrong in the PCRE or memory side of things in the code underneath this JS seems to be getting stuck in some way that exhausts the server resources.

Please let me know if you need anything else from me.

Oh... also, I should add that my test environment had every commit in the repo in it other than the one that you posted in the last 24 hours. It was a custom compile of Nginx.

OS is CentOS 9 Stream

Nginx build info is:
configure arguments: --with-cc-opt='-g -O2 -fstack-protector-strong -Wformat -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -fPIC -DTCP_FASTOPEN=23' --with-ld-opt='-Wl,-Bsymbolic-functions -Wl,-z,relro -Wl,-z,now -Wl,--as-needed' --with-openssl-opt='enable-tls1_3 enable-ec_nistp_64_gcc_128 no-nextprotoneg no-weak-ssl-ciphers no-ssl3 enable-ktls' --prefix=/usr/share/nginx --conf-path=/path/to/nginx/nginx.conf --http-log-path=/var/log/nginx/access.log --error-log-path=/var/log/nginx/error.log --lock-path=/var/lock/nginx.lock --pid-path=/run/nginx.pid --http-client-body-temp-path=/var/lib/nginx/body --http-fastcgi-temp-path=/var/lib/nginx/fastcgi --http-proxy-temp-path=/var/lib/nginx/proxy --user=nobody --group=nobody --with-pcre-jit --with-http_ssl_module --with-http_stub_status_module --with-openssl=./openssl-3.1.3 --with-zlib=./zlib-ng --with-pcre=./pcre2-10.42 --with-http_realip_module --with-http_auth_request_module --with-http_gzip_static_module --with-http_v2_module --with-http_sub_module --with-file-aio --with-http_xslt_module --with-http_flv_module --with-http_mp4_module --with-http_gunzip_module --with-threads --without-http_autoindex_module --without-http_ssi_module --without-http_scgi_module --without-http_uwsgi_module --add-module=./ngx_brotli --add-module=./ngx_http_geoip2_module-3.4 --add-module=./njs-master/nginx

I have been able to replicate the Nginx crash problem with String.replaceAll as well. As soon as I take the /g modifier off, the problem stops regardless of whether I use String.replace or String.replaceAll. Unfortunately, the need here is to replace all instances of the pattern... not just the first one. And since this seems to be able to crash Nginx anyway, it seems like a pretty significant problem

xeioex commented

Hi @lancedockins,

Thanks for reporting the problem, I was able to reproduce and fix it.
It is not related to the previous optimization that I made, but it is a new bug that became visible.

Excellent. If you need me to test it when it's out, I'm happy to. We have at least 3-4 servers that are actively suffering from this bug on a routine basis. One of them is experiencing an Nginx CPU/memory-related crash at least every 2 days if not a few times in a row when this occurs.

xeioex commented

Hi @lancedockins,

The patch also fixes the memory and cpu hog, that was the result of the bug while iterating over string with a global regexp.

I deployed this on a server that has routinely run into this issue and can confirm that this patch seems to solve the problem. If I notice any other variations on this problem, I'll let you know.

Many thanks for your help @xeioex

xeioex commented

@lancedockins,

Thank you for the feedback, it helps.

Feel free to report the memory consumption changes as well.