Internal server error for a NYTimes URL
Closed this issue · 11 comments
Running the following command, I'm getting status code 500 for Internal Server Error
curl --verbose \
--header "Content-Type: text/plain" \
--data 'https://nyti.ms/1NuB0WJ' \
'https://translate.manubot.org/web'
Full output:
* Connected to translate.manubot.org (35.221.11.188) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
* CAfile: /etc/ssl/certs/ca-certificates.crt
* CApath: /etc/ssl/certs
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.2 (IN), TLS handshake, Certificate (11):
* TLSv1.2 (IN), TLS handshake, Server key exchange (12):
* TLSv1.2 (IN), TLS handshake, Server finished (14):
* TLSv1.2 (OUT), TLS handshake, Client key exchange (16):
* TLSv1.2 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.2 (OUT), TLS handshake, Finished (20):
* TLSv1.2 (IN), TLS handshake, Finished (20):
* SSL connection using TLSv1.2 / ECDHE-RSA-CHACHA20-POLY1305
* ALPN, server accepted to use http/1.1
* Server certificate:
* subject: CN=translate.manubot.org
* start date: Sep 23 15:02:01 2021 GMT
* expire date: Dec 22 15:02:00 2021 GMT
* subjectAltName: host "translate.manubot.org" matched cert's "translate.manubot.org"
* issuer: C=US; O=Let's Encrypt; CN=R3
* SSL certificate verify ok.
> POST /web HTTP/1.1
> Host: translate.manubot.org
> User-Agent: curl/7.74.0
> Accept: */*
> Content-Type: text/plain
> Content-Length: 23
>
* upload completely sent off: 23 out of 23 bytes
* Mark bundle as not supporting multiuse
< HTTP/1.1 500 Internal Server Error
< Server: nginx/1.14.0 (Ubuntu)
< Date: Sat, 06 Nov 2021 21:00:58 GMT
< Content-Type: text/plain; charset=utf-8
< Content-Length: 21
< Connection: keep-alive
<
* Connection #0 to host translate.manubot.org left intact
translate.manubot.org
is a public translation server instance we host that is up to date with 9831fc3.
I'll try to get the internal server logs when we have this error.
@dongbohu how do we see the logs for the translation-server process that is managed with supervisor?
@dhimmel: According to /etc/supervisor/conf.d/translation-server.conf
:
- standard output is in
/var/log/supervisor/translation-server.log
- standard error is in
/var/log/supervisor/translation-server.err
Works for us.
Now failing after a few tries from AWS. Still working for a local install. Possible they're rate-limiting.
TypeError: Cannot read property 'replace' of null
TypeError: Cannot read property 'replace' of null
at addHighwireMetadata (eval at <anonymous> (/var/task/src/translation/sandboxManager.js:70:4), <anonymous>:471:59)
at completeItem (eval at <anonymous> (/var/task/src/translation/sandboxManager.js:70:4), <anonymous>:214:2)
at eval (eval at <anonymous> (/var/task/src/translation/sandboxManager.js:70:4), <anonymous>:343:4)
at /var/task/modules/zotero/chrome/content/zotero/xpcom/translation/translate.js:384:8
at Zotero.Translate.Import._runHandler (/var/task/modules/zotero/chrome/content/zotero/xpcom/translation/translate.js:1128:32)
at run (/var/task/modules/zotero/chrome/content/zotero/xpcom/translation/translate.js:188:23)
at /var/task/modules/zotero/chrome/content/zotero/xpcom/translation/translate.js:250:6
at new Promise (<anonymous>)
at Object._itemDone (/var/task/modules/zotero/chrome/content/zotero/xpcom/translation/translate.js:248:11)
at Object._itemDone (/var/task/src/translation/sandboxManager.js:96:17)
I suspect they're serving something different in the cases where it's failing, but this looks like a bug in the Embedded Metadata translator — if I skip the call to addHighwireMetadata()
it succeeds. We'll investigate.
They're definitely also rate-limiting, though. After enough requests the server starts getting a 403 from nytimes.com, and then it returns 500. So it's possible that's all you're seeing.
OK, rate-limiting and page differences aside, there was a regression in the Embedded Metadata translator from a couple weeks ago. I've pushed a fix, so if you were seeing the Cannot read property 'replace' of null
error above, pull the latest translators and it should be fixed. If you're getting a 403, nothing we can do about that.
Thanks for reporting.
Thanks a lot @dstillman for zotero/translators@0d435d8! Our CI builds are now back to 🟢.
pull the latest translators and it should be fixed
I did this using git submodule update --remote --merge
which updated the submodules beyond the commits specified by zotero/translation-server
currently for these submodules:
Submodule path 'modules/translate': merged in 'a9308c0e8632846ca2dc069a1b72db0a33f99ca6'
Submodule path 'modules/translators': merged in '0d435d8a952639d4e7489263b3a40c89377ecd31'
Submodule path 'modules/zotero-schema': merged in '97e0a8efa2cb2cf6c9853ceca334ec56180a9df0'
Is that okay, or is it best to just fast-forward modules/translators
since the other two should be updated in lock-step with zotero/translation-server
?
BTW we actually still get some CI failures:
But since it only happened in the later jobs, I bet it's rate limiting like @dstillman mentioned. One reason we should look into caching.
If you're getting a 403, nothing we can do about that.
I'm not actually sure how to see that. In the stderr logs for translation-server, this is what the failure looks like:
2021-11-07 10:31:22,151:
InternalServerError: An error occurred retrieving the document
at Object.throw (/home/translate/translation-server/node_modules/koa/lib/context.js:97:11)
at module.exports.WebSession.handleURL (/home/translate/translation-server/src/webSession.js:219:19)
at <anonymous>
at process._tickDomainCallback (internal/process/next_tick.js:228:7)
@dstillman where'd you see the TypeError: Cannot read property 'replace' of null
log?
I wouldn't use NYT for CI, since they rate-limit. Use something that will work reliably.
Is that okay, or is it best to just fast-forward modules/translators since the other two should be updated in lock-step with zotero/translation-server?
Yeah, definitely don't update the others. Just use git pull origin master
in the translators
submodule.
where'd you see the TypeError: Cannot read property 'replace' of null log?
It's just in the stdout from the server, which includes Zotero debug output (with lines beginning with, e.g., (3)(+0000010):
).
Hi, I'm also getting 500's or 403's from NY-times with recent versions from this repo. If fixing this is not your priority (which I would understand), you might want to have a different url in the README since that url is a NY-times url that does not reliably work.
Perhaps update it with this https://www.theverge.com/23727238/net-neutrality-history-fcc-legislation (not that I can say that the article on the recent updates on net neutrality is accurate).