zotero/translation-server

Internal server error for a NYTimes URL

Closed this issue · 11 comments

Running the following command, I'm getting status code 500 for Internal Server Error

curl --verbose \
  --header "Content-Type: text/plain" \
  --data 'https://nyti.ms/1NuB0WJ' \
  'https://translate.manubot.org/web'

Full output:

* Connected to translate.manubot.org (35.221.11.188) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*  CAfile: /etc/ssl/certs/ca-certificates.crt
*  CApath: /etc/ssl/certs
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.2 (IN), TLS handshake, Certificate (11):
* TLSv1.2 (IN), TLS handshake, Server key exchange (12):
* TLSv1.2 (IN), TLS handshake, Server finished (14):
* TLSv1.2 (OUT), TLS handshake, Client key exchange (16):
* TLSv1.2 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.2 (OUT), TLS handshake, Finished (20):
* TLSv1.2 (IN), TLS handshake, Finished (20):
* SSL connection using TLSv1.2 / ECDHE-RSA-CHACHA20-POLY1305
* ALPN, server accepted to use http/1.1
* Server certificate:
*  subject: CN=translate.manubot.org
*  start date: Sep 23 15:02:01 2021 GMT
*  expire date: Dec 22 15:02:00 2021 GMT
*  subjectAltName: host "translate.manubot.org" matched cert's "translate.manubot.org"
*  issuer: C=US; O=Let's Encrypt; CN=R3
*  SSL certificate verify ok.
> POST /web HTTP/1.1
> Host: translate.manubot.org
> User-Agent: curl/7.74.0
> Accept: */*
> Content-Type: text/plain
> Content-Length: 23
> 
* upload completely sent off: 23 out of 23 bytes
* Mark bundle as not supporting multiuse
< HTTP/1.1 500 Internal Server Error
< Server: nginx/1.14.0 (Ubuntu)
< Date: Sat, 06 Nov 2021 21:00:58 GMT
< Content-Type: text/plain; charset=utf-8
< Content-Length: 21
< Connection: keep-alive
< 
* Connection #0 to host translate.manubot.org left intact

translate.manubot.org is a public translation server instance we host that is up to date with 9831fc3.

I'll try to get the internal server logs when we have this error.

@dongbohu how do we see the logs for the translation-server process that is managed with supervisor?

@dhimmel: According to /etc/supervisor/conf.d/translation-server.conf:

  • standard output is in /var/log/supervisor/translation-server.log
  • standard error is in /var/log/supervisor/translation-server.err

Now failing after a few tries from AWS. Still working for a local install. Possible they're rate-limiting.

TypeError: Cannot read property 'replace' of null

    TypeError: Cannot read property 'replace' of null
        at addHighwireMetadata (eval at <anonymous> (/var/task/src/translation/sandboxManager.js:70:4), <anonymous>:471:59)
        at completeItem (eval at <anonymous> (/var/task/src/translation/sandboxManager.js:70:4), <anonymous>:214:2)
        at eval (eval at <anonymous> (/var/task/src/translation/sandboxManager.js:70:4), <anonymous>:343:4)
        at /var/task/modules/zotero/chrome/content/zotero/xpcom/translation/translate.js:384:8
        at Zotero.Translate.Import._runHandler (/var/task/modules/zotero/chrome/content/zotero/xpcom/translation/translate.js:1128:32)
        at run (/var/task/modules/zotero/chrome/content/zotero/xpcom/translation/translate.js:188:23)
        at /var/task/modules/zotero/chrome/content/zotero/xpcom/translation/translate.js:250:6
        at new Promise (<anonymous>)
        at Object._itemDone (/var/task/modules/zotero/chrome/content/zotero/xpcom/translation/translate.js:248:11)
        at Object._itemDone (/var/task/src/translation/sandboxManager.js:96:17)

I suspect they're serving something different in the cases where it's failing, but this looks like a bug in the Embedded Metadata translator — if I skip the call to addHighwireMetadata() it succeeds. We'll investigate.

They're definitely also rate-limiting, though. After enough requests the server starts getting a 403 from nytimes.com, and then it returns 500. So it's possible that's all you're seeing.

OK, rate-limiting and page differences aside, there was a regression in the Embedded Metadata translator from a couple weeks ago. I've pushed a fix, so if you were seeing the Cannot read property 'replace' of null error above, pull the latest translators and it should be fixed. If you're getting a 403, nothing we can do about that.

Thanks for reporting.

Thanks a lot @dstillman for zotero/translators@0d435d8! Our CI builds are now back to 🟢.

pull the latest translators and it should be fixed

I did this using git submodule update --remote --merge which updated the submodules beyond the commits specified by zotero/translation-server currently for these submodules:

Submodule path 'modules/translate': merged in 'a9308c0e8632846ca2dc069a1b72db0a33f99ca6'
Submodule path 'modules/translators': merged in '0d435d8a952639d4e7489263b3a40c89377ecd31'
Submodule path 'modules/zotero-schema': merged in '97e0a8efa2cb2cf6c9853ceca334ec56180a9df0'

Is that okay, or is it best to just fast-forward modules/translators since the other two should be updated in lock-step with zotero/translation-server?

BTW we actually still get some CI failures:

image

But since it only happened in the later jobs, I bet it's rate limiting like @dstillman mentioned. One reason we should look into caching.

If you're getting a 403, nothing we can do about that.

I'm not actually sure how to see that. In the stderr logs for translation-server, this is what the failure looks like:

2021-11-07 10:31:22,151: 
  InternalServerError: An error occurred retrieving the document
      at Object.throw (/home/translate/translation-server/node_modules/koa/lib/context.js:97:11)
      at module.exports.WebSession.handleURL (/home/translate/translation-server/src/webSession.js:219:19)
      at <anonymous>
      at process._tickDomainCallback (internal/process/next_tick.js:228:7)

@dstillman where'd you see the TypeError: Cannot read property 'replace' of null log?

I wouldn't use NYT for CI, since they rate-limit. Use something that will work reliably.

Is that okay, or is it best to just fast-forward modules/translators since the other two should be updated in lock-step with zotero/translation-server?

Yeah, definitely don't update the others. Just use git pull origin master in the translators submodule.

where'd you see the TypeError: Cannot read property 'replace' of null log?

It's just in the stdout from the server, which includes Zotero debug output (with lines beginning with, e.g., (3)(+0000010):).

Hi, I'm also getting 500's or 403's from NY-times with recent versions from this repo. If fixing this is not your priority (which I would understand), you might want to have a different url in the README since that url is a NY-times url that does not reliably work.
Perhaps update it with this https://www.theverge.com/23727238/net-neutrality-history-fcc-legislation (not that I can say that the article on the recent updates on net neutrality is accurate).