DOAJ/doaj

Internal Server Error on article upload

Closed this issue · 10 comments

From email 2018-01-11:

Two different users have told us they are facing internal server errors when trying to upload metadata. This has happened at around 12 midday today. We know for sure that the xmls they are trying to upload are correct as they tried to upload something that was already indexed to double test.
Can you see any sluggishness or shed some light?

'12 midday' is probably 1100 UTC, since the email got to me at 1138 UTC, and @ClaraDOAJ is probably not clairvoyant. Probably.

Another email from today at 12.07 pm CET.
There might be more. I am going through feedback emails now

Hello,
I get an "internal server error", when I try to upload our journal data in xml.
Can you let me know whether this is an internal server error from your side? I can provide the file as well.
Best,

Hello, these are the two accounts I have found so far
20759517
18410413

Here's the log entries that seem to correspond to the messages:

2018-01-12 12:24:51,170 - portality.core - ERROR - Exception on /publisher/uploadfile [POST]
Traceback (most recent call last):
  File "/home/cloo/repl/production/doaj/local/lib/python2.7/site-packages/flask/app.py", line 1687, in wsgi_app
    response = self.full_dispatch_request()
  File "/home/cloo/repl/production/doaj/local/lib/python2.7/site-packages/flask/app.py", line 1360, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/home/cloo/repl/production/doaj/local/lib/python2.7/site-packages/flask_cors/extension.py", line 188, in wrapped_function
    return cors_after_request(app.make_response(f(*args, **kwargs)))
  File "/home/cloo/repl/production/doaj/local/lib/python2.7/site-packages/newrelic/hooks/framework_flask.py", line 108, in _nr_wrapper_Flask_handle_exception_
    return wrapped(*args, **kwargs)
  File "/home/cloo/repl/production/doaj/local/lib/python2.7/site-packages/flask/app.py", line 1358, in full_dispatch_request
    rv = self.dispatch_request()
  File "/home/cloo/repl/production/doaj/local/lib/python2.7/site-packages/flask/app.py", line 1344, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "/home/cloo/repl/production/doaj/local/lib/python2.7/site-packages/newrelic/hooks/framework_flask.py", line 45, in _nr_wrapper_handler_
    return wrapped(*args, **kwargs)
  File "/home/cloo/repl/production/doaj/local/lib/python2.7/site-packages/flask_login.py", line 515, in decorated_view
    return fn(*args, **kwargs)
  File "/home/cloo/repl/production/doaj/src/doaj/portality/decorators.py", line 68, in decorated_view
    return fn(*args, **kwargs)
  File "/home/cloo/repl/production/doaj/src/doaj/portality/decorators.py", line 95, in decorated_view
    return fn(*args, **kwargs)
  File "/home/cloo/repl/production/doaj/src/doaj/portality/view/publisher.py", line 99, in upload_file
    IngestArticlesBackgroundTask.submit(job)
  File "/home/cloo/repl/production/doaj/src/doaj/portality/tasks/ingestarticles.py", line 363, in submit
    ingest_articles.schedule(args=(background_job.id,), delay=10)
  File "/home/cloo/repl/production/doaj/local/lib/python2.7/site-packages/huey/api.py", line 105, in schedule
    return self.enqueue(cmd)
  File "/home/cloo/repl/production/doaj/local/lib/python2.7/site-packages/huey/api.py", line 217, in enqueue
    self._enqueue(registry.get_message_for_task(task))
  File "/home/cloo/repl/production/doaj/local/lib/python2.7/site-packages/huey/api.py", line 162, in inner
    wrap_exception(exc_class)
  File "/home/cloo/repl/production/doaj/local/lib/python2.7/site-packages/huey/utils.py", line 14, in wrap_exception
    raise new_exc_class('%s: %s' % (exc_class.__name__, exc))
QueueWriteException: KeyError: 'is_async'

I can't, however, find the corresponding files on the server, or errored background tasks in the list. This may mean that the upload completed successfully despite the error, or it failed, didn't report correctly and cleaned up after itself.

@ClaraDOAJ please can you request the file from the publisher?

Sure, I have asked them, let's see what they say.

Thanks!

1515657279_doaj.xml.zip
Hello, here you have one of the xmls.

Thanks for the file Clara. Unfortunately, I couldn't reproduce it in my dev environment.

This is proving difficult to track down. There's lots of those exceptions in the logs, so either it's intermittent or lots of people see the internal server error and are ignoring it. However, the key it's failing on,is_async, doesn't appear in our or the task scheduler's codebase (hooray for open source!). What I can do for now is catch the error so it doesn't disrupt the site and log it in the hope of understanding it fully down the line.

I have 3 tasks:

  • Catch and log the QueueWriteException in ``publisher.py``` hidden from the user.
  • Determine whether the uploads are succeeding regardless.
  • Reproduce and fix the problem, which could be in redis, huey, the DOAJ, or some dependency.

hello

I got an internal server message today at 8.34 am CET by user 23079266

The exception handling fix has been released now, the users shouldn't be seeing internal server errors from now on. There's more work to do to to diagnose the issue.