Watts-Lab/surveyor

AWS MTurk Survey Site Load Issue

Opened this issue · 3 comments

Issue Description

  • May 23rd, 2022, 1500 people were sent 3 surveys. Soon after, survey respondants complained of app server downtime/unresponsive pages.
  • The associatedDocumentDB still logged their responses
  • Based on @sumants-dev 's initial analysis, it appears the problem was with DocumentDB in that it couldn’t handle so many survey results connections at once (1500*3)

Possible Solutions

  • Rate Management:
    • Establish ceilings for number of surveys sent in given stretch of time.
    • Force requests to occur slower than certain rate
    • how/where to encode (?)
  • DB Size: Add instances so that DocumentDB can better handle this.

Resources

  • DocumentDB configuration (?)
  • Code for deploying surveys (?)

TODO

  • Evaluate Rate Management solution
  • Evaluate DB scale solution
  • Document DocumentDB configuration/deployment
  • Document Survey Deployment code

Based on initial discussions it looks like the Rate Management might be a solution to start with. But ideally we need to space requests out across an hour or more - not just a few minutes

Question: How many requests can it currently handle?

Pasting Email from Sumant:

But I basically set up alerts for the mongo db when the db connections for to its limits of 30. So if you go to monitoring dashboard for the document db, the database connection limit is 30 for when there were server issues.

The database is the document db

The code base communicating with surveyor and the database is in surveyor. Check out databases folder for the connection in surveyor. I think the fix likely is to remove all the awaits for inserting into database and making it async (use .then syntax) Because what’s happening is the database only can do 30 connections at a time, typically the driver will keep retrying until it is submitted. This is fine. However, because the code is await, the user is waiting for the eventual submission to go through. This can be done in background asynchronously.

Here he references
a) database connection limit

Also: There was a discussion on our slack page about adding 'instances'. Not sure how that's different from 'database connection limit'. But that's something we can explore at our meeting tomorrow.
@rivera-lanasm can you comment on this?

@rivera-lanasm — I think we are done with this, right? Can it be closed?