NAICNO/Jobanalyzer

Basic DOS protections on the server

Opened this issue · 0 comments

Once data from the NRIS systems are online, there's a risk that carelessly constructed queries will generate data sets that will max out server memory: anything that asks for a month of data from 1300 nodes may easily break it. We need protection to bring the service up again (#337) but there should probably be safeguards in place against creating too large result sets. The server is not built for that.

Obvious things, not all of these are easy:

  • upper limit on the number of hosts in a query
  • upper limit on the number of records read in a query
  • those limits need not be the same for every type of query, esp if some types of queries can be smarter about how they use memory (profile is hard and could be required to be more focussed, jobs is easier)
  • memory budget per query
  • limitiation on number of queries in flight
  • nonlinear data structures can be used (trees or lists-of-chunks, not complete linear arrays)
  • filtering can be applied in a different order to winnow the data set as it is read, not after it's all been read

Less obvious:

  • more indices to be able to help focus the query more
  • data could be preprocessed in various ways so as to be smaller, more compact, whatever (eg a "jobs" database would sure take a lot of pressure off the sample store)

Some of these really get into #517, and it's only meaningful to see this ticket in connection with that.