Poor performance of LRS due to HTTP Basic Auth hashing

Question

Poor performance of LRS due to HTTP Basic Auth hashing

Closed this issue a year ago · 4 comments

Feature Request

Is your feature request related to a problem or unsupported use case? Please describe.

We are currently having poor performance when making many concurrent requests to Ralph LRS.
As seen here, when load testing Ralph LRS with 1000 concurrent users (each sending one request containing one xAPI statement) the total average response time skyrockets.

After adding a timing middleware, it shows that making a dummy request to Ralph LRS takes ~200ms, the majority of it (~180ms) spend hashing the HTTP Basic Auth password to check user credentials.

When building Ralph LRS, we chose to go with bcrypt for hashing and salting password. bcrypt seems to be the standard for HTTP Basic Auth. It is slow by design to prevent brute force attacks, but induces a large overhead for each request.

Describe the solution you'd like

An OpenId Connect authentication method is currently under development (#262), and it should greatly speed up each request, as it does not require to hash password to check credentials.

Describe alternatives you've considered

Another solution, still being discussed on our side, would be to propose different HTTP Basic Auth backends with different hashing method, so that developers can choose their own performance cost/security level ratio. It would also allow us to compare Ralph LRS to other open source LRSs in a fair way.

Answer 1 · 2023-05-05T17:23:46.000Z

Couldn't this be addressed by wrapping the existing bcrypt checks in a function and then putting an lru_cache around that function? It's getting the same (correct) user/password over and over again, right?

(Brute force attacks would still be slow because they'd all be cache misses.)

Answer 2 · 2023-05-09T08:54:25.000Z

Wouldn't this mean storing the password in plain text in the cache, which could be a potential security issue?

Answer 3 · 2023-05-09T14:54:37.000Z

It's @lru_cache, so it would only be stored in the memory of the Python process serving the requests and not some external cache that could be compromised. I suppose it does make it more vulnerable if someone can compromise the server and do a memory dump of the process... but if they got that far, they'd have the database credentials anyhow.

Answer 4 · 2023-05-10T09:41:51.000Z

That's a really good remark. I'll look into it, thank you for the idea!