Improve problem difficulty calculation
Opened this issue · 12 comments
Shouldn't be too difficult now that we have the problem_solved
table.
Not sure how we should get this to the client though. We could calculate it as part of an /api/problems
request, though that could get very slow. We could also tag the value as a new column on the problems
table (and only update it when a user solved that problem for the first time), though that would require a database change (which is gross).
I actually think that we should remove the "Solved by" section from the problem page. Instead, we should replace it with two parts.
- The difficulty for problems should be dynamically calculated based off of the data that we have available to ourselves.
- There should be some sort of activity page where we can see what problems users have solved lately and other relevant activity information.
I say this because it seems to me that the whole purpose of the "Solved by" column was for one of these two objectives, but it wasn't very good at either of them.
- Agreed. I was actually think about dropping "Difficulty" since it's pretty arbitrary, but if we can come up with some better way to calculate it, I'm all for that and ditching "Solved By".
- As for the activity page, what all do you think this would entail? A "Recent Submission" of all users would be pretty easy with the api as-is. Would the ranklist page go well with that.
To extend the idea underlying (2), I would love to create a Profile page for users, that lists problems they've solved, results in competitions, etc. If that's not an open issue, it should be.
- The current idea I have for fixing the difficulty issue is based off of what I heard someone mention that Kattis does, though I'm not sure if it's actually what they do. For starters, I'd like to create an event-loop that can be used for more than just this, but it certainly makes sense here. We could post something to the event loop that executes the difficulty calculator every so often (every hour?) that calculates all of the problems' difficulties based off of users' submissions and stores them in
problem_data
. Alternatively, we could spawn a thread to do this every time someone made a submission and only update the problem that the submission was for, though I worry about performance impacts during competitions if we do it that way. - I think a "Recent Submission" list would make sense to add to the ranklist page (or maybe via a dropdown in the navbar?).
I really like the idea of a profile page for users. I don't think it's currently an issue, but it should be. That would actually allow for what we are looking for here.
- The event loop sounds interesting. My only hesitation is that it seems to disobey KISS. That being said, it would not have to actually be baked into the app: it could run as a separate process that happens to be kicked off by
run.py
. It sounds like it would just be data crunching, so it would only have to access the database. - I don't know how it would flesh out in the UI, but an easy start would be to limit the ranklist to the top 15-25 users. We could probably pull some fancy Bootstrap/Angular stops, though.
- Yeah, we'll have to play around with some stuff to find what we like best. I do like the site being "stateless" right now, but I think that in the future, we'll likely have to move away from that. I can't cite why.
- https://angular-ui.github.io/bootstrap/#/pagination
I'm not sure what you mean by stateless, and I don't think adding a statistics daemon would break that. It would not be event-driven, if that's what you mean.
Before we let this go: Do you have any idea how the difficulty of a problem should be asserted? We could use and ELO based system like Kattis, though I'm not crazy about it, and we'd have to keep up with how many time a problem gets submitted to.
I guess you're right that this wouldn't add any state to the site. I'm not sure why I was thinking that.
I'm not sure how we should assess difficulty; I don't think that the Elo rating system would be bad. We actually do have data about how many times a problem gets submitted to, the entire submits
table. The thing we would have to do is count the good
, bad
, etc. submissions.
My reservation to the Elo is that our sample size seems really small (to be fair, I haven't read into Elo that much). A problem like Beautiful Mountains could be skewed because the only people person who would submit to it would likely get it right. That would give it an incorrectly low rating. Am I wrong?
On the other hand, a system that weights the number of users who have solved would skew new problems as being much more difficult. Sigh.
If we start counting good
and bad
submissions, I advocate that we discount good
submissions from a user who had already solved a problem.
That makes sense to me. Maybe we can take elements from that and devise our own. It's a difficult task, but I think that it's entirely doable.
I agree about discounting multiple good
submissions. It doesn't make sense to make a problem look easy if someone submits 100 correct submissions to Beautiful Mountains.
So some considerations when calculating the difficulty of a problem (in no particular order):
- Ranking of the user(s) who have solved the problem
- (Average) Number of incorrect submissions before first correct submission
- Number of users who have solved the problem
- ??? Percentage of original competition participants who did not solve the problem (original system)
The difficulty with the old system is that it relies on data that may not always be available (or, TBH, convenient). If it gets used, it should be optional and weighed very little.
Did I miss anything?
I don't think you missed anything. I actually want to just drop the rating from the old system. It was a good metric to start with, but I don't think it's an extremely good indicator. It's not awful, I just don't think it's amazing. It's also just kind of annoying to compute.
No complaints here. I guess a follow-up would be: Do we want the new ratings to be how rankings are determined? Right now they are solely determined by number of problems solved. As our problem repertoire grows it may not be such a fair indicator.
I say yes, but maybe not initially. The switch shouldn't be hard after this gets sorted out.