Review logs that requires special treatment when calculating the memory states from review history
Closed this issue · 37 comments
Reviews of the same card in the same day
FSRS doesn't accept interval < 1, so we need to filter out those review logs with the same date and only keep the first one.
Reviews in filtered deck when reschedule cards based on my answers in this deck
is disabled.
These revlogs should be filtered out from review history.
These revlogs' review_kind
is REVLOG_CRAM
, so it's easy to filter out them.
Manual reset
There are two kinds of manual reset: Set Due Date
and Forget
. Their review_kind
is REVLOG_RESCHED
. But the fronter's ease (factor) is nonzero. The latter's ease (factor) is zero.
Set Due Date
These revlogs should be filtered out from review history, too. Because I think it's equal to CTRL+J (Reschedule repetition) in SuperMemo: https://supermemopedia.com/wiki/Ctrl%2BJ_vs._Ctrl%2BShift%2BR
Forget
We should remove the review logs in front of forget
and forget
itself. Then we need to treat the first review log after forget
as the first learning, whose rating will be used to initialize the memory state.
Incomplete revlogs
Due to some legacies, The manual reset were not recorded in the old version of Anki/AnkiDroid. So we need to deal with this case via another way. Normally, the previous revlog of a REVLOG_LRN
shouldn't be REVLOG_REV
or REVLOG_RELRN
. If so, we need to remove those revlogs. In a word, we should find out the the first REVLOG_LRN in the last successive REVLOG_LRNs
Besides, some users may use Transfer scheduling data from one card to another, which would generate more weird review history.
- open-spaced-repetition/fsrs4anki#136
- open-spaced-repetition/fsrs4anki-helper#49 (comment)
- open-spaced-repetition/fsrs4anki-helper#49 (comment)
- open-spaced-repetition/fsrs4anki#191 (comment)
- open-spaced-repetition/fsrs4anki-helper#56
We need to deal with it case by case. Another more convenient method is to skip these cards.
@user1823 and @Expertium, if I miss anything, please let me know.
I thought we were covering such cases in convert_to_fsrs_items() and the ported version in Anki. Is that not the case?
There are some difference between the filters in FSRS Optimizer (implemented in Pandas) and FSRS Helper (implemented in std lib). I cannot guarantee that convert_to_fsrs_items()
has handled all situations.
Well, perhaps we should :-) If we have one function that deals with all of this, we'll know our inputs are safe to feed into any of our API.
Yeah. And it's tricky, because I'm not very familiar with the legacies of the system of revlogs. I will collect some cases for test.
But I don't know the structure of input from Anki. Could you give me an example?
Yes. Is it possbile to output the review history of a card in its Rust structure? I can write an add-on to print the review logs of a card, but it's a list of object in Python rather than Rust.
There's partial info available via card stats (assuming card id 1):
mw.col.backend.card_stats(1).revlog
To see the full Rust structure, try the above after running Anki with the following patch:
diff --git a/rslib/src/stats/card.rs b/rslib/src/stats/card.rs
index d058c2233..7884a02c9 100644
--- a/rslib/src/stats/card.rs
+++ b/rslib/src/stats/card.rs
@@ -94,6 +94,7 @@ fn average_and_total_secs_strings(revlog: &[RevlogEntry]) -> (f32, f32) {
fn stats_revlog_entry(
entry: &RevlogEntry,
) -> anki_proto::stats::card_stats_response::StatsRevlogEntry {
+ dbg!(entry);
anki_proto::stats::card_stats_response::StatsRevlogEntry {
time: entry.id.as_secs().0,
review_kind: entry.review_kind.into(),
These exceptions might be better handled not in the fsrs-rs crate, but in the Anki crate, since they're Anki-specific. Anki's port of the conversion code is here: https://github.com/ankitects/anki/blob/3742fa9f0c3118de1c23bf27198975eea62e09bb/rslib/src/scheduler/fsrs/weights.rs#L98
I plan to write some temporary tests in fsrs-rs and remove them after you implement the conversion code in Anki.
That's fine if you find it easier, but it might be more efficient to add the tests directly to the anki crate if you're able to build it. Up to you though - either way works for me. :-)
You can refer to these tests.
Jarrett: There are some difference between the filters in FSRS Optimizer (implemented in Pandas) and FSRS Helper (implemented in std lib). I cannot guarantee that convert_to_fsrs_items() has handled all situations.
Damien: Well, perhaps we should :-) If we have one function that deals with all of this, we'll know our inputs are safe to feed into any of our API.
@dae, I think that there must be some differences between the optimizer and the re-scheduler. The reason:
- The output of the optimizer affects the scheduling of ALL the cards. So, we should remove the revlogs that can corrupt the final weights.
- The output of the re-scheduler only affects that particular card. So, we can make some clever approaches to ensure that most of the cards get rescheduled (and not skipped) and that too appropriately.
I made some comparisons of the approaches used in the optimizer and the rescheduler for dealing with the special cases. Here is the result of my analysis.
Reviews of the same card on the same day
Should be dealt similarly in the rescheduler and the optimizer - keep only the first review and filter out the rest
- Add-on code: https://github.com/open-spaced-repetition/fsrs4anki-helper/blob/a26bc62148f4c09db1c9dab508d8ea79413c0244/schedule/reschedule.py#L429-L430
- Optimizer code (this repo):
fsrs-rs/src/convertor_tests.rs
Lines 84 to 90 in 531566c
Reviews in filtered deck when reschedule cards based on my answers in this deck is disabled.
Should be dealt similarly in the rescheduler and the optimizer - Filter out these reviews
But, when reschedule cards based on my answers in this deck is enabled, these reviews should not be removed and treated as normal reviews.
- Add-on code: https://github.com/open-spaced-repetition/fsrs4anki-helper/blob/4ddd4be62ed8478499e43bf2cecddf8a0dcb2c43/utils.py#L271-L274
- Optimizer code (this repo): Couldn't find
Manual Scheduling
Should be dealt similarly in the rescheduler and the optimizer
- Forget card:
- Newer versions of Anki (manual revlog + Ease = 0): Remove all the reviews up to this rating
- Add-on code: https://github.com/open-spaced-repetition/fsrs4anki-helper/blob/a26bc62148f4c09db1c9dab508d8ea79413c0244/schedule/reschedule.py#L403-L413
- Optimizer code (this repo): Couldn't find specific code, but (partially) dealt with the following. Probably, specific code should be there to deal with cards "forgot" during a learning stage.
- Older versions of Anki (no revlog, LEARN ratings seen after REVIEW/RELEARN ratings): Remove all the reviews before the last continuous group of LEARN ratings
- Add-on code: https://github.com/open-spaced-repetition/fsrs4anki-helper/blob/a26bc62148f4c09db1c9dab508d8ea79413c0244/schedule/reschedule.py#L391-L401
- Optimizer code (this repo):
fsrs-rs/src/convertor_tests.rs
Lines 54 to 72 in 531566c
andfsrs-rs/src/convertor_tests.rs
Lines 100 to 116 in 531566c
I don't know the reason for this duplication.
- Newer versions of Anki (manual revlog + Ease = 0): Remove all the reviews up to this rating
- Set Due Date: Filter out this MANUAL rating
- Add-on code: https://github.com/open-spaced-repetition/fsrs4anki-helper/blob/a26bc62148f4c09db1c9dab508d8ea79413c0244/schedule/reschedule.py#L403-L413
Shouldn't this be review_kind instead of last_kind? - Optimizer code (this repo): Couldn't find
- Add-on code: https://github.com/open-spaced-repetition/fsrs4anki-helper/blob/a26bc62148f4c09db1c9dab508d8ea79413c0244/schedule/reschedule.py#L403-L413
First review type is not LEARN / Incomplete review history
Dealt differently in reschedule & optimize.
-
Optimizer: Remove all such reviews
fsrs-rs/src/convertor_tests.rs
Lines 74 to 78 in 531566c
-
Rescheduler: Reschedule the cards which were rated Again or Reset (forget) at some point in the "known" review history but skip (don't reschedule) the rest
https://github.com/open-spaced-repetition/fsrs4anki-helper/blob/a26bc62148f4c09db1c9dab508d8ea79413c0244/schedule/reschedule.py#L382-L387
https://github.com/open-spaced-repetition/fsrs4anki-helper/blob/4ddd4be62ed8478499e43bf2cecddf8a0dcb2c43/utils.py#L296-L315I think that the second condition could be removed if this code is moved further ahead because the reviews before the manual reset would have been removed till then.
Suspended cards
Dealt differently in reschedule and optimizer
- Rescheduler: Skip these cards
https://github.com/open-spaced-repetition/fsrs4anki-helper/blob/4ddd4be62ed8478499e43bf2cecddf8a0dcb2c43/schedule/reschedule.py#L322-L324 - Optimizer: It is for the user to decide.
fsrs-rs/src/convertor_tests.rs
Lines 178 to 182 in 531566c
- Optimizer code (this repo): Couldn't find
Here:
Lines 61 to 66 in 4df08dd
I have refactored them in this branch: https://github.com/open-spaced-repetition/fsrs-rs/blob/Test/deal-with-revlog-from-Anki/src/temporary_test.rs
Thanks, that's helpful. I've just pushed a draft of Anki integration you can play with, and I'll try return to this tomorrow.
I think we may be ok now?
- Suspended cards are excluded from review already, so that's not a problem. Users can optionally exclude them from training.
- "Reviews in filtered deck when reschedule cards based on my answers in this deck is disabled.": ignored by the optimizer, and when reviewing the standard Anki review code will be used, and memory state not updated.
- "First review type is not LEARN / Incomplete review history": I think this is already handled by the code that looks for the last learning entry.
- The other cases are already covered.
First review type is not LEARN / Incomplete review history
According to open-spaced-repetition/fsrs4anki-helper#16 (comment), @user1823 proposed that it is better to skip only those cards whose first review kind isn't learning AND which don't have any again rating.
For example:
Do you mean the starting point should be min(last_learning_idx, first_again_idx)? For both training and review?
I think it only works for review. Maybe @user1823 has more ideas. And I think it's OK to skip them both in training and review, because these cases are very rare.
The main reason this might occur is for long-term / heavy Anki users, who have removed the oldest few years of their review history to keep their collection size down & stay inside the AnkiWeb sync limits. Quite rare, but might be best to deal with.
Yeah. For these cases, we should treat the first relearning review as the first learning review.
With the current version of FSRS (python optimizer and add-on), the approach is to
- FILTER OUT ALL the reviews of such cards when optimizing
- KEEP ALL the reviews of such cards if they contain any Again rating when rescheduling
The rationale:
- The review history is incomplete and it can corrupt the trained weights. So, it is best to skip these cards while optimizing.
- When rescheduling, only these cards are affected. So, the risk of including them is not very high. But, there are two types of cards to consider:
-
Cards that have no Again rating in the "known" review history. If these cards are rescheduled, it can lead to problems like this:
The interval decreased from 1.04 years to just 2 days. So, we should not reschedule such cards.
-
Cards that have one or more Again ratings in their "known" review history.
- If these cards are rescheduled without filtering out any reviews, I think that the memory states obtained are reasonable (based on personal experience). The helper add-on currently uses this approach and we can use it in Anki's implementation too.
- If these cards are rescheduled after filtering out the reviews before the last rating, I am not sure what the final result would be. But, it might be reasonable as well.
-
You might wonder: What about the cards that don't have any Again rating in the known review history? Will FSRS never work for them?
The answer is simple. Eventually, these cards will be forgotten. Then, they will get an Again rating and FSRS will start working for them. The purpose of not rescheduling them before they get an Again rating is to prevent weird intervals.
.filter(|entry| entry.review_kind != Filtered)
To filter out just the CRAM revlogs, doesn't the condition need to be the following?
entry.review_kind != Filtered || entry.ease_factor != 0
entry.review_kind != Filtered || entry.ease_factor != 0
The main reason this might occur is for long-term / heavy Anki users, who have removed the oldest few years of their review history to keep their collection size down & stay inside the AnkiWeb sync limits. Quite rare, but might be best to deal with.
@dae, is this case (incomplete review history) dealt with in the latest beta of Anki?
I suggest using this approach to deal with such cards. But obviously, if you have a better idea to deal with them, it would be fine too.
I've changed the code in beta 2 so that a lack of learning steps no longer causes the memory state to be as if the card is new. If you have further suggestions, I'm open to them.
a lack of learning steps no longer causes the memory state to be as if the card is new
If all such cards are rescheduled, it can lead to problem like this (where the interval decreased from 1.04 years to just 2 days).
Edit: I have also verified that this problem exists in Beta 2.
But, we also don't want to treat all such cards as new because then FSRS wouldn't work for them.
So, I suggest re-reading this comment of mine, where I have discussed the issue in a greater detail.
Once Anki no longer calculates the memory states for such cards, we will also need to figure out how to deal with such cards when they come up for review.
I didn't face this problem till now because I do my reviews on AnkiDroid where FSRS is not supported and SM-2 schedules the cards instead. But when, FSRS is the only scheduler available, we will need to make a special function for such cards. Possible options:
- Use SM-2
- Use the convert_states function of FSRS Scheduler (js)
https://github.com/open-spaced-repetition/fsrs4anki/blob/5a34201f4d5b3a9cc1a5da4511b6bc32b7c6e909/fsrs4anki_scheduler.js#L228-L241
https://github.com/open-spaced-repetition/fsrs4anki/blob/5a34201f4d5b3a9cc1a5da4511b6bc32b7c6e909/fsrs4anki_scheduler.js#L133-L144
For a textual description of this function, refer to https://github.com/open-spaced-repetition/fsrs4anki/wiki/How-does-the-scheduler-work%3F#calculate-memory-states
Re https://forums.ankiweb.net/t/anki-23-10-beta/34912/38 and comments above:
During inference, I think we may need to produce a memory state if the card has any reps on it, as it will be surprising to the user otherwise. If the user has cards with a truncated review history, or cards they used 'set due date' on to introduce without going through the new queue, excluding cards without learning steps would mean such cards could never have a memory state assigned, and the initial stability/difficulty would be used at each review based on the current rating, so intervals would never grow.
We could change the criteria from "must have learning" to "must have Again", but that would mean the ratings would not grow above the initial intervals until the user rates such cards Again, which will be confusing and look like a bug in FSRS/Anki. Wouldn't it be better to just use whatever review came first? In the set-due-date case that would correctly reflect what would have happened if the user had gone through the new queue, and in the truncated revlog case, the initial stability may be wrong, but is hopefully shifted back towards actual by the subsequent reviews in the history.
Regarding writing a separate function instead, that would be a bit tricky, as the current code assumes a missing memory state when a card is not new means the memory state needs to be calculated from the revlog. I think it might be simpler to try and handle such cards in the same way other cards are handled (i.e. giving them an approximate memory state). WDYT?
giving them an approximate memory state
Here is a method to convert Anki's interval and ease into FSRS's stability and difficulty: https://github.com/open-spaced-repetition/fsrs4anki/wiki/How-does-the-scheduler-work%3F#interval-to-stability
fsrs-js didn't have access to the revlog history, which we do here - do you still feel converting from SM-2 ease is safer than trying to infer the memory state from incomplete logs? No preference on my end, just curious.
There are too many complex cases for incomplete logs. It's hard to deal with those corner cases one by one. So I prefer to convert memory state from SM-2 ease and interval.
There are too many complex cases for incomplete logs. It's hard to deal with those corner cases one by one. So I prefer to convert memory state from SM-2 ease and interval.
I completely agree. With this, we can also get rid of the check for the presence of an Again
rating in the known review history.
To summarise, we are going to deal with such cards in the following way:
- Check if the first rating is of
Learn
type or not. - If the first rating is of
Learn
type, no issue. In this case, Anki can calculate the memory states the normal way. - If the first rating is not of
Learn
type, use SM-2 ease factor and interval at the first rating to estimate the memory states. Then, on subsequent ratings of the same card, update the memory states using the normal FSRS formulas.
@L-M-Sherlock, what about using this approach in the add-on too (for Anki versions up to 2.1.66)?
Another question that needs to be answered:
Which interval is used in the following line of code? The interval before the first rating? Or the interval obtained after the first rating?
Line 88 in d389762
In the first case, we will obtain the memory states before the first rating, and we will have to calculate the memory states after the first rating using the regular FSRS formulas.
In the second case, we will obtain the memory states after the first rating, and we will have to calculate the memory states after the subsequent ratings using the regular FSRS formulas.
In my opinion, the first option is better because the memory states will tend to be more accurate (the earlier we can calculate the memory states, the more accurate the results will be) and it can deal with cards having no review history at the time of review (explained in the next para).
Consider the following card. I reviewed it today and it had no review history before today. Assume that I had started using native FSRS yesterday. Now, if Anki doesn't use the first approach (calculating memory states from interval and ease before the first rating), it won't be able to schedule this card.
However, I think that the interval before the first review will be available only during review (due to technical limitations). So, we might want to use a hybrid approach.
- In the scheduler, calculate the memory states from interval and ease before the rating if the card doesn't have any review history (but it is not new).
- In the re-scheduler, calculate the memory states from interval and ease after the first rating.
The memory state is derived when saving weights or when reviewing a card without state, so it should work as you expect.
Sorry for asking too many questions, but I want to ensure that the built-in FSRS works perfectly fine when it is released as stable. Also, because I am not comfortable in reading Rust code (especially when it is scattered across several files and repos), I can only rely on basic experimentation and such question/answers to ensure that things work fine.
For the above question, your answer was so brief that I couldn't make out what it exactly means. So, let me make the question more concise so that you can easily answer it in a way that clears my doubt.
In the above comment, I asked about a review card with no review history at the time of review. Can such cards be assigned memory states with the current code? If not, I proposed a solution in the same comment.
For an example, consider the following card. What will happen when it comes for review on 12th October?
Cards without any review history don't get updated by the deck options, but when it comes time to review, the memory state is first derived from the ease factor and interval, and then that state is changed by the rating the user provided.