[Question] How to choose "Days to simulate"?

I don't know what's the appropriate repo for asking, so I'll ask here.
As I mentioned in another issue, just by adjusting "Days to simulate", it's possible to make the simulator output any value of retention within the allowed range. This raises the question: how to choose the appropriate value of days to simulate? Right now I don't know whether 1 year is "better" (in some sense) than 10 years.
Also, I believe this deserves it's own entry in the wiki. I'm sure a lot of users would like to know the inner workings of the simulator.

I agree this deserves documentation, maybe a short one in a tooltip and then some more detailed one in the Anki manual.

The one thing I like about FSRS is that it has a model of how many cards you actually know (and would correctly recall) at any given point in time, so I assume the optimizer tries to choose a retention that maximizes that number in the future, and it depends whether it's the number of cards known you know after 1 year and 10 years. To some extent this choice is probably personal. Also the simulations probably assumes that depending on the retention you adapt the daily new-cards count to reach the study goal. I don't know any of this, but that would be my guess based on what seems to me would make sense. Anyway this is all not obvious, but with documentation it should be fine.

The one thing I like about FSRS is that it has a model of how many cards you actually know (and would correctly recall) at any given point in time, so I assume the optimizer tries to choose a retention that maximizes that number in the future

Sherlock said that the simulator/retention optimizer maximizes the sum of all retrievabilities. So it's better to have 12 cards at 80% than 10 cards at 90%, because 12⋅0.8>10⋅0.9. But, that also incurs additional time costs, and the cost cannot exceed the time limit per day (30 minutes by default). That much I know from Sherlock and from reading the code.
The problem is that even knowing all of that, it's still not clear whether I should choose 1 year or 5 years or whatever. Initially, I assumed that the value of optimal retention generated by the simulator converges to something (see chart below, it's roughly what I expected).

That assumption was wrong.

The problem is that even knowing all of that, it's still not clear whether I should choose 1 year or 5 years or whatever. Initially, I assumed that the value of optimal retention generated by the simulator converges to something (see chart below).

If you have an exam to prepare, the "Days to simulate" is the days before your deadline. If you are a language learner, 3-5 years would be fine. If you are a life-time learner, 10 years are the maximum values.

You can also use this simulator: https://huggingface.co/spaces/open-spaced-repetition/fsrs4anki_simulator

I think you should make a new wiki entry for the simulator, or maybe edit the README.

The problem is that even knowing all of that, it's still not clear whether I should choose 1 year or 5 years or whatever. Initially, I assumed that the value of optimal retention generated by the simulator converges to something (see chart below).

If you have an exam to prepare, the "Days to simulate" is the days before your deadline. If you are a language learner, 3-5 years would be fine. If you are a life-time learner, 10 years are the maximum values.

You can also use this simulator: https://huggingface.co/spaces/open-spaced-repetition/fsrs4anki_simulator

I am still having hard time using optimal retention finding mechanism in practice. My main concern are:

For a non-deadline learning, like language learning as a hobby, it is not clear what time period to choose. On my deck I can get any response in the 0.83-0.95 range, depending what period from a 3-10 years range I choose. I can choose that "to learn a language is a life-long adventure", choose 10 years and get 0.95 proposed - but my gut feeling is that with such high target retention I will have very large number of daily reviews and get hard time adding any new material.
For a deadline based learning (for example learning for the exam) it will rather produce correct result only at the beginning of the learning. As I understand simulator is making an assumption, that all cards are in the new state at the beginning of the period. But after some time already spend learning the deck some number of cards is already learned and mature - so the optimizer starts calculation for a distinctly different situation.

I will have very large number of daily reviews and get hard time adding any new material.

The simulator assumes that you spend the same time in Anki per day. You can set a review limit for that. The simulator also assume that you finish the review before you learn any new cards.

2. But after some time already spend learning the deck some number of cards is already learned and mature - so the optimizer starts calculation for a distinctly different situation.

Yes, the real world is very complex. I haven't figured out a method to deal with that.

The simulator assumes that you spend the same time in Anki per day. You can set a review limit for that. The simulator also assume that you finish the review before you learn any new cards.

That's what I am basically doing. I will try suggested higher retention for some time and will see how it goes.

Yes, the real world is very complex. I haven't figured out a method to deal with that.

I think I will put that quote in a frame over my desk.

Would it make sense to introduce some more variables to better model such situations? e.g. starting cards, ending cards, days to learn new cards, and days after all cards learnt?

One question, more of the curiosity: does the simulator takes learning and re-learning steps into consideration?
Depending on how many steps there are configured the real time spend on the failed card changes significantly. Also the number of distinct cards reviewed daily depends on this setting.

Yes, it has considered. It summed up all review duration in the relearning steps as the forgetting cost.

Would it make sense to introduce some more variables to better model such situations? e.g. starting cards, ending cards, days to learn new cards, and days after all cards learnt?

I think that ideally, the simulator should use real data from the preset. Deck size should be determined by the number of cards in the preset, rather than just being an arbitrary number. And if the user has already reviewed some of them, their memory states can be used in the simulator, that way the simulator won't assume that the user is starting to learn them from zero.

The current 'deck size' argument is how many cards the user expects to learn, which can not be derived from the existing collection. If the simulator supported a separate 'starting size' argument, that might make more sense.

I have an idea to improve the simulator. Here is the code to initialize the card table:

fsrs-rs/src/optimal_retention.rs

Lines 108 to 113 in 19e7af4

    
           let mut card_table = Array2::zeros((Column::COUNT, deck_size)); 
        
           card_table 
        
               .slice_mut(s![Column::Due, ..]) 
        
               .fill(learn_span as f64); 
        
           card_table.slice_mut(s![Column::Difficulty, ..]).fill(1e-10); 
        
           card_table.slice_mut(s![Column::Stability, ..]).fill(1e-10);

We can pass vectors of the existing cards' memory states and due into the simulator and initialize the card table with these data. It's similar to this add-on: Anki Simulator

Here is the related code: https://github.com/giovannihenriksen/Anki-Simulator/blob/1b41dd74f109faa9da18d43b8701f310399424ea/src/anki_simulator/collection_simulator.py#L80-L91

https://github.com/open-spaced-repetition/fsrs4anki/blob/main/docs/tutorial.md#step-5-optional-compute-optimal-retention

	let mut card_table = Array2::zeros((Column::COUNT, deck_size));
	card_table
	.slice_mut(s![Column::Due, ..])
	.fill(learn_span as f64);
	card_table.slice_mut(s![Column::Difficulty, ..]).fill(1e-10);
	card_table.slice_mut(s![Column::Stability, ..]).fill(1e-10);