BRSMProjectRepo

The paper introduces a novel optimization task that captures the trade-off between pursuing an option vs. permanently abandoning it in the hope of finding a better option in the future. To capture this pursue vs quit problem, the paper presents a task by combining two well-studied problems – the multi-armed bandit problem and the secretary problem. The multi-armed bandit problem involves choosing between options with unknown reward rates to maximize overall reward over a fixed number of trials. It's widely used in psychology and neuroscience to study decision-making under uncertainty. However, traditional versions of the problem assume that options are always available for selection, which limits the study of persistence and quitting behaviors. The secretary problem involves sequentially selecting the best option without the ability to return to previously rejected options. It highlights the trade-off between accepting a potentially sub-optimal option early or waiting for a better one. However, traditional versions of the problem assume fixed rewards for each option, ignoring the stochastic nature of real-life decisions. This limitation makes it less suitable for studying persistence and quitting behaviors influenced by variability in rewards over time.

Now, the "pursue vs. quit" task introduced in the paper merges the stochastic nature of multi-armed bandit problems with the "no return" structure of the secretary problem. In this task, a decision-maker faces a series of sequentially presented options, aiming to maximize total rewards over their lifetime. Each option offers rewards with initially unknown probabilities, requiring exploration to learn about their potential benefits. Once an option is abandoned, it cannot be revisited, and there's a finite number of trials to explore all options. This setup forces the decision-maker to balance between persisting with current options or abandoning them in pursuit of potentially better ones. Similar to the secretary problem, decisions are made in an uncertain environment where option quality is inferred through exploration, enabling a systematic examination of persistence versus quitting trade-offs.

The paper also presented an Optimal Strategy and generated a dataset on how a rational agent at optimisation would make choices. The optimal strategy for the "pursue vs. quit" task involves three key features:

Always persist with a great option: If an option has only resulted in successes so far, the optimal strategy advises continuing with that option without abandoning it.
Always abandon a bad option immediately: If an option has an equal or greater number of failures compared to successes, the optimal strategy recommends quitting that option immediately.
Consider time remaining when evaluating good options: For options with more successes than failures but not exceptional performance, the decision to continue or quit depends on the number of remaining trials. If there are many trials left, the strategy suggests continuing, but if few trials remain, quitting is optimal. This decision is guided by a threshold τ, which varies based on the option's performance.

naimeeshgit/BRSMProjectRepo

BRSMProjectRepo