In most instances of the Multi-Armed Bandit (MAB) task, the cue aligns with the response. For example, you are required to select one of four bandits (A, B, C, or D), receive immediate feedback, and subsequently update the expected value of the selected bandit.
When the cue and the response are inconsistent, the agent needs to form a latent rule. For example, in the arrow paradigm of Rmus et al. (2024) doi:10.1371/journal.pcbi.1012119 , participants can only choose left or right, but what they actually need to learn is the value associated with arrows of different colors.
The final case represents my personal interpretation, when participants have limited working-memory capacity and an object can be decomposed into many elements, they may update the values of only a subset of those elements rather than the entire object.
Slots
cue [CharacterVector]A
cuerefers to the stimulus-or a component of the stimulus-presented in the paradigm. It represents the internal target the agent selects, which may differ from the actual behavioral response. For instance, cue is the color of arrows, rather than the direction.mid [CharacterVector]The
midrepresents user-defined internal variables generated by the model during the MDP process. It accepts a character vector of arbitrary length, where each element corresponds to a named intermediate (latent) variable.These variables are not external inputs, but are created, modified, and passed along internally as the model executes each function. Each function in the MDP pipeline may read from or write to
mid, enabling flexible information flow.Through this interface, users can implement custom intermediate states, track hidden dynamics, and exert fine-grained control over the behavior of the MDP process.
rsp [CharacterVector]The
rsprepresents the action the agent actually makes. It typically has a mapping relationship with the cue. For example, in the arrow paradigm of Rmus et al. (2024) doi:10.1371/journal.pcbi.1012119 , the agent updates the value associated with the arrow's color, but the overt response is the direction corresponding to the currently chosen color arrow.
References
Rmus, M., Pan, T. F., Xia, L., & Collins, A. G. (2024). Artificial neural networks for model identification and parameter estimation in computational cognitive models. PLOS Computational Biology, 20(5), e1012119. doi:10.1371/journal.pcbi.1012119