Skip to contents

Experimental data from any Multi-Armed Bandit (MAB)-like task.

Class

data [data.frame]

subidblocktrialobject_1object_2object_3object_4reward_1reward_2reward_3reward_4action
111ABCD2006040A
112ABCD20406080B
113ABCD2006040C
114ABCD20406080D
........................

Details

Each row must contain all information relevant to that trial for running a decision-making task (e.g., multi-armed bandit) as well as the feedback received.

In this type of paradigm, the rewards associated with possible actions must be explicitly written in the table for every trial (aka, tabular case, see Sutton & Barto, 2018, Chapter 2).

Note

The package does not perform any real-time random sampling based on the agent’s choices; therefore, Users should pre-define the reward for each possible action in every trial.

You should never ever ever use true randomization to generate rewards.

Doing so would result in different participants interacting with multi-armed bandits that do not share the same expected values. In such cases, if two participants show different parameter estimates in a same model, we cannot determine whether the difference reflects stable individual traits or simply the fact that one participant happened to be lucky while the other was not.

References

Sutton, R. S., & Barto, A. G. (2018). Reinforcement Learning: An Introduction (2nd ed). MIT press.