Experimental data from any Multi-Armed Bandit (MAB)-like task.
Class
data [data.frame]
| subid | block | trial | object_1 | object_2 | object_3 | object_4 | reward_1 | reward_2 | reward_3 | reward_4 | action |
| 1 | 1 | 1 | A | B | C | D | 20 | 0 | 60 | 40 | A |
| 1 | 1 | 2 | A | B | C | D | 20 | 40 | 60 | 80 | B |
| 1 | 1 | 3 | A | B | C | D | 20 | 0 | 60 | 40 | C |
| 1 | 1 | 4 | A | B | C | D | 20 | 40 | 60 | 80 | D |
| .. | .. | .. | .. | .. | .. | .. | .. | .. | .. | .. | .. |
Details
Each row must contain all information relevant to that trial for running a decision-making task (e.g., multi-armed bandit) as well as the feedback received.
In this type of paradigm, the rewards associated with possible actions must be explicitly written in the table for every trial (aka, tabular case, see Sutton & Barto, 2018, Chapter 2).
Note
The package does not perform any real-time random sampling based on the agent’s choices; therefore, Users should pre-define the reward for each possible action in every trial.
You should never ever ever use true randomization to generate rewards.
Doing so would result in different participants interacting with multi-armed bandits that do not share the same expected values. In such cases, if two participants show different parameter estimates in a same model, we cannot determine whether the difference reflects stable individual traits or simply the fact that one participant happened to be lucky while the other was not.