Dataset Structure — data • multiRL

Experimental data from any Multi-Armed Bandit (MAB)-like task.

Class

data [data.frame]

subid	block	trial	object_1	object_2	object_3	object_4	reward_1	reward_2	reward_3	reward_4	action
1	1	1	A	B	C	D	20	0	60	40	A
1	1	2	A	B	C	D	20	40	60	80	B
1	1	3	A	B	C	D	20	0	60	40	C
1	1	4	A	B	C	D	20	40	60	80	D
..	..	..	..	..	..	..	..	..	..	..	..

Details

Each row must contain all information relevant to that trial for running a decision-making task (e.g., multi-armed bandit) as well as the feedback received.

In this type of paradigm, the rewards associated with possible actions must be explicitly written in the table for every trial (aka, tabular case, see Sutton & Barto, 2018, Chapter 2).

Note

The package does not perform any real-time random sampling based on the agent’s choices; therefore, Users should pre-define the reward for each possible action in every trial.

You should never ever ever use true randomization to generate rewards.

Doing so would result in different participants interacting with multi-armed bandits that do not share the same expected values. In such cases, if two participants show different parameter estimates in a same model, we cannot determine whether the difference reflects stable individual traits or simply the fact that one participant happened to be lucky while the other was not.

References

Sutton, R. S., & Barto, A. G. (2018). Reinforcement Learning: An Introduction (2nd ed). MIT press.