Simulated Multi-Arm Bandit Dataset

A simulated multi-armed bandit (MAB) dataset featuring a complex stimulus-response structure. The set of four distinct stimuli (red, blue, yellow, green) is not isomorphic to the set of four available choices (up, down, left, right). Crucially, multiple stimuli may map to the same underlying choice (e.g., Red and Blue both map to 'Up'). This design requires the reinforcement learning model to learn the latent mapping from observable stimuli to the set of potential actions, making it a challenging test case for model fitting.

Format

A data frame with 9000 rows and 12 columns:

Subject

Subject ID, an integer ranging from 1 to 30.

Block

Block number, an integer ranging from 1 to 6.

Trial

Trial number within each block, an integer (1 to 50).

Object_1, Object_2, Object_3, Object_4

Stimulus-response combinations (string) for four objects, formatted as "Color_Direction" (e.g., "Red_Up"). Each column is independently balanced and shuffled.

Reward_1, Reward_2, Reward_3, Reward_4

Reward values for four choice arms (Decks), following the classic Iowa Gambling Task (IGT) structure with adjusted values.

Reward_1 (Bad): High gain (+100) with high frequency, mid-sized fine (-250). Long-term net loss.
Reward_2 (Bad): High gain (+100) with low frequency, large fine (-1250). Long-term net loss.
Reward_3 (Good): Low gain (+50) with high frequency, small fine (-50). Long-term net gain.
Reward_4 (Good): Low gain (+50) with low frequency, mid-sized fine (-250). Long-term net gain.

Rewards are balanced at the Block level.

Action

The simulated choice made by the subject on that trial (string), randomly sampled from "Up", "Down", "Left", or "Right".