Overview
This package is designed to help users build the Rescorla-Wagner Model for Two-Alternative Forced Choice tasks (e.g. multi-armed bandit). Beginners can define models using simple if-else
logic, making model construction more accessible.
How to cite
YuKi. (2025). binaryRL: Reinforcement Learning Tools for Two-Alternative Forced Choice Tasks. R package version 0.9.0. https://CRAN.R-project.org/package=binaryRL
Hu, M., & Liu, Z. (2025). binaryRL: A Package for Building Reinforcement Learning Models in R. Journal(7), 100-123. https://doi.org/
Installation
# Install the stable version from CRAN
install.packages("binaryRL")
# Install the latest version from GitHub
remotes::install_github("yuki-961004/binaryRL@*release")
# Load package
library(binaryRL)
# Obtain help document
?binaryRL
╔═════════════════════════╗ ║ ╔----------╗ ║ ║ | ██████╗ | ██╗ ║ | _) ║ | ██╔══██╗ | ██║ ║ __ \ | __ \ _` | __| | | ║ | ██████╔╝ | ██║ ║ | | | | | ( | | | | ║ | ██╔══██╗ | ██║ ║ _.__/ _| _| _| \__,_| _| \__, | ║ | ██║ ██║ | ███████╗ ║ ____/ ║ | ╚═╝ ╚═╝ | ╚══════╝ ║ ║ ╚----------╝ ║ ╚═════════════════════════╝
Tutorial
In tasks with small, finite state sets (e.g. TAFC tasks in psychology), all states, actions, and their corresponding rewards could be recorded in tables.
- Sutton & Barto (2018) call this kind of scenario as the tabular case and the corresponding methods as tabular methods.
- The development and usage workflow of this R package adheres to the four stages (ten rules) recommended by Wilson & Collins (2019).
- The three basic models built into this R package are referenced from Niv et al. (2012).
- The example data used in this R package is an open data from Mason et. al. (2024)
Reference
Sutton, R. S., & Barto, A. G. (2018). Reinforcement Learning: An Introduction (2nd ed). MIT press.
Wilson, R. C., & Collins, A. G. (2019). Ten simple rules for the computational modeling of behavioral data. Elife, 8, e49547. https://doi.org/10.7554/eLife.49547
Niv, Y., Edlund, J. A., Dayan, P., & O’Doherty, J. P. (2012). Neural prediction errors reveal a risk-sensitive reinforcement-learning process in the human brain. Journal of Neuroscience, 32(2), 551-562. https://doi.org/10.1523/JNEUROSCI.5498-10.2012
Mason, A., Ludvig, E. A., Spetch, M. L., & Madan, C. R. (2024). Rare and extreme outcomes in risky choice. Psychonomic Bulletin & Review, 31(3), 1301-1308. https://doi.org/10.3758/s13423-023-02415-x
Example Data
head(binaryRL::Mason_2024_G2)
Subject | Block | Trial | L_choice | R_choice | L_reward | R_reward | Sub_Choose | - |
---|---|---|---|---|---|---|---|---|
1 | 1 | 1 | A | B | 36 | 40 | A | … |
1 | 1 | 2 | B | A | 0 | 36 | B | … |
1 | 1 | 3 | C | D | -36 | -40 | C | … |
1 | 1 | 4 | D | C | 0 | -36 | D | … |
… | … | … | … | … | … | … | … | … |
. . . ;. .; ;;. ;.;; ;;;;. ;;;;; ;;;;; ..;;;;;... ':::::' ':`
Example Result
binaryRL::run_m(
mode = "replay",
data = binaryRL::Mason_2024_G2,
id = 1,
eta = 0.5, tau = 0.5,
n_params = 2, n_trials = 360
)
A | B | C | D | - | L_porb | R_prob | - | Rob_Choose | - | Reward | - | ACC | - |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
36 | 0 | 0 | 0 | … | 0.50 | 0.50 | … | A | … | 36 | … | 1 | … |
36 | 40 | 0 | 0 | … | 0.50 | 0.50 | … | B | … | 40 | … | 1 | … |
36 | 40 | 0 | -40 | … | 0.50 | 0.50 | … | D | … | -40 | … | 0 | … |
36 | 40 | -36 | -40 | … | 0.50 | 0.50 | … | C | … | -36 | … | 0 | … |
… | … | … | … | … | … | … | … | … | … | … | … | … | … |
Parallel Data Fitting
Maximum Likelihood Estimation (MLE)
While this R package is primarily designed for constructing Reinforcement Learning (RL) models (with run_m()
at its core), its flexibility extends further.
The key functions, rcv_d()
and fit_p()
, provide a unified interface to seamlessly integrate a diverse range of optimization algorithms. Crucially, they offer a parallel solution for tasks like parameter optimization, parameter recovery, and model recovery.
This means you can leverage this package not only for building and fitting RL models, but also as a versatile algorithm library for fitting other “black-box functions” in parallel for each subject. This significantly reduces processing time, provided your function’s parameters can be optimized independently for each subject.
Base R Optimization
- L-BFGS-B (from stats::optim
)
Specialized External Optimization
- Simulated Annealing (GenSA::GenSA
)
- Genetic Algorithm (GA::ga
)
- Differential Evolution (DEoptim::DEoptim
).
- Particle Swarm Optimization (pso::psoptim
)
- Bayesian Optimization (mlrMBO::mbo
)
- Covariance Matrix Adapting Evolutionary Strategy (cmaes::cma_es
)
Optimization Library
- Nonlinear Optimization (nloptr::nloptr
)
NOTE:
1. If you want to use an algorithm other than L-BFGS-B
, you’ll need to install its corresponding R package.
2. This package supports parallel computation. When you set the nc
argument in rcv_d()
or fit_p()
to a value greater than 1, calculations will run in parallel, meaning each participant’s parameter optimization happens simultaneously.
3. If you’ve defined a custom model, you must provide the names of your custom functions as a character vector to the funcs
argument within rcv_d()
or fit_p()
.
Maximum A Posteriori (MAP)
For more robust parameter estimates, the package supports Maximum A Posteriori (MAP) estimation via an EM-like algorithm (adapted from mfit). This approach leverages the entire group’s data to inform and regularize individual-level fits.
M-Step (Update Priors): Find the optimal parameter values for each subject individually and calculate the log-posterior using the prior distributions.
E-Step (Update Posterior): Update the prior distributions based on the optimal parameters obtained from the M-step, then repeat the M-step iteratively.
Note:
1. To enable MAP estimation, specify estimate = "MAP"
in the fit_p()
function and provide a prior distribution for each free parameter.
2. The fitting process forces a Normal distribution on all parameters except for the inverse temperature, which is given an Exponential prior. This may not always be appropriate.
Markov Chain Monte Carlo (MCMC)
For a full Bayesian analysis, you can perform Markov Chain Monte Carlo (MCMC) to characterize the entire posterior distribution, capturing a complete picture of parameter uncertainty.
-
LaplacesDemon
provides a convenient interface for performing MCMC on any black-box function. If you userstan
, you would need to rewrite the entire markov decision process. The core functions ofbinaryRL
are implemented inRcpp
, which ensures that the package remains flexible and easy-to-use while running very efficiently. We provide an example code.
Note:
1. With a small number of iterations, the results may be less accurate compared to standard MLE algorithms.