This function is designed to fit the optimal parameters of black-box
functions (models) to real-world data. Provided that the black-box
function adheres to the specified interface
(demo:
TD
,
RSTD
,
Utility
)
, this function can employ the various optimization algorithms detailed
below to find the best- fitting parameters for your model.
The function provides several optimization algorithms:
1. L-BFGS-B (from
stats::optim
)2. Simulated Annealing (
GenSA::GenSA
)3. Genetic Algorithm (
GA::ga
)4. Differential Evolution (
DEoptim::DEoptim
)5. Particle Swarm Optimization (
pso::psoptim
)6. Bayesian Optimization (
mlrMBO::mbo
)7. Covariance Matrix Adapting Evolutionary Strategy (
cmaes::cma_es
)8. Nonlinear Optimization (
nloptr::nloptr
)
For more information, please refer to the homepage of this package: https://yuki-961004.github.io/binaryRL/
Usage
fit_p(
policy = "off",
estimate = "MLE",
data,
id = NULL,
n_trials = NULL,
funcs = NULL,
model_name = c("TD", "RSTD", "Utility"),
fit_model = list(binaryRL::TD, binaryRL::RSTD, binaryRL::Utility),
lower = list(c(0, 0), c(0, 0, 0), c(0, 0, 0)),
upper = list(c(1, 1), c(1, 1, 1), c(1, 1, 1)),
priors = NULL,
tolerance = 0.001,
iteration_i = 10,
iteration_g = 0,
initial_params = NA,
initial_size = 50,
seed = 123,
nc = 1,
algorithm
)
Arguments
- policy
[string]
Specifies the learning policy to be used. This determines how the model updates action values based on observed or simulated choices. It can be either
"off"
or"on"
.Off-Policy (Q-learning): This is the most common approach for modeling reinforcement learning in Two-Alternative Forced Choice (TAFC) tasks. In this mode, the model's goal is to learn the underlying value of each option by observing the human participant's behavior. It achieves this by consistently updating the value of the option that the human actually chose. The focus is on understanding the value representation that likely drove the participant's decisions.
Off-Policy (SARSA): In this mode, the target policy and the behavior policy are identical. The model first computes the selection probability for each option based on their current values. Critically, it then uses these probabilities to sample its own action. The value update is then performed on the action that the model itself selected. This approach focuses more on directly mimicking the stochastic choice patterns of the agent, rather than just learning the underlying values from a fixed sequence of actions.
default:
policy = "off"
- estimate
[string]
Estimation method. Can be either
"MLE"
or"MAP"
."MLE"
: (Default) Maximum Likelihood Estimation. This method finds the parameter values that maximize the log-likelihood of the data. A higher log-likelihood indicates that the parameters provide a better explanation for the observed human behavior. In other words, data simulated using these parameters would most closely resemble the actual human data. This method does not consider any prior information about the parameters."MAP"
: Maximum A Posteriori Estimation. This method finds the parameter values that maximize the posterior probability. It is an iterative process based on the Expectation-Maximization (EM) framework.Initialization: The process begins by assuming a uniform distribution as the prior for each parameter, making the initial log-prior zero. The first optimization is thus equivalent to MLE.
Iteration: After finding the best parameters for all subjects, the algorithm assesses the actual distribution of each parameter and fits a normal distribution to it. This fitted distribution becomes the new empirical prior.
Re-estimation: The parameters are then re-optimized to maximize the updated posterior probability.
Convergence: This cycle repeats until the posterior probability converges or the maximum number of iterations (specified by
iteration_g
) is reached.
priors
argument be specified to define the initial prior distributions.
default:
estimate = "MLE"
- data
[data.frame]
This data should include the following mandatory columns:
sub
"Subject"time_line
"Block" "Trial"L_choice
"L_choice"R_choice
"R_choice"L_reward
"L_reward"R_reward
"R_reward"sub_choose
"Sub_Choose"
- id
[CharacterVector]
A vector specifying the subject ID(s) for which parameters should be fitted. The function will process only the subjects provided in this vector.
To fit all subjects, you can either explicitly set the argument as
id = unique(data$Subject)
or leave it as the default (id = NULL
). Both approaches will direct the function to fit parameters for every unique subject in the dataset.It is strongly recommended to avoid using simple numeric sequences like
id = 1:4
. This practice can lead to errors if subject IDs are stored as strings (e.g., subject four is stored as "004") or are not sequentially numbered.default:
id = NULL
- n_trials
[integer]
Represents the total number of trials a single subject experienced in the experiment. If this parameter is kept at its default value of
NULL
, the program will automatically detect how many trials a subject experienced from the provided data. This information is primarily used for calculating model fit statistics such as AIC (Akaike Information Criterion) and BIC (Bayesian Information Criterion).default:
n_trials = NULL
- funcs
[CharacterVector]
A character vector containing the names of all user-defined functions required for the computation. When parallel computation is enabled (i.e.,
nc > 1
), user-defined models and their custom functions might not be automatically accessible within the parallel environment.Therefore, if you have created your own reinforcement learning model that modifies the package's default six default functions (default functions:
util_func = func_gamma
,rate_func = func_eta
,expl_func = func_epsilon
,bias_func = func_pi
,prob_func = func_tau
,loss_func = func_logl
), you must explicitly provide the names of your custom functions as a vector here.- model_name
[List]
The name of fit modals
e.g.
model_name = c("TD", "RSTD", "Utility")
- fit_model
[List]
A collection of functions applied to fit models to the data.
e.g.
fit_model = list(binaryRL::TD, binaryRL::RSTD, binaryRL::Utility)
- lower
[List]
The lower bounds for model fit models
e.g.
lower = list(c(0, 0), c(0, 0, 0), c(0, 0, 0))
- upper
[List]
The upper bounds for model fit models
e.g.
upper = list(c(1, 10), c(1, 1, 10), c(1, 1, 10))
- priors
[List]
A list specifying the prior distributions for the model parameters. This argument is mandatory when using
estimate = "MAP"
. There are two primary scenarios for its use:1. Static MAP Estimation (Non-Hierarchical) This approach is used when you have a strong, pre-defined belief about the parameter priors and do not want the model to update them iteratively.
- Configuration:
Set
estimate = "MAP"
.Provide a
list
defining your confident prior distributions.Keep
iteration_g = 0
(the default).
- Behavior:
The algorithm maximizes the posterior probability based solely on your specified priors. It will not use the EM (Expectation-Maximization) framework to learn new priors from the data.
2. Hierarchical Bayesian Estimation via EM This approach is used to let the model learn the group-level (hierarchical) prior distributions directly from the data.
- Configuration:
Set
estimate = "MAP"
.Specify a weak or non-informative initial prior, such as a uniform distribution for all parameters.
Set
iteration_g
to a value greater than 0.
- Behavior:
With a uniform prior, the initial log-posterior equals the log-likelihood, making the first estimation step equivalent to MLE. The algorithm then initiates the EM procedure: it iteratively assesses the actual parameter distribution across all subjects and updates the group-level priors. This cycle continues until the posterior converges or
iteration_g
is reached.
default:
priors = NULL
- tolerance
[double]
Convergence threshold for MAP estimation. If the change in log posterior probability between iterations is smaller than this value, the algorithm is considered to have converged and the program will stop.
default:
tolerance = 0.001
- iteration_i
[integer]
The number of iterations the optimization algorithm will perform when searching for the best-fitting parameters during the fitting phase. A higher number of iterations may increase the likelihood of finding a global optimum but also increases computation time.
default:
iteration_i = 10
.- iteration_g
[integer]
The maximum number of iterations for the Expectation-Maximization (EM) based MAP estimation. The algorithm will stop once this iteration count is reached, even if the change in the log-posterior value has not yet fallen below the
tolerance
threshold.default:
iteration_g = 0
.- initial_params
[NumericVector]
Initial values for the free parameters that the optimization algorithm will search from. These are primarily relevant when using algorithms that require an explicit starting point, such as
L-BFGS-B
. If not specified, the function will automatically generate initial values close to zero.default:
initial_params = NA
.- initial_size
[integer]
This parameter corresponds to the population size in genetic algorithms (
GA
). It specifies the number of initial candidate solutions that the algorithm starts with for its evolutionary search. This parameter is only required for optimization algorithms that operate on a population, such asGA
orDEoptim
.default:
initial_size = 50
.- seed
[integer]
Random seed. This ensures that the results are reproducible and remain the same each time the function is run.
default:
seed = 123
- nc
[integer]
Number of cores to use for parallel processing. Since fitting optimal parameters for each subject is an independent task, parallel computation can significantly speed up the fitting process:
nc = 1: The fitting proceeds sequentially. Parameters for one subject are fitted completely before moving to the next subject.
nc > 1: The fitting is performed in parallel across subjects. For example, if
nc = 4
, the algorithm will simultaneously fit data for four subjects. Once these are complete, it will proceed to fit the next batch of subjects (e.g., subjects 5-8), and so on, until all subjects are processed.
default: nc = 1
- algorithm
[string]
Choose an algorithm package from
L-BFGS-B
,GenSA
,GA
,DEoptim
,PSO
,Bayesian
,CMA-ES
.In addition, any algorithm from the
nloptr
package is also supported. If your chosennloptr
algorithm requires a local search, you need to input a character vector. The first element represents the algorithm used for global search, and the second element represents the algorithm used for local search.
Value
The optimal parameters found by the algorithm for each subject,
along with the model fit calculated using these parameters.
This is returned as an object of class binaryRL
containing results
for all subjects with all models.
Note
While both fit_p
and rcv_d
utilize the same underlying
optimize_para
function to find optimal parameters, they play
distinct and sequential roles in the modeling pipeline.
The key differences are as follows:
Purpose and Data Source:
rcv_d
should always be performed beforefit_p
. Its primary role is to validate a model's stability by fitting it to synthetic data generated by the model itself. This process, known as parameter recovery, ensures the model is well-behaved. In contrast,fit_p
is used in the subsequent stage to fit the validated model to real experimental data.Estimation Method:
rcv_d
does not include anestimate
argument. This is because the synthetic data is generated from known "true" parameters, which are drawn from pre-defined distributions (typically uniform for most parameters and exponential for the inverse temperature). Since the ground truth is known, a hierarchical estimation (MAP) is not applicable. Thefit_p
function, however, requires this argument to handle real data where the true parameters are unknown.Policy Setting: In
fit_p
, thepolicy
setting has different effects: "on-policy" is better for learning choice patterns, while "off-policy" yields more accurate parameter estimates. Forrcv_d
, the process defaults to an "off-policy" approach because its main objectives are to verify if the true parameters can be accurately recovered and to assess whether competing models are distinguishable, tasks for which off-policy estimation is more suitable.
Examples
if (FALSE) { # \dontrun{
comparison <- binaryRL::fit_p(
data = binaryRL::Mason_2024_G2,
id = unique(binaryRL::Mason_2024_G2$Subject),
#+-----------------------------------------------------------------------------+#
#|----------------------------- black-box function ----------------------------|#
#funcs = c("your_funcs"),
policy = c("off", "on"),
fit_model = list(binaryRL::TD, binaryRL::RSTD, binaryRL::Utility),
model_name = c("TD", "RSTD", "Utility"),
#|--------------------------------- estimate ----------------------------------|#
estimate = c("MLE", "MAP"),
#|------------------------------------ MLE ------------------------------------|#
lower = list(c(0, 0), c(0, 0, 0), c(0, 0, 0)),
upper = list(c(1, 10), c(1, 1, 10), c(1, 1, 10)),
#|------------------------------------ MAP ------------------------------------|#
priors = list(
list(
eta = function(x) {stats::dunif(x, min = 0, max = 1, log = TRUE)},
tau = function(x) {stats::dexp(x, rate = 1, log = TRUE)}
),
list(
eta = function(x) {stats::dunif(x, min = 0, max = 1, log = TRUE)},
eta = function(x) {stats::dunif(x, min = 0, max = 1, log = TRUE)},
tau = function(x) {stats::dexp(x, rate = 1, log = TRUE)}
),
list(
eta = function(x) {stats::dunif(x, min = 0, max = 1, log = TRUE)},
gamma = function(x) {stats::dunif(x, min = 0, max = 1, log = TRUE)},
tau = function(x) {stats::dexp(x, rate = 1, log = TRUE)}
)
),
#|----------------------------- iteration number ------------------------------|#
iteration_i = 10,
iteration_g = 10,
#|-------------------------------- algorithms ---------------------------------|#
nc = 1, # <nc > 1>: parallel computation across subjects
# Base R Optimization
algorithm = "L-BFGS-B" # Gradient-Based (stats)
#|-----------------------------------------------------------------------------|#
# Specialized External Optimization
#algorithm = "GenSA" # Simulated Annealing (GenSA)
#algorithm = "GA" # Genetic Algorithm (GA)
#algorithm = "DEoptim" # Differential Evolution (DEoptim)
#algorithm = "PSO" # Particle Swarm Optimization (pso)
#algorithm = "Bayesian" # Bayesian Optimization (mlrMBO)
#algorithm = "CMA-ES" # Covariance Matrix Adapting (cmaes)
#|-----------------------------------------------------------------------------|#
# Optimization Library (nloptr)
#algorithm = c("NLOPT_GN_MLSL", "NLOPT_LN_BOBYQA")
#|-------------------------------- algorithms ---------------------------------|#
#################################################################################
)
result <- dplyr::bind_rows(comparison)
# Ensure the output directory exists before writing
if (!dir.exists("../OUTPUT")) {
dir.create("../OUTPUT", recursive = TRUE)
}
write.csv(result, "../OUTPUT/result_comparison.csv", row.names = FALSE)
} # }