Skip to contents

This function is designed to fit the optimal parameters of black-box functions (models) to real-world data. Provided that the black-box function adheres to the specified interface (demo: TD, RSTD, Utility ) , this function can employ the various optimization algorithms detailed below to find the best- fitting parameters for your model.

The function provides several optimization algorithms:

For more information, please refer to the homepage of this package: https://yuki-961004.github.io/binaryRL/

Usage

fit_p(
  data,
  id = NULL,
  n_trials = NULL,
  fit_model = list(TD, RSTD, Utility),
  funcs = NULL,
  model_name = c("TD", "RSTD", "Utility"),
  lower = list(c(0, 0), c(0, 0, 0), c(0, 0, 0)),
  upper = list(c(1, 1), c(1, 1, 1), c(1, 1, 1)),
  initial_params = NA,
  initial_size = 50,
  iteration = 10,
  seed = 123,
  nc = 1,
  algorithm
)

Arguments

data

[data.frame] This data should include the following mandatory columns:

  • "sub"

  • "time_line" (e.g., "Block", "Trial")

  • "L_choice"

  • "R_choice"

  • "L_reward"

  • "R_reward"

  • "sub_choose"

id

[vector] Specifies the subject ID(s) for whom optimal parameters are to be fitted. If you intend to fit all subjects within your dataset, it is strongly recommended to use id = unique(data$Subject). This approach accounts for cases where subject IDs in the dataset may not be simple numeric sequences (e.g., "1", "2", "3", "4") or may contain string entries (e.g., "1", "2", "3", "004"). Using id = 1:4 could lead to errors if IDs are not sequentially numbered integers.

default: id = NULL

n_trials

[integer] Represents the total number of trials a single subject experienced in the experiment. If this parameter is kept at its default value of `NULL`, the program will automatically detect how many trials a subject experienced from the provided data. This information is primarily used for calculating model fit statistics such as AIC (Akaike Information Criterion) and BIC (Bayesian Information Criterion).

default: n_trials = NULL

fit_model

[list] A collection of functions applied to fit models to the data.

funcs

[character] A character vector containing the names of all user-defined functions required for the computation. When parallel computation is enabled (i.e., `nc > 1`), user-defined models and their custom functions might not be automatically accessible within the parallel environment.

Therefore, if you have created your own reinforcement learning model that modifies the package's default four default functions (default functions: util_func = func_gamma, rate_func = func_eta, expl_func = func_epsilon bias_func = func_pi prob_func = func_tau ), you must explicitly provide the names of your custom functions as a vector here.

model_name

[list] The name of fit modals

lower

[list] The lower bounds for model fit models

upper

[list] The upper bounds for model fit models

initial_params

[vector] Initial values for the free parameters that the optimization algorithm will search from. These are primarily relevant when using algorithms that require an explicit starting point, such as L-BFGS-B. If not specified, the function will automatically generate initial values close to zero.

default: initial_params = NA.

initial_size

[integer] This parameter corresponds to the population size in genetic algorithms (GA). It specifies the number of initial candidate solutions that the algorithm starts with for its evolutionary search. This parameter is only required for optimization algorithms that operate on a population, such as `GA` or `DEoptim`.

default: initial_size = 50.

iteration

[integer] The number of iterations the optimization algorithm will perform when searching for the best-fitting parameters during the fitting phase. A higher number of iterations may increase the likelihood of finding a global optimum but also increases computation time.

seed

[integer] Random seed. This ensures that the results are reproducible and remain the same each time the function is run.

default: seed = 123

nc

[integer] Number of cores to use for parallel processing. Since fitting optimal parameters for each subject is an independent task, parallel computation can significantly speed up the fitting process:

  • `nc = 1`: The fitting proceeds sequentially. Parameters for one subject are fitted completely before moving to the next subject.

  • `nc > 1`: The fitting is performed in parallel across subjects. For example, if `nc = 4`, the algorithm will simultaneously fit data for four subjects. Once these are complete, it will proceed to fit the next batch of subjects (e.g., subjects 5-8), and so on, until all subjects are processed.

default: nc = 1

algorithm

[character] Choose an algorithm package from `L-BFGS-B`, `GenSA`, `GA`, `DEoptim`, `PSO`, `Bayesian`, `CMA-ES`.

In addition, any algorithm from the `nloptr` package is also supported. If your chosen `nloptr` algorithm requires a local search, you need to input a character vector. The first element represents the algorithm used for global search, and the second element represents the algorithm used for local search.

Value

The optimal parameters found by the algorithm for each subject, along with the model fit calculated using these parameters. This is returned as an object of class binaryRL containing results for all subjects with all models.

Note

The primary difference between `fit_p` and `rcv_d` lies in their data source: `fit_p` fits parameters to real data, while `rcv_d` fits parameters to synthetic data generated by models. Many studies inappropriately skip the `rcv_d` step.

Imagine `fit_p` as the actual boxing match, and `rcv_d` as the weigh-in. Boxers of different weight classes shouldn't compete directly.

Similarly, if a competing model lacks the ability of parameter or model recovering, even if your proposed model outperforms it in `fit_p`, it doesn't necessarily make your proposed model a good one.

Examples

if (FALSE) { # \dontrun{
comparison <- binaryRL::fit_p(
  data = binaryRL::Mason_2024_Exp2,
  id = unique(binaryRL::Mason_2024_Exp2$Subject),
##-----------------------------------------------------------------------------##
##----------------------------- black-box function ----------------------------##
  #funcs = c("your_funcs"),
  fit_model = list(binaryRL::TD, binaryRL::RSTD, binaryRL::Utility),
  model_name = c("TD", "RSTD", "Utility"),
  lower = list(c(0, 0), c(0, 0, 0), c(0, 0, 0)),
  upper = list(c(1, 10), c(1, 1, 10), c(1, 1, 10)),
##----------------------------- interation number -----------------------------##
  iteration = 10,
##-------------------------------- algorithms ---------------------------------##
  nc = 1,                 # <nc > 1>: parallel computation across subjects
  # Base R Optimization
  algorithm = "L-BFGS-B"  # Gradient-Based (stats)
##-----------------------------------------------------------------------------##
  # Specialized External Optimization
  #algorithm = "GenSA"    # Simulated Annealing (GenSA)
  #algorithm = "GA"       # Genetic Algorithm (GA)
  #algorithm = "DEoptim"  # Differential Evolution (DEoptim)
  #algorithm = "PSO"      # Particle Swarm Optimization (pso)
  #algorithm = "Bayesian" # Bayesian Optimization (mlrMBO)
  #algorithm = "CMA-ES"   # Covariance Matrix Adapting (cmaes)
##-----------------------------------------------------------------------------##
  # Optimization Library (nloptr)
  #algorithm = c("NLOPT_GN_MLSL", "NLOPT_LN_BOBYQA")
##-------------------------------- algorithms ---------------------------------##
#################################################################################
)

result <- dplyr::bind_rows(comparison)

# Ensure the output directory exists before writing
if (!dir.exists("../OUTPUT")) {
  dir.create("../OUTPUT", recursive = TRUE)
}

write.csv(result, "../OUTPUT/result_comparison.csv", row.names = FALSE)
} # }