Learning Rate: \(\alpha\)
$$Q_{new} = Q_{old} + \alpha \cdot (R - Q_{old})$$
Inverse Temperature: \(\beta\)
$$ P_{t}(a) = \frac{ \exp(\beta \cdot Q_{t}(a)) }{ \sum_{i=1}^{k} \exp(\beta \cdot Q_{t}(a_{i})) } $$
Arguments
- params
Parameters used by the model's internal functions, see params
Value
Depending on the mode and estimate defined in the
runtime environment, the corresponding outputs for different estimation
methods are produced, such as a single log-likelihood value or summary
statistics.
Body
TD <- function(params){
params <- list(
free = list(alpha = params[1], beta = params[2])
)
multiRL.model <- multiRL::run_m(
data = data,
behrule = behrule,
colnames = colnames,
params = params,
funcs = funcs,
priors = priors,
settings = settings
)
assign(x = "multiRL.model", value = multiRL.model, envir = multiRL.env)
return(.return_result(multiRL.model))
}