$$Q_{new} = Q_{old} + \alpha \cdot (R - Q_{old})$$
Arguments
- shown
Which options shown in this trial.
- is.fp
Is it the first time picking this option?
- qvalue
The expected Q values of different behaviors produced by different systems when updated to this trial.
- reward
The feedback received by the agent from the environment at trial(t) following the execution of action(a)
- utility
The subjective value (internal representation) assigned by the agent to the objective reward.
- system
When the agent makes a decision, is a single system at work, or are multiple systems involved? see system
- rownum
The trial number
- params
Parameters used by the model's internal functions, see params
All hidden variables within the MDP process belong here.
- ...
It currently contains the following information; additional information may be added in future package versions.
idinfo:
subid
block
trial
exinfo: contains information whose column names are specified by the user.
Frame
RT
NetWorth
...
behave: includes the following:
action: the behavior performed by the human in the given trial.
latent: the object updated by the agent in the given trial.
simulation: the actual behavior performed by the agent.
position: the position of the stimulus on the screen.
cue and rsp: Cues and responses within latent learning rules, see behrule
state: The state stores the stimuli shown in the current trial—split into components by underscores—and the rewards associated with them.
Value
A List
output [NumericVector]A numeric value representing the updated Q-value after learning.
This function specifies how prediction error (PE) is incorporated into value updating, using a learning rate that determines whether updates are more conservative or more aggressive in response to PE.
hidden [CharacterVector]User-defined internal variables generated by this function. These represent intermediate (latent) states produced during computation, which can be read or modified by other functions in the MDP process.
Body
func_alpha <- function(
shown,
is.fp,
qvalue,
reward,
utility,
params,
rownum,
system,
hidden,
...
){
list2env(list(...), envir = environment())
# If you need extra information(...)
# Column names may be lost(C++), indexes are recommended
# e.g.
# Trial <- idinfo[3]
# Frame <- exinfo[1]
# Action <- behave[1]
Q0 <- params[["Q0"]]
alpha <- params[["alpha"]]
alphaN <- params[["alphaN"]]
alphaP <- params[["alphaP"]]
if (is.nan(Q0) && first) {
update <- utility
hidden[1] <- "first"
return(list(output = update, hidden = hidden))
}
# Determine the model currently in use based on which parameters are free.
if (
system == "RL" && !(is.null(alpha)) && is.null(alphaN) && is.null(alphaP)
) {
model <- "TD"
} else if (
system == "RL" && is.null(alpha) && !(is.null(alphaN)) && !(is.null(alphaP))
) {
model <- "RSTD"
} else if (
system == "WM"
) {
model <- "WM"
} else {
stop("Unknown Model! Plase modify your learning rate function")
}
# TD
if (model == "TD") {
update <- qvalue + alpha * (utility - qvalue)
# RSTD
} else if (model == "RSTD" && utility < qvalue) {
update <- qvalue + alphaN * (utility - qvalue)
} else if (model == "RSTD" && utility >= qvalue) {
update <- qvalue + alphaP * (utility - qvalue)
# WM
} else if (model == "WM") {
update <- reward
}
return(list(output = update, hidden = hidden))
}