It is now possible to mix calibration and policy elements in optimization payoff files. This makes it possible to do things like adding penalty terms to calibration payoffs that account for priors on model parameters or enforce complex constraints.
There are also several additional subtypes that make payoffs logarithmic, restrict the timing of payoff computation, and use non-Gaussian calibration error assumptions.
The new payoff keywords modify the existing *C (calibration) and *P (policy) options, and can be mixed.
For example:
•*CK - calibration, with Gaussian errors and a weight interpretation matching that in Kalman filtering
•*CL - calibration, with errors in log space
•*PF - policy, with payoff contribution computed only at FINAL TIME
•*RI - prior, with payoff contribution computed only at INITIAL TIME
Note that the extended payoff options are often not meaningful for Kalman filtering, which assumes Gaussian errors, and will generally be treated like type *C.
Examples
See the examples in OptSensi .
Calibration Payoffs
For calibration payoffs, the *C keyword may be followed by a Transform modifier L or X, and a Distribution modifier, G, K, O, R, Y, H, B, P or D. The Transform and Distribution modifiers may be combined, e.g. *CRL computes a robust error metric on log-transformed model and data.
•*CGL means calibration with Gaussian (Normal) errors on log-transformed data
•*CP means calibration with Poisson count errors
Distribution
The treatment of the distribution option depends on whether Kalman filtering is active.
•In ordinary simulations, the payoff is computed as below.
•The Kalman Filter assumes Gaussian errors (i.e. Normally distributed errors) with a specified variance. If you specify a distribution that is non-Gaussian, it will be translated to the Kalman format automatically (except for a few pathological cases that will cause an error). For example, if your payoff specifies a Poisson distribution, which has mean = variance, the Kalman filter will substitute N(mean,mean). The most likely consequences of this approximation are that (a) the tails of the Robust, Huber and Cauchy distributions will be lighter than expected, and (b) the skewness of the Poisson, Binomial and Overdispersed Poisson distributions will not be captured.
The interpretation of the Scale parameter varies with the distribution. In all but a few cases, the scale must be greater than 0; points with 0 scale will be ignored, with a warning or penalty if it represents a conceptual error.
Distribution |
Modifier |
Scale Interpretation |
Regular Calibration |
Kalman filtering |
Normal |
none (default) |
Weight = 1/StandardDeviation
>= 0 |
Payoff contribution calculated as:
- ((model-data)*weight)^2
where the weight is typically 1/(standard error of measurement). |
The weight is converted from 1/StdDev to variance, so filtering works as usual. |
Gaussian |
G |
Standard Deviation
> 0 |
This computes the complete normal log likelihood, excepting inessential constant terms, as in Kalman filtering, except that the scale parameter is the standard deviation of the measurement error rather than the variance. This is often the most convenient format to use, because the standard deviation is the most available and intuitive measure of the quality of the data.
The payoff contribution is
- (model-data)^2/std_dev^2/2 - LN(std_dev)
Inclusion of the LN(std_dev) term makes it possible to estimate the error scale as an optimization parameter. |
The scale parameter is converted from StdDev to variance, so filtering works as usual. |
Kalman |
K |
Variance
> 0 |
This makes the interpretation of the weight parameter consistent whether Kalman filtering is active or not.
Example:
*CK model|data/variance
Contributes:
- (model-data)^2/variance/2 - LN(variance)/2
Other differences will exist due to the state estimation in a Kalman simulation. |
Same as regular calibration. |
Kalman Filter Only |
O |
Variance
> 0 |
NA |
The 'O' modifier is for use only with Kalman filtering; it permits inclusion of an element in the filter's state update without contributing to the optimization payoff.
As for other Kalman filtering applications (*C or *CK), the scale parameter is a variance. |
Robust |
R |
Scale = MAD/ln(2)
> 0 |
The absolute value of differences between the model and the data rather than the squared residual. To use this feature, just specify *R as the payoff type, e.g.,
*CR model|data/scale
Which will contribute to the payoff as:
- ABS((model - data)/scale) - LN(scale)
The underlying assumption is that errors have a Laplace or double exponential distribution. The second term, LN(scale) arises from the Laplace likelihood, and makes the scale usable as an optimization parameter. |
The scale parameter is converted to a variance, according to the Laplace distribution, by
Var = 2*scale^2.
This preserves the variance, but not the kurtosis of the distribution; it becomes less heavy-tailed and therefore less robust (though the Kalman filter's outlier rejection may compensate). |
Cauchy |
Y |
Scale = MAD
> 0 |
The Cauchy distribution has undefined mean and variance (but a defined median), so it is a more extreme error assumption than the Laplace distribution. The scale parameter represents the MAD. To use this feature, specify *Y as the payoff type, e.g.,
*CY model_var|data_var/scale
Which will contribute to the payoff as:
- LN( 1 + ((model_var - data_var)/scale)^2 ) |
The scale parameter is converted to a variance by
Var = scale^2.
This makes the distribution less heavy-tailed and therefore less robust (though the Kalman filter's outlier rejection may compensate). |
Huber |
H |
Scale = Standard Deviation in central region
> 0 |
The Huber loss function combines aspects of the Normal (Gaussian) distribution and the Robust (Laplace) loss function.
*CH model|data/scale
Which will contribute to the payoff as:
if ( (model - data)/scale > 2 ) - 2*(ABS(model - data)/scale-1) - LN(scale) else - (model-data)^2/scale^2/2 - LN(scale)
In other words, it behaves like the Normal distribution with squared errors where the errors are of scale <= 2 (i.e. 2 standard deviations), and like a robust ABS(error) estimate for outliers. This combines the robustness of the absolute value with greater efficiency near the mean. |
The scale parameter is converted to a variance by
Var = scale^2.
This preserves the variance and central portion of the distribution, but not the kurtosis of the tails; it becomes less heavy-tailed and therefore less robust (though the Kalman filter's outlier rejection may compensate). |
Binomial |
B |
Number of trials
>= 0 (normally an integer) |
The Binomial distribution is provided for estimation of discrete events, as one might encounter in a model of individual health care outcomes. It represents the log likelihood of the outcome of Binomial trials with the given characteristics.
*CB model p success|data n successes/n trials
Unlike the other distributions above, •the first entry represents the model-calculated probability that an event occurs, not a (random) number of model events! Note that if the model time step and data interval do not match, you may need to adjust the probability to match the data interval. •the comparison data codes the number of events that actually occurred; •the third parameter is not a weight or scale factor, but represents the number of trials conducted.
In the special case where n_trials = 1, this simplifies to a Bernouilli trial, with probability of success p, so the payoff contribution is:
if ( data n successes <> 0 ) log(model p success) else log(1-model p success)
In this case, any nonzero value of the data is regarded as "success." The modeled probability of success must be greater than 0 and less than 1.
In other cases. The payoff contribution is:
log( model p success^data_n_successes * (1-model p success)^(n_trials-data_n_successes) )
The n_trials should be an integer >= 0 (zero trials will be ignored). The data_n_successes should be an integer >= 0 and <= n_trials. The modeled probability of success must be greater than 0 and less than 1. |
The Binomial variance is computed as
Var = n trials*p success*(1-p success) |
Poisson |
P |
Scale
> 0 (normally 1) |
The Poisson distribution is provided for estimation of discrete events, as one might encounter in a queueing process. It represents the log likelihood of the outcome of Poisson arrivals with the given characteristics.
*CP model f events|data n events/scale
Similar to the Binomial distribution, •the first entry represents the model-calculated frequency of events per time step, not a (random) number of model events! Note that if the model time step is not 1 unit, or events do not occur at all times, or the data interval differs from the time step, you may need to adjust the probability to match the data interval. •the comparison data codes the number of events that actually occurred. •Unlike the Binomial case, the third entry is a scaling factor, and should be 1 for the standard Poisson distribution. o You can set the scale to 0 to ignore this entry in the payoff. oAny other positive value will be interpreted as a scaling factor, Y/s ~ Poisson(X/s). This allows modification of the mean/variance relationship, such that Var = s*Mean.
The payoff contribution for a nonzero weight is:
( data n events / scale * log(model f events / scale) - model f events / scale ) - GAMMA_LN(data n events / scale + 1.0) - log(scale)
The data_n_events should be an integer >= 0. The modeled frequency of events must be greater than 0. If model=data=0, the likelihood contribution is 0. If model=0 and data>0, an impossible condition by definition, a large penalty (-1e12) is added to the payoff. |
The Poisson variance (which is the same as the mean) is computed as
Var = model f events/scale |
Overdispersed Poisson |
D |
Overdispersion
>= 0 (0 reverts to ordinary Poisson) |
The overdispersed Poisson distribution arises when the Mean=Var relationship of the Poisson distribution does not hold. There are several possible derivations, but a mixture of observations of Poisson variables with mean ~ Gamma produces a Negative Binomial distribution, with Var = Mean + overdispersion*Mean^2.
*CD model f events|data n events/overdispersion
Similar to the Poisson distribution, •the first entry represents the model-calculated frequency of events per time step, not a (random) number of model events! Note that if the model time step is not 1 unit, or events do not occur at all times, or the data interval differs from the time step, you may need to adjust the probability to match the data interval. •the comparison data codes the number of events that actually occurred. •Unlike the Poisson case, the third entry is the overdispersion term. oIf overdispersion=0, the distribution falls back to the ordinary Poisson distribution above. |
The overdispersed variance is
Var = model f events*(1+scale*model f events) |
The transformations modify the model-data comparison to use differences in logs or proportions of model-data values. This is often convenient, because it makes the choice of the weight or scale parameter easy: rather than being an absolute quantity, it can be expressed as a fraction of data.
Logarithmic
Like policy payoffs, calibration payoffs may use the āLā modifier to take logarithms of the model and data values before computing payoff contributions. Thus a *CGL payoff with the Gaussian distribution computes:
((LN(model)-LN(data))/scale)^2
which can be rewritten as
(LN(model/data)/scale)^2
The scale parameter then represents the expected geometric standard deviation of errors. A value of .2, for example, means that errors are expected to be about 20% of the model value.
This provides lognormal errors, rather than the standard normal assumption, with two consequences. First, 0 is impossible, and values much smaller than the model value are improbable. Second, the right tail of the distribution is heavier than the standard Normal.
The logarithm requires positive values for model and data variables. Nonpositive model values will be skipped with a warning message and (as of v10.3) a large penalty term of -1e12. Nonpositive data values result in a warning only. The condition model=data=0 is permitted (no warning, and no contribution to the payoff).
Proportional
The 'X' modifier makes a distribution's weight or scale parameter proportional to the model value. This implies that the scale of errors increases with larger model values, as in the logarithmic transform. A *CGX transform with the Robust distribution computes:
ABS( (model-data)/(model*scale) )
which can be rewritten as
ABS( (1-data/model)/scale )
Like the logarithmic transformation, here the scale can be interpreted as the expected error as a fraction of the model variable. The difference is that the proportional transform is symmetric around the model value (unlike the lognormal which is symmetric in log space, but skewed in linear space). The proportional transform therefore permits 0 values, and does not have a heavier-than-expected right tail.
Note that, because the scale parameter has varying interpretations according to the distribution choices above, the interpretation of "proportional" also varies:
•For Normal and Gaussian, the proportional scale can be thought of as expressing the Std Dev of the data relative to the scale of the variable, so a value of 0.1 means "the standard deviation is 10% of the model value".
•For Kalman, the proportionality is to the variance. So, a value of 1.0 mimics the Poisson distribution, for which mean = variance.
•For Robust, Cauchy and the tails of the Huber distribution, this implies that the Median Absolute Deviation is proportional to the model value times the scale.
Limitations
The Logarithmic transform is incompatible with Kalman filtering; if it is selected, the Proportional transform will be used instead, with a warning issued to the log.
The log and proportional transforms are incompatible with the discrete count distributions (Poisson, OD Poisson and Binomial); if used, they will be ignored with a warning.
Alternative
If you want some other transformation of the data or the scale of a distribution, you can make the payoff parameters model variables. For example:
*CG
log population plus 1 | log population data plus 1 / pop data frac error
with
log population plus 1 = LN( population/ref population + 1)
log population data plus 1 = LN( population data/ref population + 1)
pop data frac error = ref fractional error * SQRT( population*population data )/ref population
ref fractional error = 0.1
Policy Payoffs
For policy payoffs, the *P keyword may be followed by a Transform modifier L, and a Timing modifier, I or F. The Transform and Timing modifiers may be combined, e.g. *PFL contributes weight*LN(variable), only at final time.
Transform
•*PL indicates a logarithmic payoff, such that the payoff sums weight*LN(variable). The logarithm requires positive values for model and data variables; nonpositive values will be skipped with a warning message.
Timing
•*PI and *PF are policy elements that are computed only at INITIAL TIME and FINAL TIME, respectively.
•I and F cannot be combined.
These options are equivalent to adding a model variable with a timing switch, like:
payoff elm = IF THEN ELSE(Time > FINAL TIME-TIME STEP/2,model var,0)
Note that if you are combining initial or final values with other values, you may need to adjust the weights to compensate for the fact that ordinary values are integrated over the course of the simulation.
Priors
Priors may enter the payoff in two ways:
•From priors specified alongside parameters in the optimization control file (.voc). These are included in the payoff and payoff report, but do not need to be specified in the .vpd.
•As payoff elements specified explicitly in this payoff definition. These elements have type *R.
Normally it will be more convenient to specify priors in the .voc, but there may be reasons for including an explicit element:
•You want to use a distribution that isn't available among the builtin options for the .voc parameter list.
•The prior applies to a computed value. For example, in an SIR model, R0 = transmissionRate*generationTime, and you may wish to put the prior on R0 rather than the transmissionRate and generationTime individually.
•You want to use arbitrary timing.
*R prior elements are computed in the same way as policy payoff elements, except that they are reported separately in the payoff report.