This document is intended as a short introduction to fitting MCMC models using JAGS and runjags, focussing on learning by example. The document may grow slightly over time as new examples are added, but the intention is to keep this as breif as possible. Additional information and more details are available in the user guide. For detailed documentation of individual functions, see the runjags manual for that function.
The runjags packages requires a working installation of JAGS. To test the installation, it is recommended to use the testjags() function:
testjags()
## You are using R Under development (unstable) (2015-04-10 r68170)
## on a unix machine, with the X11 GUI
## The rjags package is installed
## JAGS version 3.4.0 found successfully using the command
## '/usr/local/bin/jags'
This should verify that your JAGS installation has been found. The rjags and modeest packages are not strictly required, but are reccomended packages and should be installed where possible.
A recommended introductory text for individuals not familiar with the BUGS language is:
Lunn D, Jackson C, Best N, Thomas A, Spiegelhalter D (2012). The BUGS book: A practical introduction to Bayesian analysis. CRC press.
To illustrate how to use run.jags we will look at the Salmonella example from chapter 6.5.2 of this book, with thanks to Lunn, Jackson, Best, Thomas, and Spiegelhalter 2012, for kind permission to reproduce this code. The model below was taken from the WinBUGS example, modified to include a pair of curly brackets to demarcate the Data and Inits blocks (mandatory), and adding a second Inits block with different starting values for 2 chains:
filestring <- "
Poisson model...
model {
for (i in 1:6) {
for (j in 1:3) {
y[i,j] ~ dpois(mu[i])
}
log(mu[i]) <- alpha + beta*log(x[i] + 10) + gamma*x[i]
}
for (i in 1:6) {
y.pred[i] ~ dpois(mu[i])
}
alpha ~ dnorm(0, 0.0001)
beta ~ dnorm(0, 0.0001)
gamma ~ dnorm(0, 0.0001)
}
Data{
list(y = structure(.Data = c(15,21,29,16,18,21,16,26,33,
27,41,60,33,38,41,20,27,42),
.Dim = c(6, 3)),
x = c(0, 10, 33, 100, 333, 1000))
}
Inits{
list(alpha = 0, beta = 1, gamma = -0.1)
}
Inits{
list(alpha = 10, beta = 0, gamma = 0.1)
}
"
The following call to run.jags reads, compiles, runs and returns the model information along with MCMC samples and summary statistics:
results <- run.jags(filestring, monitor=c('alpha','beta','gamma'))
Note that this model is run from an embedded character vector here for convinience, but run.jags will also accept a file path as the main argument in which case the information will be read from the specified file. There is a print method associated with runjags class objects:
results
##
## JAGS model summary statistics from 20000 samples (chains = 2; adapt+burnin = 5000):
##
## Lower95 Median Upper95 Mean SD Mode
## alpha 1.7474 2.171 2.5954 2.1633 0.22163 --
## beta 0.21075 0.31972 0.42886 0.32174 0.057586 --
## gamma -0.0014947 -0.0010189 -0.00053908 -0.0010229 0.00024571 --
##
## MCerr MC%ofSD SSeff AC.10 psrf
## alpha 0.01593 7.2 194 0.8136 1.0054
## beta 0.0043872 7.6 172 0.83752 1.0045
## gamma 0.000014817 6 275 0.61603 1.0033
##
## Total time taken: 1 seconds
This method displays relevant summary statistics - the effective sample size (SSeff) and convergence diagnostic (psrf) is shown for each stochastic variable. It is also advisable to plot the results using the default plot method to assess convergence graphically:
plot(results, layout=c(4,3))
In this case the chains have converged, but there is a large amount of auto-correlation and therefore a small effective sample size, which can be reduced by loading the GLM module in JAGS:
resultswithglm <- run.jags(filestring, monitor=c('alpha','beta','gamma'), modules='glm')
## module glm loaded
resultswithglm
##
## JAGS model summary statistics from 20000 samples (chains = 2; adapt+burnin = 5000):
##
## Lower95 Median Upper95 Mean SD Mode
## alpha 1.7232 2.1564 2.5781 2.1553 0.21832 --
## beta 0.21051 0.32404 0.43174 0.32466 0.056563 --
## gamma -0.0014889 -0.0010392 -0.00054339 -0.00104 0.00024192 --
##
## MCerr MC%ofSD SSeff AC.10 psrf
## alpha 0.0028878 1.3 5716 0.0069201 0.99996
## beta 0.00069957 1.2 6537 0.0054517 0.99997
## gamma 2.7599e-06 1.1 7683 -0.0007299 1.0002
##
## Total time taken: 1.4 seconds
This second model has a smaller AC.10 (autocorrelation with lag 10) and therefore larger SSeff (effective sample size), so there will be much less Monte Carlo error associated with these estimates comapred to the first model.
To facilitate running models using data that is already in R, runjags will look for a number of comment tags in the model file to indicate the variables that should be extracted from the R working environment. For example the following model uses N, Y and X as data, and coef, int and precision are given initial values, all of which are taken from R. In addition, the model indicates to monitor coef, int and precision so that no monitor vector is required in the call to run.jags:
# Simulate the data:
set.seed(1)
X <- 1:100
Y <- rnorm(length(X), 2*X + 10, 1)
N <- length(X)
model <- "
model {
for(i in 1 : N){ #data# N
Y[i] ~ dnorm(true.y[i], precision) #data# Y
true.y[i] <- (coef * X[i]) + int #data# X
}
coef ~ dunif(-1000,1000)
int ~ dunif(-1000,1000)
precision ~ dexp(1)
#inits# coef, int, precision
#monitor# coef, int, precision
}"
# A function to return initial values for each chain:
coef <- function(chain) return(switch(chain, "1"= -10, "2"= 10))
int <- function(chain) return(switch(chain, "1"= -10, "2"= 10))
precision <- function(chain) return(switch(chain, "1"= 0.01, "2"= 100))
# Run the simulation:
results <- run.jags(model, n.chains = 2)
The output of this function can be extended using:
results <- extend.jags(results, sample=5000)
This function call takes an additional 5000 samples and combines them with the original simulation. To re-start the original simulation:
autoresults <- autorun.jags(model, method='parallel')
This re-runs the original model, but this time the convergence and sample size is automatically monitored and controlled, and the two chains are run in parallel to speed up lengthy simulations. Each of the functions run.jags, extend.jags, autorun.jags and autoextend.jags allows a method argument to be specified, with the options available detailed in the user guide.
One of the most difficult things about getting started with JAGS or BUGS is the requirement to write the model yourself. Looking at examples is a good way to learn, but it is even easier when the example is tailored towards your specific model. The template.jags function creates a model template file based on an lme4-style formula interface. We can look at an example from the help file for lm:
ctl <- c(4.17,5.58,5.18,6.11,4.50,4.61,5.17,4.53,5.33,5.14)
trt <- c(4.81,4.17,4.41,3.59,5.87,3.83,6.03,4.89,4.32,4.69)
group <- gl(2, 10, 20, labels = c("Ctl","Trt"))
weight <- c(ctl, trt)
D9 <- data.frame(weight, group)
# The JAGS equivalent:
model <- template.jags(weight ~ group, D9, n.chains=2, family='gaussian')
The model is written to disk by template.jags, and includes all of the data and initial values necessary to run the model. Examining this model file is a good way to learn how the BUGS syntax works, as well as some of the options allowed by the runjags package. To run the models:
JAGS.D9 <- run.jags(model)
lm.D9 <- lm(weight ~ group, data=D9)
And to compare results:
JAGS.D9
##
## JAGS model summary statistics from 20000 samples (chains = 2; adapt+burnin = 5000):
##
## Lower95 Median Upper95 Mean SD Mode
## regression_precision 0.81269 1.987 3.4114 2.0644 0.6903 --
## intercept 4.5616 5.0326 5.4924 5.0303 0.23436 --
## group_effect[1] 0 0 0 0 0 0
## group_effect[2] -1.0385 -0.37285 0.26592 -0.37053 0.33139 --
## deviance 40.179 42.752 48.611 43.418 2.6468 --
## resid.sum.sq 8.7293 9.4311 12.166 9.8192 1.2364 --
##
## MCerr MC%ofSD SSeff AC.10 psrf
## regression_precision 0.0053117 0.8 16889 -0.012628 0.99996
## intercept 0.0016656 0.7 19798 0.0023872 1
## group_effect[1] -- -- -- -- --
## group_effect[2] 0.0023433 0.7 20000 -0.0012253 1
## deviance 0.021233 0.8 15538 -0.0078901 0.99999
## resid.sum.sq 0.009686 0.8 16293 -0.0035878 1.0001
##
## Model fit assessment:
## DIC = 46.80822 (range between chains: 46.79467 - 46.82177)
## [PED not available from the stored object]
## Estimated effective number of parameters: pD = 3.39042
##
## Total time taken: 0.9 seconds
summary(lm.D9)
##
## Call:
## lm(formula = weight ~ group, data = D9)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.0710 -0.4938 0.0685 0.2462 1.3690
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.0320 0.2202 22.850 9.55e-15 ***
## groupTrt -0.3710 0.3114 -1.191 0.249
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.6964 on 18 degrees of freedom
## Multiple R-squared: 0.07308, Adjusted R-squared: 0.02158
## F-statistic: 1.419 on 1 and 18 DF, p-value: 0.249
Note that lm reports sigma and JAGS the precision - to make them more comparable we could use a mutate function:
JAGS.D9 <- run.jags(model, mutate=list(prec2sd, 'precision'))
And focus on the results from the summary that we are interested in:
summary(JAGS.D9, vars=c('regression_precision.sd', 'intercept', 'group_effect'))
## Lower95 Median Upper95 Mean
## regression_precision.sd 0.5036577 0.7091036 0.9913811 0.7279563
## intercept 4.5674130 5.0313281 5.4908289 5.0336583
## group_effect[1] 0.0000000 0.0000000 0.0000000 0.0000000
## group_effect[2] -1.0348447 -0.3720177 0.2846335 -0.3732267
## SD Mode MCerr MC%ofSD SSeff
## regression_precision.sd 0.1300112 NA 0.001039635 0.8 15639
## intercept 0.2347665 NA 0.001660050 0.7 20000
## group_effect[1] 0.0000000 0 NA NA NA
## group_effect[2] 0.3325473 NA 0.002372569 0.7 19646
## AC.10 psrf
## regression_precision.sd -0.004710555 1.0002287
## intercept -0.006259967 0.9999553
## group_effect[1] NA NA
## group_effect[2] -0.013940930 0.9999598
We can also compare the estimated residuals using for example:
summary(residuals(lm.D9) - residuals(JAGS.D9, output='mean'))
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -0.0008054 -0.0008054 -0.0003525 -0.0003525 0.0001003 0.0001003
The standard likelihood ratio, AIC and BIC statistics are not relevant to Bayesian methods such as MCMC. Instead, the Deviance Information Criterion is typically used to compare models, and is displayed as part of the print method shwon above. This can also be extracted from a runjags model using the extract method, along with other information such as the samplers used:
extract(JAGS.D9, what='samplers')
## Index.No Sampler Node
## 1 1 Linear intercept
## 2 1 Linear group_effect[2]
## 3 2 ConjugateGamma regression_precision
Another possible method to assess the fit of Bayesian models is using leave-one-out cross-validation, or a drop-k study. A utility function is included within runjags to facilitate this - here we just assess the first 6 weight datapoints for brevity:
datafit <- drop.k(JAGS.D9, 'weight[1:6]', k=1, n.cores=2)
datafit
##
## Values obtained from a drop-k study with a total of 6 simulations:
##
## Target Median Mean Lower95%CI Upper95%CI Range95%CI
## weight[1] 4.17 5.1137 5.1174 3.5649 6.5966 3.0316
## weight[2] 5.58 4.9566 4.9604 3.3604 6.4849 3.1245
## weight[3] 5.18 5.0007 5.0047 3.3754 6.557 3.1816
## weight[4] 6.11 4.8985 4.9022 3.3961 6.3371 2.941
## weight[5] 4.5 5.0765 5.0804 3.4785 6.6066 3.1281
## weight[6] 4.61 5.0642 5.0681 3.4552 6.6049 3.1497
##
## Within95%CI AutoCorr(Lag10)
## weight[1] 1 0.0013443
## weight[2] 1 0.0013443
## weight[3] 1 0.0013443
## weight[4] 1 0.0013443
## weight[5] 1 0.0013443
## weight[6] 1 0.0013443
##
## Average time taken: 0.3 seconds (range: 0.2 seconds - 0.4 seconds)
## Average adapt+burnin required: 5000 (range: 5000 - 5000)
## Average samples required: 10000 (range: 10000 - 10000)
For more details of simulation studies, see the user guide.
There are a large number of global options that can be controlled using the runjags.options function. For an explanation of these see the help file for this function:
?runjags.options
The sourceforge repository associated with the runjags package contains additional information on where to get help etc. This package is in constant development, so if you find any issues or the code does not behave as expected please email the pacckage developer to report the problem - either by email or via the runjags forum. General comments and suggestions on how you find using this package are also very welcome.
Devleopment of this R package has consumed a great deal of time and energy, and is a somewhat peripheral activity in my current academic position. In order to justify continued devleopment of the software I need to be able to demonstrate that it is being used within the general research community. So, if you use this software as part of a publication, please remember to cite the software as follows:
citation('runjags')
##
## Please cite the 'runjags' package in publications as:
##
## M. J. Denwood (In Review). runjags: An R package providing
## interface utilities, parallel computing methods and additional
## distributions for MCMC models in JAGS. Journal of Statistical
## Software. URL: http://runjags.sourceforge.net
##
## A BibTeX entry for LaTeX users is
##
## @Article{,
## title = {runjags: An R package providing interface utilities, model templates, parallel computing methods and additional distributions for MCMC models in JAGS},
## author = {M. J. Denwood},
## year = {In Review},
## journal = {Journal of Statistical Software},
## url = {http://runjags.sourceforge.net},
## }
It is also important to cite JAGS and R!
sessionInfo()
## R Under development (unstable) (2015-04-10 r68170)
## Platform: x86_64-apple-darwin13.4.0 (64-bit)
## Running under: OS X 10.9.5 (Mavericks)
##
## locale:
## [1] C/en_US.UTF-8/C/C/C/C
##
## attached base packages:
## [1] parallel stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] rjags_3-14 coda_0.17-1 runjags_2.0.1-1
##
## loaded via a namespace (and not attached):
## [1] formatR_1.1 tools_3.3.0 grid_3.3.0 knitr_1.9
## [5] stringr_0.6.2 lattice_0.20-31 evaluate_0.5.5