A quick-start guide to using the runjags package

This document is intended as a short introduction to fitting MCMC models using JAGS and runjags, focussing on learning by example. The document may grow slightly over time as new examples are added, but the intention is to keep this as breif as possible. Additional information and more details are available in the user guide. For detailed documentation of individual functions, see the runjags manual for that function.

Preparation

The runjags packages requires a working installation of JAGS. To test the installation, it is recommended to use the testjags() function:

testjags()

## You are using R Under development (unstable) (2015-04-10 r68170)
## on a unix machine, with the X11 GUI
## The rjags package is installed
## JAGS version 3.4.0 found successfully using the command
## '/usr/local/bin/jags'

This should verify that your JAGS installation has been found. The rjags and modeest packages are not strictly required, but are reccomended packages and should be installed where possible.

Running models using run.jags

A recommended introductory text for individuals not familiar with the BUGS language is:

Lunn D, Jackson C, Best N, Thomas A, Spiegelhalter D (2012). The BUGS book: A practical introduction to Bayesian analysis. CRC press.

To illustrate how to use run.jags we will look at the Salmonella example from chapter 6.5.2 of this book, with thanks to Lunn, Jackson, Best, Thomas, and Spiegelhalter 2012, for kind permission to reproduce this code. The model below was taken from the WinBUGS example, modified to include a pair of curly brackets to demarcate the Data and Inits blocks (mandatory), and adding a second Inits block with different starting values for 2 chains:

filestring <- "
Poisson model...

model {
for (i in 1:6) {
for (j in 1:3) {
y[i,j] ~ dpois(mu[i])
}
log(mu[i]) <- alpha + beta*log(x[i] + 10) + gamma*x[i]
}
for (i in 1:6) {
y.pred[i] ~ dpois(mu[i])
}
alpha ~ dnorm(0, 0.0001)
beta ~ dnorm(0, 0.0001)
gamma ~ dnorm(0, 0.0001)
}

Data{
list(y = structure(.Data = c(15,21,29,16,18,21,16,26,33,
27,41,60,33,38,41,20,27,42),
.Dim = c(6, 3)),
x = c(0, 10, 33, 100, 333, 1000))
}

Inits{
list(alpha = 0, beta = 1, gamma = -0.1)
}

Inits{
list(alpha = 10, beta = 0, gamma = 0.1)
}
"

The following call to run.jags reads, compiles, runs and returns the model information along with MCMC samples and summary statistics:

results <- run.jags(filestring, monitor=c('alpha','beta','gamma'))

Note that this model is run from an embedded character vector here for convinience, but run.jags will also accept a file path as the main argument in which case the information will be read from the specified file. There is a print method associated with runjags class objects:

results

## 
## JAGS model summary statistics from 20000 samples (chains = 2; adapt+burnin = 5000):
##                                                                   
##          Lower95     Median     Upper95       Mean         SD Mode
## alpha     1.7474      2.171      2.5954     2.1633    0.22163   --
## beta     0.21075    0.31972     0.42886    0.32174   0.057586   --
## gamma -0.0014947 -0.0010189 -0.00053908 -0.0010229 0.00024571   --
##                                               
##             MCerr MC%ofSD SSeff   AC.10   psrf
## alpha     0.01593     7.2   194  0.8136 1.0054
## beta    0.0043872     7.6   172 0.83752 1.0045
## gamma 0.000014817       6   275 0.61603 1.0033
## 
## Total time taken: 1 seconds

This method displays relevant summary statistics - the effective sample size (SSeff) and convergence diagnostic (psrf) is shown for each stochastic variable. It is also advisable to plot the results using the default plot method to assess convergence graphically:

plot(results, layout=c(4,3))

In this case the chains have converged, but there is a large amount of auto-correlation and therefore a small effective sample size, which can be reduced by loading the GLM module in JAGS:

resultswithglm <- run.jags(filestring, monitor=c('alpha','beta','gamma'), modules='glm')

## module glm loaded

resultswithglm

## 
## JAGS model summary statistics from 20000 samples (chains = 2; adapt+burnin = 5000):
##                                                                 
##          Lower95     Median     Upper95     Mean         SD Mode
## alpha     1.7232     2.1564      2.5781   2.1553    0.21832   --
## beta     0.21051    0.32404     0.43174  0.32466   0.056563   --
## gamma -0.0014889 -0.0010392 -0.00054339 -0.00104 0.00024192   --
##                                                  
##            MCerr MC%ofSD SSeff      AC.10    psrf
## alpha  0.0028878     1.3  5716  0.0069201 0.99996
## beta  0.00069957     1.2  6537  0.0054517 0.99997
## gamma 2.7599e-06     1.1  7683 -0.0007299  1.0002
## 
## Total time taken: 1.4 seconds

This second model has a smaller AC.10 (autocorrelation with lag 10) and therefore larger SSeff (effective sample size), so there will be much less Monte Carlo error associated with these estimates comapred to the first model.

Using data and initial values from R

To facilitate running models using data that is already in R, runjags will look for a number of comment tags in the model file to indicate the variables that should be extracted from the R working environment. For example the following model uses N, Y and X as data, and coef, int and precision are given initial values, all of which are taken from R. In addition, the model indicates to monitor coef, int and precision so that no monitor vector is required in the call to run.jags:

# Simulate the data:
set.seed(1)
X <- 1:100
Y <- rnorm(length(X), 2*X + 10, 1)
N <- length(X)

model <- "
    model {
        for(i in 1 : N){ #data# N
        Y[i] ~ dnorm(true.y[i], precision) #data# Y
        true.y[i] <- (coef * X[i]) + int #data# X
    }
    coef ~ dunif(-1000,1000)
    int ~ dunif(-1000,1000)
    precision ~ dexp(1)
    #inits# coef, int, precision
    #monitor# coef, int, precision
}"
# A function to return initial values for each chain:
coef <- function(chain) return(switch(chain, "1"= -10, "2"= 10))
int <- function(chain) return(switch(chain, "1"= -10, "2"= 10))
precision <- function(chain) return(switch(chain, "1"= 0.01, "2"= 100))

# Run the simulation:
results <- run.jags(model, n.chains = 2)

The output of this function can be extended using:

results <- extend.jags(results, sample=5000)

This function call takes an additional 5000 samples and combines them with the original simulation. To re-start the original simulation:

autoresults <- autorun.jags(model, method='parallel')

This re-runs the original model, but this time the convergence and sample size is automatically monitored and controlled, and the two chains are run in parallel to speed up lengthy simulations. Each of the functions run.jags, extend.jags, autorun.jags and autoextend.jags allows a method argument to be specified, with the options available detailed in the user guide.

Generating a GLMM template

One of the most difficult things about getting started with JAGS or BUGS is the requirement to write the model yourself. Looking at examples is a good way to learn, but it is even easier when the example is tailored towards your specific model. The template.jags function creates a model template file based on an lme4-style formula interface. We can look at an example from the help file for lm:

ctl <- c(4.17,5.58,5.18,6.11,4.50,4.61,5.17,4.53,5.33,5.14)
trt <- c(4.81,4.17,4.41,3.59,5.87,3.83,6.03,4.89,4.32,4.69)
group <- gl(2, 10, 20, labels = c("Ctl","Trt"))
weight <- c(ctl, trt)
D9 <- data.frame(weight, group)

# The JAGS equivalent:
model <- template.jags(weight ~ group, D9, n.chains=2, family='gaussian')

The model is written to disk by template.jags, and includes all of the data and initial values necessary to run the model. Examining this model file is a good way to learn how the BUGS syntax works, as well as some of the options allowed by the runjags package. To run the models:

JAGS.D9 <- run.jags(model)
lm.D9 <- lm(weight ~ group, data=D9)

And to compare results:

JAGS.D9

## 
## JAGS model summary statistics from 20000 samples (chains = 2; adapt+burnin = 5000):
##                                                                    
##                      Lower95   Median Upper95     Mean      SD Mode
## regression_precision 0.81269    1.987  3.4114   2.0644  0.6903   --
## intercept             4.5616   5.0326  5.4924   5.0303 0.23436   --
## group_effect[1]            0        0       0        0       0    0
## group_effect[2]      -1.0385 -0.37285 0.26592 -0.37053 0.33139   --
## deviance              40.179   42.752  48.611   43.418  2.6468   --
## resid.sum.sq          8.7293   9.4311  12.166   9.8192  1.2364   --
##                                                                
##                          MCerr MC%ofSD SSeff      AC.10    psrf
## regression_precision 0.0053117     0.8 16889  -0.012628 0.99996
## intercept            0.0016656     0.7 19798  0.0023872       1
## group_effect[1]             --      --    --         --      --
## group_effect[2]      0.0023433     0.7 20000 -0.0012253       1
## deviance              0.021233     0.8 15538 -0.0078901 0.99999
## resid.sum.sq          0.009686     0.8 16293 -0.0035878  1.0001
## 
## Model fit assessment:
## DIC = 46.80822  (range between chains: 46.79467 - 46.82177)
## [PED not available from the stored object]
## Estimated effective number of parameters:  pD = 3.39042
## 
## Total time taken: 0.9 seconds

summary(lm.D9)

## 
## Call:
## lm(formula = weight ~ group, data = D9)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.0710 -0.4938  0.0685  0.2462  1.3690 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   5.0320     0.2202  22.850 9.55e-15 ***
## groupTrt     -0.3710     0.3114  -1.191    0.249    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.6964 on 18 degrees of freedom
## Multiple R-squared:  0.07308,    Adjusted R-squared:  0.02158 
## F-statistic: 1.419 on 1 and 18 DF,  p-value: 0.249

Note that lm reports sigma and JAGS the precision - to make them more comparable we could use a mutate function:

JAGS.D9 <- run.jags(model, mutate=list(prec2sd, 'precision'))

And focus on the results from the summary that we are interested in:

summary(JAGS.D9, vars=c('regression_precision.sd', 'intercept', 'group_effect'))

##                            Lower95     Median   Upper95       Mean
## regression_precision.sd  0.5036577  0.7091036 0.9913811  0.7279563
## intercept                4.5674130  5.0313281 5.4908289  5.0336583
## group_effect[1]          0.0000000  0.0000000 0.0000000  0.0000000
## group_effect[2]         -1.0348447 -0.3720177 0.2846335 -0.3732267
##                                SD Mode       MCerr MC%ofSD SSeff
## regression_precision.sd 0.1300112   NA 0.001039635     0.8 15639
## intercept               0.2347665   NA 0.001660050     0.7 20000
## group_effect[1]         0.0000000    0          NA      NA    NA
## group_effect[2]         0.3325473   NA 0.002372569     0.7 19646
##                                AC.10      psrf
## regression_precision.sd -0.004710555 1.0002287
## intercept               -0.006259967 0.9999553
## group_effect[1]                   NA        NA
## group_effect[2]         -0.013940930 0.9999598

We can also compare the estimated residuals using for example:

summary(residuals(lm.D9) - residuals(JAGS.D9, output='mean'))

##       Min.    1st Qu.     Median       Mean    3rd Qu.       Max. 
## -0.0008054 -0.0008054 -0.0003525 -0.0003525  0.0001003  0.0001003

Model evaluation

The standard likelihood ratio, AIC and BIC statistics are not relevant to Bayesian methods such as MCMC. Instead, the Deviance Information Criterion is typically used to compare models, and is displayed as part of the print method shwon above. This can also be extracted from a runjags model using the extract method, along with other information such as the samplers used:

extract(JAGS.D9, what='samplers')

##   Index.No        Sampler                 Node
## 1        1         Linear            intercept
## 2        1         Linear      group_effect[2]
## 3        2 ConjugateGamma regression_precision

Another possible method to assess the fit of Bayesian models is using leave-one-out cross-validation, or a drop-k study. A utility function is included within runjags to facilitate this - here we just assess the first 6 weight datapoints for brevity:

datafit <- drop.k(JAGS.D9, 'weight[1:6]', k=1, n.cores=2)

datafit

## 
## Values obtained from a drop-k study with a total of 6 simulations:
##                                                                
##           Target Median   Mean Lower95%CI Upper95%CI Range95%CI
## weight[1]   4.17 5.1137 5.1174     3.5649     6.5966     3.0316
## weight[2]   5.58 4.9566 4.9604     3.3604     6.4849     3.1245
## weight[3]   5.18 5.0007 5.0047     3.3754      6.557     3.1816
## weight[4]   6.11 4.8985 4.9022     3.3961     6.3371      2.941
## weight[5]    4.5 5.0765 5.0804     3.4785     6.6066     3.1281
## weight[6]   4.61 5.0642 5.0681     3.4552     6.6049     3.1497
##                                      
##           Within95%CI AutoCorr(Lag10)
## weight[1]           1       0.0013443
## weight[2]           1       0.0013443
## weight[3]           1       0.0013443
## weight[4]           1       0.0013443
## weight[5]           1       0.0013443
## weight[6]           1       0.0013443
## 
## Average time taken:  0.3 seconds (range: 0.2 seconds - 0.4 seconds)
## Average adapt+burnin required:  5000 (range: 5000 - 5000)
## Average samples required:  10000 (range: 10000 - 10000)

For more details of simulation studies, see the user guide.

Options

There are a large number of global options that can be controlled using the runjags.options function. For an explanation of these see the help file for this function:

?runjags.options

Further information

The sourceforge repository associated with the runjags package contains additional information on where to get help etc. This package is in constant development, so if you find any issues or the code does not behave as expected please email the pacckage developer to report the problem - either by email or via the runjags forum. General comments and suggestions on how you find using this package are also very welcome.

Citation

Devleopment of this R package has consumed a great deal of time and energy, and is a somewhat peripheral activity in my current academic position. In order to justify continued devleopment of the software I need to be able to demonstrate that it is being used within the general research community. So, if you use this software as part of a publication, please remember to cite the software as follows:

citation('runjags')

## 
## Please cite the 'runjags' package in publications as:
## 
##   M. J. Denwood (In Review). runjags: An R package providing
##   interface utilities, parallel computing methods and additional
##   distributions for MCMC models in JAGS.  Journal of Statistical
##   Software.  URL: http://runjags.sourceforge.net
## 
## A BibTeX entry for LaTeX users is
## 
##   @Article{,
##     title = {runjags: An R package providing interface utilities, model templates, parallel computing methods and additional distributions for MCMC models in JAGS},
##     author = {M. J. Denwood},
##     year = {In Review},
##     journal = {Journal of Statistical Software},
##     url = {http://runjags.sourceforge.net},
##   }

It is also important to cite JAGS and R!

This vignette was built with:

sessionInfo()

## R Under development (unstable) (2015-04-10 r68170)
## Platform: x86_64-apple-darwin13.4.0 (64-bit)
## Running under: OS X 10.9.5 (Mavericks)
## 
## locale:
## [1] C/en_US.UTF-8/C/C/C/C
## 
## attached base packages:
## [1] parallel  stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
## [1] rjags_3-14      coda_0.17-1     runjags_2.0.1-1
## 
## loaded via a namespace (and not attached):
## [1] formatR_1.1     tools_3.3.0     grid_3.3.0      knitr_1.9      
## [5] stringr_0.6.2   lattice_0.20-31 evaluate_0.5.5