Usage

Consider a simple Bernoulli model, \(Y_{i} \sim Binomial(1, p)\), \(i=1, 2, \ldots, n\). The data generation corresponding to this model looks like as follows:

set.seed(4321) # set random seed for reproducibility
n <- 25 # sample size
p <- 0.3 # true parameter value
y <- rbinom(n = n, size = 1, prob = p)

Bayesian analysis in JAGS

library(dclone)
library(rjags)

## model specification
model <- custommodel("model {
    for (i in 1:n) {
        #Y[i] ~ dbin(p, 1) # Binomial(N,p)
        Y[i] ~ dbern(p) # Bernoulli(p)
    }
    p ~ dunif(0.001, 0.999)
}")

## data
dat <- list(Y = y, n = n)

## Bayesian MCMC results
fit <- jags.fit(data = dat, params = "p", model = model)

summary(fit)
plot(fit)

Data cloning based maximum likelihood estimation

To make sure that both locations and clones are independent (i.i.d.), it is safest to include and extra dimension and the corresponding loop:

## dclone-ified model specification
model <- custommodel("model {
    for (k in 1:K) {
        for (i in 1:n) {
            Y[i,k] ~ dbin(p, 1)
        }
    }
    p ~ dunif(0.001, 0.999)
}")

## dclone-ified data specification
dat <- list(Y = dcdim(data.matrix(y)), n = n, K = 1)

## data cloning based MCMC results
dcfit <- dc.fit(data = dat, params = "p", model = model,
    n.clones = c(1,2,4,8), unchanged = "n", multiply = "K")

summary(dcfit)
plot(dcfit)

coef(dcfit) # MLE
dcsd(dcfit) # asymptotic SEs
vcov(dcfit) # inverse Fisher information matrix
confint(dcfit) # asymptotic confidence interval

Data cloning based diagnostics can be used to spot identifiability issues (which is not the case here):

dctable(dcfit)
plot(dctable(dcfit))
dcdiag(dcfit)
plot(dcdiag(dcfit))

High performance computing

Because MCMC chains are independent, is can be seen as an embarrassingly parallel problem. On Windows, the only option is through clusters (the cluster object cl initialized using the makeCluster function and alikes). On other platforms, forking can be used (cl <- 3) besides clusters.

## model specification
model <- custommodel("model {
    for (i in 1:n) {
        #Y[i] ~ dbin(p, 1) # Binomial(N,p)
        Y[i] ~ dbern(p) # Bernoulli(p)
    }
    p ~ dunif(0.001, 0.999)
}")

## data
dat <- list(Y = y, n = n)

## parallel Bayesian MCMC results
cl <- makeCluster(3)
pfit <- jags.parfit(cl, data = dat, params = "p", model = model)
stopCluster(cl)

Data cloning increases the size of the DAG (directed acyclic graph). This means increasing computation times. Parallel computations for can be utilized to cut down computing times.

## dclone-ified model specification
model <- custommodel("model {
    for (k in 1:K) {
        for (i in 1:n) {
            Y[i,k] ~ dbin(p, 1)
        }
    }
    p ~ dunif(0.001, 0.999)
}")

## dclone-ified data specification
dat <- list(Y = dcdim(data.matrix(y)), n = n, K = 1)

## parallel data cloning based MCMC results
cl <- makeCluster(3)
dcpfit <- dc.parfit(cl, data = dat, params = "p", model = model,
    n.clones = c(1,2,4,8), unchanged = "n", multiply = "K",
    n.chains = 3, partype = "parchains")
stopCluster(cl)