Consider a simple Bernoulli model, \(Y_{i} \sim Binomial(1, p)\), \(i=1, 2, \ldots, n\). The data generation corresponding to this model looks like as follows:
set.seed(4321) # set random seed for reproducibility
n <- 25 # sample size
p <- 0.3 # true parameter value
y <- rbinom(n = n, size = 1, prob = p)
library(dclone)
library(rjags)
## model specification
model <- custommodel("model {
for (i in 1:n) {
#Y[i] ~ dbin(p, 1) # Binomial(N,p)
Y[i] ~ dbern(p) # Bernoulli(p)
}
p ~ dunif(0.001, 0.999)
}")
## data
dat <- list(Y = y, n = n)
## Bayesian MCMC results
fit <- jags.fit(data = dat, params = "p", model = model)
summary(fit)
plot(fit)
To make sure that both locations and clones are independent (i.i.d.), it is safest to include and extra dimension and the corresponding loop:
## dclone-ified model specification
model <- custommodel("model {
for (k in 1:K) {
for (i in 1:n) {
Y[i,k] ~ dbin(p, 1)
}
}
p ~ dunif(0.001, 0.999)
}")
## dclone-ified data specification
dat <- list(Y = dcdim(data.matrix(y)), n = n, K = 1)
## data cloning based MCMC results
dcfit <- dc.fit(data = dat, params = "p", model = model,
n.clones = c(1,2,4,8), unchanged = "n", multiply = "K")
summary(dcfit)
plot(dcfit)
coef(dcfit) # MLE
dcsd(dcfit) # asymptotic SEs
vcov(dcfit) # inverse Fisher information matrix
confint(dcfit) # asymptotic confidence interval
Data cloning based diagnostics can be used to spot identifiability issues (which is not the case here):
dctable(dcfit)
plot(dctable(dcfit))
dcdiag(dcfit)
plot(dcdiag(dcfit))
Because MCMC chains are independent, is can be seen as an
embarrassingly parallel problem.
On Windows, the only option is through
clusters (the cluster object cl
initialized
using the makeCluster
function and alikes).
On other platforms, forking can be used
(cl <- 3
) besides clusters.
## model specification
model <- custommodel("model {
for (i in 1:n) {
#Y[i] ~ dbin(p, 1) # Binomial(N,p)
Y[i] ~ dbern(p) # Bernoulli(p)
}
p ~ dunif(0.001, 0.999)
}")
## data
dat <- list(Y = y, n = n)
## parallel Bayesian MCMC results
cl <- makeCluster(3)
pfit <- jags.parfit(cl, data = dat, params = "p", model = model)
stopCluster(cl)
Data cloning increases the size of the DAG (directed acyclic graph). This means increasing computation times. Parallel computations for can be utilized to cut down computing times.
## dclone-ified model specification
model <- custommodel("model {
for (k in 1:K) {
for (i in 1:n) {
Y[i,k] ~ dbin(p, 1)
}
}
p ~ dunif(0.001, 0.999)
}")
## dclone-ified data specification
dat <- list(Y = dcdim(data.matrix(y)), n = n, K = 1)
## parallel data cloning based MCMC results
cl <- makeCluster(3)
dcpfit <- dc.parfit(cl, data = dat, params = "p", model = model,
n.clones = c(1,2,4,8), unchanged = "n", multiply = "K",
n.chains = 3, partype = "parchains")
stopCluster(cl)