Generalized Linear Models V

Dec. 10, 2018 $\newcommand{\bs}{\boldsymbol}$ $\newcommand{\argmin}[1]{\underset{\bs{#1}}{\text{arg min}}\,}$ $\newcommand{\argmax}[1]{\underset{\bs{#1}}{\text{arg max}}\,}$ $\newcommand{\tr}{^{\top}}$ $\newcommand{\norm}[1]{\left|\left|\,#1\,\right|\right|}$ $\newcommand{\given}{\,|\,}$ $\DeclareMathOperator{\E}{\mathbb{E}}$ $\newcommand{\blue}[1]{{\color{blue} {#1}}}$ $\newcommand{\red}[1]{{\color{red} {#1}}}$ $\newcommand{\orange}[1]{{\color{orange} {#1}}}$ $\newcommand{\pfrac}[2]{\frac{\partial #1}{\partial #2}}$

For a simplified version, we assume that the underlying data-generating distribution is Poisson, whose means $\lambda_{ij}$ are related to the covariates via

$$ \log \lambda_{ij} = \bs{x}_{ij}\tr \bs{\theta} + \beta_i + \gamma_{ij}. \tag{1} $$

Since the covariates $\bs{z}_{ij}$ and $\bs{t}_{ij}$ are all assumed to be 1, we can combine the one-dimensional coefficients as $\bs{B} = (\beta_1,..., \beta_n)'$ and $\bs{\Gamma}=(\gamma_{11},...,\gamma_{n,m_n})'$, so that

$$ \bs{B} \sim \mathcal{N}(\bs{0}, \sigma_1^2\bs{I}), \quad \bs{\Gamma} \sim \mathcal{N}(\bs{0}, \sigma_2^2\bs{I}). \tag{2} $$

We have the following data model:

$$ p(y_{ij}) = \frac{e^{-\lambda_{ij}}\lambda_{ij}^{y_{ij}}}{y_{ij}!}, \text{ where } \lambda_{ij} = e^{\bs{x}_{ij}\tr \bs{\theta} + \beta_i+ \gamma_{ij}}. \tag{3} $$

Writing the pdf in standard form:

$$ p(y_{ij}) = \text{exp}\left(y_{ij}\log \lambda_{ij} - \lambda_{ij} - \log y_{ij}! \right). \tag{4} $$

From this we see that the logarithm is indeed the canonical link function. In addition, we see that $b(\cdot)=\cdot$ is the identity function, and the dispersion parameter $\phi_i=1$. The link function $g(\lambda) = \log \lambda$, so its derivative is given by $g'(\lambda) = 1/\lambda$. Hence, at each iteration of the Bayesian IWLS, the weights is given by

$$ w_{ij} = \frac{1}{\text{Var}(Y_{ij})(g'(\lambda_{ij}))^2} = \lambda_{ij}. \tag{5} $$

Adding the priors $\mathcal{N}(\bs{a}, \bs{R})$ for $\bs{\theta}$ and the non-informative priors $p(\sigma_1^{-2}, \sigma_2^{-2}) \propto \sigma_1^{2}\sigma_2^{2}$, the posterior distribution becomes

$$ \begin{aligned} &\quad \pi(\bs{\theta}, \bs{B}, \bs{\Gamma}, \sigma_1^{-2}, \sigma_2^{-2}) \propto \\ &\text{exp}\left\{-\frac{1}{2}(\bs{\theta}-\bs{a})\tr\bs{R}^{-1}(\bs{\theta}-\bs{a}) + \sum_{i=1}^n \sum_{j=1}^{m_i}\left[y_{ij}\log\lambda_{ij}-\lambda_{ij}\right] \right\} \\ &\quad [\text{where } \lambda_{ij} = e^{\bs{x}_{ij}\tr \bs{\theta} + \beta_i+ \gamma_{ij}}] \\ \times \,\, & (\sigma_1^{-2})^{\frac{n}{2}-1} \text{exp}\left\{\frac{-\bs{B}\tr\bs{B}}{2\sigma_1^2}\right\} \times (\sigma_2^{-2})^{\frac{n}{2}-1} \text{exp}\left\{\frac{-\bs{\Gamma}\tr\bs{\Gamma}}{2\sigma_2^2} \right\} \end{aligned} \tag{6} $$

Sampling $\bs{\theta}$

For a vector of response $\bs{y}=(y_{11},...,y_{n,m_n})'$, the posterior log-likelihood for $\bs{\theta}$ is

$$ \pi(\bs{\theta}) \propto -\frac{1}{2}(\bs{\theta}-\bs{a})\tr \bs{R}^{-1}(\bs{\theta}-\bs{a}) + \sum_{i=1}^n\sum_{j=1}^{m_i} \left[y_{ij} \bs{x}_{ij}\tr \bs{\theta} - \lambda_{ij} \right]. \tag{7} $$

The linear predictor $\eta_{ij} = \bs{x}_{ij}\tr\bs{\theta}+\beta_i+\gamma_{ij}$ can be written in the vectorized form

$$ \bs{\eta} = \bs{X}\bs{\theta} + \bs{Z}\bs{B} + \bs{\Gamma}, \tag{8} $$

where $\bs{X} = (\bs{x}_{11}\tr,...,\bs{x}_{n,m_n}\tr)'$ is the design matrix, and

$$ \begin{aligned} \bs{Z} &= \text{diag}(\bs{1}_{m_1},...,\bs{1}_{m_n}) \\ \bs{B} &= (\beta_1,...,\beta_n)' \\ \bs{\Gamma} &= (\gamma_{11},...,\gamma_{n,m_n})' \end{aligned} \tag{9} $$

In [2]:
# loading necessary libraries
library(dplyr)
library(Matrix)
library(mvtnorm)
In [3]:
# define log-likelihood for theta
dtheta.log <- function(theta, X, y, Z, B, Gamma, a, Ri){
#     -----------
#     theta = fixed effect
#     X, y = design matrix and response vector
#     Z = diag(1m1,...,1mn) defined in (9)
#     B = (beta_1,...,b_n) vector of beta_i
#     Gamma = (gamma_11,...,gamma_nm_n) vector of gamma_ij
#     a, R = prior parameters for theta
#     Ri = R^-1, the inverse of R
#     -----------
    eta1 <- X%*%theta
    lambda <- exp(eta1 + Z%*%B + Gamma)
    sum(y*(eta1)-lambda)-as.numeric(t(theta-a)%*%Ri%*%(theta-a))
}

The proposal density has the form $\mathcal{N}(\bs{m}^{(t)}, \bs{C}^{(t)})$, where

$$ \begin{aligned} \bs{m}^{(t)} &= (\bs{R}^{-1} + \bs{X}\tr\bs{W}\bs{X})^{-1} (\bs{R}^{-1}\bs{a} + \bs{X}\tr \bs{W} (\bs{\tilde{y}}-\bs{Z}\bs{B} - \bs{\Gamma})) \\ \bs{C}^{(t)} &= (\bs{R}^{-1} + \bs{X}\tr\bs{W}\bs{X})^{-1} \end{aligned} \tag{10} $$

The adjusted response in this case is given by

$$ \bs{\tilde{y}}^{(t-1)} = \bs{\eta}^{(t-1)} + (\bs{y} - \bs{\lambda}^{(t-1)})/\bs{\lambda}^{(t-1)}, \tag{11} $$

where the division of vectors is defined elementwise, and

$$ \bs{\lambda}^{(t-1)} = \text{exp}(\bs{\eta}^{(t-1)}). \tag{12} $$

In [4]:
# returns mean vector and adjusted response
y.adjust <- function(theta, X, y, Z, B, Gamma){
    eta <- as.vector(X%*%theta + Z%*%B + Gamma) # linear predictor
    lambda <- exp(eta) # mean vector
    z <- as.vector(eta + (y-lambda)/lambda) # adjusted response
    return(list('lambda'=lambda, 'z'=z))
}
In [5]:
# returns mean and variance for proposal density for theta
J.theta <- function(theta, X, y, Z, B, Gamma, Ri, Ria){
#     -----------
#     Ri = R^-1, inverse of R
#     Ria = R^-1*a
#     -----------
    adjust <- y.adjust(theta, X, y, Z, B, Gamma)
    lambda <- adjust$lambda # mean
    z <- adjust$z # adjusted response
    z <- as.vector(z - Z%*%B - Gamma) # adjusted response with offset
    W <- diag(lambda) # weight matrix
    C.t <- solve(Ri + t(X)%*%W%*%X) # proposal variance
    m.t <- C.t%*%(Ria + t(X)%*%W%*%z) # proposal mean
    return(list('m'=m.t, 'C'=C.t))
}
In [6]:
# returns the acceptance probability for theta
alpha.theta <- function(theta.new, theta.old, X, y, Z, B, Gamma, C, a, Ri) {
#     -----------
#     theta.new = value of theta generated at current iteration
#     theta.old = value of theta from previous iteration
#     C = covariance matrix of the proposal density 
#     -----------
    top <- dtheta.log(theta.new,X,y,Z,B,Gamma,a,Ri)+dmvnorm(theta.old,theta.new,C,log=TRUE)
    bot <- dtheta.log(theta.old,X,y,Z,B,Gamma,a,Ri)+dmvnorm(theta.new,theta.old,C,log=TRUE)
    alpha <- min(1, exp(top-bot))
    rbinom(1,1,alpha)
}

Sampling $\beta_i$

For each subject level random effect $\beta_i$, the full conditional distribution is given by

$$ \begin{aligned} \pi(\beta_i) &\propto \text{exp}\left\{-\frac{\beta_i^2}{2\sigma_1^2} + \sum_{j=1}^{m_i}\left[y_{ij}\log\lambda_{ij}-\lambda_{ij}\right] \right\} \\ & \quad [\text{where } \lambda_{ij} = e^{\bs{x}_{ij}\tr \bs{\theta} + \beta_i+ \gamma_{ij}}] \\ & \propto \text{exp}\left\{-\frac{\beta_i^2}{2\sigma_1^2} + \sum_{j=1}^{m_i}\left[y_{ij}\beta_i-\lambda_{ij}\right] \right\} \end{aligned} \tag{13} $$

The posterior log-likelihood is implemented as follows. First we define two slicing functions slice.vector and slice.matrix that subsets the corresponding vector of matrix for the $i$th subject.

In [7]:
# Slicing a vector of length n*m_n
slice.vector <- function(i, Z, vector) {
#     -----------
#     i = index for the ith subject
#     -----------
    indices <- Z[,i]==1 # indices for the ith subject
    return(as.vector(vector[indices]))
}
In [8]:
# Slicing a matrix of size (n*m_n x ?)
slice.matrix <- function(i, Z, matrix) {
    indices <- Z[,i]==1
    return(as.matrix(matrix[indices,]))
}
In [12]:
# compute log-liklihood for beta_i
dbeta.log <- function(beta.i, i, theta, X, y, Z, Gamma, sig1) {
#     -----------
#     beta.i = beta_i, random effect for ith subject
#     i = index value for ith subject
#     sig1 = sigma1^-2, the precision parameter (not variance!!)
#     -----------
    X.i <- slice.matrix(i, Z, X) # design matrix for subject i
    y.i <- slice.vector(i, Z, y) # response for subject i
    Gamma.i <- slice.vector(i, Z, Gamma) # unit random effect for subject i
    v1 <- as.vector(rep(1, length(Gamma.i))) # a vector of 1's of length m_i
    eta.i <- as.vector(X.i%*%theta + beta.i*v1 + Gamma.i) # linear predictor
    lambda.i <- as.vector(exp(eta.i)) # mean vector
    beta.i*sum(y.i) - sum(lambda.i) - beta.i^2*sig1/2 # log-likelihood
}

For the IWLS sampling, the offset is now $\bs{x}_{ij}\tr\bs{\theta}+\gamma_{ij}$, so the proposal density is represented by $N(m_i^{(t)}, C_i^{(t)})$ where

$$ \begin{aligned} m_i^{(t)} &= (\sigma_1^{-2} + \bs{1}_{m_i}\tr\bs{W}_i\bs{1}_{m_i})^{-1}\bs{1}_{m_i}\tr\bs{W}_i(\bs{\tilde{y}}_i - \bs{X}_i\bs{\theta}-\bs{\Gamma}_i) \\ C_i^{(t)} &= (\sigma_1^{-2} + \bs{1}_{m_i}\tr\bs{W}_i\bs{1}_{m_i})^{-1}. \end{aligned} \tag{14} $$

In [10]:
# returns mean and variance for proposal density for beta_i
J.beta <- function(i, theta, X, y, Z, B, Gamma, sig1){
    adjust <- y.adjust(theta, X, y, Z, B, Gamma)
    lambda <- adjust$lambda # mean vector
    z <- adjust$z # adjusted response
    z.i <- slice.vector(i, Z, z) # slice adjusted response
    X.i <- slice.matrix(i, Z, X) # design matrix for subject i
    Gamma.i <- slice.vector(i, Z, Gamma) # unit random effect for subject i
    lambda.i <- slice.vector(i, Z, lambda) # mean vector for subject i
    z.i <- as.vector(z.i - X.i%*%theta - Gamma.i) # offset adjustment
    C.t <- (sig1 + sum(lambda.i))^(-1) # proposal variance
    m.t <- C.t * sum(lambda.i*z.i) # proposal mean
    return(list('m'=m.t, 'C'=C.t))
}
In [11]:
# returns the acceptance probability for beta_i
alpha.beta <- function(beta.new, beta.old, i, X, y, Z, Gamma, sig1, C) {
#     -----------
#     beta.new = value of theta generated at current iteration
#     beta.old = value of theta from previous iteration
#     C = variance for the proposal density 
#     -----------
    top <- dbeta.log(beta.new,i,theta,X,y,Z,Gamma,sig1)+dnorm(beta.old,beta.new,sqrt(C),log=TRUE)
    bot <- dbeta.log(beta.old,i,theta,X,y,Z,Gamma,sig1)+dnorm(beta.new,beta.old,sqrt(C),log=TRUE)
    alpha <- min(1, exp(top-bot))
    rbinom(1,1,alpha)
}

Sampling $\gamma_{ij}$

For the $\gamma_{ij}$, the full conditional distribution is given by

$$ \begin{aligned} \pi(\gamma_{ij}) &\propto \text{exp}\left\{-\frac{\gamma_{ij}^2}{2\sigma_2^2} + \left[y_{ij}\log\lambda_{ij}-\lambda_{ij}\right] \right\} \\ & \quad [\text{where } \lambda_{ij} = e^{\bs{x}_{ij}\tr \bs{\theta} + \beta_i+ \gamma_{ij}}] \\ & \propto \text{exp}\left\{-\frac{\gamma_{ij}^2}{2\sigma_2^2} + \left[y_{ij}\gamma_{ij}-\lambda_{ij}\right] \right\} \end{aligned} \tag{15} $$

In [13]:
# compute log-liklihood for gamma_ij
dgamma.log <- function(gamma.ij, i, j, theta, X, y, Z, B, sig2) {
#     -----------
#     gamma.ij = gamma_ij, unit random effect
#     i, j = index values for jth unit in ith subject
#     sig2 = sigma2^-2, precision parameter for gamma_ij
#     -----------
    y.ij <- slice.vector(i, Z, y)[j] # y_ij scaler
    x.ij <- as.vector(slice.matrix(i, Z, X)[j,]) # x_ij vector
    beta.i <- B[i] # beta_i
    eta.ij <- as.vector(sum(x.ij*theta) + beta.i + gamma.ij) # linear predictor
    lambda.ij <- as.vector(exp(eta.ij)) # mean
    y.ij*gamma.ij - lambda.ij - gamma.ij^2*sig2/2 # compute log-likelihood
}

The offset is $\bs{x}_{ij}\tr\bs{\theta}+\bs{z}_{ij}\tr\bs{\beta}_i$, so the proposal density is represented by $\mathcal{N}(\bs{m}_{ij}^{(t)}, \bs{C}_{ij}^{(t)})$ where

$$ \begin{aligned} \bs{m}_{ij}^{(t)} &= (\sigma_2^{-2} + w_{ij})^{-1}w_{ij}(\tilde{y}_{ij} - \bs{x}_{ij}\tr\bs{\theta}-\beta_i) \\ \bs{C}_i^{(t)} &= (\sigma_2^{-2} + w_{ij})^{-1}. \end{aligned} \tag{16} $$

In [14]:
# returns mean and variance for proposal density for gamma_ij
J.gamma <- function(i, j, theta, X, y, Z, B, Gamma, sig2){
    adjust <- y.adjust(theta, X, y, Z, B, Gamma)
    lambda <- adjust$lambda # mean vector
    z <- adjust$z # adjusted response
    z.ij <- slice.vector(i, Z, z)[j] # sliced adjusted response
    x.ij <- as.vector(slice.matrix(i, Z, X)[j,]) # covariates for ijth example
    lambda.ij <- slice.vector(i, Z, lambda)[j] # mean for ijth example
    beta.i <- B[i] # obtain beta_i
    z.ij <- z.ij - sum(x.ij*theta) - beta.i # adjusted response with offset
    C.t <- 1/(sig2 + lambda.ij) # proposal variance
    m.t <- C.t * lambda.ij * z.ij  # proposal mean
    return(list('m'=m.t, 'C'=C.t))
}
In [15]:
# returns the acceptance probability for gamma_ij
alpha.gamma <- function(gamma.new, gamma.old, i, j, theta, X, y, Z, B, sig2, C) {
#     -----------
#     gamma.new = value of gamma_ij generated at current iteration
#     gamma.old = value of gamma_ij from previous iteration
#     C = variance for the proposal density 
#     -----------
    top <- dgamma.log(gamma.new,i,j,theta,X,y,Z,B,sig2)+dnorm(gamma.old,gamma.new,sqrt(C),log=TRUE)
    bot <- dgamma.log(gamma.old,i,j,theta,X,y,Z,B,sig2)+dnorm(gamma.new,gamma.old,sqrt(C),log=TRUE)
    alpha <- min(1, exp(top-bot))
    rbinom(1,1,alpha)
}

Sampling $\sigma_1^{-2}, \sigma_2^{-2}$

It is easy to see that the posterior distribution for the precision parameters are

$$ \begin{aligned} \sigma_1^{-2} &\sim \text{Gamma}\left(\frac{n}{2}, \frac{2}{\bs{B}\tr\bs{B}} \right) \\ \sigma_2^{-2} &\sim \text{Gamma}\left(\frac{n}{2}, \frac{2}{\bs{\Gamma}\tr\bs{\Gamma}}\right). \end{aligned} \tag{17} $$

Full Model

In [16]:
# Bayesian IWLS algorithm for Poisson nested model
Poisson.mcmc <- function(theta, X, y, Z, B, Gamma, sig1, sig2, a, R, N=100, thin=1, burnin=0, seed=6, plot=TRUE) {
#     -----------
#     theta = initial value for fixed effect
#     X, y = design matrix and response vector
#     Z = subject location matrix defined in (9)
#     B = (beta_1,...,beta_n), vector of subject random effects
#     Gamma = (gamma_11,...,gamma_nmn), vector of unit random effects
#     sig1, sig2 = sigma1^-2, sigma2^-2
#     a, R = prior parameters for beta
#     N = number of iterations
#     thin = skipping iterations to decorrelate the chain
#     burnin = discarding the initial part of the chain  
#     -----------
    set.seed(seed) # control randomness
    acc.theta = 0 # keep track of acceptance for theta
    n <- dim(Z)[2] # number of subjects
    p <- length(theta) # dimension of fixed effect

    # calculate Ri and Ria
    Ri <- solve(R) %>% as.matrix()
    Ria <- Ri%*%a %>% as.vector()

    # create saving device for theta
    theta.mcmc <- matrix(0, ncol=p, nrow=(N/thin-1)) # save theta 

    # create saving device for beta
    beta.mcmc <- list() # using 2 layered list
    for (i in 1:n){
        beta.mcmc[[i]] <- rep(0, nrow=(N/thin-1))
    }

    # create saving device for gamma
    gamma.mcmc <- list() # saving gamma using 3 layered list
    for (i in 1:n){
        gamma.mcmc[[i]] <- list()
        for (j in 1:sum(Z[,i]==1)){
            gamma.mcmc[[i]][[j]] <- rep(0, (N/thin-1))
        }
    }

    for (t in 1:(N-1+burnin)){

        ##################################
        ######### sampling theta #########
        ##################################

        # obtain parameters for the proposal density
        params <- J.theta(theta, X, y, Z, B, Gamma, Ri, Ria)
        m.t <- params$m %>% as.vector()
        C.t <- params$C %>% as.matrix()

        # generate a new value of theta from the proposal density
        theta.s <- rmvnorm(1, m.t, C.t) %>% as.vector()

        # determine acceptance or not
        RR <- alpha.theta(theta.s, theta, X, y, Z, B, Gamma, C.t, a, Ri)

        if (RR==1){
            theta <- theta.s
            acc.theta <- acc.theta+1
        }

        # Saving the thinned Markov chain
        if ((t %% thin == 0) & (t >= burnin)){
            theta.mcmc[(t-burnin)/thin,] = as.vector(theta)
        }

        ###################################
        ######### sampling beta_i #########
        ###################################

        for (i in 1:n){

            # obtain parameters for the proposal density
            params <- J.beta(i, theta, X, y, Z, B, Gamma, sig1)
            m.t <- as.numeric(params$m)
            C.t <- as.numeric(params$C)

            # generate a new value of beta_i from the proposal density
            beta.s <- rnorm(1, m.t, sqrt(C.t))

            # determine acceptance or not
            RR <- alpha.beta(beta.s, B[i], i, X, y, Z, Gamma, sig1, C.t)

            if (RR==1){
                B[i] <- beta.s
            }

            # Saving the thinned Markov chain
            if ((t %% thin == 0) & (t >= burnin)){
                beta.mcmc[[i]][(t-burnin)/thin] = B[i]
            }
        }

        ####################################
        ######### sampling gamma_ij ########
        ####################################

        index.ij <- 0 # keep track of index of gamma_ij in Gamma

        for (i in 1:n){

            for (j in 1:sum(Z[,i]==1)){

                index.ij <- index.ij + 1

                # obtain parameters for the proposal density
                params <- J.gamma(i, j, theta, X, y, Z, B, Gamma, sig2)
                m.t <- as.numeric(params$m)
                C.t <- as.numeric(params$C)

                # generate a new value of gamma_ij from the proposal density
                gamma.s <- rnorm(1, m.t, sqrt(C.t)) %>% as.numeric()

                # determine acceptance or not
                RR <- alpha.gamma(gamma.s, Gamma[index.ij], i, j, theta, X, y, Z, B, sig2, C.t)

                if (RR==1){
                    Gamma[index.ij] <- gamma.s
                }

                # Saving the thinned Markov chain
                if ((t %% thin == 0) & (t >= burnin)){
                    gamma.mcmc[[i]][[j]][(t-burnin)/thin] = Gamma[index.ij]
                }
            }
        }

        ####################################
        ######### sampling sig1, 2 #########
        ####################################
        sig1 <- rgamma(1, n/2, 2/sum(B*B))
        sig2 <- rgamma(1, n/2, 2/sum(Gamma*Gamma))
    }

    p <- length(theta)

    if (plot==TRUE){

        # print basic information
        print('SUMMARY')
        print('-------------------------', quote=FALSE)
        print(paste('Iterations:', N))
        print(paste('Burn-in:', burnin))
        print(paste('Thin:', thin))
        print(paste('Acceptance rate:', acc.theta/N))
        print('-------------------------', quote=FALSE)

        options(repr.plot.width=6, repr.plot.height=2*p)
        par(mfrow=c(p, 1), mar=c(4, 4, 2, 1))

        # plot the chains
        for (i in 1:p){
            plot(theta.mcmc[,i],type='l',main=paste0('theta', i),xlab='',ylab='')
            print(paste0('Mean of theta', i, ': ', mean(theta.mcmc[,i])))
        }
    }
    return(list('theta'=theta.mcmc, 'beta'=beta.mcmc, 'gamma'=gamma.mcmc))
}

Simulation Study

Let's assume that the truth is indeed Poisson distributed with the following link:

$$ \begin{aligned} g(\lambda_{ij}) &= \eta_{ij} = \log \lambda_{ij} = \bs{x}_{ij}\tr \bs{\theta}+\beta_i + \gamma_{ij}\\ i &= 1,..., n, j=1,...,m_i \end{aligned}\tag{18} $$

In addition, we assume that $\beta_i\sim N(0, \sigma_1^2)$ and $\gamma_{ij}\sim N(0, \sigma_2^2)$ are independently distributed. Let's generate the data set.

In [17]:
data.load <- function(p, theta, sig1, sig2, counts, seed=1) {
#     -----------
#     sig1, sig2 = sigma1^-2, sigma2^-2 as precision parameters
#     counts = (m1, m2,..., m_n) number of units in each subject
#     p = number of columns of design matrix   
#     -----------
    set.seed(seed)
    n <- length(counts) # number of subjects
    nm <- sum(counts) # number of training examples

    X <- rep(1, nm) # design matrix
    for (i in 1:(p-1)){
        X <- cbind(X, runif(nm, 0, 5))
    }

    Y <- rep(0, nm) # initialize response vector

    # generate response Y based on fixed and random effects
    index.ij <- 0 # keep track of index
    for (i in 1:n){
        beta.i <- rnorm(1, 0, sqrt(1/sig1)) # generate beta_i
        for (j in 1:counts[i]){
            index.ij <- index.ij + 1 # updating index
            gamma.ij <- rnorm(1, 0, sqrt(1/sig2)) # generate gamma_ij
            x.ij <- as.vector(X[index.ij,]) # covariates for ijth example
            eta <- sum(x.ij*theta) + beta.i + gamma.ij # linear predictor
            Y[index.ij] <- rpois(1, exp(eta)) # generate Y
        }
    }
    cbind(X, Y)
}

Let's generate a uni-variate dataset with $\bs{\theta}=(0.1,1)'$, $\sigma_1^{-2}=1$, and $\sigma_2^{-2}=10$. We also want only 4 subjects with 25 training examples in each subject.

In [22]:
# Generate data
mydata <- data.load(p=2, theta=c(0.1, 1), sig1=1, sig2=20, counts=c(25, 25, 25, 25), seed=1) # generate dataset
options(repr.plot.width=4, repr.plot.height=4) # plotting
plot(mydata[,2], mydata[,3], xlab='X', ylab='Y', main='Poisson')
In [25]:
# Plotting each group separately
options(repr.plot.width=6, repr.plot.height=6)
par(mfrow=c(2, 2), mar=c(4, 4, 2, 1))
plot(mydata[1:25,2], mydata[1:25,3], xlab='X', ylab='Y', ylim=c(0, 250), main='Group 1')
plot(mydata[26:50,2], mydata[26:50,3], xlab='X', ylab='Y', ylim=c(0, 250), main='Group 2')
plot(mydata[51:75,2], mydata[51:75,3], xlab='X', ylab='Y', ylim=c(0, 250),  main='Group 3')
plot(mydata[76:100,2], mydata[76:100,3], xlab='X', ylab='Y', ylim=c(0, 250), main='Group 4')

We see that Group 3 has relatively lower response values. If the model works well, we should be able to detect that from the value of $\beta_3$. Let's initialize the model.

In [27]:
theta <- c(0, 1) # fixed effect
v1 <- rep(1,25)
Z <- bdiag(v1, v1, v1, v1) %>% as.matrix() # subject location matrix Z
X <- as.matrix(mydata[,1:2]) # design matrix
y <- as.vector(mydata[,3]) # response vector
B <- rep(0,4) # subject random effects beta_i
Gamma <- rep(0,100) # unit random effects gamma_ij
sig1 <- 1; sig2 <- 10 # precision parameter
a <- c(0,0); R <- 10*diag(2) # prior for fixed effect
In [29]:
result <- Poisson.mcmc(theta, X, y, Z, B, Gamma, sig1, sig2, a, R, N=5000, thin=10, burnin=1000, seed=1, plot=TRUE)
[1] "SUMMARY"
[1] -------------------------
[1] "Iterations: 5000"
[1] "Burn-in: 1000"
[1] "Thin: 10"
[1] "Acceptance rate: 0.8314"
[1] -------------------------
[1] "Mean of theta1: 0.102864587068628"
[1] "Mean of theta2: 0.973779786770928"

Wow, beautiful! I have to admit the IWLS sampling scheme is awesome. We get a extremely high acceptance rate. We can also get the posterior mean for $\beta_i$.

In [34]:
beta.1 <- mean(result$beta[[1]])
beta.2 <- mean(result$beta[[2]])
beta.3 <- mean(result$beta[[3]])
beta.4 <- mean(result$beta[[4]])
print(paste('beta1 =', beta.1))
print(paste('beta2 =', beta.2))
print(paste('beta3 =', beta.3))
print(paste('beta4 =', beta.4))
[1] "beta1 = 0.507830435349446"
[1] "beta2 = 1.03774406883753"
[1] "beta3 = -0.200640532311118"
[1] "beta4 = 0.413865556247885"

Indeed, we see that $\beta_3$ is negative, corresponding to a lower response rate in Group 3.