Exponential Random Graph Models

Modified

June 27, 2024

I strongly suggest reading the vignette in the ergm R package.

vignette("ergm", package="ergm")

The purpose of ERGMs, in a nutshell, is to describe parsimoniously the local selection forces that shape the global structure of a network. To this end, a network dataset, like those depicted in Figure 1, may be considered as the response in a regression model, where the predictors are things like “propensity for individuals of the same sex to form partnerships” or “propensity for individuals to form triangles of partnerships”. In Figure 1(b), for example, it is evident that the individual nodes appear to cluster in groups of the same numerical labels (which turn out to be students’ grades, 7 through 12); thus, an ERGM can help us quantify the strength of this intra-group effect.

(David R. Hunter et al. 2008)

Source: Hunter et al. (2008)

In a nutshell, we use ERGMs as a parametric interpretation of the distribution of \(\mathbf{Y}\), which takes the canonical form:

\[ {\mathbb{P}\left(\mathbf{Y}=\mathbf{y}|\theta, \mathcal{Y}\right) } = \frac{\text{exp}\left\{\theta^{\text{T}}\mathbf{g}(\mathbf{y})\right\}}{\kappa\left(\theta, \mathcal{Y}\right)},\quad\mathbf{y}\in\mathcal{Y} \tag{8.1}\]

Where \(\theta\in\Omega\subset\mathbb{R}^q\) is the vector of model coefficients and \(\mathbf{g}(\mathbf{y})\) is a q-vector of statistics based on the adjacency matrix \(\mathbf{y}\).

Model Equation 8.1 may be expanded by replacing \(\mathbf{g}(\mathbf{y})\) with \(\mathbf{g}(\mathbf{y}, \mathbf{X})\) to allow for additional covariate information \(\mathbf{X}\) about the network. The denominator \(\kappa\left(\theta,\mathcal{Y}\right) = \sum_{\mathbf{y}\in\mathcal{Y}}\text{exp}\left\{\theta^{\text{T}}\mathbf{g}(\mathbf{y})\right\}\) is the normalizing factor that ensures that equation Equation 8.1 is a legitimate probability distribution. Even after fixing \(\mathcal{Y}\) to be all the networks that have size \(n\), the size of \(\mathcal{Y}\) makes this type of statistical model hard to estimate as there are \(N = 2^{n(n-1)}\) possible networks! (David R. Hunter et al. 2008)

Later developments include new dependency structures to consider more general neighborhood effects. These models relax the one-step Markovian dependence assumptions, allowing investigation of longer-range configurations, such as longer paths in the network or larger cycles (Pattison and Robins 2002). Models for bipartite (Faust and Skvoretz 1999) and tripartite (Mische and Robins 2000) network structures have been developed. (David R. Hunter et al. 2008, 9)

A naïve example

In the simplest case, ERGMs equate a logistic regression. By simple, I mean cases with no Markovian terms–motifs involving more than one edge–for example, the Bernoulli graph. In the Bernoulli graph, ties are independent, so the presence/absence of a tie between nodes \(i\) and \(j\) won’t affect the presence/absence of a tie between nodes \(k\) and \(l\).

Let’s fit an ERGM using the sampson dataset in the ergm package.

library(ergm)
library(netplot)
data("sampson")
nplot(samplike)

Using ergm to fit a Bernoulli graph requires using the edges term, which counts how many ties are in the graph:

ergm_fit <- ergm(samplike ~ edges)
## Starting maximum pseudolikelihood estimation (MPLE):
## Obtaining the responsible dyads.
## Evaluating the predictor and response matrix.
## Maximizing the pseudolikelihood.
## Finished MPLE.
## Evaluating log-likelihood at the estimate.

Since this is equivalent to a logistic regression, we can use the glm function to fit the same model. First, we need to prepare the data so we can pass it to glm:

y <- sort(as.vector(as.matrix(samplike)))
y <- y[-c(1:18)] # We remove the diagonal from the model, which is all 0.
y
##   [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
##  [38] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
##  [75] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [112] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [149] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [186] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1
## [223] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [260] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [297] 1 1 1 1 1 1 1 1 1 1

We can now fit the GLM model:

glm_fit <- glm(y~1, family=binomial("logit"))

The coefficients of both ERGM and GLM should match:

glm_fit
## 
## Call:  glm(formula = y ~ 1, family = binomial("logit"))
## 
## Coefficients:
## (Intercept)  
##     -0.9072  
## 
## Degrees of Freedom: 305 Total (i.e. Null);  305 Residual
## Null Deviance:       367.2 
## Residual Deviance: 367.2     AIC: 369.2
ergm_fit
## 
## Call:
## ergm(formula = samplike ~ edges)
## 
## Maximum Likelihood Coefficients:
##   edges  
## -0.9072

Furthermore, in the case of the Bernoulli graph, we can get the estimate using the Logit function:

pr <- mean(y)
# Logit function:
# Alternatively we could have used log(pr) - log(1-pr)
qlogis(pr)
## [1] -0.9071582

Again, the same result. The Bernoulli graph is not the only ERGM model that can be fitted using a Logistic regression. Moreover, if all the terms of the model are non-Markov terms, ergm automatically defaults to a Logistic regression.

Estimation of ERGMs

The ultimate goal is to perform statistical inference on the proposed model. In a standard setting, we could use Maximum Likelihood Estimation (MLE), which consists of finding the model parameters \(\theta\) that, given the observed data, maximize the likelihood of the model. For the latter, we generally use Newton’s method. Newton’s method requires computing the model’s log-likelihood, which can be challenging in ERGMs.

For ERGMs, since part of the likelihood involves a normalizing constant that is a function of all possible networks, this is not as straightforward as in the regular setting. Because of this, most estimation methods rely on simulations.

In statnet, the default estimation method is based on a method proposed by (Geyer and Thompson 1992), Markov-Chain MLE, which uses Markov-Chain Monte Carlo for simulating networks and a modified version of the Newton-Raphson algorithm to estimate the parameters.

The idea of MC-MLE for this family of statistical models is to approximate the expectation of normalizing constant ratios using the law of large numbers. In particular, the following:

\[\begin{align*} \frac{\kappa\left(\theta,\mathcal{Y}\right)}{\kappa\left(\theta_0,\mathcal{Y}\right)} & = \frac{ \sum_{\mathbf{y}\in\mathcal{Y}}\text{exp}\left\{\theta^{\text{T}}\mathbf{g}(\mathbf{y})\right\}}{ \sum_{\mathbf{y}\in\mathcal{Y}}\text{exp}\left\{\theta_0^{\text{T}}\mathbf{g}(\mathbf{y})\right\} } \\ & = \sum_{\mathbf{y}\in\mathcal{Y}}\left( % \frac{1}{% \sum_{\mathbf{y}\in\mathcal{Y}\text{exp}\left\{\theta_0^{\text{T}}\mathbf{g}(\mathbf{y})\right\}}% } \times % \text{exp}\left\{\theta^{\text{T}}\mathbf{g}(\mathbf{y})\right\} % \right) \\ & = \sum_{\mathbf{y}\in\mathcal{Y}}\left( % \frac{\text{exp}\left\{\theta_0^{\text{T}}\mathbf{g}(\mathbf{y})\right\}}{% \sum_{\mathbf{y}\in\mathcal{Y}\text{exp}\left\{\theta_0^{\text{T}}\mathbf{g}(\mathbf{y})\right\}}% } \times % \text{exp}\left\{(\theta - \theta_0)^{\text{T}}\mathbf{g}(\mathbf{y})\right\} % \right) \\ & = \sum_{\mathbf{y}\in\mathcal{Y}}\left( % {\mathbb{P}\left(Y = y|\mathcal{Y}, \theta_0\right) } \times % \text{exp}\left\{(\theta - \theta_0)^{\text{T}}\mathbf{g}(\mathbf{y})\right\} % \right) \\ & = \text{E}_{\theta_0}\left(\text{exp}\left\{(\theta - \theta_0)^{\text{T}}\mathbf{g}(\mathbf{y})\right\} \right) \end{align*}\]

In particular, the MC-MLE algorithm uses this fact to maximize the log-likelihood ratio. The objective function itself can be approximated by simulating \(m\) networks from the distribution with parameter \(\theta_0\):

\[ l(\theta) - l(\theta_0) \approx (\theta - \theta_0)^{\text{T}}\mathbf{g}(\mathbf{y}_{obs}) - \text{log}{\left[\frac{1}{m}\sum_{i = 1}^m\text{exp}\left\{(\theta-\theta_0)^{\text{T}}\right\}\mathbf{g}(\mathbf{Y}_i)\right]} \]

For more details, see (David R. Hunter et al. 2008). A sketch of the algorithm follows:

  1. Initialize the algorithm with an initial guess of \(\theta\), call it \(\theta^{(t)}\) (must be a rather OK guess)

  2. While (no convergence) do:

  1. Using \(\theta^{(t)}\), simulate \(M\) networks by means of small changes in the \(\mathbf{Y}_{obs}\) (the observed network). This part is done by using an importance-sampling method which weights each proposed network by its likelihood conditional on \(\theta^{(t)}\)

  2. With the networks simulated, we can do the Newton step to update the parameter \(\theta^{(t)}\) (this is the iteration part in the ergm package): \(\theta^{(t)}\to\theta^{(t+1)}\).

  3. If convergence has been reached (which usually means that \(\theta^{(t)}\) and \(\theta^{(t + 1)}\) are not very different), then stop; otherwise, go to step a.

(Lusher, Koskinen, and Robins 2012; Admiraal and Handcock 2006; Snijders 2002; Wang et al. 2009) provides details on the algorithm used by PNet (the same as the one used in RSiena), and (Lusher, Koskinen, and Robins 2012) provides a short discussion on the differences between ergm and PNet.

The ergm package

The ergm R package (Handcock et al. 2018)

From the previous section:1

library(igraph)

library(dplyr)

load("03.rda")

In this section, we will use the ergm package (from the statnet suit of packages (Handcock et al. 2018),) and the intergraph (Bojanowski 2023) package. The latter provides functions to go back and forth between igraph and network objects from the igraph and network packages respectively2

library(ergm)
library(intergraph)

As a rather important side note, the order in which R packages are loaded matters. Why is this important to mention now? Well, it turns out that at least a couple of functions in the network package have the same name as some functions in the igraph package. When the ergm package is loaded, since it depends on network, it will load the network package first, which will mask some functions in igraph. This becomes evident once you load ergm after loading igraph:

The following objects are masked from ‘package:igraph’:

  add.edges, add.vertices, %c%, delete.edges, delete.vertices, get.edge.attribute, get.edges,
  get.vertex.attribute, is.bipartite, is.directed, list.edge.attributes, list.vertex.attributes, %s%,
  set.edge.attribute, set.vertex.attribute

What are the implications of this? If you call the function list.edge.attributes for an object of class igraph R will return an error as the first function that matches that name comes from the network package! To avoid this you can use the double colon notation:

igraph::list.edge.attributes(my_igraph_object)
network::list.edge.attributes(my_network_object)

Anyway… Using the asNetwork function, we can coerce the igraph object into a network object so we can use it with the ergm function:

# Creating the new network
network_111 <- intergraph::asNetwork(ig_year1_111)

# Running a simple ergm (only fitting edge count)
ergm(network_111 ~ edges)
## [1] "Warning:  This network contains loops"
## Starting maximum pseudolikelihood estimation (MPLE):
## Obtaining the responsible dyads.
## Evaluating the predictor and response matrix.
## Maximizing the pseudolikelihood.
## Finished MPLE.
## Evaluating log-likelihood at the estimate.
## 
## Call:
## ergm(formula = network_111 ~ edges)
## 
## Maximum Likelihood Coefficients:
##  edges  
## -4.734

So what happened here? We got a warning. It turns out that our network has loops (didn’t think about it before!). Let’s take a look at that with the which_loop function

E(ig_year1_111)[which_loop(ig_year1_111)]
## + 1/2638 edge from 737c929 (vertex names):
## [1] 1110111->1110111

We can get rid of these using the igraph::-.igraph. Let’s remove the isolates using the same operator

# Creating the new network
network_111 <- ig_year1_111

# Removing loops
network_111 <- network_111 - E(network_111)[which(which_loop(network_111))]

# Removing isolates
network_111 <- network_111 - which(degree(network_111, mode = "all") == 0)

# Converting the network
network_111 <- intergraph::asNetwork(network_111)

asNetwork(simplify(ig_year1_111)) ig_year1_111 |> simplify() |> asNetwork()

A problem that we have with this data is the fact that some vertices have missing values in the variables hispanic, female1, and eversmk1. For now, we will proceed by imputing values based on the averages:

for (v in c("hispanic", "female1", "eversmk1")) {
  tmpv <- network_111 %v% v
  tmpv[is.na(tmpv)] <- mean(tmpv, na.rm = TRUE) > .5
  network_111 %v% v <- tmpv
}

Let’s take a look at the network

nplot(
  network_111,
  vertex.color = ~ hispanic
  )
Figure 8.1
Figure 8.2

Running ERGMs

Proposed workflow:

  1. Estimate the simplest model, adding one variable at a time.

  2. After each estimation, run the mcmc.diagnostics function to see how good (or bad) behaved the chains are.

  3. Run the gof function and verify how good the model matches the network’s structural statistics.

What to use:

  1. control.ergms: Maximum number of iterations, seed for Pseudo-RNG, how many cores

  2. ergm.constraints: Where to sample the network from. Gives stability and (in some cases) faster convergence as by constraining the model you are reducing the sample size.

Here is an example of a couple of models that we could compare3

ans0 <- ergm(
  network_111 ~
    edges +
    nodematch("hispanic") +
    nodematch("female1") +
    nodematch("eversmk1") +
    mutual,
  constraints = ~bd(maxout = 19),
  control = control.ergm(
    seed        = 1,
    MCMLE.maxit = 10,
    parallel    = 4,
    CD.maxit    = 10
    )
  )
## Warning: 'glpk' selected as the solver, but package 'Rglpk' is not available;
## falling back to 'lpSolveAPI'. This should be fine unless the sample size and/or
## the number of parameters is very big.
## Warning in nobs.ergm(object, ...): The number of observed dyads in this network
## is ill-defined due to complex constraints on the sample space. Disable this
## warning with 'options(ergm.loglik.warn_dyads=FALSE)'.
## Warning in nobs.ergm(object, ..., verbose = verbose): The number of observed
## dyads in this network is ill-defined due to complex constraints on the sample
## space. Disable this warning with 'options(ergm.loglik.warn_dyads=FALSE)'.

So what are we doing here:

  1. The model is controlling for:

    1. edges Number of edges in the network (as opposed to its density)

    2. nodematch("some-variable-name-here") Includes a term that controls for homophily/heterophily

    3. mutual Number of mutual connections between \((i, j), (j, i)\). This can be related to, for example, triadic closure.

For more on control parameters, see (Morris, Handcock, and Hunter 2008).

ans1 <- ergm(
  network_111 ~
    edges +
    nodematch("hispanic") +
    nodematch("female1") +
    nodematch("eversmk1")
    ,
  constraints = ~bd(maxout = 19),
  control = control.ergm(
    seed        = 1,
    MCMLE.maxit = 10,
    parallel    = 4,
    CD.maxit    = 10
    )
  )

This example takes longer to compute

ans2 <- ergm(
  network_111 ~
    edges +
    nodematch("hispanic") +
    nodematch("female1") +
    nodematch("eversmk1") + 
    mutual +
    balance
    ,
  constraints = ~bd(maxout = 19),
  control = control.ergm(
    seed        = 1,
    MCMLE.maxit = 10,
    parallel    = 4,
    CD.maxit    = 10
    )
  )

Now, a nice trick to see all regressions in the same table, we can use the texreg package (Leifeld 2013) which supports ergm ouputs!

library(texreg)
## Version:  1.39.3
## Date:     2023-11-09
## Author:   Philip Leifeld (University of Essex)
## 
## Consider submitting praise using the praise or praise_interactive functions.
## Please cite the JSS article in your publications -- see citation("texreg").
screenreg(list(ans0, ans1, ans2))
## Warning in nobs.ergm(object): The number of observed dyads in this network is
## ill-defined due to complex constraints on the sample space. Disable this
## warning with 'options(ergm.loglik.warn_dyads=FALSE)'.
## Warning: This object was fit with 'ergm' version 4.1.2 or earlier. Summarizing
## it with version 4.2 or later may return incorrect results or fail.
## Warning in nobs.ergm(object): The number of observed dyads in this network is
## ill-defined due to complex constraints on the sample space. Disable this
## warning with 'options(ergm.loglik.warn_dyads=FALSE)'.
## Warning: This object was fit with 'ergm' version 4.1.2 or earlier. Summarizing
## it with version 4.2 or later may return incorrect results or fail.
## Warning in nobs.ergm(object): The number of observed dyads in this network is
## ill-defined due to complex constraints on the sample space. Disable this
## warning with 'options(ergm.loglik.warn_dyads=FALSE)'.
## 
## ===============================================================
##                     Model 1        Model 2        Model 3      
## ---------------------------------------------------------------
## edges                   -5.63 ***      -5.49 ***      -5.60 ***
##                         (0.06)         (0.06)         (0.06)   
## nodematch.hispanic       0.22 ***       0.30 ***       0.22 ***
##                         (0.04)         (0.05)         (0.04)   
## nodematch.female1        0.87 ***       1.17 ***       0.87 ***
##                         (0.04)         (0.05)         (0.04)   
## nodematch.eversmk1       0.33 ***       0.45 ***       0.34 ***
##                         (0.04)         (0.04)         (0.04)   
## mutual                   4.12 ***                      1.75 ***
##                         (0.07)                        (0.14)   
## balance                                                0.01 ***
##                                                       (0.00)   
## ---------------------------------------------------------------
## AIC                 -40020.94      -37511.87      -39989.59    
## BIC                 -39970.60      -37471.60      -39929.18    
## Log Likelihood       20015.47       18759.94       20000.79    
## ===============================================================
## *** p < 0.001; ** p < 0.01; * p < 0.05

Or, if you are using rmarkdown, you can export the results using LaTeX or html, let’s try the latter to see how it looks like here:

library(texreg)
htmlreg(list(ans0, ans1, ans2))
## Warning in nobs.ergm(object): The number of observed dyads in this network is
## ill-defined due to complex constraints on the sample space. Disable this
## warning with 'options(ergm.loglik.warn_dyads=FALSE)'.
## Warning: This object was fit with 'ergm' version 4.1.2 or earlier. Summarizing
## it with version 4.2 or later may return incorrect results or fail.
## Warning in nobs.ergm(object): The number of observed dyads in this network is
## ill-defined due to complex constraints on the sample space. Disable this
## warning with 'options(ergm.loglik.warn_dyads=FALSE)'.
## Warning: This object was fit with 'ergm' version 4.1.2 or earlier. Summarizing
## it with version 4.2 or later may return incorrect results or fail.
## Warning in nobs.ergm(object): The number of observed dyads in this network is
## ill-defined due to complex constraints on the sample space. Disable this
## warning with 'options(ergm.loglik.warn_dyads=FALSE)'.
Statistical models
  Model 1 Model 2 Model 3
edges -5.63*** -5.49*** -5.60***
  (0.06) (0.06) (0.06)
nodematch.hispanic 0.22*** 0.30*** 0.22***
  (0.04) (0.05) (0.04)
nodematch.female1 0.87*** 1.17*** 0.87***
  (0.04) (0.05) (0.04)
nodematch.eversmk1 0.33*** 0.45*** 0.34***
  (0.04) (0.04) (0.04)
mutual 4.12***   1.75***
  (0.07)   (0.14)
balance     0.01***
      (0.00)
AIC -40020.94 -37511.87 -39989.59
BIC -39970.60 -37471.60 -39929.18
Log Likelihood 20015.47 18759.94 20000.79
***p < 0.001; **p < 0.01; *p < 0.05

Model Goodness-of-Fit

In raw terms, once each chain has reached stationary distribution, we can say that there are no problems with autocorrelation and that each sample point is iid. The latter implies that, since we are running the model with more than one chain, we can use all the samples (chains) as a single dataset.

Recent changes in the ergm estimation algorithm mean that these plots can no longer be used to ensure that the mean statistics from the model match the observed network statistics. For that functionality, please use the GOF command: gof(object, GOF=~model).

—?ergm::mcmc.diagnostics

Since ans0 is the best model, let’s look at the GOF statistics. First, let’s see how the MCMC did. We can use the mcmc.diagnostics function included in the package. The function is a wrapper of a couple of functions from the coda package (Plummer et al. 2006), which are called upon the $sample object holding the centered statistics from the sampled networks. At first, it can be confusing to look at the $sample object; it neither matches the observed statistics nor the coefficients.

When calling mcmc.diagnostics(ans0, centered = FALSE), you will see many outputs, including a couple of plots showing the trace and posterior distribution of the uncentered statistics (centered = FALSE). The following code chunks will reproduce the output from the mcmc.diagnostics function step by step using the coda package. First, we need to uncenter the sample object:

# Getting the centered sample
sample_centered <- ans0$sample

# Getting the observed statistics and turning it into a matrix so we can add it
# to the samples
observed <- summary(ans0$formula)
observed <- matrix(
  observed,
  nrow  = nrow(sample_centered[[1]]),
  ncol  = length(observed),
  byrow = TRUE
  )

# Now we uncenter the sample
sample_uncentered <- lapply(sample_centered, function(x) {
  x + observed
})

# We have to make it an mcmc.list object
sample_uncentered <- coda::mcmc.list(sample_uncentered)

Under the hood:

  1. Empirical means and sd, and quantiles:

::: {.cell}

summary(sample_uncentered)
## 
## Iterations = 655360:12255232
## Thinning interval = 65536 
## Number of chains = 4 
## Sample size per chain = 178 
## 
## 1. Empirical mean and standard deviation for each variable,
##    plus standard error of the mean:
## 
##                      Mean    SD Naive SE Time-series SE
## edges              2473.9 58.60   2.1961          3.846
## nodematch.hispanic 1826.6 50.63   1.8976          3.602
## nodematch.female1  1875.4 50.54   1.8941          3.847
## nodematch.eversmk1 1755.6 50.66   1.8985          3.315
## mutual              483.5 21.97   0.8232          1.634
## 
## 2. Quantiles for each variable:
## 
##                    2.5%  25%  50%    75%  97.5%
## edges              2366 2431 2473 2516.0 2583.4
## nodematch.hispanic 1725 1790 1828 1862.0 1923.4
## nodematch.female1  1774 1839 1876 1910.0 1974.5
## nodematch.eversmk1 1661 1718 1755 1791.0 1857.0
## mutual              441  469  483  499.2  527.2

:::

  1. Cross correlation:

::: {.cell}

coda::crosscorr(sample_uncentered)
##                        edges nodematch.hispanic nodematch.female1
## edges              1.0000000          0.8565106         0.8788820
## nodematch.hispanic 0.8565106          1.0000000         0.7565759
## nodematch.female1  0.8788820          0.7565759         1.0000000
## nodematch.eversmk1 0.8645339          0.7342901         0.7441760
## mutual             0.7359569          0.6684517         0.6943574
##                    nodematch.eversmk1    mutual
## edges                       0.8645339 0.7359569
## nodematch.hispanic          0.7342901 0.6684517
## nodematch.female1           0.7441760 0.6943574
## nodematch.eversmk1          1.0000000 0.6648087
## mutual                      0.6648087 1.0000000

:::

  1. Autocorrelation_: For now, we will only look at autocorrelation for chain one. Autocorrelation should be small (in a general MCMC setting). If autocorrelation is high, then it means that your sample is not idd (no Markov property). A way to solve this is thinning the sample.

::: {.cell}

coda::autocorr(sample_uncentered)[[1]]
## , , edges
## 
##                   edges nodematch.hispanic nodematch.female1 nodematch.eversmk1
## Lag 0        1.00000000        0.836037936      0.8821700893         0.86741018
## Lag 65536    0.29546092        0.336082499      0.3126843913         0.28693021
## Lag 327680  -0.06232415       -0.006038374     -0.0825766181         0.02595267
## Lag 655360   0.07015730        0.053058193      0.0960399588         0.03077291
## Lag 3276800 -0.00989517       -0.009090348      0.0001679658        -0.05633911
##                  mutual
## Lag 0        0.73293230
## Lag 65536    0.46321953
## Lag 327680  -0.02060089
## Lag 655360   0.12705728
## Lag 3276800  0.01462118
## 
## , , nodematch.hispanic
## 
##                   edges nodematch.hispanic nodematch.female1 nodematch.eversmk1
## Lag 0        0.83603794         1.00000000       0.772940115         0.72892437
## Lag 65536    0.24693356         0.35329161       0.258747609         0.26629645
## Lag 327680  -0.01122090         0.01199279      -0.003794365         0.06759060
## Lag 655360   0.08604041         0.06242513       0.088828307         0.05919026
## Lag 3276800 -0.01659566        -0.03235612       0.000974174        -0.04643465
##                    mutual
## Lag 0        0.6716895580
## Lag 65536    0.3791243900
## Lag 327680   0.0261119585
## Lag 655360   0.0607146690
## Lag 3276800 -0.0001893006
## 
## , , nodematch.female1
## 
##                   edges nodematch.hispanic nodematch.female1 nodematch.eversmk1
## Lag 0        0.88217009         0.77294011        1.00000000         0.74934143
## Lag 65536    0.27898163         0.34275682        0.33764682         0.27489365
## Lag 327680  -0.08876005        -0.06253290       -0.11524990        -0.01943560
## Lag 655360   0.06074076         0.02953052        0.09298027         0.04512914
## Lag 3276800  0.01357995        -0.01878670        0.03039838        -0.04875359
##                  mutual
## Lag 0        0.68557322
## Lag 65536    0.40180603
## Lag 327680  -0.11239238
## Lag 655360   0.08831210
## Lag 3276800  0.01514836
## 
## , , nodematch.eversmk1
## 
##                    edges nodematch.hispanic nodematch.female1
## Lag 0        0.867410177        0.728924371       0.749341430
## Lag 65536    0.308858124        0.329895859       0.352890856
## Lag 327680  -0.059305945       -0.006457098      -0.067161252
## Lag 655360   0.084904907        0.086115019       0.110833574
## Lag 3276800 -0.008811374        0.016144057      -0.005430382
##             nodematch.eversmk1      mutual
## Lag 0              1.000000000  0.65218802
## Lag 65536          0.365355720  0.44773193
## Lag 327680         0.049205563 -0.02967711
## Lag 655360         0.006413673  0.07422969
## Lag 3276800       -0.066505311  0.02476719
## 
## , , mutual
## 
##                   edges nodematch.hispanic nodematch.female1 nodematch.eversmk1
## Lag 0        0.73293230        0.671689558        0.68557322        0.652188019
## Lag 65536    0.38912404        0.457808562        0.44041048        0.357851343
## Lag 327680  -0.05892551        0.015349925       -0.08588321        0.053255709
## Lag 655360   0.05482938        0.080365573        0.08786954        0.031917431
## Lag 3276800  0.02014438       -0.006550518        0.01638320        0.009361014
##                   mutual
## Lag 0        1.000000000
## Lag 65536    0.539527252
## Lag 327680  -0.002392256
## Lag 655360   0.064381438
## Lag 3276800 -0.007195045

:::

  1. Geweke Diagnostic: From the function’s help file:

“If the samples are drawn from the stationary distribution of the chain, the two means are equal and Geweke’s statistic has an asymptotically standard normal distribution. […] The Z-score is calculated under the assumption that the two parts of the chain are asymptotically independent, which requires that the sum of frac1 and frac2 be strictly less than 1.””

—?coda::geweke.diag

Let’s take a look at a single chain:

::: {.cell}

coda::geweke.diag(sample_uncentered)[[1]]
## 
## Fraction in 1st window = 0.1
## Fraction in 2nd window = 0.5 
## 
##              edges nodematch.hispanic  nodematch.female1 nodematch.eversmk1 
##             0.4535             0.5490             0.5069             0.8436 
##             mutual 
##             0.5864

:::

  1. (not included) Gelman Diagnostic: From the function’s help file:

Gelman and Rubin (1992) propose a general approach to monitoring convergence of MCMC output in which m > 1 parallel chains are run with starting values that are overdispersed relative to the posterior distribution. Convergence is diagnosed when the chains have ‘forgotten’ their initial values, and the output from all chains is indistinguishable. The gelman.diag diagnostic is applied to a single variable from the chain. It is based a comparison of within-chain and between-chain variances, and is similar to a classical analysis of variance. —?coda::gelman.diag

As a difference from the previous diagnostic statistic, this uses all chains simultaneously:

::: {.cell}

coda::gelman.diag(sample_uncentered)
## Potential scale reduction factors:
## 
##                    Point est. Upper C.I.
## edges                    1.00       1.02
## nodematch.hispanic       1.01       1.03
## nodematch.female1        1.00       1.01
## nodematch.eversmk1       1.00       1.01
## mutual                   1.01       1.03
## 
## Multivariate psrf
## 
## 1.03

:::

As a rule of thumb, values in the \([.9,1.1]\) are good.

One nice feature of the mcmc.diagnostics function is the nice trace and posterior distribution plots that it generates. If you have the R package latticeExtra (Sarkar and Andrews 2022), the function will override the default plots used by coda::plot.mcmc and use lattice instead, creating nicer-looking plots. The next code chunk calls the mcmc.diagnostic function, but we suppress the rest of the output (see figure ?fig-coda-plots).

# [2022-03-13] This line is failing for what it could be an ergm bug
# mcmc.diagnostics(ans0, center = FALSE) # Suppressing all the output

If we call the function mcmc.diagnostics, this message appears at the end:

MCMC diagnostics shown here are from the last round of simulation, prior to computation of final parameter estimates. Because the final estimates are refinements of those used for this simulation run, these diagnostics may understate model performance. To directly assess the performance of the final model on in-model statistics, please use the GOF command: gof(ergmFitObject, GOF=~model).

mcmc.diagnostics(ans0)

Not that bad (although the mutual term could do better)!4 First, observe that in the figure, we see four different lines; why is that? Since we were running in parallel using four cores, the algorithm ran four chains of the MCMC algorithm. An eyeball test is to see if all the chains moved at about the same place; in such a case, we can start thinking about model convergence from the MCMC perspective.

Once we are sure to have reach convergence on the MCMC algorithm, we can start thinking about how well does our model predicts the observed network’s proterties. Besides the statistics that define our ERGM, the gof function’s default behavior show GOF for:

  1. In degree distribution,
  2. Out degree distribution,
  3. Edge-wise shared partners, and
  4. Geodesics

Let’s take a look at it

# Computing and printing GOF estatistics
ans_gof <- gof(ans0)
ans_gof
## 
## Goodness-of-fit for in-degree 
## 
##           obs min  mean max MC p-value
## idegree0   13   0  1.27   4       0.00
## idegree1   34   2  7.37  15       0.00
## idegree2   37  10 20.57  30       0.00
## idegree3   48  24 39.88  56       0.22
## idegree4   37  43 57.68  80       0.00
## idegree5   47  49 65.84  86       0.00
## idegree6   42  47 65.37  86       0.00
## idegree7   39  40 56.79  73       0.00
## idegree8   35  24 40.92  55       0.34
## idegree9   21  17 27.27  38       0.18
## idegree10  12   6 16.76  29       0.34
## idegree11  19   1  9.03  18       0.00
## idegree12   4   1  4.91  12       0.94
## idegree13   7   0  2.33   6       0.00
## idegree14   6   0  1.09   5       0.00
## idegree15   3   0  0.59   3       0.06
## idegree16   4   0  0.19   1       0.00
## idegree17   3   0  0.08   1       0.00
## idegree18   3   0  0.03   1       0.00
## idegree19   2   0  0.01   1       0.00
## idegree20   1   0  0.00   0       0.00
## idegree21   0   0  0.02   1       1.00
## idegree22   1   0  0.00   0       0.00
## 
## Goodness-of-fit for out-degree 
## 
##           obs min  mean max MC p-value
## odegree0    4   0  1.37   5       0.08
## odegree1   28   2  7.70  15       0.00
## odegree2   45   8 20.36  35       0.00
## odegree3   50  22 39.91  56       0.14
## odegree4   54  39 57.28  72       0.64
## odegree5   62  53 65.65  85       0.60
## odegree6   40  46 65.17  90       0.00
## odegree7   28  38 56.32  76       0.00
## odegree8   13  29 41.26  57       0.00
## odegree9   16  18 27.93  41       0.00
## odegree10  20   9 16.57  25       0.46
## odegree11   8   3  9.60  19       0.74
## odegree12  11   0  4.92  10       0.00
## odegree13  13   0  2.42   7       0.00
## odegree14   6   0  0.96   4       0.00
## odegree15   6   0  0.39   3       0.00
## odegree16   7   0  0.14   2       0.00
## odegree17   4   0  0.05   1       0.00
## odegree18   3   0  0.00   0       0.00
## 
## Goodness-of-fit for edgewise shared partner 
## 
##           obs  min    mean  max MC p-value
## esp.OTP0 1032 1975 2218.24 2333          0
## esp.OTP1  755  159  240.50  440          0
## esp.OTP2  352    3   16.40   79          0
## esp.OTP3  202    0    1.20   20          0
## esp.OTP4   79    0    0.07    1          0
## esp.OTP5   36    0    0.00    0          0
## esp.OTP6   14    0    0.00    0          0
## esp.OTP7    4    0    0.00    0          0
## esp.OTP8    1    0    0.00    0          0
## 
## Goodness-of-fit for minimum geodesic distance 
## 
##       obs   min     mean   max MC p-value
## 1    2475  2373  2476.41  2601       0.94
## 2   10672 12655 13847.68 15148       0.00
## 3   31134 50975 56296.27 61965       0.00
## 4   50673 77045 80030.39 82203       0.00
## 5   42563 13813 19437.28 24922       0.00
## 6   18719   327  1072.27  2060       0.00
## 7    4808     1    29.61   209       0.00
## 8     822     0     0.55    27       0.00
## 9     100     0     0.00     0       0.00
## 10      7     0     0.00     0       0.00
## Inf 12333     0  1115.54  3320       0.00
## 
## Goodness-of-fit for model statistics 
## 
##                     obs  min    mean  max MC p-value
## edges              2475 2373 2476.41 2601       0.94
## nodematch.hispanic 1832 1769 1841.97 1941       0.88
## nodematch.female1  1879 1773 1874.83 1990       0.94
## nodematch.eversmk1 1755 1668 1758.81 1842       0.92
## mutual              486  447  483.09  514       0.88

# Plotting GOF statistics
plot(ans_gof)

Try the following configuration instead

ans0_bis <- ergm(
  network_111 ~
    edges +
    nodematch("hispanic") +
    nodematch("female1") +
    mutual + 
    esp(0:3) + 
    idegree(0:10)
    ,
  constraints = ~bd(maxout = 19),
  control = control.ergm(
    seed        = 1,
    MCMLE.maxit = 15,
    parallel    = 4,
    CD.maxit    = 15,
    MCMC.samplesize = 2048*4,
    MCMC.burnin = 30000,
    MCMC.interval = 2048*4
    )
  )

Increase the sample size, so the curves are smoother, longer intervals (thinning), which reduces autocorrelation, and a larger burin. All this together to improve the Gelman test statistic. We also added idegree from 0 to 10, and esp from 0 to 3 to explicitly match those statistics in our model.

An example of a terrible ERGM (no convergence at all). Also, a good example of why running multiple chains can be useful

More on MCMC convergence

For more on this issue, I recommend reviewing chapter 1 and chapter 6 from the Handbook of MCMC (Brooks et al. 2011). Both chapters are free to download from the book’s website.

For GOF take a look at section 6 of ERGM 2016 Sunbelt tutorial, and for a more technical review, you can take a look at (David R. Hunter, Goodreau, and Handcock 2008).

Mathematical Interpretation

One of the most critical parts of statistical modeling is interpreting the results, if not the most important. In the case of ERGMs, a key aspect is based on change statistics. Suppose that we would like to know how likely the tie \(y_{ij}\) is to happen, given the rest of the network. We can compute such probabilities using what literature sometimes describes as the Gibbs-sampler.

In particular, the log-odds of the \(ij\) tie ocurring conditional on the rest of the network can be written as:

\[\begin{equation} \text{logit}\left({\mathbb{P}\left(y_{ij} = 1|y_{-ij}\right) }\right) = {\theta}^\mathbf{t}\Delta\delta\left(y_{ij}:0\to 1\right), \end{equation}\]

with \(\delta\left(y_{ij}:0\to 1\right)\equiv s\left(\mathbf{y}\right)_{\text{ij}}^+ - s\left(\mathbf{y}\right)_{\text{ij}}^-\) as the vector of change statistics, in other words, the difference between the sufficient statistics when \(y_{ij}=1\) and its value when \(y_{ij} = 0\). To show this, we write the following:

\[\begin{align*} {\mathbb{P}\left(y_{ij} = 1|y_{-ij}\right) } & = % \frac{{\mathbb{P}\left(y_{ij} = 1, x_{-ij}\right) }}{% {\mathbb{P}\left(y_{ij} = 1, y_{-ij}\right) } + {\mathbb{P}\left(y_{ij} = 0, y_{-ij}\right) }} \\ & = \frac{\text{exp}\left\{{\theta}^\mathbf{t}s\left(\mathbf{y}\right)^+_{ij}\right\}}{% \text{exp}\left\{{\theta}^\mathbf{t}s\left(\mathbf{y}\right)^+_{ij}\right\} + \text{exp}\left\{{\theta}^\mathbf{t}s\left(\mathbf{y}\right)^-_{ij}\right\}} \end{align*}\]

Applying the logit function to the previous equation, we obtain:

\[\begin{align*} & = \text{log}\left\{\frac{\text{exp}\left\{{\theta}^\mathbf{t}s\left(\mathbf{y}\right)^+_{ij}\right\}}{% \text{exp}\left\{{\theta}^\mathbf{t}s\left(\mathbf{y}\right)^+_{ij}\right\} + % \text{exp}\left\{{\theta}^\mathbf{t}s\left(\mathbf{y}\right)^-_{ij}\right\}}\right\} - % \text{log}\left\{ % \frac{\text{exp}\left\{{\theta}^\mathbf{t}s\left(\mathbf{y}\right)^-_{ij}\right\}}{% \text{exp}\left\{{\theta}^\mathbf{t}s\left(\mathbf{y}\right)^+_{ij}\right\} + \text{exp}\left\{{\theta}^\mathbf{t}s\left(\mathbf{y}\right)^-_{ij}\right\}}% \right\} \\ & = \text{log}\left\{\text{exp}\left\{{\theta}^\mathbf{t}s\left(\mathbf{y}\right)^+_{ij}\right\}\right\} - \text{log}\left\{\text{exp}\left\{{\theta}^\mathbf{t}s\left(\mathbf{y}\right)^-_{ij}\right\}\right\} \\ & = {\theta}^\mathbf{t}\left(s\left(\mathbf{y}\right)^+_{ij} - s\left(\mathbf{y}\right)^-_{ij}\right) \\ & = {\theta}^\mathbf{t}\Delta\delta\left(y_{ij}:0\to 1\right) \end{align*}\] Henceforth, the conditional probability of node \(n\) gaining function \(k\) can be written as:

\[\begin{equation} {\mathbb{P}\left(y_{ij} = 1|y_{-ij}\right) } = \frac{1}{1 + \text{exp}\left\{-{\theta}^\mathbf{t}\Delta\delta\left(y_{ij}:0\to 1\right)\right\}} \end{equation}\]

i.e., a logistic probability.

Markov independence

The challenge of analyzing networks is their interdependent nature. Nonetheless, in the absence of such interdependence, ERGMs are equivalent to logistic regression. Conceptually, if all the statistics included in the model do not involve two or more dyads, then the model is non-Markovian in the sense of Markov graphs.

Mathematically, to see this, it suffices to show that the ERGM probability can be written as the product of each dyads’ probabilities.

\[\begin{equation*} {\mathbb{P}\left(\mathbf{y} | \theta\right) } = \frac{\text{exp}\left\{{\theta}^\mathbf{t}s\left(\mathbf{y}\right)\right\}}{\sum_{y}\text{exp}\left\{{\theta}^\mathbf{t}s\left(\mathbf{y}\right)\right\}} = \frac{\prod_{ij}\text{exp}\left\{{\theta}^\mathbf{t}s\left(\mathbf{y}\right)_{ij}\right\}}{\sum_{y}\text{exp}\left\{{\theta}^\mathbf{t}s\left(\mathbf{y}\right)\right\}} \end{equation*}\]

Where \(s\left(\right)_{ij}\) is a function such that \(s\left(\mathbf{y}\right) = \sum_{ij}{s\left(\mathbf{y}\right)_{ij}}\). We now need to deal with the normalizing constant. To see how that can be saparated, let’s start from the result:

\[\begin{align*} & =\prod_{ij}\left(1 + \text{exp}\left\{{\theta}^\mathbf{t}s\left(\mathbf{y}\right)_{ij}\right\}\right) \\ & = \left(1 + \text{exp}\left\{{\theta}^\mathbf{t}s\left(\mathbf{y}\right)_{11}\right\}\right)\left(1 + \text{exp}\left\{{\theta}^\mathbf{t}s\left(\mathbf{y}\right)_{12}\right\}\right)\dots\left(1 + \text{exp}\left\{{\theta}^\mathbf{t}s\left(\mathbf{y}\right)_{nn}\right\}\right) \\ & = 1 + \text{exp}\left\{{\theta}^\mathbf{t}s\left(\mathbf{y}\right)_{11}\right\} + \text{exp}\left\{{\theta}^\mathbf{t}s\left(\mathbf{y}\right)_{11}\right\}\text{exp}\left\{{\theta}^\mathbf{t}s\left(\mathbf{y}\right)_{12}\right\} + \dots + \prod_{ij}\text{exp}\left\{{\theta}^\mathbf{t}s\left(\mathbf{y}\right)_{ij}\right\} \\ & = 1 + \text{exp}\left\{{\theta}^\mathbf{t}s\left(\mathbf{y}\right)_{11}\right\} + \text{exp}\left\{{\theta}^\mathbf{t}\left(s\left(\mathbf{y}\right)_{11} + s\left(\mathbf{y}\right)_{12}\right)\right\} + \dots + \prod_{ij}\text{exp}\left\{{\theta}^\mathbf{t}s\left(\mathbf{y}\right)_{ij}\right\} \\ & = \sum_{\mathbf{y}\in\mathcal{Y}}\text{exp}\left\{{\theta}^\mathbf{t}s\left(\mathbf{y}\right)\right\} \end{align*}\]

Where the last equality follows from the fact that the sum is the sum over all possible combinations of networks, starting from \(exp(0) = 1\), up to \(exp(all)\). This way, we can now write:

\[\begin{equation} \frac{\prod_{ij}\text{exp}\left\{{\theta}^\mathbf{t}s\left(\mathbf{y}\right)_{ij}\right\}}{\sum_{y}\text{exp}\left\{{\theta}^\mathbf{t}s\left(\mathbf{y}\right)\right\}} = \prod_{ij}\frac{\text{exp}\left\{{\theta}^\mathbf{t}s\left(\mathbf{y}\right)_{ij}\right\}}{1 + \text{exp}\left\{{\theta}^\mathbf{t}s\left(\mathbf{y}\right)_{ij}\right\}} \end{equation}\]

Related to this, block-diagonal ERGMs can be estimated as independent models, one per block. To see more about this, read (SNIJDERS 2010). Likewise, since independence depends–pun intended–on partitioning the objective function, as pointed by Snijders, non-linear functions make the model dependent, e.g., \(s\left(\mathbf{y}\right) = \sqrt{\sum_{ij}y_{ij}}\), the square root of the edgecount is no longer a bernoulli graph.


  1. You can download the 03.rda file from this link.↩︎

  2. Yes, the classes have the same name as the packages.↩︎

  3. Notice that this document may not include the usual messages that the ergm command generates during the estimation procedure. This is just to make it more printable-friendly.↩︎

  4. The statnet wiki website as a very nice example of (very) bad and good MCMC diagnostics plots here.↩︎