Elsevier

Social Networks

Volume 29, Issue 2, May 2007, Pages 216-230
Social Networks

Curved exponential family models for social networks

https://doi.org/10.1016/j.socnet.2006.08.005Get rights and content

Abstract

Curved exponential family models are a useful generalization of exponential random graph models (ERGMs). In particular, models involving the alternating k-star, alternating k-triangle, and alternating k-twopath statistics of Snijders et al. [Snijders, T.A.B., Pattison, P.E., Robins, G.L., Handcock, M.S., in press. New specifications for exponential random graph models. Sociological Methodology] may be viewed as curved exponential family models. This article unifies recent material in the literature regarding curved exponential family models for networks in general and models involving these alternating statistics in particular. It also discusses the intuition behind rewriting the three alternating statistics in terms of the degree distribution and the recently introduced shared partner distributions. This intuition suggests a redefinition of the alternating k-star statistic. Finally, this article demonstrates the use of the statnet package in R for fitting models of this sort, comparing new results on an oft-studied network dataset with results found in the literature.

Introduction

For a fixed set of n actors, or nodes, and a network on those nodes, assume that Y denotes the n×n adjacency matrix for the network; that isYij=1if an edge exists fromitoj,0otherwiseIn some social networks applications, the goal is to produce a probabilistic model for Y based on an observed network dataset. It is the goal of this article to explain a particular class of models, called curved exponential family models, for achieving this end. We assume here that the reader is at least somewhat conversant in certain basic techniques of statistical modelling such as logistic regression, though not necessarily familiar with the intricacies of statistical modelling of networks.

Implicit in definition (1) is the fact that we assume no valued or multiple edges are allowed; furthermore, we disallow self-edges, so Yii=0 for all i. Finally, we will treat only undirected networks in this article, which implies that Yij=Yji, though we make this choice only to simplify some of the arguments; there is little difficulty in extending all of the results here to the case of directed networks.

Beyond the network information contained in Y, there are often additional data, such as a set of measured characteristics for each node in the network. For instance, when the nodes are people, we may know each person’s age and sex. Throughout this article, we let X denote the additional data.

We assume throughout this article that the probability of observing a particular network is a function of statistics that may depend on the network itself as well as covariates measured on the nodes. For the particular class of models known as exponential random graph models (ERGMs), the relationship between a particular graph y and its probability of occurrence conditional on the additional data X is generally expressed asPη(Y=y)=exp{i=1pηigi(y,X)}κ(η)=exp{ηtg(y,X)}κ(η),where g(y,X) is a user-defined p-vector of statistics and ηRp denotes the statistical parameter governing the probabilistic formation of the network. The denominator, κ(η), is a normalizing constant that ensures that the sum of (2) over all possible y equals 1.

ERGMs are sometimes known in the social networks literature as p-star models (Wasserman and Pattison, 1996). We use “ERGM” instead of “p-star” here due to the vast statistical literature covering exponential family models (e.g., Barndorff-Nielsen, 1978, Brown, 1986). Nevertheless, we consider “p-star” to be synonymous with “ERGM” in this article, with one caveat: Wasserman and Pattison (1996) used a method of parameter estimation, maximum pseudo-likelihood estimation, that has come to be closely associated with the p-star models themselves. However, in this article we separate the name of the models (ERGMs or p-star models) from the method of estimating their parameters. In particular, we do not discuss maximum pseudo-likelihood estimation here, focusing instead on the better-understood method of maximum likelihood estimation. ERGMs are discussed in detail by Robins et al. (2007a).

The remainder of this article concerns generalizations of (2) known as curved exponential family models. Section 2 discusses these models in general terms, while Sections 3 Rewriting alternating, 4 Shared partner statistics focus on specific examples—namely, models involving the alternating k-star, alternating k-triangle, and alternating k-twopath statistics developed by Snijders et al. (in press). These statistics have recently shown great promise in producing parsimonious models that fit certain social network datasets well; they are discussed further by Robins et al. (2007b). Much of Sections 3 Rewriting alternating, 4 Shared partner statistics is devoted to a reformulation of these statistics in terms of the degree statistics and the recently developed shared partner statistics, as well as an attempt to reveal how this reformulation aids in interpreting these statistics. Finally, Section 5 demonstrates the use of these models on real data. The analysis, which recreates and extends some earlier work using the same dataset, is carried out using the statnet package for R, available for use at csde.washington.edu/statnet. The computer code used in Section 5 may be found in Appendix A.

Section snippets

Curved exponential families

In model (2), the maximum likelihood estimator (MLE) of the parameter vector η is, by definition, the vector that maximizes Pη(Y=yobs) as a function of η, where yobs is the observed network. In other words, if we let ηˆ denote the MLE and we assume that η is a vector contained in p-dimensional space (denoted Rp), then we may writeηˆ=argmaxηRpexp{ηtg(yobs,X)}κ(η).It is worth noting here that it is extremely difficult to find ηˆ except in the case of a simplistic model (e.g., one in which all

Rewriting alternating k-stars

The first curved exponential family model we will consider involves the k-star statistics S1(y),,Sn1(y), where Sk(y) denotes the number of k-stars in the graph y. A k-star is a set of k distinct edges that all share an endpoint. In particular, a 1-star is simply an edge. Note that the number of edges in the graph y is sometimes denoted by L(y), though we use S1(y) in this article.

When each k-star statistic has its own coefficient in an ERGM, the resulting model isP(Y=y)=exp{i=1n1ηiSi(y)}κ(η

Shared partner statistics

Section 3 defines the alternating k-star statistic and shows that it may be rewritten as in Eq. (12) in terms of the degree statistics. In an analogous way, we now define the alternating k-triangle and alternating k-twopath statistics of Snijders et al. (in press) and show that they may be rewritten in terms of the edgewise and dyadic shared partner statistics, respectively, which we also define.

The k-triangle and the k-twopath are concepts that generalize the ideas of triangle and 2-star,

Lazega’s Lawyer dataset

Lazega Lazega and Pattison, 1999, Lazega, 2001 collected and analyzed data on working relations among 36 partners in a New England law firm. Here, we deal with only a subset of the data: An (undirected) edge will be said to exist between two partners if and only if each indicates a collaboration with the other. These data are analyzed by both Snijders et al., in press, Hunter and Handcock, 2006. Here, we employ models similar to those used in both of these articles in order to compare and

Discussion

The class of curved exponential family models is a major generalization of the ERGM model class. Though we have focused here only on very particular models arising from the work of Snijders et al. (in press), we have shown how curved exponential family models can achieve parsimonious descriptions of data by reducing a large number of parameters to only a few (e.g., in the case of the GWD term, reducing from n1 parameters to only two). These reductions also allow much better behavior of maximum

References (18)

There are more references available in the full text version of this article.

Cited by (372)

View all citing articles on Scopus

This research is supported by Grant DA012831 from NIDA and Grant HD041877 from NICHD.

View full text