Curved exponential family models for social networks

doi:10.1016/j.socnet.2006.08.005

Social Networks

Volume 29, Issue 2, May 2007, Pages 216-230

https://doi.org/10.1016/j.socnet.2006.08.005 Get rights and content

Abstract

Curved exponential family models are a useful generalization of exponential random graph models (ERGMs). In particular, models involving the alternating k-star, alternating k-triangle, and alternating k-twopath statistics of Snijders et al. [Snijders, T.A.B., Pattison, P.E., Robins, G.L., Handcock, M.S., in press. New specifications for exponential random graph models. Sociological Methodology] may be viewed as curved exponential family models. This article unifies recent material in the literature regarding curved exponential family models for networks in general and models involving these alternating statistics in particular. It also discusses the intuition behind rewriting the three alternating statistics in terms of the degree distribution and the recently introduced shared partner distributions. This intuition suggests a redefinition of the alternating k-star statistic. Finally, this article demonstrates the use of the statnet package in R for fitting models of this sort, comparing new results on an oft-studied network dataset with results found in the literature.

Introduction

For a fixed set of n actors, or nodes, and a network on those nodes, assume that Y denotes the $n \times n$ adjacency matrix for the network; that is $Y_{i j} = \{\begin{matrix} 1 & if an edge exists from i to j, \\ 0 & otherwise \end{matrix}$ In some social networks applications, the goal is to produce a probabilistic model for Y based on an observed network dataset. It is the goal of this article to explain a particular class of models, called curved exponential family models, for achieving this end. We assume here that the reader is at least somewhat conversant in certain basic techniques of statistical modelling such as logistic regression, though not necessarily familiar with the intricacies of statistical modelling of networks.

Implicit in definition (1) is the fact that we assume no valued or multiple edges are allowed; furthermore, we disallow self-edges, so $Y_{i i} = 0$ for all i. Finally, we will treat only undirected networks in this article, which implies that $Y_{i j} = Y_{j i}$ , though we make this choice only to simplify some of the arguments; there is little difficulty in extending all of the results here to the case of directed networks.

Beyond the network information contained in Y, there are often additional data, such as a set of measured characteristics for each node in the network. For instance, when the nodes are people, we may know each person’s age and sex. Throughout this article, we let X denote the additional data.

We assume throughout this article that the probability of observing a particular network is a function of statistics that may depend on the network itself as well as covariates measured on the nodes. For the particular class of models known as exponential random graph models (ERGMs), the relationship between a particular graph y and its probability of occurrence conditional on the additional data X is generally expressed as $P_{η} (Y = y) = \frac{\exp {\sum_{i = 1}^{p} η_{i} g_{i} (y, X)}}{κ (η)} = \frac{\exp {η^{t} g (y, X)}}{κ (η)},$ where $g (y, X)$ is a user-defined p-vector of statistics and $η \in R^{p}$ denotes the statistical parameter governing the probabilistic formation of the network. The denominator, $κ (η)$ , is a normalizing constant that ensures that the sum of (2) over all possible y equals 1.

ERGMs are sometimes known in the social networks literature as p-star models (Wasserman and Pattison, 1996). We use “ERGM” instead of “p-star” here due to the vast statistical literature covering exponential family models (e.g., Barndorff-Nielsen, 1978, Brown, 1986). Nevertheless, we consider “p-star” to be synonymous with “ERGM” in this article, with one caveat: Wasserman and Pattison (1996) used a method of parameter estimation, maximum pseudo-likelihood estimation, that has come to be closely associated with the p-star models themselves. However, in this article we separate the name of the models (ERGMs or p-star models) from the method of estimating their parameters. In particular, we do not discuss maximum pseudo-likelihood estimation here, focusing instead on the better-understood method of maximum likelihood estimation. ERGMs are discussed in detail by Robins et al. (2007a).

The remainder of this article concerns generalizations of (2) known as curved exponential family models. Section 2 discusses these models in general terms, while Sections 3 Rewriting alternating, 4 Shared partner statistics focus on specific examples—namely, models involving the alternating k-star, alternating k-triangle, and alternating k-twopath statistics developed by Snijders et al. (in press). These statistics have recently shown great promise in producing parsimonious models that fit certain social network datasets well; they are discussed further by Robins et al. (2007b). Much of Sections 3 Rewriting alternating, 4 Shared partner statistics is devoted to a reformulation of these statistics in terms of the degree statistics and the recently developed shared partner statistics, as well as an attempt to reveal how this reformulation aids in interpreting these statistics. Finally, Section 5 demonstrates the use of these models on real data. The analysis, which recreates and extends some earlier work using the same dataset, is carried out using the statnet package for R, available for use at csde.washington.edu/statnet. The computer code used in Section 5 may be found in Appendix A.

Section snippets

Curved exponential families

In model (2), the maximum likelihood estimator (MLE) of the parameter vector $η$ is, by definition, the vector that maximizes $P_{η} (Y = y_{obs})$ as a function of $η$ , where $y_{obs}$ is the observed network. In other words, if we let $\hat{η}$ denote the MLE and we assume that $η$ is a vector contained in p-dimensional space (denoted $R^{p}$ ), then we may write $\hat{η} = \arg \max_{η \in R^{p}} \frac{\exp {η^{t} g (y_{obs}, X)}}{κ (η)} .$ It is worth noting here that it is extremely difficult to find $\hat{η}$ except in the case of a simplistic model (e.g., one in which all

Rewriting alternating k-stars

The first curved exponential family model we will consider involves the k-star statistics $S_{1} (y), \dots, S_{n - 1} (y)$ , where $S_{k} (y)$ denotes the number of k-stars in the graph y. A k-star is a set of k distinct edges that all share an endpoint. In particular, a 1-star is simply an edge. Note that the number of edges in the graph y is sometimes denoted by $L (y)$ , though we use $S_{1} (y)$ in this article.

When each k-star statistic has its own coefficient in an ERGM, the resulting model is $P (Y = y) = \frac{\exp {\sum_{i = 1}^{n - 1} η_{i} S_{i} (y)}}{κ (η}$

Shared partner statistics

Section 3 defines the alternating k-star statistic and shows that it may be rewritten as in Eq. (12) in terms of the degree statistics. In an analogous way, we now define the alternating k-triangle and alternating k-twopath statistics of Snijders et al. (in press) and show that they may be rewritten in terms of the edgewise and dyadic shared partner statistics, respectively, which we also define.

The k-triangle and the k-twopath are concepts that generalize the ideas of triangle and 2-star,

Lazega’s Lawyer dataset

Lazega Lazega and Pattison, 1999, Lazega, 2001 collected and analyzed data on working relations among 36 partners in a New England law firm. Here, we deal with only a subset of the data: An (undirected) edge will be said to exist between two partners if and only if each indicates a collaboration with the other. These data are analyzed by both Snijders et al., in press, Hunter and Handcock, 2006. Here, we employ models similar to those used in both of these articles in order to compare and

Discussion

The class of curved exponential family models is a major generalization of the ERGM model class. Though we have focused here only on very particular models arising from the work of Snijders et al. (in press), we have shown how curved exponential family models can achieve parsimonious descriptions of data by reducing a large number of parameters to only a few (e.g., in the case of the GWD term, reducing from $n - 1$ parameters to only two). These reductions also allow much better behavior of maximum

References (18)

S.M. Goodreau
Advances in exponential random graph (p^*) models applied to a large social network
Social Networks
(2007)
E. Lazega et al.
Multiplexity, generalized exchange and cooperation in organizations: a case study
Social Networks
(1999)
G.L. Robins et al.
An introduction to exponential random graph (p^*) models for social networks
Social Networks
(2007)
G.L. Robins et al.
Recent developments in exponential random graph (p^*) models for social networks
Social Networks
(2007)
R. Albert et al.
Statistical mechanics of complex networks
Reviews of Modern Physics
(2002)
O.E. Barndorff-Nielsen
Information and Exponential Families in Statistical Theory
(1978)
Brown, L.D., 1986. Fundamentals of statistical exponential families. IMS Lecture Notes in Monograph Series...
B. Efron
Defining the curvature of a statistical problem (with applications to second order efficiency) (with discussion)
Annals of Statistics
(1975)
B. Efron
The geometry of exponential families
Annals of Statistics
(1978)

There are more references available in the full text version of this article.

Cited by (372)

Topological analysis, endogenous mechanisms, and supply risk propagation in the polycrystalline silicon trade dependency network
2024, Journal of Cleaner Production
High-purity polycrystalline silicon, as a core raw material in the photovoltaic industry, has a trade structure whose robustness affects the supply security of the entire photovoltaic industry. Using social network analysis methods and dependence indicators, this study constructs a Polycrystalline Silicon Trade Dependency Network (PSTDN) from 1995 to 2019, and performs descriptive statistical analysis and trade community divisions on the network. The Temporal Exponential Random Graph Model (TERGM) is used to explore the factors influencing the formation of trade dependencies. Based on the cascading failure model, a Photovoltaic Multilayer Network Supply Risk Propagation Model (PMNSRPM) is constructed to study the risk propagation process of polycrystalline silicon supply interruption as a raw material in the photovoltaic industry. Finally, using the risk propagation range and dependency degree, the study measures the importance of nations comparatively. The results show that (1) the PSTDN exhibits reciprocity, geographical clustering, and weak network convergence; (2) the evolution of the PSTDN's community has gone through three stages: the rising expansion phase from 1995 to 2007, the crisis recession phase from 2008 to 2010, and the competitive contraction phase from 2010 to 2019; (3) the evolution of the PSTDN shows strong reciprocity effects, transitive effects, stabilization effects, and lag effects as endogenous mechanisms; (4) the average risk propagation range is continuously increasing along the supply chain, while there is no significant difference in the risk propagation rate along the supply chain. The risk propagation rate of different risk sources shows a long-tail effect, with the risk propagation rate of leading countries like the United States and China growing rapidly over time. The United States can infect China, but China cannot infect the United States and Western Europe; (5) The risk propagation range and the dependency degree present a positive correlation ranging between 0.5 and 0.7, which is not as highly positive as initially anticipated, and the correlation between them shows a downward trend. This study explores the statistical characteristics of the PSTDN, community evolution, and evolutionary factors, demonstrates the importance of polycrystalline silicon supply security for the production of the photovoltaic industry, quantifies the risk propagation capabilities of different countries, and identifies a group of countries that actually have a high risk propagation range but tend to be underestimated due to their low dependency degree in normal trade network. This can help enhance the early warning capabilities for photovoltaic supply security risks of various countries.
Status, cognitive overload, and incomplete information in advice-seeking networks: An agent-based model
2024, Social Networks
Advice-seeking typically occurs across organizational boundaries through informal connections. By using Stochastic Actor-Oriented Models (SAOM), previous research has tried to identify the micro-level mechanisms behind these informal connections. Unfortunately, these models assume perfect network information, require agents to perform too cognitively demanding decisions, and do not account for threshold-based critical events, such as simultaneous tie changes. In the context of knowledge-intensive organizations, the shortage of high-skilled professionals could determine complex network effects given that many less-skilled professionals would seek advice from a few easily overloaded, selective high-skilled, who are also sensitive to status demotion. To capture these context-specific organizational features, we have elaborated on SAOM with an agent-based model that assumes local information, status-based tie selection, and simultaneous re-direction of multiple ties. By fitting our simulated networks to Lazega’s advice network used in previous research, we reproduced the same set of macro-level network metrics with a parsimonious model based on more empirically plausible assumptions than previous research. Our findings show the advantage of exploring multiple generative paths of network formation with different models.
How does socioeconomic homophily emerge? Testing for the contribution of different processes to socioeconomic segregation in adolescent friendships
2024, Social Networks
Homophily – the fact that friendships happen at a higher rate among similar individuals – does not necessarily imply homophilic selection – the tendency to look for similar friends. This is particularly true for socioeconomic homophily: because individuals’ social class impacts most aspects of their lives, there are several ways in which it can favor homogeneity in friendship networks. Applying this view to the relationships of French middle-school students, the present article tries to unravel the contribution of various relational processes to the emergence of socioeconomic homophily. Stochastic Actor-Oriented Models, a class of generative models designed for network panel data, are applied to the friendship networks of 820 students surveyed over a year and a half. Simulations derived from the estimated models are then used to assess the impact of different processes on aggregated levels of homophily. To that aim, a new metric is proposed that help researchers decompose an observed property of a network into a set of contributions from low-order processes, called “contribution scores”. Results suggest that direct homophilic selection can be important in explaining socioeconomic homophily, but not in all cases. Indirect inducers, such as residential propinquity or ethnic selection, also play a significant role. Moreover, endogenous network processes – namely reciprocation and transitive closure – strongly contribute to homophily by reinforcing other homophily-inducing processes.
Same but different: A comparison of estimation approaches for exponential random graph models for multiple networks
2024, Social Networks
The Exponential Random Graph family of models (ERGM) is a powerful tool for social science research as it allows for the simultaneous modeling of endogenous network characteristics and exogenous variables such as gender, age, and socioeconomic status. However, a major limitation of ERGM is that it is mainly used for descriptive analysis of a single network. This paper examines two methods for estimating multiple networks: hierarchical and integrated. We contrast the two approaches, evaluate their accuracy and discuss the advantages and drawbacks of each. Furthermore, we make recommendations for future researchers on how to proceed with multiple network analysis depending on various factors such as the number of networks and the hierarchical structure of the data. This research is important as it highlights the need for the analysis of multiple networks in order to gain a more comprehensive understanding of social phenomena and the potential for new discoveries.
Drivers of tie formation in the Canadian climate change policy network: Belief homophily and social structural processes
2023, Social Networks
Extant research on policy networks tends to focus on explaining successes and/or failures of particular policy efforts. One commonly used theoretical framework – the Advocacy Coalition Framework (ACF) – focuses on actor attributes external to policy networks. We argue this leads to an incomplete understanding of the social dynamics of climate change policy making. We incorporate a policy network analytic approach with the ACF in an ERGM of collaboration in a Canadian climate change policy network, showing the role micro-structural network processes play in giving rise to informal policy networks. We find certain policy beliefs are correlated with tie formation. We also find micro-structural network processes related to reciprocity, structural equivalence and transitive closure are correlated with tie formation. We argue combining these two prominent streams of policy network literature has potential to improve our understanding of climate change policy making processes.
Sustainable network analysis and coordinated development simulation of urban agglomerations from multiple perspectives
2023, Journal of Cleaner Production
Sustainable and coordinated development plays a crucial role in urban planning. However, previous studies have not fully explored the complex interplay between different factors that affect coordinated development within urban agglomerations. To address this gap, our study proposes a novel network-based approach that combines qualitative analysis and quantitative verification to assess the coordinated development of urban agglomerations. In this study, we focused on the Yangtze River Delta Urban Agglomeration (YRDUA) as our study area. Firstly, we constructed a network for sustainable development in urban agglomerations based on Sustainable Development Goals (SDG) data. Next, we analyzed the spatial patterns of intercity resource exchange flows to gain a more comprehensive and diversified perspective on regional coordinated development. By integrating qualitative and quantitative analysis, we then used the Exponential Random Graph Model (ERGM) to identify the driving factors of coordinated development. Finally, we optimized the development score of urban subgroups by using a greedy algorithm and determined the direction for coordinated development among different cities in the YRDUA. The findings of our study reveal that the development potential and optimal agglomeration pattern of urban subgroups in the YRDUA are closely linked to intercity cooperation, science and technology industries, foreign investment, and multiple interaction flows. Furthermore, our results suggest that local governments should focus on building a multi-core network flow pattern and fully utilizing the distinctive advantages of urban subgroups according to local conditions and time. These insights can guide policymakers and urban planners in designing more coordinated and efficient urban agglomerations.

View all citing articles on Scopus

^☆: This research is supported by Grant DA012831 from NIDA and Grant HD041877 from NICHD.

View full text

Curved exponential family models for social networks☆

Abstract

Introduction

Section snippets

Curved exponential families

Rewriting alternating k-stars

Shared partner statistics

Lazega’s Lawyer dataset

Discussion

Social Networks

Social Networks

Social Networks

Social Networks

Statistical mechanics of complex networks

Reviews of Modern Physics

Information and Exponential Families in Statistical Theory

Defining the curvature of a statistical problem (with applications to second order efficiency) (with discussion)

Annals of Statistics

The geometry of exponential families

Annals of Statistics