Clustering in weighted networks
Introduction
While a substantial body of recent research has investigated the topological features of a variety of networks Barabási et al., 2002, Ingram and Roberts, 2000, Kossinets and Watts, 2006, Uzzi and Spiro, 2005, Watts and Strogatz, 1998, relatively little work has been conducted that moves beyond merely topological measures to take explicitly into account the heterogeneity of ties (or edges) connecting nodes (or vertices) (Barrat et al., 2004). In a number of real-world networks, ties are often associated with weights that differentiate them in terms of their strength, intensity or capacity Barrat et al., 2004, Wasserman and Faust, 1994. On the one hand, Granovetter (1973) argued that the strength of social relationships in social networks is a function of their duration, emotional intensity, intimacy, and exchange of services. On the other, for non-social networks, weights often refer to the function performed by ties, e.g., the carbon flow (mg/m2/day) between species in food webs Luczkowich et al., 2003, Nordlund, 2007, the number of synapses and gap junctions in neural networks (Watts and Strogatz, 1998), or the amount of traffic flowing along connections in transportation networks (Barrat et al., 2004). In order to fully capture the richness of the data, it is therefore crucial that the measures used to study a network incorporate the weights of the ties.
A measure that has long received much attention in both theoretical and empirical research is concerned with the degree to which nodes tend to cluster together. Evidence suggests that in most real-world networks, and especially social networks, nodes tend to cluster into densely connected groups Feld, 1981, Friedkin, 1984, Holland and Leinhardt, 1970, Louch, 2000, Simmel, 1923, Snijders, 2001, Snijders et al., 2006, Watts and Strogatz, 1998. In particular, the problem of network clustering can be investigated from a two-fold perspective. On the one hand, it involves determining whether and to what extent clustering is a property of a network or, alternatively, whether nodes tend to be members of tightly knit groups (Luce and Perry, 1949). On the other, it is concerned with the identification of the groups of nodes into which a network can be partitioned. This can be obtained, for example, by applying algorithms for community detection that assess and compare densities within and between groups Newman, 2006, Newman and Girvan, 2004, Rosvall and Bergstrom, 2008, or by using the image matrix in blockmodeling for grouping nodes with the same or similar patterns of ties and uncovering connections between groups of nodes (Doreian et al., 2005).
In this paper, we focus our attention only on the problem of determining whether clustering is a property of a network. More specifically, to address this problem one may ask: If there are three nodes in a network, i, j, and k, and i is connected to j and k, how likely is it that j and k are also connected with each other? In real-world networks, empirical studies have shown that this likelihood tends to be greater than the probability of a tie randomly established between two nodes Barabási et al., 2002, Davis et al., 2003, Ebel et al., 2002, Holme et al., 2004, Ingram and Roberts, 2000, Newman, 2001, Uzzi and Spiro, 2005, Watts and Strogatz, 1998. For social networks, scholars have investigated the mechanisms that are responsible for the increase in the probability that two people will be connected if they share a common acquaintance Holland and Leinhardt, 1971, Simmel, 1923, Snijders, 2001, Snijders et al., 2006. The nature of these mechanisms can be cognitive, as in the case of individuals’ desire to maintain balance among ties with others Hallinan, 1974, Heider, 1946, social, as in the case of third-part referral (Davis, 1970), or can be explained in other ways, such as in terms of focus constraints Feld, 1981, Kossinets and Watts, 2006, Louch, 2000, Monge et al., 1985 or the differing popularity among individuals Feld and Elmore, 1982a, Feld and Elmore, 1982b. While clustering is likely to result from a combination of all these mechanisms, network studies have offered no conclusive theoretical explanation of its causes, nor have they concentrated as much on its underpinning processes as on the measures to formally detect its presence in real-world networks (Levine and Kurzban, 2006).
Traditionally, the two main measures developed for testing the tendency of nodes to cluster together into tightly knit groups are the local clustering coefficient (Watts and Strogatz, 1998) and the global clustering coefficient Feld, 1981, Karlberg, 1997, Karlberg, 1999, Louch, 2000, Newman, 2003. The local clustering coefficient is based on ego’s network density or local density Scott, 2000, Wasserman and Faust, 1994. For node i, this is measured as the fraction of the number of ties connecting i’s neighbors over the total number of possible ties between i’s neighbors. To create an overall local coefficient for the whole network, the individual fractions are averaged across all nodes.
Despite its ability to capture the degree of social embeddedness that characterizes the nodes of a network, nonetheless the local clustering coefficient suffers from a number of limitations. First, in its original formulation, it does not take into consideration the weights of the ties in the network. As a result, the same value of the coefficient might be attributed to networks that share the same topology but differ in terms of how weights are distributed across ties and, as a result, may be characterized by different likelihoods to befriend the friends of one’s friends. Second, the local clustering coefficient does not take into consideration the directionality of the ties connecting a node to its neighbors (Wasserman and Faust, 1994).1 Recently, there have been a number of attempts to extend the local clustering coefficient to the case of weighted networks Barrat et al., 2004, Lopez-Fernandez et al., 2004, Onnela et al., 2005, Zhang and Horvath, 2005. However, the issue of directionality still remains mainly unresolved (Caldarelli, 2007), thus making the coefficient suitable primarily for undirected networks.
Moreover, the local clustering coefficient, even in its weighted version, is biased by correlations with nodes’ degrees: a node with more neighbors is likely to be embedded in relatively fewer closed triplets, and therefore to have a smaller local clustering than a node connected to fewer neighbors Ravasz and Barabási, 2003, Ravasz et al., 2002. An additional bias might stem from degree–degree correlations. When nodes preferentially connect to others with similar degree, local clustering is positively correlated with nodes’ degree Ravasz and Barabási, 2003, Ravasz et al., 2002, Soffer and Vàzquez, 2005. Lack of comparability between values of clustering of nodes with different degrees thus makes the average value of local clustering sensitive with respect to how degrees are distributed across the whole network.
Unlike the local clustering coefficient, the global clustering coefficient is based on transitivity, which is a measure used to detect the fraction of triplets that are closed in directed networks (Wasserman and Faust, 1994, p. 243). It is not an average of individual fractions calculated for each node, and, as a result, it does not suffer from the same type of correlations with nodes’ degrees as the local coefficient. Despite its merits, however, in its original formulation, the global coefficient applies only to networks where ties are unweighted. To address this limitation, and make the coefficient suitable also to networks where ties are weighted, researchers have typically introduced an arbitrary cut-off level of the weight, and then dichotomized the network by removing ties with weights that are below the cut-off, and then setting the weights of the remaining ties equal to one Doreian, 1969, Wasserman and Faust, 1994. The outcome of this procedure is a binary network consisting of ties that are either present (i.e., equal to 1) or absent (i.e., equal to 0) Scott, 2000, Wasserman and Faust, 1994. For example, Doreian (1969) studied clustering in a weighted network by creating a series of binary networks from the original weighted network using different cut-offs. To address potential problems arising from the subjectivity inherent in the choice of the cut-off, a sensitivity analysis was conducted to assess the degree to which the value of clustering varies depending on the cut-off (Doreian, 1969). However, this analysis tells us little about the original weighted network, apart from the fact that the value of clustering changes at different levels of the cut-off.
In this paper, we focus on the global clustering coefficient, and propose a generalization that explicitly takes weights of ties into consideration and, for this reason, does not depend on a cut-off to dichotomize weighted networks. In what follows, we start by discussing the existing literature on the global clustering coefficient in undirected and unweighted networks. In Section 3, we propose our generalized measure of clustering. We then turn our attention to directed networks, and discuss the current literature on clustering in those networks. We extend our generalized measure of clustering to cover weighted and directed networks. In Section 5, we empirically test our proposed measure, and compare it with the standard one, by using a number of weighted network datasets. Finally, in Section 6 we summarize and discuss the main results.
Section snippets
Clustering coefficient
The global clustering coefficient is concerned with the density of triplets of nodes in a network. A triplet can be defined as three nodes that are connected by either two (open triplet) or three (closed triplet) ties. A triangle consists of three closed triplets, each centered on one node. The global clustering coefficient is defined as the number of closed triplets (or triangles) over the total number of triplets (both open and closed). The first attempt to measure the coefficient was made
Generalized clustering coefficient
We can generalize the clustering coefficient, C, to take weights of ties into consideration, by rewriting Eq. (1) in terms of a triplet value, . To this end, it is vital to choose an appropriate method for defining the triplet value as this impacts on the value of the coefficient. The method should be chosen based on the research question, and should reflect the way in which the weights of ties are defined. First, the triplet value, , can be defined as the arithmetic mean of the weights of
Directed networks
In directed networks, connections between nodes are described as ties that originate from one node and point toward another (Wasserman and Faust, 1994). The weight of a tie directed from node i to node j is expressed as . In a binary network, the weight of a present tie is set equal to 1, whereas the weight of an absent tie is 0. We define the triplet consisting of the two directed ties, and , as , and the value of this triplet as .
The standard clustering coefficient as
Empirical test of the generalized clustering coefficient
We now test the proposed generalization of clustering on a number of network datasets. We also compare the generalized coefficient with the standard one measured with different cut-offs.5
Conclusions and discussion
Relationships among unique people are unique. We live in an increasingly connected world with an increasing number of contacts to whom we relate in different ways, with different frequencies, and for different reasons. Each social relationship bears a special meaning to us, and it would be overly simplistic and grossly unfair to treat every contact in the same manner. Therefore, it is important to capture differences among relationships when mapping and studying social networks. In particular,
Acknowledgements
We wish to give very special thanks to Filip Agneessens, Stephen Borgatti, Carter Butts, and Tom Snijders for their valuable feedback on earlier versions of this paper. We are also grateful to participants of the 3rd Conference on Applications of Social Network Analysis, the 27th Sunbelt Social Networks Conference, and NetSci 2008 for their helpful comments and suggestions.
References (59)
- et al.
Evolution of the social network of scientific collaborations
Physica A
(2002) - et al.
Centrality in valued graphs: a measure of betweenness based on network flow
Social Networks
(1991) - et al.
Structure and time evolution of an Internet dating community
Social Networks
(2004) Testing transitivity in graphs
Social Networks
(1997)Personal network integration: transitivity and homophily in strong-tie relations
Social Networks
(2000)Identifying regular blocks in valued networks: a heuristic applied to the St. Marks carbon flow data, and international trade in cereal products
Social Networks
(2007)- et al.
Optimal connections: strength and distance in valued graphs
Social Networks
(2001) - Ahnert, S.E., Garlaschelli, D., Fink, T.M., Caldarelli, G., 2007. An ensemble approach to the analysis of weighted...
- et al.
The architecture of complex weighted networks.
Proceedings of the National Academy of Sciences of the United States of America
(2004) Structural Holes.
(1992)
Scale-Free Networks: Complex Webs in Nature and Technology
Reaction-diffusion processes and metapopulation models in heterogeneous networks
Nature Physics
The Hidden Power of Social Networks
Clustering and hierarchy in interpersonal relations: testing two graph theoretical models on 742 sociomatrices
American Sociological Review
The small world of the American corporate elite, 1982–2001
Strategic Organization
A note on the detection of cliques in valued graphs
Sociometry
Generalized Blockmodeling
Scale-free topology of e-mail networks
Physical Review E
On random graphs
Publicationes Mathematicae
The focused organization of social ties
American Journal of Sociology
Patterns of sociometric choices: transitivity reconsidered
Social Psychology Quarterly
Processes underlying patterns of sociometric choice: response to Hallinan
Social Psychology Quarterly
Structural cohesion and equivalence explanations of social homogeneity
Sociological Methods and Research
The strength of weak ties
American Journal of Sociology
A structural model of sentiment relations
American Journal of Sociology
Attitudes and cognitive organization
Journal of Psychology
A method for detecting structure in sociometric data
American Journal of Sociology
Transitivity in structural models of small groups
Comparative Group Studies
Cited by (823)
Dynamic link prediction by learning the representation of node-pair via graph neural networks
2024, Expert Systems with ApplicationsBipartite mixed membership distribution-free model. A novel model for community detection in overlapping bipartite weighted networks
2024, Expert Systems with ApplicationsOptimal scheme for vaccine allocation in multi-community networks
2023, Physica A: Statistical Mechanics and its ApplicationsCommunity detection for weighted bipartite networks
2023, Knowledge-Based SystemsStructural connectivity modifications in the brain of selected patients with tumour after its removal by surgery (a case study)
2023, Physica A: Statistical Mechanics and its Applications