Heterogeneity

We have learnt how to exploit the network topology to gauge the similarity (or dissimilarity) of pairs of networks. This opens the possibility of defining a measure of heterogeneity for a node in terms of the dissimilarity of its neighbors:

A node is heterogeneous if it has dissimilar neighbors.

We will work on weighted undirected graphs. In the case of directed graphs, one edge direction among incoming and outgoing must be chosen. The unweighted case is a special case of the weighted one. Let A=(ai,j) be the adjacency matrix of a weighted undirected graph. Let i be a node with positive degree di. We set:

pi,j=ai,jkai,k=ai,jdi
Notice that 0pi,j1 and jpi,j=1, hence (pi,1,,pi,n) is a probability distribution for every node i.

We will work out the math for a general probability distribution p=(p1,,pn), with 0pi1 and ipi=1. A measure of heterogeneity of p is Shannon entropy:

H(p)=ipilogpi
where pilogpi=0 if pi=0. Another measure of heterogeneity of p is Simpson diversity:
S(p)=ijpipj=i,jpipjipi2=1ipi2
where in the latter we have used the fact that
i,jpipj=ipijpj=ipi1=1
Both measures are minimized when p is totally concentrated on some element k, that is pk=1 and pi=0 for every ik. In such case H(p)=S(p)=0. Both measures are maximized when p is evenly distributed among the elements, that is pi=1/n for every i. In such case H(p)=logn and S(p)=11/n. Notice that in this case both measures grow with n, although Simpson diversity is limited by 1.

If we have information about pairwise distance (dissimilarity) di,j of elements i and j, then another measure of heterogeneity is Rao quadratic entropy:

R(p)=i,jpipjdi,j
There are two components in this definition of heterogeneity:

  1. the evenness of the distribution p, and
  2. the distances among neighbors of k.
To isolate both components of the definition, let us first assume that the distribution p is uniform, that is pi=1/n for every i. Hence:
R(p)=i,jpipjdi,j=1n2i,jdi,j
is the arithmetic mean distance among elements in p. Hence, the more distant are the elements in p among each other, the more heterogeneous is p.

Let us now assume the simplest distance among nodes: di,j=1 for ij and di,i=0. In this case:

R(p)=i,jpipjdi,j=ijpipj=S(p)
Hence in this case the more uniform is p, the more heterogeneous is p.

It follows that, in general, R(p) is large, hence p is heterogeneous, when p evenly distributes its probability among dissimilar elements. On the contrary, p is homogeneous when it concentrates its probability on similar elements. If we go back to heterogeneity of nodes of a graph and apply R(p) as a measure, we have that a node is heterogeneous when it evenly distributes its strength among dissimilar neighbors and it is homogeneous when it concentrates its strength on similar neighbors.

shannon = function(p) { x = p * log2(p) x = replace(x, is.nan(x), 0) return(-sum(x)) } simpson = function(p) { x = 1 - sum(p * p) return(x) } rao = function(p, D) { x = diag(p) %*% D %*% diag(p) return(sum(c(x))) } heterogeneity = function(g, D, mode = "col") { A = as_adjacency_matrix(g, attr = "weight", sparse = FALSE) if (mode == "col") { A = A %*% diag(1/colSums(A)) dim = 2 } else { A = diag(1/rowSums(A)) %*% A dim = 1 } return(list(shannon = apply(A, dim, shannon), simpson = apply(A, dim, simpson), rao = apply(A, dim, rao, D))) } }