Spectral maximization

This method tries to maximize the modularity of a partition by exploiting the spectral properties of the modularity matrix. This method is implemented in igraph in the function cluster_leading_eigen. Recall that modularity is defined as:

Q=12mi,j(Ai,jkikj2m)δ(ci,cj)=12mi,jBi,jδ(ci,cj)

Notice that the modularity matrix B is symmetric and all rows and all columns of B sum to 0. Indeed:

jBi,j=j(Ai,jkikj2m)=jAi,jki2mjkj=kiki2m2m=0

Let us consider the division of a network into just two parts, namely group 1 and group 2. The more general case can be treated with repeated bisection, as described below. We represent the assignment of node i to a group with the variable si such that si=1 if i belongs to group 1 and si=1 if i belongs to group 2. It follows that:

δ(ci,cj)=12(sisj+1)
Substituting this expression in the modularity formula we have:
Q=14mi,jBi,j(sisj+1)=14mi,jBi,jsisj
where we have used the fact that i,jBi,j=0. In matrix form we have:
Q=14msBs
We wish to find the division of a given network, that is the value of s, that maximizes the modularity Q. The elements of s are constrained to take integer values 1 or 1, so that the vector always points to one of the corners of an n-dimensional hypercube.

Unfortunately, this optimization problem is a hard one, but it can be tackled approximately by a relaxation method. We relax the constraint that s must point to a corner of the hypercube and allow it to point in any direction, though keeping its length the same, meaning that it can take any real value subject only to the constraint that:

ss=isi2=n

We maximize the modularity equation by differentiating, imposing the constraint with a single Lagrange multiplier β:

si[j,kBj,ksjsk+β(njsj2)]=0
The derivate in not zero only when either j=i or k=i, that is:
kBi,ksk+jBj,isj2βsi=2jBi,jsj2βsi=0
which is:
jBi,jsj=βsi
or in matrix notation:
Bs=βs
Hence the solution s is an eigenvector of the modularity matrix. Therefore the modularity is given by:
Q=14msBs=14mβss=n4mβ
which leads us to the conclusion that for maximum modularity we have to choose s as the eigenvector u corresponding to the largest (positive) eigenvalue of the modularity matrix.

We typically cannot in fact choose s=u, since the elements of s are subject to the constraint si{+1,1}. But we do the best we can and choose it as close to u as possible, hence si=+1 if ui>0 and si=1 if ui<0 (either value of si is good if ui=0).

And so we are led the following very simple algorithm. We calculate the eigenvector of the modularity matrix corresponding to the largest (most positive) eigenvalue and then assign vertices to communities according to the signs of the vector elements, positive signs in one group and negative signs in the other. If all elements in the eigenvector are of the same sign that means that the network has no underlying community structure.

What about the division in more than two communities? Instead of maximizing modularity over divisions of a network into two groups, we can just maximize it over divisions into any number of groups. Modularity is supposed to be largest for the best division of the network, no matter how many groups that division possesses. A simpler approach is repeated bisection of a network. We start by dividing the network first into two parts and then we further subdivide those parts in to smaller ones, and so on. Given that our goal is to maximize the modularity for the entire network, we should only go on subdividing groups so long as doing so results in an increase in the overall modularity. If we are unable to find any division of a community that results in a positive change in the modularity, then we should simply leave that community undivided. When we have subdivided our network to the point where all communities are in this indivisible state, the algorithm is finished and we stop. This repeated bisection method works well in many situations, but it is by no means perfect. A particular problem is that there is no guarantee that the best division of a network into, say, three parts, can be found by first finding the best division into two parts and then subdividing one of the two.

One potential problem with the algorithm is that the matrix B is not sparse, and indeed usually has all elements non-zero. Finding the leading eigenvector of a matrix takes time O(mn), which is equivalent to O(n3) in a dense matrix, as opposed to O(n2) in a sparse one.