In 1997, Kenneth Massey, then an undergraduate, created a method for ranking college football teams. We wrote about this method, which uses the mathematical theory of least squares, as his honors thesis.
The main idea of Massey's method is enclosed in the following equation:
Massey proposes to find the solution that minimizes:
We observe that the Massey's team ratings are in fact interdependent. Indeed, Massey's matrix can be decomposed as:
We now establish a connection between Massey's method and Katz's centrality. Observe that Massey's equation
Massey's rating has also an electrical interpretation in terms of resistor networks. If we view the symmetric matrix as the adjacency matrix of an undirected graph , then is the Laplacian matrix of the graph . The Massey's rating vector defined in system is then equivalent to the potential vector over a resistor network defined by with supply vector .
Following this metaphor, the resistor network has a resistor between nodes and as soon as teams and has matched with conductance (reciprocal of resistance) equal to the number of matches between and . Sources, that are nodes with positive supply through which current enters the network, are teams with positive point spread, while targets, that are nodes with negative supply through which current leaves the network, are teams with negative point spread. Notice that current entering and leaving the network must be equal, indeed the point spread vector sums to 0. The potential for node corresponds to the rating of team : teams with large potential are teams high in the ranking. Furthermore, the current flow through edge is, by Ohm's law, the quantity , which corresponds to the rating difference between teams and (which is also an estimate of the point spread in a match between and ) multiplied by the number of times they matched: current flows more intensely between teams of different strengths (as measured by Massey's method) that matched many times.
How do we solve the Massey's system ? Unfortunately, is a singular matrix, hence it is not possible to solve this system computing the inverse of . Assuming that is connected, a solution of the system can be obtained as follows: let be the matrix obtained by replacing any row, say the last, of with a vector of all 's, and let be the vector obtained by replacing the last element of with a 0. Then, solve the system:
Massey also proposed an offensive rating () and a defensive rating () characterizing the offensive and defensive strengths of teams. Massey assumes that , that is, the overall strength of a team is the sum of its offensive and defensive powers. Let's decompose the point spread vector , where holds the total number of points scored by each team and holds the total number of points scored against each team. Then, the equations defining and are:
Let and . It might happen that matrix is singular, hence it cannot be inverted. Assuming that is connected, it holds that is singular precisely when the graph of is bipartite. If is bipartite, we can apply a perturbation trick similar to the one exploited for the Massey's matrix . Let and be the two set of nodes of the bipartite graph . Let be a vector such that if and if . Let be the matrix obtained by replacing any row, say the last, of with vector and be the vector obtained by replacing the last element of with a 0. Then, a solution of system Massey's system for defensive ranking is obtained solving the following perturbed system:
The following user-defined function massey computes the various Massey's ratings described above:
## Massey
# ASSUMPTION: graph of A is connected
# INPUT
# matches: matches data frame
# teams: team names
# OUTPUT
# r: Massey rating (r = r1 + r2; r = o + d)
# r1: opponents rating
# r2: spread rating
# o: offensive rating
# d: defensive rating
# A: match matrix
# g: match graph
massey = function(matches, teams) {
# number of teams
n = length(teams)
# number of matches
m = dim(matches)[1]
# match matrix
X = matrix(0, nrow=m, ncol=n, byrow=TRUE)
# margin of victory vector
y = rep(0, m)
# points for vector
f = rep(0, n)
# points against vector
a = rep(0, n)
# populate match matrix
for (i in 1:m) {
if (matches[i,"score1"] >= matches[i,"score2"]) {
X[i, matches[i,"team1"]] = 1;
X[i, matches[i,"team2"]] = -1;
y[i] = matches[i,"score1"] - matches[i,"score2"]
} else {
X[i, matches[i,"team2"]] = 1;
X[i, matches[i,"team1"]] = -1;
y[i] = matches[i,"score2"] - matches[i,"score1"]
}
f[matches[i,"team1"]] = f[matches[i,"team1"]] + matches[i,"score1"]
f[matches[i,"team2"]] = f[matches[i,"team2"]] + matches[i,"score2"]
a[matches[i,"team1"]] = a[matches[i,"team1"]] + matches[i,"score2"]
a[matches[i,"team2"]] = a[matches[i,"team2"]] + matches[i,"score1"]
}
# compute Massey matrix
M = t(X) %*% X
# compute team spread vector (or p = f-a)
p = t(X) %*% y
# check assumptions
D = diag(diag(M), n, n)
A = D - M
g = graph_from_adjacency_matrix(A, mode="undirected", weighted=TRUE)
if (!is.connected(g)) {
cat("The graph is not connected");
return(NULL)
}
# perturbate the system
M1 = M
p1 = p
M1[n, ] = rep(1, n)
p1[n,1] = 0
# compute ranking r
r = solve(M1, p1)
# compute partial rankings r1 and r2 (r = r1 + r2)
D1 = solve(D)
r1 = D1 %*% A %*% r
r2 = D1 %*% p
# compute offensive and defensive rankings (r = o + d)
N = D + A
q = (D %*% r) - f
# check if graph of A is bipartite (we know that the graph of A is connected)
bipa = bipartite_mapping(graph_from_adjacency_matrix(A, mode="undirected"))
if (bipa$res == TRUE) {
# if bipartite, perturbate the system
v = bipa$type
v[v == TRUE] = 1
v[v == FALSE] = -1
N1 = N
q1 = q
N1[n, ] = v
q1[n,1] = 0
d = solve(N1, q1)
} else {
d = solve(N, q)
}
o = r - d
r = as.vector(r)
r1 = as.vector(r1)
r2 = as.vector(r2)
o = as.vector(o)
d = as.vector(d)
names(r) = teams
names(r1) = teams
names(r2) = teams
names(o) = teams
names(d) = teams
V(g)$teams = teams
return(list(r=r, r1=r1, r2=r2, o=o, d=d, A = A, g = g))
}
The table below shows the results of 4 matches (numbered 1, 2, 3, 4) involving 4 fictitious teams (labelled A, B, C, D):