Cursory Mathematical Analysis of MYHockey Rankings Algorithm (Part 2)

6 minute read

Published:

After discussing the MYHockey algorithm with my uncle, I learned that one of the assumptions in my previous analysis is wrong. In particular, the rating associated with the strength of a team’s opponents is not computed from static values from a previous time, but they are updated with every game result.

As before, $r$ is the vector of ratings for the different teams and $\Delta$ is the vector whose $i$’th element is the number of goals scored by team $i$ minus it’s number of goals yielded. $S$ is the matrix where the element $S_{i,j}$ represents how many times team $j$ played team $i$. Lastly, $n$ is the vector whose $i$’th element is the number of games played by team $i$ (Note that $n=S1$). I now believe that ratings are computed as the solution to the system of equations of the form:

\[\begin{align} r&=diag(n)^{-1}\left(\Delta + Sr\right) \end{align}\]

In words, the rating of a team is given by the sum of the average goal differential and the average rating of their opponents. This system can be rewritten as:

\[\begin{align*} \left( diag(n) - S \right)r &= \Delta \end{align*}\]

If the game schedule is interpreted as a weighted graph with edge weights equal to the number of games played between a pair of teams, the matrix on the lefthand side of the above equation is the Laplacian of this graph. Thus, we define $L\triangleq diag(n)-S$ giving:

\[\begin{align} Lr=\Delta \end{align}\]

Heat equation analogy

The Laplacian matrix of a graph can be interpreted as an finite-difference approximation of the negative continuous Laplacian operator [Smola and Condor, 2003]. So, we can draw an analogy between the MYHockey rating system, and the heat equation. The heat equation takes the form:

\[\begin{align*} Lu=\frac{\partial u}{\partial t} \end{align*}\]

where $u(x,t)$ is a temperature function across space and time, and $L$ is the spatial Laplacian operator. This equation says that the rate of change of temperature at a given location is equal to the spatial Laplacian of temperature at that point. The Laplacian, informally, measures how much a function at a point deviates from its average value in the surrounding neighborhood. So, this equation makes intuitive sense - if the temperature at a point is higher than its surrounding, it will cool over time.

In our ratings system, the temperature function is analagous to the ratings, and the time derivative is analagous to the goal differential. The hockey ratings are the temperatures under which changes in temperature are given by the negative goal differentials (emphasis on the negative). So, if a team is hot (has a high rating) relative to its neighbors (opponents) we expect that it will be cooling (high goal differential). If a team is cold (low rating) relative to its neighbors, we expect it will be heating (negative goal differential).

Existence of solution

The ratings are computed as the solution to equation 2. However, how do we know a solution exists? In fact, the matrix $L$ is not invertible since it has a zero eigenvalue associated with the $1$ vector. Fortunately, results from matrix analysis [Spielman, 2009] tell us that:

  • Eigenvalues of $L$ are real and non-negative
  • If the graph is connected, the second eigenvalue is positive

Thus, the nullspace of $L$ is equal to the span of the $1$ vector. Further, since $L$ is real an symmetric, it can be diagonalized into:

\[\begin{align*} L=\begin{bmatrix} U' | 1 \end{bmatrix}\begin{bmatrix} D | 0 \\ 0, 0 \end{bmatrix}\begin{bmatrix} U'^T \\ 1 \end{bmatrix} \end{align*}\]

It is clear from this representation that the range of $L$ is $1^\perp$, i.e. the linear subspace perpendicular to the $1$ vector. Fornately, since goal differentials must sum to zero across all teams, we have $\Delta \cdot 1=0 \implies \Delta \in 1^\perp$. Thus, we can use, for example, the pseudoinverse of $L$ to find a ratings vector that satisfies $Lr=\Delta$. This rating is unique up to a translation by $1$.

In summary: If the game schedule graph is connected (i.e. there is a “path” of games that connects each team to every other team), then there exists a rating vector such that $Lr=\Delta$. There are infinitely many solutions, but they are all related to each other by a constant translation of all the ratings.

Ratings do not change when goal differences match rating differences

Proof

Consider, without loss of generality, the case where a single game is played between team $i$ and team $j$ and the game differential and Laplacian matrix for this game only are represented by $\delta \Delta$ and $\delta L$ respectively.

\(\begin{align} L r_{old} &= \Delta_{old} \\ \left( L + \delta L \right)r_{new} &= \Delta_{old}+\delta\Delta \\ \text{Subtract (3) from (4)} \nonumber \\ L(r_{new}-r_{old})+\delta Lr_{new}&= \delta \Delta \end{align}\)=

What happens if team $i$ exceeds its expected goal differential? Define the margin $m=\delta \Delta - \delta L r_{old}$, and substituting into equation 5, we get:

\[\begin{align*} L(r_{new}-r_{old})+\delta Lr_{new}&= m+\delta L r_{old} \\ (L+\delta L)(r_{new}-r_{old})&= m \end{align*}\]

By the same reasoning as in the earlier section, we can solve for $r_{new}-r_{old}$. It is clear that if $m=0$, i.e. the difference between goal differential and rating differential is zero, then ratings are unchanged. However it is less clear (to me) what happens if $m \neq 0$. It would be nice to prove that if a team exceeds its rating margin, its rating will increase.

$\blacksquare$

Convergence of system to true ability vector

Say that the central hypothesis of this system is true, that a team’s ability can be summarized by a single number, scaled so that differences in ratings are the expected difference in goals. Say this “true” vector of abilities is $r^\ast$. If, for every pair of teams that plays each other, their number of games goes to infinity, then the rating will converge in probability to $r^\ast$.

Proof

$\Delta$ can be rewritten as a weighted sum of average goal differentials between pairs of teams. The law of large numbers implies that these averages converge to their expected value in probability, and therefore the rating vector can recover $r^\ast$, up to a translation by $1$ (constant shift of all true ratings).

$\blacksquare$