Thursday, March 25, 2010

Seeding March Madness

Each of the teams in the NCAA Men's Division I basketball tournament is seeded so that the best teams are likely to meet late in the tournament. In 1985 the number of teams was increased to 64, and a play-in game was added in 2001. There are 4 regions, and the strongest team in each region is given the 1 seed and the weakest the 16 seed. How well does the relative seeding of two teams predict the outcome of a game? It is well known that no 16 seed has ever defeated a 1 seed, but part of the attraction of the tournament is that the higher ranked teams do not always win. I have examined the results from the 2007 through (most of) the 2010 tournaments to learn the effectiveness of the seeding process. The first figure below plots the winning margin for the favored team versus the ratio of the seeds. For example, since in 2008 Kansas, a 1 seed, beat Davidson, a 10 seed, by 2 points, there is a point with an x-value of 10.0 (10/1) and a y-value of +2 (positive because the favored team won). I have ignored the 4 times when equally seeded team play, which can only happen when the regional winners play in the "Final Four". The plot shows that the favored team usually wins (about 75% of the time), and a linear trend line has a correlation coefficient of 0.30. But there is a wide range of winning margins -- typically about ± 20 points, and the range is roughly independent of the ratio of the seeds. This is why there are many upsets.


 The second figure shows the distribution of winning margins after correcting for the trend line.  The standard deviation of the distribution is 11.0 points, and by construction the histogram is centered near 0. The standard deviation without correcting for seeding is 131. points. Once the trend line exceeds 11.0 points (at a ratio of about 5.7), the favored team is very likely to win. In fact, there is only one example of such an upset in the above figure (9-seed Northern Iowa defeating the No. 1 seed Kansas in 2010). There may be factors other than seeding that correlate with the winning margin, but I have not investigated them.