Interesting. I might have guessed 2010 would be higher, but 2015 and especially 2007 really surprise me. I had a related conversation on another channel, but I’ll copy the thoughts here.
Essentially, I’m seeing some serious shortcomings of using average Elo as a holistic measurement of a team’s season. I originally decided to use average Elo because that’s what 538 did and because I’ve recently been sick of seeing teams’ end of season Elos get artificially inflated because of one late season match. However, neither of these are adequate justifications on their own.
The biggest problem I’m seeing with average Elos is that they don’t do a particularly great job of isolating a single season’s performance, which is potentially why 1114 has so many years better than 2008. Now, for the multi-year ranges, this becomes less of a problem because it’s not that big of a deal if a part of one year’s “rating” actually comes from the previous year, but for single year strengths it’s a much larger issue. I did some rough calculations using the assumption that each match changes a team’s Elo by about 5% and got the following breakdowns of how much impact 2012 and 2013 Elos would have on a hypothetical team’s 2014 average Elo, here are the results (again, very crude):
Obviously as teams get more matches, the dependency on the previous years’ Elos goes down, but not super quickly. In contrast, if we used end of season Elo, it looks more like this:
There’s much less dependence on past years, and the drop goes more quickly with more matches.
I need to do some proper investigation to see which is actually superior. The best will very likely be to use some blend of average, end of season, and max Elos, but I’d need to determine which weights to use. Another thing I want to look into is finding the average Elo after the team has played X number of matches (I’ve seen 538 ignore the first 4 NFL matches for example). This would help remove some of the dependence on the prior seasons’ performances.