I’m going to try soon to add some normalization between years for my Elo ratings. Presently, I find start of season Elos by taking 70% of the previous season’s Elo + 30% of the end of season Elo from 2 seasons ago. I then revert this sum toward 1550 by 20%. My concern with this method is that I don’t think it’s fair to directly sum Elos from different seasons since the Elo distributions vary so greatly year to year based on the game. If we had the same game every year, this wouldn’t be a problem.

To start, I measured the average, stdev, skew, and kurtosis for the end of season Elo distributions in each year. The results are shown in this table:

The average hovers right around 1500 each year, but this is due to how I designed my Elo ratings, and doesn’t actually tell us much. Some actual measure of “true skill” would probably have higher averages in recent years, since most would agree the average robot in 2018 is much better than the average robot was in 2002.

stdevs move around each year, likely due to the game structure. 2018 had the highest stdev on record by a pretty solid margin. I have previously speculated that this could be due to the snowballing scoring structure of powerup.

The skewness is interesting, for those of you unfamiliar with skewness, a positive skewness indicates a larger positive “tail” on the distribution than the negative “tail”. Every year on record has had a positive skew, which indicates that there are always more “outlier” good teams than “outlier” bad teams. Some years have had much higher skews than others though. For example, 2015 had an incredibly positive skew, which means there were a large number of very dominant teams. 2017 in contrast had one of the smallest skews on record. This is probably due to the severely limited scoring opportunities for the strong teams after the climb and 3 rotors, as well as the fact that the teams that lacked climbing ability were a severe hindrance to their alliances. The difference in skews between 2015 and 2017 can be seen in histograms of their Elo distributions. Notice how much longer the 2015 positive tail is than the 2017 one.

I also threw in kurtosis, kurtosis is a rough measure of how “outlier-y” or “taily” a distribution is. Kurtosis tracks very closely with skew every year. This means that the “outlier” teams driving the high kurtosis in some years are “good team” outliers and not “bad team” outliers. A high kurtosis with low skew would indicate that there are lots of good team and bad team outliers. Plots of stdev vs skew and skew vs kurtosis can be seen below.

Next, I’ll be trying to normalize end of season Elos so that I can get better start of season Elos. We’ve now had two years in a row of games that have low skew/kurtosis, which means that without adjustment the 2019 start of season Elos will also have low skew/kurtosis even though the 2019 game likely will not. It’ll all come down to predictive power though, if I can get enough predictive power increase I’ll add it in, otherwise I won’t.