Paper: FRC Elo 2002-Present

I’ve uploaded a new book, this is the Elo model I am currently planning to use going into the 2019 season (although that is subject to change). The only key change in this book compared to my previous books is the addition of 2002-2004 matches, which has negligible impact on 2018+ Elos.

I advised caution when looking at 2005-2007 due to poor data quality, so I need to caution twice as hard about this when looking at 2002-2004 data. Even for events in which the data quality is solid, the incentive structure in those years can potentially cause Elo to be misleading. The predictive power in these years is worse than basically every other year on record, however, they are worth including in the model if only to improve predictive power in the following years.

Interestingly, the key model parameters for 2002-2004 are essentially the same as the parameters for 3-team alliance years. I could have made them a bit different to improve predictive power a touch, but I have opted to keep them the same for simplicity.

I didn’t make any major model changes other than including 2002-2004, but it certainly wasn’t for a lack of trying. Here are the big changes I attempted but ended up opting not to include:

  • Varying kQuals and kPlayoffs for 2-team alliances in 2002-2004: I mentioned this above, but the predictive power increase was small enough that this change didn’t merit inclusion.
  • Using the log of match winning margins instead of the raw match winning margin (stolen from xkcd and initiated by conversations with Bob Croucher): This was to hopefully counteract the annoying effects of ridiculous penalty matches and/or red cards. Unfortunately, this didn’t improve predictive power in every year I applied it, although some years did see improvement. In a similar vein, I tried taking the log of the difference between the predicted match score and the actual match score, this had similarly disappointing results.
  • Updating Elo values differently for the highest, second highest, and lowest Elo teams (stolen from Eugene Fang): The idea here is that each team doesn’t contribute equally to matches, so it might make more sense to, for example, weight the Elo change more for the “better” teams than the worse teams, or vice versa. Unfortunately, this had negligible impact on predictive power while also being inconsistent between years.
  • Using a weighted sum of Elos instead of just taking a raw sum when predicting match results (stolen from Kevin Leonard on discord): This is a similar idea to the one above, but it would be applied before the match instead of after the match. For example, I found that, in 2018, it would have been best to take 1.5*(best team’s Elo) + 1.0*(second best team’s Elo) + 0.5*(worst team’s Elo) when determining the alliance’s strength instead of just taking a raw sum. This actually provided a very large amount of predictive power increase when applied to 2018, but the optimal weights varied drastically across years. For example, 2017 had weights very close to 1.0, 1.0, and 1.0, and 2016 was close to 1.5, 1.0, and 1.0. Because of this, I had to leave this improvement out of the model.
  • Using a geometric mean instead of an arithmetic mean when predicting match results: This is tied very closely to the above change. The idea would be to have a weighting of sorts for teams of varying skills, since geometric means (depending on where you place the zero) will either be higher or lower than arithmetic means but will also factor in alliance composition. The results were very similary to the different weightings, the optimal placement for the zero varied drastically year-to-year, but could potentially provide drastic predictive power increase in years like 2018.
  • Setting some cap on the amount a team’s Elo rating can change between matches: This was another approach to dealing with the ridiculous penalty matches. Unfortunately, the predictive power increase with this change was negligible. Perhaps waiting until after a team has played a few matches to implement the cap could be valuable? Since team’s ratings move around more early in the season. I’m not too hopeful though.
  • Using a variant of the fivethirtyeight Elo update formula: The model currently only looks at the winning margin when updating Elos after matches. This seems a bit counterintuitive since, for example, it will cause a team’s Elo to drop if they win a match by 190 points if they were expected to win by 210. I’ve tried this change in the past unsuccessfully, but I thought I’d take another crack at it since I think I’m at least a little smarter than I was two years ago. Nothing much came out of it though. I could be misunderstanding how to implement it, but I manually tweaked all of the parameters and couldn’t get anything even slightly more predictive than my current model. I’m starting to believe that, in most matches, perhaps teams aren’t actually trying to win, but are rather trying to maximize the match winning margin. This makes intuitive sense to me since matches are so short, there is no perfect information about the game state, any team trying to strictly win has two partners they don’t have direct control over, and there are frequently alternative incentives in a match instead of winning (showing off for alliance selection, going for other RPs, etc…). If this is true, then it would make more sense for a model to not incorporate who wins or loses, but instead to look only at the winning margin.
  • Adjusting the Elo distribution to be “standard” between years: I analyzed Elo distributions in different years here. Using the 2008-2018 average distribution, I was able to get a really solid predictive performance increase in most years in that range. Unfortunately, there were a couple years that saw their predictive power drop with this change. There’s certainly potential here though if I ever decide to update parameters each year.

It’s a bit frustrating that I don’t really have anything concrete to show from all of these investigations other than the addition of 2002-2004 data. I think I’m getting close to maximizing the predictive power I can get using the strict restrictions I have placed on myself, which means it might be time to branch out and break some of those restrictions. Roughly, here are the key restrictions I may loosen in the future to improve my model:

  • Demanding that model parameters stay the same every year: With the exception of the standard deviations, all of my model parameters are the same in 2018 as they were in 2008 or even 2002. The huge advantage of this approach is that I have a model that will probably work pretty darn well in 2019 and beyond, even though we know basically nothing about those games at the moment. The drawback is that I can’t incorporate things that clearly improve predictive power in some years if they reduce predictive power in other years. If I loosened this restriction, I would probably restrict myself to measurements I could take using only week 1 data (like stdevs). Possible additions after loosening this restriction would be different team weights or geometric mean usage as well as looking at the Elo distribution change between years.
  • Restricting my analyses only to raw match scores and match type: For much of the history of FRC data, this has been basically the only type of data we have had available, so looking historically this is about as good as we’re going to get. However, the addition of twitter score breakdowns (~2009?) opened up the possibility to use other match data, and the detailed API score breakdowns in 2015 expanded on this data drastically. There’s a ton of potential to use these data sets to improve upon predictions. I’ve played around with incorporating these data into my Elo ratings in the past, maybe it’s time to officially add some things like this. Looking forward, I think the TBA live scoring breakdowns will also add a whole new dimension to the metrics we can use.
  • Demanding a team’s rating be restricted to a single number (my current Elo rating): There are lots of supplementary metrics that I have envisioned that could be added onto raw Elo ratings to provide better predictions in certain scenarios. For example, giving teams a playoff Elo boost to improve playoff predictions or rating the region a team hails from to make better champs predictions. Additions such as these could certainly improve predictive power, but would require that I track extra metrics instead of just the simple Elo rating.

I’m not strictly opposed to loosening any of the above restrictions, although I do think I might make two separate ratings if I do, one for the tighter restrictions and one for the looser restrictions. It’s important to me that my model can be quickly applied in a new year, which my current model does very well, and loosening these restrictions will only make the transition to a new year more difficult, but certainly manageable.