I’ve been looking for a way to measure team performance in a way that’s independent of the individual season, so that you can track performance on a line graph and get a good idea of how much better/worse the team has done over the years. Below is a summary of my research. I’d be interested to see if anyone has any commentary or experience with what they use.
Too Long, Didn’t Read (TLDR)
Elo is good but has a couple minor problems. I tried fixing the problems but encountered bigger problems. The best I’ve found is a normalized version of CCWM. Most simpler systems have huge flaws.
OPR scaling differs from year to year. Win-loss record is too dependent on the individual schedule of each team, and can reward teams for narrowly missing out on qualifying for DCMP/CMP. District points/rank is also too dependent on the events you go to. We need something that can compare across regions and seasons.
Caleb Sykes’ adaptation of the Elo rating is great. It’s what I’ve been using for this purpose and I’ve gotten some pretty good insights out of it. Unfortunately, I have a couple issues with it. The algorithm relies on each team’s Elo score going into a match to determine their updated score. Within a season, this is great, but it has the consequence of requiring some starting value for each season. There are two possible solutions.
First, you could use something based off of past seasons. This relies on the assumption that a team’s performance can be approximated by their performances in previous seasons. While this is often true, it means that each year’s data isn’t strictly generated by performances from that year. Two teams with equal performances in 2019 will end up with different scores depending on their 2018 and prior performances. It’s tolerable but not ideal. Also, rookie teams can be seriously over/undervalued. They are presumed to have an average score (~1500) for the purposes of early calculations, but this means it could take several years before they approach their proper score. If you were making a line graph of a far-from-average rookie team’s Elo scores, you would see some slope to it over several years, even if their score remained consistent. Again, tolerable but not ideal. This is the solution Caleb Sykes uses, and for good reason.
Second, you could start everyone off at the same score each season. This would address both issues with the above solution, but it comes with two downsides. The reality of FRC is that the vast majority of teams won’t see a huge (>100 pt) difference in Elo score from year to year. By ignoring previous data, the algorithm just has less to go off of to sort teams into the right place. Also, early matches would be devalued. Elo awards more points for bigger upsets. If all teams start at the same score, the same upset would result in fewer points exchanged at the beginning of the season than at the end, even if all teams perform the exact same way. Overall, this solution just has too severe downsides to bother with, even if it’s a bit more elegant.
CMR, or Contribution to Match Result, is an algorithm that I developed to experiment with a more elegant alternative to Elo that addresses the other concerns. I’m not entirely happy with it, but developing it made the core problems more clear, so that’s neat. If you’re interested, the GitHub is here.
The core idea of CMR is to use a linear algebra calculation, similar to OPR, to calculate each team’s contribution to whether the match is a win, loss, or tie. There are two main differences between OPR and CMR. First, the CMR left-hand “match schedule” matrix includes both alliances, meaning the teams’ scores depend both on what they contribute to their alliance and what they take away from their opponents’ alliance. If this sounds familiar, bear with me until the next section. Second, the CMR right-hand “match result” matrix includes just a 1 for red win, 0 for tie, and -1 for blue win, rather than the specific match scores. This means that rather than calculating how many points you contribute to your alliance, CMR calculates how much you contribute to the actual match result.
There are a couple advantages as compared to Elo. In short, it works completely independently of both the region and season. First, each season’s CMR value depends only on matches played during that season. Second, scores are determined based on final results, rather than cumulative results. With CMR, each team’s value is determined simultaneously, evaluating their performance in the context of who they were playing with and against. This means that the data shouldn’t be distorted by teams that started off with an incorrect value. Third, regional differences in performance can be communicated through many fewer matches. With Elo, cross-region communication requires directly playing against teams that have played out-of-region. With CMR, this is automatically adjusted for as long as a few teams have crossed regions and their performances were fairly consistent.
However, it does have a fatal flaw: a binary representation of the match performance just isn’t enough data to measure teams on. I was finding that each team needed to play almost ten times more matches to get a comparable deviation on a “Performance Measure vs. OPR” graph. Without the additional matches, you could end up with the third-highest-OPR team at CMP rated as contributing nothing to their alliance on average, only due to the schedule they got. It’s just not suitable for real FRC data. I tried changing the right-hand “match result” matrix to instead be the difference in scores over the sum of scores. Unfortunately, this made the data much worse, as that isn’t a linear function between actual performance and measured performance. Fortunately, it led me to the next performance measure.
CCWM is great. It’s basically OPR but better. The gist of CCWM (calculated contribution to winning margin) is that it’s how many points you contribute to your alliance plus how many points you take away from your opponents. It’s probably the best method for analyzing a robot in a given year. However, it still has two main flaws.
First, it’s also non-linear, but in a slightly different way. In short, many FRC games don’t use scoring systems with each action representing the same number of points each occurrence, meaning that teams that are just a little better can be rated a lot higher.
Second, it depends on the details of the game. It was a lot easier to contribute a ton of points in 2018 than it was in 2019. This is really easily solved though: just divide the CCWM by the median CCWM that season. This normalized CCWM compares across regions and seasons without the inaccuracy of CMR and the lack of elegance of Elo. From what I’ve seen, its only flaw is the non-linearity. As that’s an inherent problem with FRC scoring, I’m pretty happy to just use normalized CCWM and call it a day.
There are three core problems with these algorithms I’ve found: comparing across regions, comparing across seasons, and getting good data. Here’s a table of each measurement system’s performance at each of those, ranked 1-5. Normalized CCWM is the best I’ve seen. However, if anyone knows of a better system, I would love to see it.
|System||Comparing Across Regions||Comparing Across Seasons||Getting Good Data|
|Elo (using past data)||4||4||4|
|Elo (not using past data)||4||5||3|