Great post, Brennon!
A few comments just for posterity:
. I’ve spent most of my time mentoring FTC and not FRC. Elo is tough to do for FTC for a variety of reasons including
A. There is much less data available. There are lots of events and teams where the event results aren’t published.
B. Teams vary from year to year a lot more on average compared with FRC.
That’s why OPR-type stats are more commonly used for FTC than Elo.
. My Android App for FTC computes MMSE-OPR and match predictions on the fly. Teams I’ve mentored have used the match predictions largely to know which matches are likely to be close, easy, or very difficult, to know what low or high risk strategies might be useful.
. Match predictions can also be fun to see which upcoming matches are likely to be the highest scoring matches of the event, or the most contested, etc. The app also sums up the match prediction win probabilities to give an estimate of a team’s final RP as the event progresses, which can be fun to track and sort. Some teams are highly ranked but have very hard matches later in the event, and the estimated RP (eRP) can show that as the event happens.
. The app also computes the component MMSE-OPRs which again can be used for strategic planning (who’s best at auton? who’s best at end-game)?
. The app can also use either event-specific OPR averages for the prior for the MMSE calculation, or can import prior OPR estimates where they exist (e.g., late in a season at a regional championship, super-regional, or worlds).
. In limited experimentation, I found that using the most recent event’s OPR as an estimate for the MMSE prior wasn’t that helpful for FTC because the season is much longer and teams make substantial changes to their robots over a season. For example, one team may win an event in November with an OPR of 50 but come back with a whole new robot in March to compete at the state championship and get an OPR of >100 there.
. Scouting in FTC on average is much weaker and team sizes are much smaller, so there are more teams who might use OPR-type metrics, especially for alliance selections. It’s not unheard of to have teams of 2-4 people, making real-time scouting very difficult as the whole team might be competing in matches.
. In general, defensive stats are harder to compute because defensive benefits tend to be smaller. If a team can cause the opponents to score 20% fewer points through defense, then their “true” DPR-type value is likely to be around 20% of the opponent’s OPR. With smaller values, noise becomes more important and filtering out the noise just requires more match results. And given that we have events with very few match results in general, it ends up being hard to find defensive stats that aren’t overwhelmed by the noise of the event. That said, if there were ever events with 10x the number of matches, then defensive stats might be more viable. Or events where defense mattered as much as offense (e.g., if a good defensive robot could completely stop an opponent’s offense for the entire match).