Elo is better than OPR in every way

…at least for predicting match outcomes. Full disclosure: I was not expecting these results – and no, @Caleb_Sykes did not sponsor this. :slight_smile:

Full post is available here. You’ll especially enjoy it if you want to know more about MMSE OPR and if you like matrix calculus. The previous CD topic for my post on Elo match prediction methods is available here.

Why do you think Elo methods are better than MMSE OPR methods for match prediction? I have some hunches, but I’d like to hear your thoughts. What other models would you like to see analyzed for match predictions?


It’s pretty simple: Elo uses data from multiple seasons, OPR does not. OPR has a much smaller data set to work from.

A rating derived from the OPR Rank of teams at their previous events / years could plausibly outperform both, in games where OPR makes some amount of sense.


I disagree. OPR is pretty useful for determining who thinks that “linear algebra go brrr” is a replacement for scouting.


As much as I love math, I’m not really sure why you’d use either ELO or OPR if you have scouting data. What are you really using the prediction estimates for? I could foresee only a couple of use cases, but it’s really only super useful if you don’t want to put in the legwork. For example, if you want to know how your early match schedule is going to pan out in terms of areas of worry, etc., you could have done some pre-event scouting.

Other than an actual use at a competition, sure it might be interesting.

But, isn’t that the idea? ELO is what you HAVE done. OPR is what you ARE doing in this competition.

1 Like

I actually don’t think it’s that simple. I found that Elo outperforms OPR in 2010 in every metric (2010 is when I initalized Elo values, so there was no multi-year advantage).

Relevant 2010 stats:

Method Accuracy Precision Recall MSE
OPR 55.6% 60.9% 58.2% 0.2554
Elo 63.5% 61.3% 66.0% 0.2295
1 Like

Is that maybe because ELO includes defensive prowess but OPR, by definition, does not?

From personal experience, I don’t think many teams use their scouting data to predict match outcomes via mathematical models. And even if they did, most teams track different metrics and don’t release their data – making it hard to judge the performance of those models.

I’ll add in the caveat of “if you have accurate, trustworthy, and detailed scouting data”. I’ll be blunt for a moment and say that most teams that have scouting data do not have scouting data which match the descriptors I laid out above. As such OPR can be a very valuable addition to their toolbox. Even for teams who have great scouting data, OPR can provide a valuable sense check and forces you to ask yourself some smart questions. “Wait, why is there a huge discrepancy between this team’s OPR and our scouted data?”


CCWM (which is OPR and DPR combined) has weaker predictive power than just OPR (in my setup, and Caleb found it as well).

The accuracy numbers I got for OPR from quals predicting elims at a single event (which is the most common way teams actually use OPR) were pretty close to Caleb’s ELO (in 2017, and game dependent - so things definitely could have changed). OP has taken a much more rigorous approach than I did, but I feel like it’s also measuring something that isn’t the primary way teams use OPR (as much teams would looking at OPR before alliance selection to see who they want to pick for playoffs).

1 Like

Are (any/many) teams attempting to predict match outcomes at all? If so, to what end?

If by predict match outcomes you mean predict abilities of the alliance(s) as a whole, I think a lot of teams use their scouting data to do that and influence the match outcome.


It can be useful to know what to prioritize in a match. If a match prediction looks awfully close, we might choose to switch to a higher point yield strategy than a higher RP strategy (good example of this was choosing to go for the cargo ship instead of finishing a rocket in 2019). If a match seems to be really difficult to win, we might try to go for extra RPs and show off skills to higher seeded teams. If the prediction shows us handily winning, we might try something risky.


I don’t think CCWM – or any other metric that incorporates DPR – is the right metric to use. DPR captures strength of schedule more than a defensive contribution.

If you want to quantify defense in a least squares or minimum mean square error way, you should take a serious look at @wgardner’s sOPR metric (available here). I mentioned him and his work a few times in the blog post, but just want to reiterate here that his paper is superb.

Unfortunately, some of the real-world benchmarks he did revealed that sOPR only had a slight real-world advantage over conventional OPR. However, I believe there’s room for further testing here.

How are these predictions (for both the ELO and OPR models) generated? Are they generated post-facto (ie, the results of all matches are known to the model when making predictions), or are they done on-line (ie, the model knows only the matches that happened prior to the event being predicted)? Given that FRC teams tend to improve over time, I would expect this to have a significant impact on predictions.

Great post, Brennon!

A few comments just for posterity:

. I’ve spent most of my time mentoring FTC and not FRC. Elo is tough to do for FTC for a variety of reasons including
A. There is much less data available. There are lots of events and teams where the event results aren’t published.
B. Teams vary from year to year a lot more on average compared with FRC.
That’s why OPR-type stats are more commonly used for FTC than Elo.

. My Android App for FTC computes MMSE-OPR and match predictions on the fly. Teams I’ve mentored have used the match predictions largely to know which matches are likely to be close, easy, or very difficult, to know what low or high risk strategies might be useful.

. Match predictions can also be fun to see which upcoming matches are likely to be the highest scoring matches of the event, or the most contested, etc. The app also sums up the match prediction win probabilities to give an estimate of a team’s final RP as the event progresses, which can be fun to track and sort. Some teams are highly ranked but have very hard matches later in the event, and the estimated RP (eRP) can show that as the event happens.

. The app also computes the component MMSE-OPRs which again can be used for strategic planning (who’s best at auton? who’s best at end-game)?

. The app can also use either event-specific OPR averages for the prior for the MMSE calculation, or can import prior OPR estimates where they exist (e.g., late in a season at a regional championship, super-regional, or worlds).

. In limited experimentation, I found that using the most recent event’s OPR as an estimate for the MMSE prior wasn’t that helpful for FTC because the season is much longer and teams make substantial changes to their robots over a season. For example, one team may win an event in November with an OPR of 50 but come back with a whole new robot in March to compete at the state championship and get an OPR of >100 there.

. Scouting in FTC on average is much weaker and team sizes are much smaller, so there are more teams who might use OPR-type metrics, especially for alliance selections. It’s not unheard of to have teams of 2-4 people, making real-time scouting very difficult as the whole team might be competing in matches.

. In general, defensive stats are harder to compute because defensive benefits tend to be smaller. If a team can cause the opponents to score 20% fewer points through defense, then their “true” DPR-type value is likely to be around 20% of the opponent’s OPR. With smaller values, noise becomes more important and filtering out the noise just requires more match results. And given that we have events with very few match results in general, it ends up being hard to find defensive stats that aren’t overwhelmed by the noise of the event. That said, if there were ever events with 10x the number of matches, then defensive stats might be more viable. Or events where defense mattered as much as offense (e.g., if a good defensive robot could completely stop an opponent’s offense for the entire match).

1 Like

The model knows only the results of matches that happened prior to the match being predicted. I recalculated OPRs for each match prediction and used more of a “world OPR” formulation. So…closest to the latter.

Under the model I used, improvements over time would be represented as both a higher mean and a higher variance.

Interesting observation. Now that we’re in a post-bag era, this could be applicable to FRC.

We use match predictions as motivation to disprove them when we are expected to lose :wink:



I’d wager that component OPRs for the inner goal in 2020 were more accurate than team-scouted numbers.