[TBA Blog] OPR and You – Basic FRC Strategy

OPR and You – Basic FRC Strategy
by Tim Flynn

In the world of FIRST Robotics strategy, the term Offensive Power Rating (OPR) seems to continue to hold ground every year. How could it not — it’s an easy metric to find, being in FRC Spyder, the TBA app, and on The Blue Alliance itself. However, how good OPR is as a metric for team strength varies year to year and is heavily influenced by game design.

Check out the rest of Tim Flynn’s article here: https://blog.thebluealliance.com/2017/11/06/opr-you-basic-frc-strategy/

Thanks Tim for a good read! It explains the assumption of linearity well, which is so critical for deciding if OPR is a telling number or essentially random noise.

Great work by Tim! I think it might be worth emphasizing that while OPR is weak in particular years, the weakness is due to more than random noise; OPR’s failings usually manifest into specific, identifiable biases (often in favor of flexible bots and teams with specific types of schedules, for example). In such cases, it’s good to be aware of what biases are in effect so that, with enough context, each OPR measurement can be read with an appropriate shaker of salt (I, for one, always carry ample salt with me from my past experiences with OPR ;)).

As another point of comparison for linearity, I found how many matches my current Elo model correctly predicted each year and made a graph similar to yours. Since Elo assumes linearity just like OPR, the years with the lowest predictive power are also probably indicative of nonlinear scoring of games. Here is the graph: https://imgur.com/a/wrorG

Since 2006, we have from roughly most predictive power to least:
Tier 1: 2011, 2013, and 2016
Tier 2: 2009, 2012, 2014 (and 2015 if you count it)
Tier 3: 2006, 2007, 2008, 2017
Tier IDK: 2010

A few of the years (2010 in particular) can vary a lot depending on how you count ties. Here is the same graph except that ties now count as incorrect predictions: https://imgur.com/a/nxCtg

I generally prefer Brier scores to correct% as a metric in part because we don’t have to worry about ties so much.

Caleb, can you please plot ELO on the same scale as the OPR article? It would also be helpful to expand that scale to 0 - 100. Or at least 50 - 100.

Predicting the winner with a coin flip scores 50 on that chart (ignoring ties). Proper scale helps illustrate that when we use OPR we are leaning very heavily on very fine differences to make judgments on expected performance. It also shows ELO is a superior method.

Here you go: https://imgur.com/a/i294k
I took my best guesses for the values in the article’s graph. The 0-100 scale just had a bunch of white space, so I did 50-100. I can do the other one though if you want.

Predicting the winner with a coin flip scores 50 on that chart (ignoring ties). Proper scale helps illustrate that when we use OPR we are leaning very heavily on very fine differences to make judgments on expected performance.

Are you saying 65ish% correct predictions isn’t substantially better than a coin flip? I’m not upset or anything, just trying to figure out what you mean by this.

It also shows ELO is a superior method.

I don’t think that is a fair takeaway based on these graphs.
In the first place, we might not even be comparing apples to apples. I have no idea how they handled ties in their results, or what their methodology was for creating their predictions. My guess is that they used continuously updating predictions using ixOPR (predicted contributions), but even if I am right I have no idea what seed values they used.

Furthermore, my current Elo model uses uses 7 parameters to improve predictive power, while nearly every OPR prediction I have seen uses 1 or 2. I’m pretty sure OPR’s predictive power can be improved (for example, by choosing better seeds for the start of the event or finding a reasonable way to incorporate playoff performance), and with 7 parameters I would expect its predictive power to approach or exceed Elo’s.

Finally, the Elo model I am using hasn’t actually been tested yet in a regular season. My Elo system for the 2017 season had 5 parameters, and had slightly less predictive power than a relatively simple OPR system I cooked up. Getting something to have high predictive power on training data is not the same thing as getting it to have high predictive power on data it hasn’t seen yet. I’ve been pretty good about keeping my training and testing data separate, but I wouldn’t trust someone who says they have something more predictive than OPR when they haven’t actually put their model out there for others to test, so you all shouldn’t trust me yet either.

I’m planning to focus some serious effort before kickoff on trying to improve OPR predictions, when I do so I’ll post some comparisons to Elo.

If you could predict football against the spread at a 65% rate, you’d be a very wealthy person - eventually. You’d have to be financially able to weather substantial numbers of wrong predictions, of course.
But even though it’s a good success rate - statistically speaking - the full-scale axis helps show that it’s far from a reliable indicator. I only bring it up as there are plenty of students (and parents) who start to take Blue Alliance OPR-based match predictions a little too seriously and wonder why it isn’t right more often.

I trust you anyway. :slight_smile: Looking forward to next season’s statistics thesis. Keep up the good work.

Ties were treated as a red win (which is terrible but it was easy to implement). You’re correct that ixOPR was used, and used no seed values for week 1 (later weeks were based on previous weeks). This is by no means the best prediction model TBA has used, but that wasn’t the point of the article.

There’s a reason why those pages don’t have direct links to them… :stuck_out_tongue: Maybe in the future.

Got it, well since you want to see a bunch of whitespace to prove your point, here is my previous graph on a 0-100 scale: https://imgur.com/a/APGY1

I trust you anyway. :slight_smile: Looking forward to next season’s statistics thesis. Keep up the good work.

:slight_smile: Thanks.

Understandable. I forgot to mention the possibility that this might not have been TBA’s standard prediction model. I wasn’t originally meaning to make a predictive power comparison, I was just looking to show another perspective on linearity in game scoring.