Hybrid and Robust OPR

Motivated by a substantial difference between standard OPR and scouting point contribution estimates for my team at the 2020 Palmetto regional, I developed and studied some modifications to OPR with the goals of getting it closer to scouting estimates and also making it more accurate for playoff match score prediction.

Hybrid OPR

For Infinite Recharge, the field management system records several per-robot match actions such that those scoring components do not need to be estimated with OPR. This enables hybrid OPR, which combines exactly known per-robot scoring modes from the FMS with component OPRs for scoring modes recorded only for the alliance.

Robust OPR

The results of some matches can disinform the OPR calculation. For example, my team was involved in an alliance where one robot did not move for the entire match. To make the situation worse, that robot turned out to be the highest scoring robot at the event. The dramatically lower score than “expected” in that match significantly reduced the OPR of all three alliance robots, because the math has no way of knowing that the problem was due to only one robot. If outlier alliance performances like this could be identified and removed from OPR calculation, the resulting robust OPR would hopefully be a better estimate of point contribution and be more useful in predicting future scoring, such as in playoff matches.

Hybrid, robust, and hybrid robust OPR were studied for all 52 events completed for Infinite Recharge. Each technique improved playoff alliance score prediction in a statistically significant way. Hybrid reduced the sum of squared errors for event playoff score predictions by an average of over 6% across all events. Robust reduced SSE by 2.3%. Combined hybrid robust reduced SSE by nearly 7%. While my team does not claim to conduct perfect scouting, especially with the difficulty of discerning inner vs outer port scores and determining which shot bounce-outs came from which robot, it was good to see that hybrid robust OPR reduced the sum of squared difference to scouting point contribution by a factor of almost 3 compared to standard OPR at 2020 Palmetto.

The accuracy advantage of hybrid OPR will vary from season-to-season depending on how much of the scoring can be directly ascribed to individual robots based on what is recorded in the FMS. However, for the 2021 season and a replay of some form of Infinite Recharge, it seems likely that hybrid OPR will continue to have a significant advantage over standard OPR.

Plenty of detail on hybrid and robust OPR is available in a paper.

A GitHub repo also contains supporting figures, result data files, and Python code to generate hybrid, robust, and hybrid robust OPR.

6 Likes

Interesting paper. What you’re calling “hybrid OPR” has been discussed in the past. For some metrics it does make sense, but part of the point of OPR is that one team affects the other teams around it. If there’s a team that’s good at convincing their alliance partners to let them hang but their hanger actually works less frequently than average, they can actually have a negative hang OPR (because it would be expected that another alliance partner would hang a certain percent of the time). Same thing goes for climbing last year, where only one robot could get the level 3 climb. On the other hand, ramp bot last year didn’t actually score any climb points themselves but effective ones could greatly improve their alliance partners’ ability to climb. In 2018, many teams would help their alliance partners program basic autonomous modes to make sure they got the auto RP; teams that did this often could have an auto RP higher than the point value for one robot because they improved their alliance partner’s ability to score.

Regarding what you’re calling “robust OPR”, I think your testing procedure is flawed. If I understand correctly, the k0 value of 1.6 was determined by running all of the 2020 events and finding the value that gives the best results. Then to test whether there is an improvement over traditional OPR you used the same events used for finding the k0. If this is the case, your model’s “predictions” are based on the results you’re trying to predict, so you get an unfair improvement. You want to split the data into training and testing groups, where you use the training group of matches to find the proper k0 then test the model on a separate group of testing matches.

2 Likes

The power of OPR is that it doesn’t try to accurately represent FMS scores. Consider the situation where a robot has a buddy-lift for endgame (popular in 2018, 2019, kind of 2017, and a little bit in 2020). In all ten qualification matchs, team A lifts themselves and a partner. FMS would award Team A with 1x endgame points, but OPR would “award” this team 2x endgame “points” and their partners close to 0x. Your “hybrid OPR” would have team A with 1x endgame “points” and their partners with 0.1x (matching the FMS). These values are not reflective of each team’s actual ability. Team A actually earns 2x endgame points each match, even though FMS only reports them as earning 1x.

Similarly, you may run into issues with your “Robust OPR”. It can be very dangerous to remove outliers, they usually exist for a reason. In this case, robots break down for a reason, it’s not random chance. OPR captures this by including those matches. Teams more likely to breakdown will have lower OPRs. Teams that can draw more foul points will have higher OPRs. Again, OPR doesn’t really try to represent an FMS score. It might be helpful to think of it as a relative ranking system, rather than a measure of “points”.

You might be interested in looking more into OPR and discussions of the non-tangibles that affect it. That being said, if you and/or your team find these measures useful, please use them. There is no right answer to scouting and everybody has their own approach.

1 Like

This is a nicely put together paper for (what I am expecting) is the closest to a scientific manuscript you have written so far. Very nice work on being though.

Admittedly I do not have time at the moment to read through it so I can’t speak to everything, but at face value it looks good. There are bits that seem a bit wordy and non-clinical*

As for points others have brought up regarding methodology, you need to be careful of two things off the top of my head: selection bias and non equal variance in your tests.**

*Subjective, obviously. Would inevitably be cleaned up as you gain more experience in writing in this format

** again, haven’t had time to read it all, so you may have touched on this.

The danger was noted in the paper. That’s why the concept was tested on a large sample of events. Given actual data, is it better to remove very large outliers or to just live with them?
Agreed that a team more likely to break down will have a lower OPR. However, teams that played with that team in a match where it broke down will also have a misleadingly lower OPR through no fault of their own. All the OPR math can do is spread the low alliance score to all three robots. Even worse, it also spreads the scoring done by the operating robots to the one broken down and inflates its OPR.

Totally valid point. The question I was looking to answer is if OPRs would be more predictive and more in line with scouting results if we used standard OPR which would better handle a case like you mention or if we used hybrid OPR which reduces error for a much larger number of cases where robots are involved in more common scenarios. No kind of OPR is ever going to be right, but what kind is closer more often?

Setting aside the limited data per team / limited number of teams at events for a moment and assume an event with 100s of teams and 10s of matches per team :

Outliers can be tricky here, you can do 1 of two things (especially for the case of dead robot)
Non zero means: just use the data that is above a certian minimum threshold, acknoledge you are moving away from Ratio data and have to settle for interval. Since all we care about is score differential we really don’t care about ration data in most cases. This apprach can be a bit tricky because FIRST does not publish any “dead robot/lost connection” info you could fold in, but there are some published stats that can be used as proxy data (didn’t complete autonomous movement/endgame parking for example)
Throw out the high and low values: We get it, even the best teams can be dead on the field for no obvious reason (just look at Einstein playoffs EVERY SINGLE FREAKING YEAR). But since consistency is king just throw out your high and lows. This is done in all sorts of team sports tournament layouts, so I don’t see why it wouldn’t be broadly applicable here. So just run the OPR on the inner 8 deciles.

However, we do not have the benifit of large amounts of data on each team for each event, so the above is kinda… meh.

Just my $0.02

In this case, the major “training” is the fitting of all the OPR parameters in the OPR model. The “testing” is then the use of those parameters to predict scores in matches that were not used in training. The ko parameter was not optimized on the OPR training data. Nonetheless, I understand your concern that ko was optimized on the full set of playoff matches rather than having that dataset also split into training and test.

I re-ran the analysis with training and test for ko optimization. The training set was every other event alphabetically by event key. The test set were the events not in the training set. There were 26 events in each of the training and test sets. The optimization answer was the same with the training set.

The ratio of robust SSE to standard SSE on the test set turns out slightly better than on the full playoff dataset.

Making different selections of training and test sets for the optimization is just sampling from the same distribution for average improvement. Sometimes the average improvement will be better for the test set compared to the full set as was the case here. Sometimes the average improvement for the test set will be worse than the full set. Using the full set of data not used in computation of OPR provides the best estimate of the average.