An improvement to OPR

Just a thought I had recently while working on some interesting linear algebra problems. Given that OPR is generated is generated by This
(thanks to Ether for this)

All of the operations performed to compute OPR are completely functional for complex numbers, so there appears to me to be no reason why OPR could not be solved for complex numbers where the real part of the element in the matrix is the teleoperated score and the imaginary part of the complex number being autonomous score. This should yield an OPR matrix containing complex entries, which theoretically should have a least squares average for both teleop and auton.

What advantage does this have over simply calculating independently with just auto scores, and then just teleop scores, and then just climb scores? I’ve actually seen people do that.

And the results are fun sometimes because you get teams where their climb OPR is like -2.

Since the OPR calculation boils to down to

P = (A^-1) * S

where P is the OPR, A is the binary matrix denoting teams in each alliance and S is the alliance scores, then Frenchie461 is essentially advocating

Pt + Pai = (A^-1) * (St + Sai)

and since matrix multiplication is distributive

Pt + Pa*i = (A^-1) * St + ((A^-1) * Sa)*i

So you’ll end up with the same result as calculating each OPR component independently. You’ll get least-squares best fit for each component (as you would otherwise), but there won’t be any additional interaction gained between them. This makes sense, because the least-squares fitting part of the operation happens when taking the inverse of A, and isn’t affected by the value of S (whether real or complex) that it is post-multiplied by. Performance-wise, I would guess they would take about the same amount of time, assuming you’re not re-calculating the value of A^-1 when doing the calculations independently.

note: the inverse operation written ^-1 above becomes the generalized inverse for non-square cases of A

I’m not sure how this is an improvement; your code might be more concise but only if you’re working with a computational package like MATLAB.

It’s only an improvement so far as it’s more data than most teams usually compute, and it should be computationally faster than a pair of OPR calculations. Hypothetically, let’s say that OPR is a O(N^3) operation, with this method it’s N^3 to find auton and teleop rather than 2(N^3).

In the formula for OPR, namely [A][OPR]~[SCORE], [OPR] and [SCORE] need not be vectors - they can be matrices.

So instead of [OPR] being a Nx1 column vector, it can be an Nx2 matrix… and [SCORE] can be a (2M)x2 matrix.

The first column of [OPR] and [SCORE] can then be for TeleOp, and the second column for Autonomous.

This can be extended to any desired number of columns. For example use 4 columns for TeleOp, Autonomous, Climb, and Foul points.

Adding extra columns to [OPR] and [SCORE] increases the computation time only minimally, since the lion’s share of the computation is spent factoring [A]T[A].

Minor detour: does someone have comprehensive foul point data broken out from teleop scores? (because I would love them forever :])

Other than that, it’s a cool idea and would be a fun way to team the concept to new students around that level. In terms of data though, I don’t know that it brings something new. It’d actually complicate my work to do it that way, because the matrix case Ether describes allows the simultaneous calculation of endgame OPR by the same method (i.e. it’s not limited to 2, and we’re looking for at least 3 basically ever year).

Ether has created a spreadsheet with all of the twitter data from every match.
http://www.chiefdelphi.com/forums/showthread.php?t=116088&highlight=twitter+data

It has everything you need.

A few weeks ago I posted a least-squares analysis of foul points using Twitter data:

http://www.chiefdelphi.com/forums/showpost.php?p=1262753&postcount=1

As with any analysis using Twitter data, caveat utilitor.

Sorry, I should have been more specific about comprehensive (Twitter this year is missing a good chunk of MAR data). Thank you though, you’re correct, I’ll use this. Awesome analysis as always, Ether. Thanks!

[/and now back to your regularly scheduled thread]

Note that the OPR parameters are like any other statistical measure and have standard errors associated with those parameters. I haven’t seen the SEs posted with the OPR parameters, but I can tell you that the SEs are likely to be VERY large for so few observations–only 8 per team at the Champs, at most 12 in any of the regionals. The OPRs are useful indicators, and probably can be used in a pinch if you lack other data, but they are highly unreliable for live scouting. I wouldn’t bother calculating them on the fly at a competition. (The CCWM and DPRs appear to be even less reliable).

Our team had an OPR of 43 on Curie, yet our own scouting data showed we were scoring over 62 per match–quite a discrepancy. 4814 showed a CCWM ranked only 33rd yet went undefeated, and our defensive scouting data confirmed that they had substantial value added.

Obviously having your scouts get better data is the best solution, but there really is value in having the numbers as unreliable as they may be. Even if OPR for each phase of the game is +/- 50% from what that teams scores, that’s often close enough to figure out what your match strategy should be.

Also, DPR may be less reliable than OPR, but I’ve been told too many times that it’s totally meaningless. It isn’t. It just doesn’t mean what people think that it means. It’s purely how many points your opponent scores. So last year, when we were a robot that scored just enough points at championship to make it so that we were better off playing offense than defense our DPR was awful, while this year our DPR was great because opposing alliances were forced to defend us, and therefore have one less robot scoring.

I agree that the OPR is better than nothing–it still had a 0.91 correlation with our offensive scouting data. However, it tends to miss the outlier teams that may be hard to pick out otherwise. Our OPR was 27 points less than our actual offensive average–more than 50% off. But if your team has sufficient resources to calculate the OPR on the fly then you probably have enough to do full scouting. On the other hand, you might be relying on the OPR calculated with one of the apps tracking the competition, in which case then that’s the best you have.

4814 is a great case in point in Curie. Their OPR was only two points higher than their actual offensive output and their CCWM was only 9 points less than their OPR. They look like a defensive dud. Yet deeper digging shows that their defense was probably worth at least 30 points a match, and perhaps twice that.

How are you calculating their actual offensive output? Do you have six scouts monitoring every match, one scout recording the scoring for each team?

Yes.

Note: I was reviewing the 2834 database and think I found that the Championship OPRs are in error. The sums of the individual components often do not add up to the Total. (3824’s in Curie is off by 32.) A quick scan of the regionals finds in some cases no deviations whatsoever and <2 pts maximum in others. I suggest going back and recomputing the OPRs.

A possible explanation is that Ed took into account surrogate matches.

That is exactly the problem. OPR is calculated using all matches including surrogate matches. I would still want to calculate OPR this way. More data point is better even if the match does not count for that team.

Unfortunately team standing from FIRST website only adds up the total of the auto, teleop and climb points of non surrogate matches. This means when I solve A x = b, the matrix A contains the surrogate match while vector b does not contain surrogate match.

My proposal is to scale the value of b for the teams that have the surrogate matches before solving A x = b. Does anybody have any other suggestion?

Thank you for pointing out the issue with sum of individual categoty OPR do not add up to total OPR. I don’t know exactly what you mean. You made it sound like I do the calculations by hand. I can ask the computer to run it 100 times and I can guarantee you that I will get the same answer every time. :slight_smile:

As a short-term solution that sounds like a reasonable approach to try to make the best out of the data that is available.

Going forward, perhaps someone who has Frank’s ear and is interested in statistics could make an appeal to him to resolve the Twitter data issues. At the very least, store the data locally (at the event) and don’t delete it until it has been archived at FIRST. Then make the data available to the community.