Incorporating Opposing Alliance Information in CCWM Calculations

And now for the data where the single testing point was removed from the training data, then the model was computed, then the single testing point was evaluated, and this was repeated 80 times. So the results below are for separate training and testing data.

Stdev of prediction residual of the winning margins:
OPR: 34.6
CCWM: 30.5
WMPR: 30.4

(note that on the testing data from a few posts ago, WMPR had a Stdev of 15.9, so this is an argument that WMPR is “overfitting” the small amount of data available and that it could benefit from having more matches per team)

of matches predicted correctly (out of 79 possible)

OPR: 63
CCWM: 55
WMPR: 58

So here, the WM-based measures are both better at predicting the winning margin itself but not at predicting match outcomes. CCWM and WMPR have almost identical prediction residual standard deviations but WMPR is slightly better at match outcome prediction in this particular example for some reason.

Again, it would be great to test this on some 2014 data where there was more defense.

When I get a chance I’ll write a script for TBA API to grab their 2014 data and convert it to CSV. It may be a while.

Otherwise, if someone can provide the the 2014 qual match data in CSV format, I can quickly generate all the A and b for WMPR, OPR, and CCWM and post it here.

Sorry, I was trying to factor in the “strength of schedule” matches into what you are doing. Lots of us have kicked the dirt going “Yes 1640 is in the top, they played all their matches with 3 digit team alliances against all rookie alliances.”

But I spent an hour and did some “what IF runs” through the data and the result is pretty low. While low scoring opposing alliances do make a difference, about match 8,9,10 things swing the other way. So while we all hate the “random” selections, it seems to work out in the end.

I only did a small segment, with the full season available, feel free to run your own numbers.

I ran the numbers on the 2014 St. Joseph event. I checked that my calculations for 2015 St. Joseph match Ether’s, so I’m fairly confident that everything is correct.

Here’s how each stat did at “predicting” the winner of each match.

OPR: 87.2%
CCWM: 83.3%
WMPR: 91.0%

I’ve attached my analysis, WMPR values, A and b matrices, along with the qual schedules for both the 2014 and 2015 St. Joe event.

WMPR.zip (77.8 KB)


WMPR.zip (77.8 KB)

Attached are the A b and t CSV files from all official events from 2014 (Except for 2014waell, because some match scores are missing). The only difference, other than file name, is that the matches are not in sequential order within the A and b files, although they still correspond. The t files were generated by team number and the matches are in what ever order TBA decides to publish their JSON data (which I think is match number, alphabetically sorted).

Also attached are CSV files that have the OPR, CCWM, and GPR (if we are still calling it that) of each team at each event.

I can easily generate these files for any year if anyone would like.

A b T 2014.zip (156 KB)
GPR 2014.zip (223 KB)


A b T 2014.zip (156 KB)
GPR 2014.zip (223 KB)

Thanks!

The files appear to have the “new” A and “new” b. Could you by any chance also generate the “old” A and “old” b? I can generate the old A from the new A by stripping out the +1s and -1s from each row, but I can’t generate the old b from the new b since the new b only has match winning margins and I can’t get red and blue scores from that. Having both the old and new versions allows for quick and direct comparisons between OPR (which requires the old b), and CCWM and WMPR (which require the new b that can also be derived from the old one).

Maybe we can just call the old A and b by their names, and the new A and b something like Awm and bwm?

It looks like the proposal of calling it WMPR instead of GPR may be sticking. As the “G” in GPR, I would prefer to not have the metric be name-based. :slight_smile: (Though I did work at Qualcomm for a long time and saw what “The Viterbi Algorithm” did for Dr. Viterbi’s fame, I doubt there’s as much money in obscure robotics statistical computations.)

For data that is “overfit” you can sometimes improve the prediction performance on the testing data by simply scaling down the solution.

For fun, I computed the standard deviation of the prediction residual of the testing data not in the training set using the WMPR solution, 0.9WMPR, 0.8WMPR, etc. The standard deviation of the prediction residual of the winning margin for the test data for this particular tournament was minimized by 0.7*WMPR, and that standard deviation was down to 28.4 from 30.4 for the unscaled WMPR. So again, more evidence that the WMPR is overfit and could benefit from additional data.

This doesn’t change the match outcome prediction that some folks are interested in, since scaling all of the WMPRs down doesn’t change the sign of the predicted winning margin which is all the match outcome prediction is.

*Attached are A, b, and T for 2015 OPR, CCWM, and WMPR.

Also introducing a funky new metric, EPR, which uses 3 simultaneous equations for each match:

  1. r1+r2+r3-b1-b2-b3 = RS-BS
  2. r1+r2+r3=RS
  3. b1+b2+b3=BS

… and solves all the equations simultaneously.

*

2015 A, b, and T for OPR CCWM and WMPR.zip (1.55 MB)


2015 A, b, and T for OPR CCWM and WMPR.zip (1.55 MB)

:slight_smile: How do you interpret these new EPR values? They look closer to an OPR than a WM-measure as #2 and #3 both only factor in offense. How will you measure its performance? Perhaps by comparing the overall residual of all 3 combined vs. other ways of predicting all 3 (e.g., using WMPR for #1 and standard OPR for #2 and #3)?

I just tossed it out there for fun. Since equation1 is a linear combination of equations 2 and 3, I question its usefulness.

Compute the residuals of actual_alliance_scores minus alliance_scores_computed_using_xEPR:

residuals = b[sub]1[/sub]-Aopr*xEPR

…where b[sub]1[/sub] is column1 of the provided bopr.

AGPapa’s results are using the full data as training data and then reusing it as testing data.

On the same data doing the “remove one match from the training data, model based on the rest of the data, use the removed match as testing data, and repeat the process for all matches” method, I got the following results:

Stdev of winning margin prediction residual
OPR : 63.8
CCWM: 72.8
WMPR: 66.3

When I looked at scaling down each of the metrics to improve their prediction performance on testing data not in the training set, the best Stdevs I get for each were:
OPR0.9: 63.3
CCWM
0.6: 66.2
WMPR*0.7: 60.8

Match prediction outcomes
OPR : 60 of 78 (76.9 %)
CCWM: 57 of 78 (73.1 %)
WMPR: 62 of 78 (79.5 %)

Yeah! Even with testing data not used in the training set, WMPR seems to be outperforming CCWM in predicting the winning margins and the match outcomes in this single 2014 tournament (which again is a game with substantial defense). I’m hoping to get the match results (b with red and blue scores separately) for other 2014 tournaments to see if this is a general result.

[Edit: found a bug in the OPR code. Fixed it. Updated comments. Also included the scaled down OPR, CCWM, and WMPR prediction residuals to address overfitting.]

I wrote a script last night to download all the 2014 match results from TBA and generate Aopr, Awmpr, bopr, bccwm, and bwmpr for all the 2014 qual events. I’ll post them here later this morning.

Make sure to read the README file.

*

2014 A b T for OPR CCWM WMPR.zip (1.42 MB)
2014 raw qual match data from TBA.zip (116 KB)


2014 A b T for OPR CCWM WMPR.zip (1.42 MB)
2014 raw qual match data from TBA.zip (116 KB)

Iterative Interpretations of OPR and WMPR

(I found this interesting: some other folks might or some other folks might not. :slight_smile: )

Say you want to estimate a team’s offensive contribution to their alliance scores.

A simple approach is just compute the team’s average match score/3. Let’s call this estimate O(0), a vector of the average match score/3 for all teams at step 0. (/3 because there are 3 teams per alliance. This would be /2 for FTC).

But then you want to take into account the fact that a team’s alliance partners may be better or worse than average. The best estimate you have of the contribution of a team’s partners at this point is the average of their O(0) estimates.

So let the improved estimate be
O(1) = team’s average match score - 2*average ( O(0) for a team’s alliance partners).

(2average because there are 2 partners contributing per match. This would be 1average for FTC.)

This is better, but now we have an improved estimate for all teams, so we can just iterate this:

O(2) = team’s average match score - 2*average ( O(1) for a team’s alliance partners).

O(3) = team’s average match score - 2*average ( O(2) for a team’s alliance partners).
etc. etc.

This sequence of O(i) converges to the OPR values, so this is just another way of explaining what OPRs are.

WMPR can be iteratively computed in a similar way.

W(0) = team’s average match winning margin

W(1) = team’s average match winning margin - 2average ( W(0) for a team’s alliance partners) + 3average ( W(0) for a team’s opponents ).

W(2) = team’s average match winning margin - 2average ( W(1) for a team’s alliance partners) + 3average ( W(1) for a team’s opponents ).
etc. etc.

This sequence of W(i) converges to the WMPR values, so this is just another way of explaining what WMPRs are.

Currently we’ve mostly been seeing how WMPR does at a small district event with a lot of matches per team (a best-case scenario for these stats). I wanted to see how it would do in a worse case. Here’s how each stat performed at “predicting” the winner of each match 2014 Archimedes Division (100 teams, 10 matches/team).

OPR: 85.6%
CCWM: 87.4%
WMPR: 92.2%
EPR: 89.2%

WMPR holds up surprisingly well in this situation and outperforms the other stats. EPR does better than OPR, but worse than WMPR. I don’t really like EPR, as it seems difficult to interpret. The whole idea behind using the winning margin is that the red robots can influence the blue score. Yet EPR also models bf = b1 + b2 + b3, which is counter to this.

The data above is also done by using the training data as the testing data. Could you also run your method on it to check?

On another note:

I’ve also found that it’s difficult to compare WMPR’s across events (whereas OPR’s are easy to compare). This is because a match that ends 210-200 looks the same as one that ends 30-20. At very competitive events this becomes a huge problem. Here’s an example from team 33’s 2014 season.

WMPRs at each Event:
MISOU: 78.9
MIMID: 37.0
MITRY: 77.8
MICMP: 29.4
ARCHI: 40.8

Anyone who watched 33 at their second district event would tell you that they didn’t do as well as their first, and these numbers show that. But these numbers also show that 33 did better at their second event than at the State Championship. This is clearly incorrect, 33 won the State Championship but got knocked out in the semis at their second district event.
You can see pretty clearly that the more competitive events (MSC, Archimedes) result in lower WMPRs, which makes it very difficult to compare this stat across events.

This occurs because using the least-norm solution has an average of zero for every event. It treats all events as equal, when they’re not. I propose that instead of having the average be zero, the average should be how many points the average robot scored at that event. (So we should add the average event score / 3 to every team’s WMPR). This will smooth out the differences between each event. Using this method, here are 33’s new WMPRs.

MISOU: 106.3
MIMID: 71.7
MITRY: 112.7
MICMP: 86.0
ARCHI: 93.5

Now these numbers correctly reflect how 33 did at each event. MIMID has the lowest WMPR, and that’s where 33 did the worst. Their stats at MICMP and ARCHI are now comparable to their district events.

OPR has proliferated because it’s easy to understand (this robot scores X points per match). With this change, WMPR also becomes easier to understand (this robot scores and defends their opponents by X points per match).

Since this adds the same constant to everybody’s WMPR, it’ll still predict the match winner and margin of victory with the same accuracy.

Thoughts?

ARC 2014 Analysis.xlsx (126 KB)


ARC 2014 Analysis.xlsx (126 KB)

Hi Ether,

I like this idea of solving all 3 equations simultaneously. WMPR is a good improvement over CCWM because winning margin depends on who you are playing against (except in 2015), but I think EPR is even better. I will adopt it after you guys finish your validation on how well it predicts outcome.

I like this EPR because it is one number instead of two. It can replace both OPR and WMPR. The problem with OPR is similar to the problem with CCWM. It does not take into account of who the opponents were. If you play against stronger opponents, you may not be able to score as many points, especially in those years with limited game pieces. Equation 1 will take care of that. It will improve on the line fitting. To me, I would interpret EPR as how many points a team will be able to score with typical opponents on the field. This eliminates the error of match schedule strength due to luck of the draw. A team may have higher than normal score because they faced weaker opponents more often. That would skew the OPR numbers. I think EPR would be more accurate in predicting match scores. Would somebody like to test it out?

Another reason I like EPR is that it is easier to compute without all that
SVD stuff. I would prefer high school students to be able to understand and implement this on their own.

Yes, this makes sense if you want to compare results across events. Sounds like a good idea, though perhaps then it needs a different name as it’s not a WM measure? Also, if I continue to find that scaling the WMPRs down does a better job at winning margin prediction, that needs to be done before the average event score/3 is added in.

I’ll try to get to the verification on testing data in the next day or so.

I personally like this normalized WMPR (nWMPR?) better than EPR as the interpretation is cleaner: we’re just trying to predict the winning margin. EPR is trying to predict the individual scores and the winning margin and weighting the residuals all the same. It’s a bit more ad-hoc. On the other hand, one could look into which weightings result in the best overall result in terms of whatever measure of result folks care about.

I still am most interested in how well a metric predicts the winning margin of a match (and in my FTC android apps I also hope to include an estimate of “probability of victory” from this which incorporates the expected winning margin and the standard deviation of that expectation along with the assumption of a normally distributed residual). And using these for possible scouting/ alliance selection aids (especially for lower picks). But other folks may be interested in using them for other things.

Here’s a generalized perspective.

Let’s say you pick r1, r2, r3, b1, b2, b3 to minimize the following error
E(w)= w* (R-B) - ( (r1+r2+r3)-(b1+b2+b3) ) ]^2 + (1-w) * (R-(r1+r2+r3))^2 + (B- (b1+b2+b3))^2]

if w=1, you’re computing the WMPR solution (or any of the set of WMPR solutions with unspecified mean).

if w=0, you’re computing the OPR solution.

if w=1-small epsilon, you’re computing the nWMPR solution (as the relative values will be the WMPR but the mean will be selected to minimize the second part of the error, which will be the mean score in the tournament).

if w=0.5, you’re computing the EPR solution.

I wonder how the various predictions of winning margin, score, and match outcomes are as w goes from 0 to 1?

I’d still consider it a WM measure, as it doesn’t only take offense into account (like OPR). This WMPR tells us how many points this robot will score and how many it’ll take away from the opponents, that sounds like win margin to me, no? I don’t really like the nWMPR name, it’s long/slightly confusing. I think this thread should work out the kinks in this new statistic and call the final product “WMPR”.

In order for this to catch on it should

  1. Be better than OPR at predicting the winner of a match
  2. Be easy to understand
  3. Have a catchy name
  4. Apply very well to all modern FRC games
  5. Be easy to compare across events

I think that by adding in the average score and calling it “WMPR” we accomplish all of those things. 2015 is probably the strangest game we’ve had (and I would think the worst for WMPR), and yet WMPR still works pretty well.

I’m not sure why scaling down gives you better results at predicting the margin. I know you said it decreases the variance of the residuals, but does it also introduce bias? Would you propose a universal scaling factor, or one dependent on the event/game?

You actually don’t need to know anything about singular value decomposition to understand WMPR. It can be explained simply like this:

Ax=b

Where A is who played on what alliance in each match and b is the margin of victory in each match. x is the contribution from each robot to the margin. You’d expect x to be the inverse of A times b, but A is not invertable, so we use the pseudoinverse of A instead.

In Matlab the code is

x = pinv(A)*b

And that’s it, pretty simple.

I agree with you though that the ultimate test would be how it performs in predicting matches. I compared it to WMPR in the 2014 Archimedes division, although that was with using the training data as the testing data, so it’s probably not the best test.