View Full Version : Incorporating Opposing Alliance Information in CCWM Calculations
wgardner
24-05-2015, 18:26
This post is primarily of interest to stat-nerds. If you don't know or care what OPR and CCWM are and how they're computed, you probably want to ignore this thread. :)
---------------------------------------------
It has bothered me for a while that the CCWM calculation does not incorporate any knowledge of which teams are on the opposing alliance, which would seem to be important for a calculation involving the winning margin between the alliances.
The standard calculation is performed as follows:
Let's say that in the first match, the red alliance is teams 1, 2, and 3 and the blue alliance is teams 4, 5, and 6. Let's let the red score in this match be R and the blue score be B. We're trying to find the expected contributions to the winning margins for each team. Let's call these contributions C1, C2, ... for teams 1, 2, ...
The standard CCWM calculation models the winning margin of each match twice (!), first as
R-B = C1 + C2 + C3 (ignoring that teams 4, 5, and 6 are involved!)
and then again as
B-R = C4 + C5 + C6 (ignoring that teams 1, 2, and 3 are involved!)
It finds the least squares solution for the Ci values, or the values of the Ci numbers that minimize the squared prediction error over all of the matches.
This solution in matrix form is solving
(A' A) C = A' M
where (A' M) ends up being the vector of the sum of the winning margins for each team's matches, and
(A' A) is a matrix with diagonal elements equal to the # of matches each team plays and the off diagonal elements equal to the number of times teams i and j were on the same alliance.
Note again that nowhere does this factor in if teams were on opposing alliances (!). If a particular team on the blue alliance always scores 1000 points, that will make the winning margin for the red alliance be awful, and IMHO, that should be taken into account.
So, here's my proposal.
Instead of modeling each match outcome twice as above, do it only once as follows:
R-B = (C1 + C2 + C3) - (C4 + C5 + C6)
(the left set is all the teams on the red alliance and the right set is the blue teams).
Now, we're factoring in both your alliance partners' abilities AND your opponent's abilities.
If you go through the entire derivation, you end up with a similar set of equations, but the new A matrix has a 1 in the i,jth spot if the jth team was on the red alliance in match i, a -1 if the jth team was on the blue alliance in match i, and 0 otherwise.
The solution has the same format, i.e. solving the following formula
(A' A) C = A' M
(A' M) ends up being exactly the same as before even though the A and M are a little different: (A' M) is just the sum of the winning margins for each team's matches.
But now (A' A) is a little different. The diagonal elements are the same, but the off diagonal elements are equal to the number of times teams i and j are on the same alliance minus the number of times they're on opposing alliances (!). So now opposing alliance contributions are included.
One oddity emerges from this formulation: the new (A' A) is not invertible (!). This is because if you add any constant to all of the teams' contributions, the winning margins are the same. For example, if you think the red teams contributed 10, 20, and 30 points each and the blue teams contributed 40, 50, and 60 points each, you'd get exactly the same winning winning margins if the teams' contributions were 110, 120, 130, and 140, 150, and 160, or even 1010, 1020, 1030, and 1040, 1050, and 1060.
But the easy way around this is to just find the minimum norm solution (one of the many solutions) using, say, the singular value decomposition(SVD), and then subtract off the mean from all of the values. The resulting combined contributions to winning margin values represent how much a team will contribute to its winning margin compared to the average team's contribution (which will be 0, of course).
Thoughts? This seems like an improvement to me, but I'd be curious to hear what other stat-nerds like me have to say on the matter. And if somebody else has already looked into all of this, accept my apologies and please help educate me. :)
RyanCahoon
25-05-2015, 01:24
I'd love to see somebody actually quantitatively compare the predictive power of all of these different metrics across the various games. For a given year, take the games from the first half of every competition's qualifying rounds, compute a stat for every team and measure it's ability to predict the outcome of the second half of the qualifying matches. Each of the last 4 years should give a sample size of ~7,500 matches and ~2,500 teams. EDIT: these counts are for FRC. An analysis for FTC would be interesting as well.
And also look at the opposing alliance prior record to get to this match. How did they get their OPR?
Wait, Dean is a big data guy and he wants us to dig into all the past matches? This entire robot thing is just a ruse to get interested in Big Data?
((I worked at one time for a MLB stat company and all of the stats are important. Weather, prior events, crowd sizes, etc. We have just touched the surface of stat possibilities))
Caleb Sykes
25-05-2015, 16:26
And also look at the opposing alliance prior record to get to this match. How did they get their OPR?
I'm having trouble understanding this sentence. Could you please clarify?
EricDrost
25-05-2015, 17:01
I'd love to see somebody actually quantitatively compare the predictive power of all of these different metrics across the various games. For a given year, take the games from the first half of every competition's qualifying rounds, compute a stat for every team and measure it's ability to predict the outcome of the second half of the qualifying matches.
After week 1 this year, normal OPR was the best (non-manual) score predictor. That's what I used throughout the season, though by CMP a different metric could have been better.
More useful than predicting the outcome of the second HALF of the matches is taking the Day 1 matches and seeing how they predict the Day 2 matches. This is more than half the matches, so it's bound to be a *better* predictor than half the matches because of the increased sample size too.
I'm having trouble understanding this sentence. Could you please clarify?
I believe he's talking about using an Opponent's WLT to determine how much defense (or in 2015, can starvation, noodles in landfill, etc) was encountered in the matches a team's OPR was based off of.
I don't suspect defensive ability and WLT have a very strong correlation though. I'd like to see that correlation proved before I try to "normalize" a team's OPR with this metric.
Thoughts? This seems like an improvement to me, but I'd be curious to hear what other stat-nerds like me have to say on the matter. And if somebody else has already looked into all of this, accept my apologies and please help educate me. :)
Won't you simply end up with OPR? OPR does R = C1 + C2 + C3 and B = C4 + C5 + C6. Your suggestion subtracts the second equation from the first, so it's really the same. Or will the solution be different because of the different error term being minimised?
wgardner
25-05-2015, 19:04
Won't you simply end up with OPR? OPR does R = C1 + C2 + C3 and B = C4 + C5 + C6. Your suggestion subtracts the second equation from the first, so it's really the same. Or will the solution be different because of the different error term being minimised?
I don't think so. Minimizing (R - (C1+C2+C3))^2 + (B - (C4+C5+C6))^2 is different than minimizing ((R-B) - ((C1+C2+C3)-(C4+C5+C6))^2
Another way to look at it: Say that all Team 2 does is play defense against the opposing alliance and reduce its score by 25 points every time it plays. C2 should be 25 (minus any mean term). But if you only look at C1+C2+C3 as a way to predict R, C2 will look like zero because C2 doesn't affect R. But C2 does affect (R-B) by making B smaller by 25 points, so the new metric should be able to capture this effect.
So, here's my proposal.
Instead of modeling each match outcome twice as above, do it only once as follows:
R-B = (C1 + C2 + C3) - (C4 + C5 + C6)
I dub this metric GPR :-)
I ran the numbers for 2015 MISJO (40 teams, 80 qual matches, no DQs, no surrogates).
I dub this metric GPR :-)
I ran the numbers for 2015 MISJO (40 teams, 80 qual matches, no DQs, no surrogates).
EDIT2: Ignore this, it uses incorrect numbers.
I checked out how these stats relate to the match results.
Your numbers correctly predicted* the outcome of 66% of matches, while OPR and CCWM both predicted the correct winner 84% of the time.
It makes sense that this stat doesn't work for a game where the other alliance can't affect your score. Can you run the numbers for a 2014 event so we can see if it's better with that?
*I don't like these sorts of "predictions" because they occur with numbers obtained after the fact. Could you also run numbers for the first ~60 qual matches and then we'll see how they do on the next 20?
EDIT: Looking through the numbers a little, more, I can see that this new stat gives radically different evaluations to a few teams than OPR and CCWM. Look at these select teams:
Team GCCWM OPR CCWM
3688 -22.0 44.7 23.5
2474 -2.3 54.2 21.8
1940 8.4 5.4 -22.5
The first two are very undervalued by GCCWM while the last one is very overvalued. These aren't the only egregious differences.
Here are the correlation coef for each pair of metrics:
OPR-CCWM: 0.82
GCCWM-CCWM: 0.39
GCCWM-OPR: 0.35
But the easy way around this is to just find the minimum norm solution (one of the many solutions) using, say, the singular value decomposition(SVD), and then subtract off the mean from all of the values. The resulting combined contributions to winning margin values represent how much a team will contribute to its winning margin compared to the average team's contribution (which will be 0, of course).
Could you explain a bit more how SVD will help you find the minimum norm solution? Unfortunately I'm only familiar with SVD in terms of geometric transformations.
saikiranra
26-05-2015, 01:55
Attached is the GPR calculated for all 8 championship divisions this year, with OPR, CCWM, and DPR also given as reference (I did not take in account surrogate matches and such). I can generate a new one of these for any event that has data on TBA.
Could you explain a bit more how SVD will help you find the minimum norm solution? Unfortunately I'm only familiar with SVD in terms of geometric transformations.
I believe you can use the SVD to find the psuedo inverse matrix using the Moore-Penrose (http://en.wikipedia.org/wiki/Moore%E2%80%93Penrose_pseudoinverse) method. I used a Moore-Penrose psuedo matrix method in the Python Numpy library, although I believe that the general solve method in Numpy does something similar. I'm also new at this entire statistics world, so I'm sure one of the guru's here can fill us in. :)
wgardner
26-05-2015, 06:25
Long post to come later today with more detailed thoughts and examples (I hope), but some quick initial thoughts are:
1. Please drop the "G" from everything. :ahh: Since I can't think of anything better, may I suggest we call it CCWM with Opposing Alliances, or CCWMOA?
2. As other's noted, CCWM and CCWMOA aren't well suited to the 2015 FRC game because there's virtually no defense and thus virtually no benefit to looking at winning margins over just final offensive scores. Can we look instead at 2014, which had a lot of defense?
3. I'm wondering if the method may be doomed due to insufficient data? With CCWMOA, we only get 1 data point per match, while OPR and CCWM get 2 data points per match. In Ether's example tournament, there were 40 teams and 80 matches, so CCWMOA is trying to use 80 noisy data points to find 40 values, while OPR and CCWM are trying to use 160 noisy data points to find 40 values. Comparing CCWM and CCWMOA, I argue that CCWMs data values are noisier for reasons that I said in my first post, but maybe fitting with 160 noisier data points still gives you a better result than fitting with only 80 data points that are cleaner?
This is like trying to find the slope of a line that you know goes through the origin using 2 noisy points vs 4 noisier points. Which one is better will depend on the ratios of the noises. I hope to think about this more and comment more further with some data to back it up, but I'd be curious to hear the thoughts of other folks too.
Thanks everybody for the discussions!
wgardner
26-05-2015, 06:45
I believe you can use the SVD to find the psuedo inverse matrix using the Moore-Penrose (http://en.wikipedia.org/wiki/Moore%E2%80%93Penrose_pseudoinverse) method. I used a Moore-Penrose psuedo matrix method in the Python Numpy library, although I believe that the general solve method in Numpy does something similar. I'm also new at this entire statistics world, so I'm sure one of the guru's here can fill us in. :)
Yeah, that. For a real positive semi-definite symmetric matrix (like A'A for any A), the SVD is something like U D U' where U is orthogonal and D is diagonal. In our case, A'A is not full rank, so the last diagonal value of D is 0. Using the method in the link above, the pseudo-inverse is computed as U E U' where E is diagonal with elements Ei = 1/Di except where Di=0, in which case Ei=0 too. This makes the product of a matrix and its pseudo-inverse equal to U' F U where F is diagonal with Fi = 1 if Di is non-zero and Fi=0 if Di = 0.
So if the original matrix is full rank, the pseudo-inverse is just the regular inverse and the product of the matrix and pseudo-inverse is U U' = I. If the original matrix is not full rank, then multiplying by the pseudo inverse basically zeros out the component of the vector in the zero-rank projection and multiplies by the inverse of the remaining subspace. Or, the product of the matrix and its pseudo-inverse is U U' except that you replace the vector of U corresponding to Di=0 with a zero vector.
In this case, the zero-rank projection (or the row vector of U' that corresponds to the Di that is zero) is something like 1/T[ 1 1 1 1 ...1] which computes the mean, because the direction in the C vector corresponding to its mean is the direction that cannot be determined.
One other formulation for CCWMOA would just be:
if we have T teams, have T-1 unknown values C1, C2, ..., C(T-1) and set CT = -Sum(C1, C2,... C(T-1)) in all of the equations (thus enforcing that all T values of Ci are zero mean). Then we only have T-1 equations with T-1 unknowns and everything is full rank. This is just another way of saying we want to find the values of C1, C2, ... CT that minimize the prediction error subject to the constraint that the resulting set of Ci values have zero mean.
wgardner
26-05-2015, 07:12
One more interesting tid-bit:
For the 2015 FRC game, we would expect that the underlying OPR and CCWM/CCWMOA to be identical except for a mean as a team's only ability to contribute to a winning margin is with offense. The fact that these numbers do deviate substantially (or, that DPR varies as much as it does) shows that we aren't close to having enough data to really get precise estimates of underlying parameters.
Edit: This may not be entirely true. Litter and the initial race for the center cans can both cause one alliance to adversely impact the other alliances score, so it's not 100% true to say that the only way to contribute to the winning margin in the 2015 FRC game was though increasing you own alliance's score.
Yeah, that. For a real positive semi-definite symmetric matrix (like A'A for any A), the SVD is something like U D U' where U is orthogonal and D is diagonal. In our case, A'A is not full rank, so the last diagonal value of D is 0.
With rounding error, it will not be exactly zero. For the 2015 MISJO event, it is 3.43E-16. So your code needs to use a threshold.
Using the method in the link above, the pseudo-inverse is computed as U E U' where E is diagonal with elements Ei = 1/Di except where Di=0
use abs(Di)<threshold instead of Di==0
One other formulation for CCWMOA
We need a better acronym. That's too awkward.
... would just be:
if we have T teams, have T-1 unknown values C1, C2, ..., C(T-1) and set CT = -Sum(C1, C2,... C(T-1)) in all of the equations (thus enforcing that all T values of Ci are zero mean). Then we only have T-1 equations with T-1 unknowns and everything is full rank. This is just another way of saying we want to find the values of C1, C2, ... CT that minimize the prediction error subject to the constraint that the resulting set of Ci values have zero mean.
I ran the numbers for all 117 events in 2015 and found that the following two computational methods yield virtually identical results for min L2 norm of b-Ax:
Method 1
1a) = svd(A)
1b) Replace the diagonal elements of S with their reciprocals, except when abs(Sjj)<threshold (I used 1e-4 for threshold), in which case make Sjj zero.
1c) compute x = V*S*(U'*b)
[U]Method 2
2a) N = A'*A
2b) d= A'*b
2c) compute x = N\d ..... (Octave mldivide notation)
2d) compute m = mean(x)
2e) subtract m from each element of x
Notice Method 1 factors A, not A'A, resulting in less rounding error.
wgardner
26-05-2015, 09:20
Thanks Ether!
I'd love to see the residual of the predictions of the winning margins using OPR, CCWM, and whatever you want to call the new thing (how about WMPR, if you don't like CCWMOA)? It would be interesting to see the average squared winning margin prediction residual and the distribution of the prediction residual (like you did with your L1 v. L2 comparison) for both 2015 FRC tournaments (where defense was essentially non-existent) and 2014 FRC tournaments (where defense mattered more).
It might also be interesting to see if tournaments with lots of matches per team are different from tournaments with few matches per team.
I'm puzzled by AGPapa's finding that the match outcomes (such as they were in 2015) are not predicted as well with the new measure. While minimizing the prediction error in the winning margin isn't the same as predicting the match outcomes, I'd expect the match outcome results to be fairly similar. Thoughts? (BTW, I haven't verified AGPapa's finding, so I suppose there's a chance that there's a bug in the code he used to predict the match outcomes?)
[Edit: AGPapa later found an error with his initially reported results.]
And if you had a lot of time and/or processing power on your hands, I'd also love to see how well the winning margins are predicted for matches that aren't in the training data. Given that we're so low on data, I'm reluctant to suggest the "model with the first 1/2 of the data, then test with the second 1/2 of the data" proposals as we may not have enough data to get a reliable model as it is. Instead, I'd suggest the "model with all of the data except for match 1, then test with match 1, then remodel with all of the data except match 2, then test on match 2, etc." approach as then the data size is almost the same but you're testing on data that's not in the training set.
I'd be happy to do this in scilab too, especially if you could get some 2014 tournament data in your nice formats.
BTW, I computed the new metric this morning using the data from the MISJO tournament you provided and got the same results for the new measures (using the new A and b that you provided), so that confirms that we're talking about the same thing. :)
wgardner
26-05-2015, 10:24
The following results are from my scilab sim for the 2015 MISJO tournament. Again, this should be a BAD tournament year for both CCWM and WMPR as there was little defense involved.
For OPR, the winning margin was predicted by computing the prediction for the offensive score of the red alliance and subtracting the prediction for the offensive score of the blue alliance from it.
For CCWM, the winning margin was predicted by computing the prediction of the winning margin of the red alliance and subtracting the prediction for the winning margin of the blue alliance from it.
For WMPR, the winning margin was computed the same was as in CCWM, but using the values computed using the WMPR derivation instead of the CCWM derivation.
Standard deviations of the prediction residuals of the winning margins:
OPR: 25.6
CCWM: 21.1
WMPR: 15.9
(interesting that CCWM and WMPR both do better than OPR, even in a game with "no defense." Perhaps the race to get the center cans acts like defense in that a team that does well at that may cause the opposing alliance to have a lower score? Or litter?)
The tournament had 80 matches but one match appeared to be a tie, so there were only 79 matches where the match outcome could be predicted.
# of match outcomes predicted correctly:
OPR: 67
CCWM: 66
WMPR: 68
(This is all on the training data (!). I'm not using data not in the training set yet.)
# of match outcomes predicted correctly:
OPR: 67
CCWM: 66
WMPR: 68
Can you please attach match-by-match predictions?
I'm getting:
OPR: 67
CCWM: 66
WMPR: 53
Are you using the same numbers in the MISJO_GPR.CSV file that Ether attached? A difference with numbers we're using seems to be the only explanation for this difference since our OPR and CCWM predictions match up.
In the previously attached spreadsheet I erroneously awarded the blue alliance a victory in match 69, it should have been a tie.
wgardner
26-05-2015, 10:43
Can you please attach match-by-match predictions?
I'm getting:
OPR: 67
CCWM: 66
WMPR: 53
Are you using the same numbers in the MISJO_GPR.CSV file that Ether attached?
Attached.
Col1 is WMPR based WM prediction.
Col2 is CCWM based WM prediction.
Col3 is OPR based WM prediction.
Col4 is actual match WM.
I'm using my sim to compute the WMPR values, which I earlier verified matched Ether's values (at least the min and max were identical).
Attached.
Col1 is WMPR based WM prediction.
Col2 is CCWM based WM prediction.
Col3 is OPR based WM prediction.
Col4 is actual match WM.
I'm using my sim to compute the WMPR values, which I earlier verified matched Ether's values (at least the min and max were identical).
Thanks, it turns out that we were using different WMPR values. I redownloaded Ether's attachment and it contains different values. Maybe the initial download was corrupted? I'm baffled. Anyways, I can confirm that after redownloading the correct WMPR values that I get your results.
The following results are from my scilab sim for the 2015 MISJO tournament
Attached are WMPR A and b for all 117 2015 events.
OPR A and b are posted here (http://www.chiefdelphi.com/media/papers/download/4461).
wgardner
26-05-2015, 11:17
And now for the data where the single testing point was removed from the training data, then the model was computed, then the single testing point was evaluated, and this was repeated 80 times. So the results below are for separate training and testing data.
Stdev of prediction residual of the winning margins:
OPR: 34.6
CCWM: 30.5
WMPR: 30.4
(note that on the testing data from a few posts ago, WMPR had a Stdev of 15.9, so this is an argument that WMPR is "overfitting" the small amount of data available and that it could benefit from having more matches per team)
# of matches predicted correctly (out of 79 possible)
OPR: 63
CCWM: 55
WMPR: 58
So here, the WM-based measures are both better at predicting the winning margin itself but not at predicting match outcomes. CCWM and WMPR have almost identical prediction residual standard deviations but WMPR is slightly better at match outcome prediction in this particular example for some reason.
Again, it would be great to test this on some 2014 data where there was more defense.
Again, it would be great to test this on some 2014 data where there was more defense.
When I get a chance I'll write a script for TBA API to grab their 2014 data and convert it to CSV. It may be a while.
Otherwise, if someone can provide the the 2014 qual match data in CSV format, I can quickly generate all the A and b for WMPR, OPR, and CCWM and post it here.
I'm having trouble understanding this sentence. Could you please clarify?
Sorry, I was trying to factor in the "strength of schedule" matches into what you are doing. Lots of us have kicked the dirt going "Yes 1640 is in the top, they played all their matches with 3 digit team alliances against all rookie alliances."
But I spent an hour and did some "what IF runs" through the data and the result is pretty low. While low scoring opposing alliances do make a difference, about match 8,9,10 things swing the other way. So while we all hate the "random" selections, it seems to work out in the end.
I only did a small segment, with the full season available, feel free to run your own numbers.
I ran the numbers on the 2014 St. Joseph event. I checked that my calculations for 2015 St. Joseph match Ether's, so I'm fairly confident that everything is correct.
Here's how each stat did at "predicting" the winner of each match.
OPR: 87.2%
CCWM: 83.3%
WMPR: 91.0%
I've attached my analysis, WMPR values, A and b matrices, along with the qual schedules for both the 2014 and 2015 St. Joe event.
saikiranra
26-05-2015, 16:23
When I get a chance I'll write a script for TBA API to grab their 2014 data and convert it to CSV. It may be a while.
Otherwise, if someone can provide the the 2014 qual match data in CSV format, I can quickly generate all the A and b for WMPR, OPR, and CCWM and post it here.
Attached are the A b and t CSV files from all official events from 2014 (Except for 2014waell, because some match scores are missing). The only difference, other than file name, is that the matches are not in sequential order within the A and b files, although they still correspond. The t files were generated by team number and the matches are in what ever order TBA decides to publish their JSON data (which I think is match number, alphabetically sorted).
Also attached are CSV files that have the OPR, CCWM, and GPR (if we are still calling it that) of each team at each event.
I can easily generate these files for any year if anyone would like.
...the following two computational methods yield virtually identical results for min L2 norm of b-Ax:
Method 1
1a) = svd(A)
1b) Replace the diagonal elements of S with their reciprocals, except when abs(Sjj)<threshold (I used 1e-4 for threshold), in which case make Sjj zero.
1c) compute x = V*S*(U'*b)
[U]Method 2
2a) N = A'*A
2b) d= A'*b
2c) compute x = N\d ..... (Octave mldivide notation)
2d) compute m = mean(x)
2e) subtract m from each element of x
Notice Method 1 factors A, not A'A, resulting in less rounding error.
There's a simpler way to do Method#1 above if you are using Matlab or Octave (hat tip to AGPapa):
x = pinv(A,tol)*b;
pinv() is explained in detail here:
http://www.mathworks.com/help/matlab/ref/pinv.html
(well worth reading; explains the interesting difference between x1=pinv(A) and x2=A\b)
wgardner
26-05-2015, 17:26
Attached are the A b and t CSV files from all official events from 2014 (Except for 2014waell, because some match scores are missing). The only difference, other than file name, is that the matches are not in sequential order within the A and b files, although they still correspond. The t files were generated by team number and the matches are in what ever order TBA decides to publish their JSON data (which I think is match number, alphabetically sorted).
Thanks!
The files appear to have the "new" A and "new" b. Could you by any chance also generate the "old" A and "old" b? I can generate the old A from the new A by stripping out the +1s and -1s from each row, but I can't generate the old b from the new b since the new b only has match winning margins and I can't get red and blue scores from that. Having both the old and new versions allows for quick and direct comparisons between OPR (which requires the old b), and CCWM and WMPR (which require the new b that can also be derived from the old one).
Maybe we can just call the old A and b by their names, and the new A and b something like Awm and bwm?
Also attached are CSV files that have the OPR, CCWM, and GPR (if we are still calling it that) of each team at each event.
It looks like the proposal of calling it WMPR instead of GPR may be sticking. As the "G" in GPR, I would prefer to not have the metric be name-based. :) (Though I did work at Qualcomm for a long time and saw what "The Viterbi Algorithm" did for Dr. Viterbi's fame, I doubt there's as much money in obscure robotics statistical computations.)
wgardner
26-05-2015, 19:40
Stdev of prediction residual of the winning margins:
OPR: 34.6
CCWM: 30.5
WMPR: 30.4
(note that on the testing data from a few posts ago, WMPR had a Stdev of 15.9, so this is an argument that WMPR is "overfitting" the small amount of data available and that it could benefit from having more matches per team)
For data that is "overfit" you can sometimes improve the prediction performance on the testing data by simply scaling down the solution.
For fun, I computed the standard deviation of the prediction residual of the testing data not in the training set using the WMPR solution, 0.9*WMPR, 0.8*WMPR, etc. The standard deviation of the prediction residual of the winning margin for the test data for this particular tournament was minimized by 0.7*WMPR, and that standard deviation was down to 28.4 from 30.4 for the unscaled WMPR. So again, more evidence that the WMPR is overfit and could benefit from additional data.
This doesn't change the match outcome prediction that some folks are interested in, since scaling all of the WMPRs down doesn't change the sign of the predicted winning margin which is all the match outcome prediction is.
Attached are A, b, and T for 2015 OPR, CCWM, and WMPR.
Also introducing a funky new metric, EPR, which uses 3 simultaneous equations for each match:
1) r1+r2+r3-b1-b2-b3 = RS-BS
2) r1+r2+r3=RS
3) b1+b2+b3=BS
... and solves all the equations simultaneously.
wgardner
26-05-2015, 20:59
Attached are A, b, and T for 2015 OPR, CCWM, and WMPR.
Also introducing a funky new metric, EPR, which uses 3 simultaneous equations for each match:
1) r1+r2+r3-b1-b2-b3 = RS-BS
2) r1+r2+r3=RS
3) b1+b2+b3=BS
... and solves all the equations simultaneously.
:) How do you interpret these new EPR values? They look closer to an OPR than a WM-measure as #2 and #3 both only factor in offense. How will you measure its performance? Perhaps by comparing the overall residual of all 3 combined vs. other ways of predicting all 3 (e.g., using WMPR for #1 and standard OPR for #2 and #3)?
How do you interpret these new EPR values?
I just tossed it out there for fun. Since equation1 is a linear combination of equations 2 and 3, I question its usefulness.
How will you measure its performance?
Compute the residuals of actual_alliance_scores minus alliance_scores_computed_using_xEPR:
residuals = b1-Aopr*xEPR
...where b1 is column1 of the provided bopr.
wgardner
27-05-2015, 07:27
I ran the numbers on the 2014 St. Joseph event. I checked that my calculations for 2015 St. Joseph match Ether's, so I'm fairly confident that everything is correct.
Here's how each stat did at "predicting" the winner of each match.
OPR: 87.2%
CCWM: 83.3%
WMPR: 91.0%
AGPapa's results are using the full data as training data and then reusing it as testing data.
On the same data doing the "remove one match from the training data, model based on the rest of the data, use the removed match as testing data, and repeat the process for all matches" method, I got the following results:
Stdev of winning margin prediction residual
OPR : 63.8
CCWM: 72.8
WMPR: 66.3
When I looked at scaling down each of the metrics to improve their prediction performance on testing data not in the training set, the best Stdevs I get for each were:
OPR*0.9: 63.3
CCWM*0.6: 66.2
WMPR*0.7: 60.8
Match prediction outcomes
OPR : 60 of 78 (76.9 %)
CCWM: 57 of 78 (73.1 %)
WMPR: 62 of 78 (79.5 %)
Yeah! Even with testing data not used in the training set, WMPR seems to be outperforming CCWM in predicting the winning margins and the match outcomes in this single 2014 tournament (which again is a game with substantial defense). I'm hoping to get the match results (b with red and blue scores separately) for other 2014 tournaments to see if this is a general result.
[Edit: found a bug in the OPR code. Fixed it. Updated comments. Also included the scaled down OPR, CCWM, and WMPR prediction residuals to address overfitting.]
I'm hoping to get the match results (b with red and blue scores separately) for other 2014 tournaments to see if this is a general result.
I wrote a script last night to download all the 2014 match results from TBA and generate Aopr, Awmpr, bopr, bccwm, and bwmpr for all the 2014 qual events. I'll post them here later this morning.
I wrote a script last night to download all the 2014 match results from TBA and generate Aopr, Awmpr, bopr, bccwm, and bwmpr for all the 2014 qual events. I'll post them here later this morning.
Make sure to read the README file.
wgardner
27-05-2015, 09:04
Iterative Interpretations of OPR and WMPR
(I found this interesting: some other folks might or some other folks might not. :) )
Say you want to estimate a team's offensive contribution to their alliance scores.
A simple approach is just compute the team's average match score/3. Let's call this estimate O(0), a vector of the average match score/3 for all teams at step 0. (/3 because there are 3 teams per alliance. This would be /2 for FTC).
But then you want to take into account the fact that a team's alliance partners may be better or worse than average. The best estimate you have of the contribution of a team's partners at this point is the average of their O(0) estimates.
So let the improved estimate be
O(1) = team's average match score - 2*average ( O(0) for a team's alliance partners).
(2*average because there are 2 partners contributing per match. This would be 1*average for FTC.)
This is better, but now we have an improved estimate for all teams, so we can just iterate this:
O(2) = team's average match score - 2*average ( O(1) for a team's alliance partners).
O(3) = team's average match score - 2*average ( O(2) for a team's alliance partners).
etc. etc.
This sequence of O(i) converges to the OPR values, so this is just another way of explaining what OPRs are.
WMPR can be iteratively computed in a similar way.
W(0) = team's average match winning margin
W(1) = team's average match winning margin - 2*average ( W(0) for a team's alliance partners) + 3*average ( W(0) for a team's opponents ).
W(2) = team's average match winning margin - 2*average ( W(1) for a team's alliance partners) + 3*average ( W(1) for a team's opponents ).
etc. etc.
This sequence of W(i) converges to the WMPR values, so this is just another way of explaining what WMPRs are.
Currently we've mostly been seeing how WMPR does at a small district event with a lot of matches per team (a best-case scenario for these stats). I wanted to see how it would do in a worse case. Here's how each stat performed at "predicting" the winner of each match 2014 Archimedes Division (100 teams, 10 matches/team).
OPR: 85.6%
CCWM: 87.4%
WMPR: 92.2%
EPR: 89.2%
WMPR holds up surprisingly well in this situation and outperforms the other stats. EPR does better than OPR, but worse than WMPR. I don't really like EPR, as it seems difficult to interpret. The whole idea behind using the winning margin is that the red robots can influence the blue score. Yet EPR also models bf = b1 + b2 + b3, which is counter to this.
AGPapa's results are using the full data as training data and then reusing it as testing data.
On the same data doing the "remove one match from the training data, model based on the rest of the data, use the removed match as testing data, and repeat the process for all matches" method, I got the following results:
The data above is also done by using the training data as the testing data. Could you also run your method on it to check?
On another note:
I've also found that it's difficult to compare WMPR's across events (whereas OPR's are easy to compare). This is because a match that ends 210-200 looks the same as one that ends 30-20. At very competitive events this becomes a huge problem. Here's an example from team 33's 2014 season.
WMPRs at each Event:
MISOU: 78.9
MIMID: 37.0
MITRY: 77.8
MICMP: 29.4
ARCHI: 40.8
Anyone who watched 33 at their second district event would tell you that they didn't do as well as their first, and these numbers show that. But these numbers also show that 33 did better at their second event than at the State Championship. This is clearly incorrect, 33 won the State Championship but got knocked out in the semis at their second district event.
You can see pretty clearly that the more competitive events (MSC, Archimedes) result in lower WMPRs, which makes it very difficult to compare this stat across events.
This occurs because using the least-norm solution has an average of zero for every event. It treats all events as equal, when they're not. I propose that instead of having the average be zero, the average should be how many points the average robot scored at that event. (So we should add the average event score / 3 to every team's WMPR). This will smooth out the differences between each event. Using this method, here are 33's new WMPRs.
MISOU: 106.3
MIMID: 71.7
MITRY: 112.7
MICMP: 86.0
ARCHI: 93.5
Now these numbers correctly reflect how 33 did at each event. MIMID has the lowest WMPR, and that's where 33 did the worst. Their stats at MICMP and ARCHI are now comparable to their district events.
OPR has proliferated because it's easy to understand (this robot scores X points per match). With this change, WMPR also becomes easier to understand (this robot scores and defends their opponents by X points per match).
Since this adds the same constant to everybody's WMPR, it'll still predict the match winner and margin of victory with the same accuracy.
Thoughts?
Attached are A, b, and T for 2015 OPR, CCWM, and WMPR.
Also introducing a funky new metric, EPR, which uses 3 simultaneous equations for each match:
1) r1+r2+r3-b1-b2-b3 = RS-BS
2) r1+r2+r3=RS
3) b1+b2+b3=BS
... and solves all the equations simultaneously.
Hi Ether,
I like this idea of solving all 3 equations simultaneously. WMPR is a good improvement over CCWM because winning margin depends on who you are playing against (except in 2015), but I think EPR is even better. I will adopt it after you guys finish your validation on how well it predicts outcome.
I like this EPR because it is one number instead of two. It can replace both OPR and WMPR. The problem with OPR is similar to the problem with CCWM. It does not take into account of who the opponents were. If you play against stronger opponents, you may not be able to score as many points, especially in those years with limited game pieces. Equation 1 will take care of that. It will improve on the line fitting. To me, I would interpret EPR as how many points a team will be able to score with typical opponents on the field. This eliminates the error of match schedule strength due to luck of the draw. A team may have higher than normal score because they faced weaker opponents more often. That would skew the OPR numbers. I think EPR would be more accurate in predicting match scores. Would somebody like to test it out?
Another reason I like EPR is that it is easier to compute without all that
SVD stuff. I would prefer high school students to be able to understand and implement this on their own.
wgardner
27-05-2015, 12:36
I propose that instead of having the average be zero, the average should be how many points the average robot scored at that event. (So we should add the average event score / 3 to every team's WMPR). This will smooth out the differences between each event.
<snip>
Thoughts?
Yes, this makes sense if you want to compare results across events. Sounds like a good idea, though perhaps then it needs a different name as it's not a WM measure? Also, if I continue to find that scaling the WMPRs down does a better job at winning margin prediction, that needs to be done before the average event score/3 is added in.
I'll try to get to the verification on testing data in the next day or so.
I personally like this normalized WMPR (nWMPR?) better than EPR as the interpretation is cleaner: we're just trying to predict the winning margin. EPR is trying to predict the individual scores and the winning margin and weighting the residuals all the same. It's a bit more ad-hoc. On the other hand, one could look into which weightings result in the best overall result in terms of whatever measure of result folks care about.
I still am most interested in how well a metric predicts the winning margin of a match (and in my FTC android apps I also hope to include an estimate of "probability of victory" from this which incorporates the expected winning margin and the standard deviation of that expectation along with the assumption of a normally distributed residual). And using these for possible scouting/ alliance selection aids (especially for lower picks). But other folks may be interested in using them for other things.
wgardner
27-05-2015, 13:02
Here's a generalized perspective.
Let's say you pick r1, r2, r3, b1, b2, b3 to minimize the following error
E(w)= w*[ (R-B) - ( (r1+r2+r3)-(b1+b2+b3) ) ]^2 + (1-w) * [ (R-(r1+r2+r3))^2 + (B- (b1+b2+b3))^2]
if w=1, you're computing the WMPR solution (or any of the set of WMPR solutions with unspecified mean).
if w=0, you're computing the OPR solution.
if w=1-small epsilon, you're computing the nWMPR solution (as the relative values will be the WMPR but the mean will be selected to minimize the second part of the error, which will be the mean score in the tournament).
if w=0.5, you're computing the EPR solution.
I wonder how the various predictions of winning margin, score, and match outcomes are as w goes from 0 to 1?
Yes, this makes sense if you want to compare results across events. Sounds like a good idea, though perhaps then it needs a different name as it's not a WM measure? Also, if I continue to find that scaling the WMPRs down does a better job at winning margin prediction, that needs to be done before the average event score/3 is added in.
I'll try to get to the verification on testing data in the next day or so.
I personally like this normalized WMPR (nWMPR?) better than EPR as the interpretation is cleaner: we're just trying to predict the winning margin. EPR is trying to predict the individual scores and the winning margin and weighting the residuals all the same. It's a bit more ad-hoc. On the other hand, one could look into which weightings result in the best overall result in terms of whatever measure of result folks care about.
I'd still consider it a WM measure, as it doesn't only take offense into account (like OPR). This WMPR tells us how many points this robot will score and how many it'll take away from the opponents, that sounds like win margin to me, no? I don't really like the nWMPR name, it's long/slightly confusing. I think this thread should work out the kinks in this new statistic and call the final product "WMPR".
In order for this to catch on it should
1. Be better than OPR at predicting the winner of a match
2. Be easy to understand
3. Have a catchy name
4. Apply very well to all modern FRC games
5. Be easy to compare across events
I think that by adding in the average score and calling it "WMPR" we accomplish all of those things. 2015 is probably the strangest game we've had (and I would think the worst for WMPR), and yet WMPR still works pretty well.
I'm not sure why scaling down gives you better results at predicting the margin. I know you said it decreases the variance of the residuals, but does it also introduce bias? Would you propose a universal scaling factor, or one dependent on the event/game?
I think EPR would be more accurate in predicting match scores. Would somebody like to test it out?
Another reason I like EPR is that it is easier to compute without all that
SVD stuff. I would prefer high school students to be able to understand and implement this on their own.
You actually don't need to know anything about singular value decomposition to understand WMPR. It can be explained simply like this:
Ax=b
Where A is who played on what alliance in each match and b is the margin of victory in each match. x is the contribution from each robot to the margin. You'd expect x to be the inverse of A times b, but A is not invertable, so we use the pseudoinverse of A instead.
In Matlab the code is
x = pinv(A)*b
And that's it, pretty simple.
I agree with you though that the ultimate test would be how it performs in predicting matches. I compared it to WMPR in the 2014 Archimedes division, although that was with using the training data as the testing data, so it's probably not the best test.
saikiranra
27-05-2015, 14:15
Attached are A, b, and T for 2015 OPR, CCWM, and WMPR.
Also introducing a funky new metric, EPR, which uses 3 simultaneous equations for each match:
1) r1+r2+r3-b1-b2-b3 = RS-BS
2) r1+r2+r3=RS
3) b1+b2+b3=BS
... and solves all the equations simultaneously.
If I'm understanding this properly, are we setting square matrix for that system as the following?
| 2 2 2 -1 -1 -1 | |r1| = |2RS - BS|
| 2 2 2 -1 -1 -1 | |r2| = |2RS - BS|
| 2 2 2 -1 -1 -1 | |r3| = |2RS - BS|
| -1 -1 -1 2 2 2 | |b1| = |2BS - RS|
| -1 -1 -1 2 2 2 | |b2| = |2BS - RS|
| -1 -1 -1 2 2 2 | |b3| = |2BS - RS|
Thoughts?
See attached XLS. I was playing around with it yesterday. There's all sorts of fun things you could try.
Column D looks a lot like what you're suggesting, except it adds the average OPR instead.
Also attached are 2013 A b T for OPR CCWM WMPR and EPR. The raw qual match data from TBA used to generate those is posted here (http://www.chiefdelphi.com/media/papers/download/4466).
If I'm understanding this properly, are we setting square matrix for that system as the following?
| 2 2 2 -1 -1 -1 | |r1| = |2RS - BS|
| 2 2 2 -1 -1 -1 | |r2| = |2RS - BS|
| 2 2 2 -1 -1 -1 | |r3| = |2RS - BS|
| -1 -1 -1 2 2 2 | |b1| = |2BS - RS|
| -1 -1 -1 2 2 2 | |b2| = |2BS - RS|
| -1 -1 -1 2 2 2 | |b3| = |2BS - RS|
Each match generates 3 equations (3 rows in the A matrix and 3 scores in the b matrix).
If you look at _Aepr.CSV (or _Aepr.dat) and _bepr.CSV (or _bepr.dat) it should be pretty clear.
Then you solve for EPR like so: EPR = pinv(Aepr)*bepr
If you want to see what the matrix for the normal equations looks like, look at Method 2 in this post (http://www.chiefdelphi.com/forums/showpost.php?p=1484233&postcount=26). N will be square.
perhaps then it needs a different name as it's not a WM measure?
?? I thought Wm stood for "William"
Just kidding.
efoote868
27-05-2015, 15:04
Also introducing a funky new metric, EPR,
Ether Power Rating?
Ether Power Rating?
Electron Paramagnetic Resonance
Here's a generalized perspective.
Let's say you pick r1, r2, r3, b1, b2, b3 to minimize the following error
E(w)= w*[ (R-B) - ( (r1+r2+r3)-(b1+b2+b3) ) ]^2 + (1-w) * [ (R-(r1+r2+r3))^2 + (B- (b1+b2+b3))^2]
if w=1, you're computing the WMPR solution (or any of the set of WMPR solutions with unspecified mean).
if w=0, you're computing the OPR solution.
if w=1-small epsilon, you're computing the nWMPR solution (as the relative values will be the WMPR but the mean will be selected to minimize the second part of the error, which will be the mean score in the tournament).
if w=0.5, you're computing the EPR solution.
I wonder how the various predictions of winning margin, score, and match outcomes are as w goes from 0 to 1?
This is a very cool way of looking at it. By putting it this way, EPR seems to be half way between OPR and WMPR.
Again, I like it because it is one number instead of two numbers. I like it because it has a better chance to predict outcome regardless of the game, rather than OPR being good for some games and WMPR being good for some other games.
This is a very cool way of looking at it. By putting it this way, EPR seems to be half way between OPR and WMPR.
Again, I like it because it is one number instead of two numbers. I like it because it has a better chance to predict outcome regardless of the game, rather than OPR being good for some games and WMPR being good for some other games.
WMPR (with a mean of the average score/3) is also just one number instead of two. And what game is WMPR bad for? Recycle Rush seems like it would be the worst game for WMPR, but it's comparable to OPR in predicting outcomes, if not slightly better. (http://www.chiefdelphi.com/forums/showpost.php?p=1484189&postcount=16)
And from my testing, the order of predictiveness goes WMPR>EPR>OPR. The only improvement EPR has over OPR is that it's half WMPR! Why not just go all the way and stick with WMPR?
Again, this is with using the training data as the testing data, if EPR is shown to be better when these are separate then perhaps we should use it instead.
WMPR (with a mean of the average score/3) is also just one number instead of two. And what game is WMPR bad for? Recycle Rush seems like it would be the worst game for WMPR, but it's comparable to OPR in predicting outcomes, if not slightly better. (http://www.chiefdelphi.com/forums/showpost.php?p=1484189&postcount=16)
And from my testing, the order of predictiveness goes WMPR>EPR>OPR. The only improvement EPR has over OPR is that it's half WMPR! Why not just go all the way and stick with WMPR?
Again, this is with using the training data as the testing data, if EPR is shown to be better when these are separate then perhaps we should use it instead.
The reason I said two numbers is that in the past I look at OPR and CCWM. I am considering WMPR as a replacement of CCWM, which is why I will be looking at OPR and WMPR.
When we have more data, multiple years and multiple events that support WMPR as the best predictor for match outcome, then I will stop looking at OPR. But sometimes in alliance selection for first round pick, without any scouting data and you want somebody for pure offense, OPR is still a good indicator.
wgardner
27-05-2015, 15:54
[Edit: The data has been updated to reflect an error in the previous code. Previously, the data was reported for the scaled down versions of the metrics in the TESTING DATA section. Now, the data is reported for the unscaled metrics (though the last table for each tournament shows the benefits of scaling them, which is substantial!)]
Here's the data for the four 2014 tournaments starting with "A". My thoughts will be in a subsequent post:
2014: archi
Teams = 100, Matches = 167, Matches Per Team = 1.670
TRAINING DATA
Stdev of winning margin prediction residual
OPR : 51.3. 66.9% of outcome variance predicted.
CCWM: 57.0. 59.2% of outcome variance predicted.
WMPR: 36.1. 83.6% of outcome variance predicted.
Match prediction outcomes
OPR : 142 of 166 (85.5 %)
CCWM: 146 of 166 (88.0 %)
WMPR: 154 of 166 (92.8 %)
TESTING DATA
Stdev of winning margin prediction residual
OPR : 72.1. 34.8% of outcome variance predicted.
CCWM: 85.2. 8.8% of outcome variance predicted.
WMPR: 89.3. -0.1% of outcome variance predicted.
Match prediction outcomes
OPR : 127 of 166 (76.5 %)
CCWM: 124 of 166 (74.7 %)
WMPR: 123 of 166 (74.1 %)
Stdev of testing data winning margin prediction residual with scaled versions of the metrics
Weight: 1.0 0.9 0.8 0.7 0.6 0.5
OPR: 72.1 70.8 70.2 70.3 71.2 72.8
CCWM: 85.2 80.3 76.3 73.5 71.9 71.7
WMPR: 89.3 84.3 80.3 77.3 75.4 74.7
2014: abca
Teams = 35, Matches = 76, Matches Per Team = 2.171
TRAINING DATA
Stdev of winning margin prediction residual
OPR : 59.8. 65.1% of outcome variance predicted.
CCWM: 62.9. 61.2% of outcome variance predicted.
WMPR: 51.5. 74.1% of outcome variance predicted.
Match prediction outcomes
OPR : 63 of 76 (82.9 %)
CCWM: 60 of 76 (78.9 %)
WMPR: 65 of 76 (85.5 %)
TESTING DATA
Stdev of winning margin prediction residual
OPR : 78.9. 39.1% of outcome variance predicted.
CCWM: 93.6. 14.4% of outcome variance predicted.
WMPR: 92.5. 16.3% of outcome variance predicted.
Match prediction outcomes
OPR : 56 of 76 (73.7 %)
CCWM: 55 of 76 (72.4 %)
WMPR: 55 of 76 (72.4 %)
Stdev of testing data winning margin prediction residual with scaled versions of the metrics
Weight: 1.0 0.9 0.8 0.7 0.6 0.5
OPR: 78.9 77.9 77.6 78.2 79.6 81.6
CCWM: 93.6 89.5 86.4 84.3 83.4 83.7
WMPR: 92.5 88.8 86.1 84.3 83.6 84.1
2014: arfa
Teams = 39, Matches = 78, Matches Per Team = 2.000
TRAINING DATA
Stdev of winning margin prediction residual
OPR : 45.8. 61.4% of outcome variance predicted.
CCWM: 46.6. 60.1% of outcome variance predicted.
WMPR: 38.2. 73.1% of outcome variance predicted.
Match prediction outcomes
OPR : 59 of 78 (75.6 %)
CCWM: 66 of 78 (84.6 %)
WMPR: 64 of 78 (82.1 %)
TESTING DATA
Stdev of winning margin prediction residual
OPR : 61.8. 29.8% of outcome variance predicted.
CCWM: 71.7. 5.6% of outcome variance predicted.
WMPR: 75.4. -4.5% of outcome variance predicted.
Match prediction outcomes
OPR : 55 of 78 (70.5 %)
CCWM: 53 of 78 (67.9 %)
WMPR: 49 of 78 (62.8 %)
Stdev of testing data winning margin prediction residual with scaled versions of the metrics
Weight: 1.0 0.9 0.8 0.7 0.6 0.5
OPR: 61.8 61.0 60.6 60.8 61.4 62.5
CCWM: 71.7 68.4 65.9 64.1 63.1 62.9
WMPR: 75.4 71.9 69.1 66.9 65.5 64.9
2014: azch
Teams = 49, Matches = 82, Matches Per Team = 1.673
TRAINING DATA
Stdev of winning margin prediction residual
OPR : 36.3. 78.2% of outcome variance predicted.
CCWM: 37.8. 76.4% of outcome variance predicted.
WMPR: 25.5. 89.2% of outcome variance predicted.
Match prediction outcomes
OPR : 66 of 79 (83.5 %)
CCWM: 68 of 79 (86.1 %)
WMPR: 73 of 79 (92.4 %)
TESTING DATA
Stdev of winning margin prediction residual
OPR : 52.1. 54.9% of outcome variance predicted.
CCWM: 67.5. 24.6% of outcome variance predicted.
WMPR: 63.0. 34.3% of outcome variance predicted.
Match prediction outcomes
OPR : 59 of 79 (74.7 %)
CCWM: 56 of 79 (70.9 %)
WMPR: 66 of 79 (83.5 %)
Stdev of testing data winning margin prediction residual with scaled versions of the metrics
Weight: 1.0 0.9 0.8 0.7 0.6 0.5
OPR: 52.1 52.1 52.8 54.2 56.2 58.7
CCWM: 67.5 65.7 64.6 64.1 64.2 65.0
WMPR: 63.0 59.6 57.3 56.1 56.1 57.3
wgardner
27-05-2015, 16:01
[Edit: my previously posted results had mistakenly reported the values for the scaled versions of OPR, CCWM, and WMPR as the unscaled values (!). Conclusions are somewhat changed as noted below.]
So my summary of the previous data:
WMPR always results in the smallest training data winning margin prediction residual standard deviation. (Whew, try saying that 5 times fast.)
WMPR is also very good at predicting training data match outcomes. For some reason, CCWM beats it in 1 tournament but otherwise WMPR is best in the other 3.
But on the testing data, things go haywire. There are significant drops in performance in predicting winning margins for all 3 stats, showing that all 3 stats are substantially overfit. Frequently, all 3 stats give better performance at predicting winning margins by using scaled down versions of the stats. The WMPR in particular is substantially overfit (look for a later post with a discussion of this).
BTW, it seems like some folks are most interested in predicting match outcomes rather than match statistics. If that's really what folks are interested in, there are probably better ways of doing that (e.g., with linear models but where the error measure better correlates with match outcomes, or with non-linear models). I'm going to ponder that for a while...
Citrus Dad
27-05-2015, 17:00
I've been watching this thread because I'm really interested in a more useful statistic for scouting--a true DPR. I think this path may be a fruitful way to arrive at that point.
Currently the DPR doesn't measure how a team's defensive performance causes the opposing alliance to deviate from its predicted OPR. The current DPR calculation simply assumes that the OPRs of the opposing alliances are randomly distributed in a manner that those OPRs are most likely to converge on the tournament average. Unfortunately that's only true if a team plays a very large number of matches that capture potential alliance combinations. Instead we're working with a small sample set that is highly influenced by the individual teams included in each alliance.
Running the DPR separately across the opposing alliances becomes a two-stage estimation problem in which 1) the OPRs are estimated for the opposing alliance and 2) the DPR is estimated against the predicted OPRs. The statistical properties become interesting and the matrix quite large.
I'll be interested to see how this comes out. Maybe you can report the DPRs as well.
I tested how well EPR predicted match outcomes in the four events in 2014 beginning with "a". These tests excluded the match being tested from the training data and recomputed the EPR.
EPR:
ABCA: 59 out of 76 (78%)
ARFA: 50 out of 78 (64%)
AZCH: 63 out of 79 (78%)
ARCHI: 123 out of 166 (74%)
And as a reminder, here's how OPR did (as found by wgardner)
OPR:
ABCA: 56 out of 76 (74%)
ARFA: 55 out of 78 (71%)
AZCH: 59 out of 79 (75%)
ARCHI: 127 out of 166 (77%)
So over these four events OPR successfully predicted 297 matches and EPR successfully predicted 295.
wgardner
28-05-2015, 15:18
On the Overfitting of OPR and WMPR
I'm working on studying exactly what's going on here with respect to the overfitting of the various stats. Look for more info in a day or two hopefully.
However, I thought I'd share this data point as a good example of what the underlying problem is.
I'm looking at the 2014 casa tournament structure (# of teams=54 which is a multiple of 6 and the # of matches is twice the # of teams, so it fits in well with some of the studies I'm doing).
As one data point, I'm replacing the match scores with completely random, normally distributed data for every match (i.e., there is absolutely no relationship between the match scores and which teams played!). Stdev of each match score is 1.0, so the winning margin is the difference between 2 and has variance of 2.0 and stdev of 1.414.
I get the following result on one run (which is pretty typical).
2014 Sim: casa
Teams = 54, Matches = 108, Matches Per Team = 2.000
SIMULATED MATCH SCORES THAT ARE 100% RANDOM NOISE!
TRAINING DATA
Stdev of winning margin prediction residual
OPR : 1.3. 26.5% of outcome variance predicted.
WMPR: 1.1. 47.3% of outcome variance predicted.
Match prediction outcomes
OPR : 78 of 108 (72.2 %)
WMPR: 87 of 108 (80.6 %)
TESTING DATA
Stdev of winning margin prediction residual
OPR : 1.7. -31.3% of outcome variance predicted.
WMPR: 2.1. -105.0% of outcome variance predicted.
Match prediction outcomes
OPR : 58 of 108 (53.7 %)
WMPR: 56 of 108 (51.9 %)
Stdev of testing data winning margin prediction residual with scaled versions of the metrics
Weight: 1.0 0.9 0.8 0.7 0.6 0.5
OPR: 1.7 1.7 1.6 1.6 1.5 1.5
WMPR: 2.1 2.0 1.9 1.8 1.8 1.7
For random match scores, the OPR can still "predict out" 26% of the "training data" winning margin variance and the WMPR can still "predict out" 47% of the "training data" winning margin variance! And they can correctly predict 72% and 81% of the match results of the training set, respectively.
This is what I mean by overfitting: the metrics are modeling the match noise even when the underlying OPRs and WMPRs should all be zero. And this is why the final table shows that scaling down the OPRs and WMPRs (e.g., replace the actual OPRs by 0.9*OPRs, or 0.8*OPRs, etc.) results in a lower standard deviation in the predicted Testing data residual, because that reduces the amount of overfitting by decreasing the variance of the predicted outputs. In this case, the best weighting should be zero, as it's better to predict the testing data with 0*OPR or 0*WMPR than it is to predict with completely bogus OPRs and WMPRs.
And WMPR seems to suffer from this more because there are fewer data points to average out (OPR uses 216 equations to solve for 54 values, whereas WMPR uses 108 equations to solve for 54 values).
More to come...
wgardner
06-06-2015, 08:53
For posterity, the follow up work I did on this is reported and discussed the paper in this thread (http://www.chiefdelphi.com/forums/showthread.php?t=137451).
nuclearnerd
06-06-2015, 12:06
I just want to say how awesome you people are. My linear algebra skills are weak, but this thread has moved me a lot closer to a working understanding of the scouting stats. Thank you all for sharing your work.
vBulletin® v3.6.4, Copyright ©2000-2017, Jelsoft Enterprises Ltd.