![]() |
Incorporating Opposing Alliance Information in CCWM Calculations
This post is primarily of interest to stat-nerds. If you don't know or care what OPR and CCWM are and how they're computed, you probably want to ignore this thread. :)
--------------------------------------------- It has bothered me for a while that the CCWM calculation does not incorporate any knowledge of which teams are on the opposing alliance, which would seem to be important for a calculation involving the winning margin between the alliances. The standard calculation is performed as follows: Let's say that in the first match, the red alliance is teams 1, 2, and 3 and the blue alliance is teams 4, 5, and 6. Let's let the red score in this match be R and the blue score be B. We're trying to find the expected contributions to the winning margins for each team. Let's call these contributions C1, C2, ... for teams 1, 2, ... The standard CCWM calculation models the winning margin of each match twice (!), first as R-B = C1 + C2 + C3 (ignoring that teams 4, 5, and 6 are involved!) and then again as B-R = C4 + C5 + C6 (ignoring that teams 1, 2, and 3 are involved!) It finds the least squares solution for the Ci values, or the values of the Ci numbers that minimize the squared prediction error over all of the matches. This solution in matrix form is solving (A' A) C = A' M where (A' M) ends up being the vector of the sum of the winning margins for each team's matches, and (A' A) is a matrix with diagonal elements equal to the # of matches each team plays and the off diagonal elements equal to the number of times teams i and j were on the same alliance. Note again that nowhere does this factor in if teams were on opposing alliances (!). If a particular team on the blue alliance always scores 1000 points, that will make the winning margin for the red alliance be awful, and IMHO, that should be taken into account. So, here's my proposal. Instead of modeling each match outcome twice as above, do it only once as follows: R-B = (C1 + C2 + C3) - (C4 + C5 + C6) (the left set is all the teams on the red alliance and the right set is the blue teams). Now, we're factoring in both your alliance partners' abilities AND your opponent's abilities. If you go through the entire derivation, you end up with a similar set of equations, but the new A matrix has a 1 in the i,jth spot if the jth team was on the red alliance in match i, a -1 if the jth team was on the blue alliance in match i, and 0 otherwise. The solution has the same format, i.e. solving the following formula (A' A) C = A' M (A' M) ends up being exactly the same as before even though the A and M are a little different: (A' M) is just the sum of the winning margins for each team's matches. But now (A' A) is a little different. The diagonal elements are the same, but the off diagonal elements are equal to the number of times teams i and j are on the same alliance minus the number of times they're on opposing alliances (!). So now opposing alliance contributions are included. One oddity emerges from this formulation: the new (A' A) is not invertible (!). This is because if you add any constant to all of the teams' contributions, the winning margins are the same. For example, if you think the red teams contributed 10, 20, and 30 points each and the blue teams contributed 40, 50, and 60 points each, you'd get exactly the same winning winning margins if the teams' contributions were 110, 120, 130, and 140, 150, and 160, or even 1010, 1020, 1030, and 1040, 1050, and 1060. But the easy way around this is to just find the minimum norm solution (one of the many solutions) using, say, the singular value decomposition(SVD), and then subtract off the mean from all of the values. The resulting combined contributions to winning margin values represent how much a team will contribute to its winning margin compared to the average team's contribution (which will be 0, of course). Thoughts? This seems like an improvement to me, but I'd be curious to hear what other stat-nerds like me have to say on the matter. And if somebody else has already looked into all of this, accept my apologies and please help educate me. :) |
Re: Incorporating Opposing Alliance Information in CCWM Calculations
I'd love to see somebody actually quantitatively compare the predictive power of all of these different metrics across the various games. For a given year, take the games from the first half of every competition's qualifying rounds, compute a stat for every team and measure it's ability to predict the outcome of the second half of the qualifying matches. Each of the last 4 years should give a sample size of ~7,500 matches and ~2,500 teams. EDIT: these counts are for FRC. An analysis for FTC would be interesting as well.
|
Re: Incorporating Opposing Alliance Information in CCWM Calculations
And also look at the opposing alliance prior record to get to this match. How did they get their OPR?
Wait, Dean is a big data guy and he wants us to dig into all the past matches? This entire robot thing is just a ruse to get interested in Big Data? ((I worked at one time for a MLB stat company and all of the stats are important. Weather, prior events, crowd sizes, etc. We have just touched the surface of stat possibilities)) |
Re: Incorporating Opposing Alliance Information in CCWM Calculations
Quote:
|
Re: Incorporating Opposing Alliance Information in CCWM Calculations
Quote:
More useful than predicting the outcome of the second HALF of the matches is taking the Day 1 matches and seeing how they predict the Day 2 matches. This is more than half the matches, so it's bound to be a *better* predictor than half the matches because of the increased sample size too. Quote:
I don't suspect defensive ability and WLT have a very strong correlation though. I'd like to see that correlation proved before I try to "normalize" a team's OPR with this metric. |
Re: Incorporating Opposing Alliance Information in CCWM Calculations
Quote:
|
Re: Incorporating Opposing Alliance Information in CCWM Calculations
Quote:
Another way to look at it: Say that all Team 2 does is play defense against the opposing alliance and reduce its score by 25 points every time it plays. C2 should be 25 (minus any mean term). But if you only look at C1+C2+C3 as a way to predict R, C2 will look like zero because C2 doesn't affect R. But C2 does affect (R-B) by making B smaller by 25 points, so the new metric should be able to capture this effect. |
Re: Incorporating Opposing Alliance Information in CCWM Calculations
1 Attachment(s)
Quote:
I ran the numbers for 2015 MISJO (40 teams, 80 qual matches, no DQs, no surrogates). |
Re: Incorporating Opposing Alliance Information in CCWM Calculations
1 Attachment(s)
Quote:
I checked out how these stats relate to the match results. Your numbers correctly predicted* the outcome of 66% of matches, while OPR and CCWM both predicted the correct winner 84% of the time. It makes sense that this stat doesn't work for a game where the other alliance can't affect your score. Can you run the numbers for a 2014 event so we can see if it's better with that? *I don't like these sorts of "predictions" because they occur with numbers obtained after the fact. Could you also run numbers for the first ~60 qual matches and then we'll see how they do on the next 20? EDIT: Looking through the numbers a little, more, I can see that this new stat gives radically different evaluations to a few teams than OPR and CCWM. Look at these select teams: Code:
Team GCCWM OPR CCWMHere are the correlation coef for each pair of metrics: OPR-CCWM: 0.82 GCCWM-CCWM: 0.39 GCCWM-OPR: 0.35 Quote:
|
Re: Incorporating Opposing Alliance Information in CCWM Calculations
1 Attachment(s)
Attached is the GPR calculated for all 8 championship divisions this year, with OPR, CCWM, and DPR also given as reference (I did not take in account surrogate matches and such). I can generate a new one of these for any event that has data on TBA.
Quote:
|
Re: Incorporating Opposing Alliance Information in CCWM Calculations
Long post to come later today with more detailed thoughts and examples (I hope), but some quick initial thoughts are:
1. Please drop the "G" from everything. :ahh: Since I can't think of anything better, may I suggest we call it CCWM with Opposing Alliances, or CCWMOA? 2. As other's noted, CCWM and CCWMOA aren't well suited to the 2015 FRC game because there's virtually no defense and thus virtually no benefit to looking at winning margins over just final offensive scores. Can we look instead at 2014, which had a lot of defense? 3. I'm wondering if the method may be doomed due to insufficient data? With CCWMOA, we only get 1 data point per match, while OPR and CCWM get 2 data points per match. In Ether's example tournament, there were 40 teams and 80 matches, so CCWMOA is trying to use 80 noisy data points to find 40 values, while OPR and CCWM are trying to use 160 noisy data points to find 40 values. Comparing CCWM and CCWMOA, I argue that CCWMs data values are noisier for reasons that I said in my first post, but maybe fitting with 160 noisier data points still gives you a better result than fitting with only 80 data points that are cleaner? This is like trying to find the slope of a line that you know goes through the origin using 2 noisy points vs 4 noisier points. Which one is better will depend on the ratios of the noises. I hope to think about this more and comment more further with some data to back it up, but I'd be curious to hear the thoughts of other folks too. Thanks everybody for the discussions! |
Re: Incorporating Opposing Alliance Information in CCWM Calculations
Quote:
So if the original matrix is full rank, the pseudo-inverse is just the regular inverse and the product of the matrix and pseudo-inverse is U U' = I. If the original matrix is not full rank, then multiplying by the pseudo inverse basically zeros out the component of the vector in the zero-rank projection and multiplies by the inverse of the remaining subspace. Or, the product of the matrix and its pseudo-inverse is U U' except that you replace the vector of U corresponding to Di=0 with a zero vector. In this case, the zero-rank projection (or the row vector of U' that corresponds to the Di that is zero) is something like 1/T[ 1 1 1 1 ...1] which computes the mean, because the direction in the C vector corresponding to its mean is the direction that cannot be determined. One other formulation for CCWMOA would just be: if we have T teams, have T-1 unknown values C1, C2, ..., C(T-1) and set CT = -Sum(C1, C2,... C(T-1)) in all of the equations (thus enforcing that all T values of Ci are zero mean). Then we only have T-1 equations with T-1 unknowns and everything is full rank. This is just another way of saying we want to find the values of C1, C2, ... CT that minimize the prediction error subject to the constraint that the resulting set of Ci values have zero mean. |
Re: Incorporating Opposing Alliance Information in CCWM Calculations
One more interesting tid-bit:
For the 2015 FRC game, we would expect that the underlying OPR and CCWM/CCWMOA to be identical except for a mean as a team's only ability to contribute to a winning margin is with offense. The fact that these numbers do deviate substantially (or, that DPR varies as much as it does) shows that we aren't close to having enough data to really get precise estimates of underlying parameters. Edit: This may not be entirely true. Litter and the initial race for the center cans can both cause one alliance to adversely impact the other alliances score, so it's not 100% true to say that the only way to contribute to the winning margin in the 2015 FRC game was though increasing you own alliance's score. |
Re: Incorporating Opposing Alliance Information in CCWM Calculations
Quote:
Quote:
Quote:
Quote:
Method 1 1a) [U,S,V] = svd(A) 1b) Replace the diagonal elements of S with their reciprocals, except when abs(Sjj)<threshold (I used 1e-4 for threshold), in which case make Sjj zero. 1c) compute x = V*S*(U'*b) Method 2 2a) N = A'*A 2b) d= A'*b 2c) compute x = N\d ..... (Octave mldivide notation) 2d) compute m = mean(x) 2e) subtract m from each element of x Notice Method 1 factors A, not A'A, resulting in less rounding error. |
Re: Incorporating Opposing Alliance Information in CCWM Calculations
Thanks Ether!
I'd love to see the residual of the predictions of the winning margins using OPR, CCWM, and whatever you want to call the new thing (how about WMPR, if you don't like CCWMOA)? It would be interesting to see the average squared winning margin prediction residual and the distribution of the prediction residual (like you did with your L1 v. L2 comparison) for both 2015 FRC tournaments (where defense was essentially non-existent) and 2014 FRC tournaments (where defense mattered more). It might also be interesting to see if tournaments with lots of matches per team are different from tournaments with few matches per team. I'm puzzled by AGPapa's finding that the match outcomes (such as they were in 2015) are not predicted as well with the new measure. While minimizing the prediction error in the winning margin isn't the same as predicting the match outcomes, I'd expect the match outcome results to be fairly similar. Thoughts? (BTW, I haven't verified AGPapa's finding, so I suppose there's a chance that there's a bug in the code he used to predict the match outcomes?) [Edit: AGPapa later found an error with his initially reported results.] And if you had a lot of time and/or processing power on your hands, I'd also love to see how well the winning margins are predicted for matches that aren't in the training data. Given that we're so low on data, I'm reluctant to suggest the "model with the first 1/2 of the data, then test with the second 1/2 of the data" proposals as we may not have enough data to get a reliable model as it is. Instead, I'd suggest the "model with all of the data except for match 1, then test with match 1, then remodel with all of the data except match 2, then test on match 2, etc." approach as then the data size is almost the same but you're testing on data that's not in the training set. I'd be happy to do this in scilab too, especially if you could get some 2014 tournament data in your nice formats. BTW, I computed the new metric this morning using the data from the MISJO tournament you provided and got the same results for the new measures (using the new A and b that you provided), so that confirms that we're talking about the same thing. :) |
Re: Incorporating Opposing Alliance Information in CCWM Calculations
The following results are from my scilab sim for the 2015 MISJO tournament. Again, this should be a BAD tournament year for both CCWM and WMPR as there was little defense involved.
For OPR, the winning margin was predicted by computing the prediction for the offensive score of the red alliance and subtracting the prediction for the offensive score of the blue alliance from it. For CCWM, the winning margin was predicted by computing the prediction of the winning margin of the red alliance and subtracting the prediction for the winning margin of the blue alliance from it. For WMPR, the winning margin was computed the same was as in CCWM, but using the values computed using the WMPR derivation instead of the CCWM derivation. Standard deviations of the prediction residuals of the winning margins: OPR: 25.6 CCWM: 21.1 WMPR: 15.9 (interesting that CCWM and WMPR both do better than OPR, even in a game with "no defense." Perhaps the race to get the center cans acts like defense in that a team that does well at that may cause the opposing alliance to have a lower score? Or litter?) The tournament had 80 matches but one match appeared to be a tie, so there were only 79 matches where the match outcome could be predicted. # of match outcomes predicted correctly: OPR: 67 CCWM: 66 WMPR: 68 (This is all on the training data (!). I'm not using data not in the training set yet.) |
Re: Incorporating Opposing Alliance Information in CCWM Calculations
Quote:
I'm getting: OPR: 67 CCWM: 66 WMPR: 53 Are you using the same numbers in the MISJO_GPR.CSV file that Ether attached? A difference with numbers we're using seems to be the only explanation for this difference since our OPR and CCWM predictions match up. In the previously attached spreadsheet I erroneously awarded the blue alliance a victory in match 69, it should have been a tie. |
Re: Incorporating Opposing Alliance Information in CCWM Calculations
1 Attachment(s)
Quote:
Col1 is WMPR based WM prediction. Col2 is CCWM based WM prediction. Col3 is OPR based WM prediction. Col4 is actual match WM. I'm using my sim to compute the WMPR values, which I earlier verified matched Ether's values (at least the min and max were identical). |
Re: Incorporating Opposing Alliance Information in CCWM Calculations
Quote:
Thanks, it turns out that we were using different WMPR values. I redownloaded Ether's attachment and it contains different values. Maybe the initial download was corrupted? I'm baffled. Anyways, I can confirm that after redownloading the correct WMPR values that I get your results. |
Re: Incorporating Opposing Alliance Information in CCWM Calculations
1 Attachment(s)
Quote:
OPR A and b are posted here. |
Re: Incorporating Opposing Alliance Information in CCWM Calculations
And now for the data where the single testing point was removed from the training data, then the model was computed, then the single testing point was evaluated, and this was repeated 80 times. So the results below are for separate training and testing data.
Stdev of prediction residual of the winning margins: OPR: 34.6 CCWM: 30.5 WMPR: 30.4 (note that on the testing data from a few posts ago, WMPR had a Stdev of 15.9, so this is an argument that WMPR is "overfitting" the small amount of data available and that it could benefit from having more matches per team) # of matches predicted correctly (out of 79 possible) OPR: 63 CCWM: 55 WMPR: 58 So here, the WM-based measures are both better at predicting the winning margin itself but not at predicting match outcomes. CCWM and WMPR have almost identical prediction residual standard deviations but WMPR is slightly better at match outcome prediction in this particular example for some reason. Again, it would be great to test this on some 2014 data where there was more defense. |
Re: Incorporating Opposing Alliance Information in CCWM Calculations
Quote:
Otherwise, if someone can provide the the 2014 qual match data in CSV format, I can quickly generate all the A and b for WMPR, OPR, and CCWM and post it here. |
Re: Incorporating Opposing Alliance Information in CCWM Calculations
Quote:
But I spent an hour and did some "what IF runs" through the data and the result is pretty low. While low scoring opposing alliances do make a difference, about match 8,9,10 things swing the other way. So while we all hate the "random" selections, it seems to work out in the end. I only did a small segment, with the full season available, feel free to run your own numbers. |
Re: Incorporating Opposing Alliance Information in CCWM Calculations
1 Attachment(s)
I ran the numbers on the 2014 St. Joseph event. I checked that my calculations for 2015 St. Joseph match Ether's, so I'm fairly confident that everything is correct.
Here's how each stat did at "predicting" the winner of each match. OPR: 87.2% CCWM: 83.3% WMPR: 91.0% I've attached my analysis, WMPR values, A and b matrices, along with the qual schedules for both the 2014 and 2015 St. Joe event. |
Re: Incorporating Opposing Alliance Information in CCWM Calculations
2 Attachment(s)
Quote:
Also attached are CSV files that have the OPR, CCWM, and GPR (if we are still calling it that) of each team at each event. I can easily generate these files for any year if anyone would like. |
Re: Incorporating Opposing Alliance Information in CCWM Calculations
Quote:
x = pinv(A,tol)*b; pinv() is explained in detail here: http://www.mathworks.com/help/matlab/ref/pinv.html (well worth reading; explains the interesting difference between x1=pinv(A) and x2=A\b) |
Re: Incorporating Opposing Alliance Information in CCWM Calculations
Quote:
The files appear to have the "new" A and "new" b. Could you by any chance also generate the "old" A and "old" b? I can generate the old A from the new A by stripping out the +1s and -1s from each row, but I can't generate the old b from the new b since the new b only has match winning margins and I can't get red and blue scores from that. Having both the old and new versions allows for quick and direct comparisons between OPR (which requires the old b), and CCWM and WMPR (which require the new b that can also be derived from the old one). Maybe we can just call the old A and b by their names, and the new A and b something like Awm and bwm? Quote:
|
Re: Incorporating Opposing Alliance Information in CCWM Calculations
Quote:
For fun, I computed the standard deviation of the prediction residual of the testing data not in the training set using the WMPR solution, 0.9*WMPR, 0.8*WMPR, etc. The standard deviation of the prediction residual of the winning margin for the test data for this particular tournament was minimized by 0.7*WMPR, and that standard deviation was down to 28.4 from 30.4 for the unscaled WMPR. So again, more evidence that the WMPR is overfit and could benefit from additional data. This doesn't change the match outcome prediction that some folks are interested in, since scaling all of the WMPRs down doesn't change the sign of the predicted winning margin which is all the match outcome prediction is. |
Re: Incorporating Opposing Alliance Information in CCWM Calculations
1 Attachment(s)
Attached are A, b, and T for 2015 OPR, CCWM, and WMPR. Also introducing a funky new metric, EPR, which uses 3 simultaneous equations for each match: 1) r1+r2+r3-b1-b2-b3 = RS-BS 2) r1+r2+r3=RS 3) b1+b2+b3=BS ... and solves all the equations simultaneously. |
Re: Incorporating Opposing Alliance Information in CCWM Calculations
Quote:
|
Re: Incorporating Opposing Alliance Information in CCWM Calculations
Quote:
Quote:
residuals = b1-Aopr*xEPR ...where b1 is column1 of the provided bopr. |
Re: Incorporating Opposing Alliance Information in CCWM Calculations
Quote:
On the same data doing the "remove one match from the training data, model based on the rest of the data, use the removed match as testing data, and repeat the process for all matches" method, I got the following results: Stdev of winning margin prediction residual OPR : 63.8 CCWM: 72.8 WMPR: 66.3 When I looked at scaling down each of the metrics to improve their prediction performance on testing data not in the training set, the best Stdevs I get for each were: OPR*0.9: 63.3 CCWM*0.6: 66.2 WMPR*0.7: 60.8 Match prediction outcomes OPR : 60 of 78 (76.9 %) CCWM: 57 of 78 (73.1 %) WMPR: 62 of 78 (79.5 %) Yeah! Even with testing data not used in the training set, WMPR seems to be outperforming CCWM in predicting the winning margins and the match outcomes in this single 2014 tournament (which again is a game with substantial defense). I'm hoping to get the match results (b with red and blue scores separately) for other 2014 tournaments to see if this is a general result. [Edit: found a bug in the OPR code. Fixed it. Updated comments. Also included the scaled down OPR, CCWM, and WMPR prediction residuals to address overfitting.] |
Re: Incorporating Opposing Alliance Information in CCWM Calculations
Quote:
|
Re: Incorporating Opposing Alliance Information in CCWM Calculations
2 Attachment(s)
Quote:
|
Re: Incorporating Opposing Alliance Information in CCWM Calculations
Iterative Interpretations of OPR and WMPR
(I found this interesting: some other folks might or some other folks might not. :) ) Say you want to estimate a team's offensive contribution to their alliance scores. A simple approach is just compute the team's average match score/3. Let's call this estimate O(0), a vector of the average match score/3 for all teams at step 0. (/3 because there are 3 teams per alliance. This would be /2 for FTC). But then you want to take into account the fact that a team's alliance partners may be better or worse than average. The best estimate you have of the contribution of a team's partners at this point is the average of their O(0) estimates. So let the improved estimate be O(1) = team's average match score - 2*average ( O(0) for a team's alliance partners). (2*average because there are 2 partners contributing per match. This would be 1*average for FTC.) This is better, but now we have an improved estimate for all teams, so we can just iterate this: O(2) = team's average match score - 2*average ( O(1) for a team's alliance partners). O(3) = team's average match score - 2*average ( O(2) for a team's alliance partners). etc. etc. This sequence of O(i) converges to the OPR values, so this is just another way of explaining what OPRs are. WMPR can be iteratively computed in a similar way. W(0) = team's average match winning margin W(1) = team's average match winning margin - 2*average ( W(0) for a team's alliance partners) + 3*average ( W(0) for a team's opponents ). W(2) = team's average match winning margin - 2*average ( W(1) for a team's alliance partners) + 3*average ( W(1) for a team's opponents ). etc. etc. This sequence of W(i) converges to the WMPR values, so this is just another way of explaining what WMPRs are. |
Re: Incorporating Opposing Alliance Information in CCWM Calculations
1 Attachment(s)
Currently we've mostly been seeing how WMPR does at a small district event with a lot of matches per team (a best-case scenario for these stats). I wanted to see how it would do in a worse case. Here's how each stat performed at "predicting" the winner of each match 2014 Archimedes Division (100 teams, 10 matches/team).
OPR: 85.6% CCWM: 87.4% WMPR: 92.2% EPR: 89.2% WMPR holds up surprisingly well in this situation and outperforms the other stats. EPR does better than OPR, but worse than WMPR. I don't really like EPR, as it seems difficult to interpret. The whole idea behind using the winning margin is that the red robots can influence the blue score. Yet EPR also models bf = b1 + b2 + b3, which is counter to this. Quote:
On another note: I've also found that it's difficult to compare WMPR's across events (whereas OPR's are easy to compare). This is because a match that ends 210-200 looks the same as one that ends 30-20. At very competitive events this becomes a huge problem. Here's an example from team 33's 2014 season. WMPRs at each Event: MISOU: 78.9 MIMID: 37.0 MITRY: 77.8 MICMP: 29.4 ARCHI: 40.8 Anyone who watched 33 at their second district event would tell you that they didn't do as well as their first, and these numbers show that. But these numbers also show that 33 did better at their second event than at the State Championship. This is clearly incorrect, 33 won the State Championship but got knocked out in the semis at their second district event. You can see pretty clearly that the more competitive events (MSC, Archimedes) result in lower WMPRs, which makes it very difficult to compare this stat across events. This occurs because using the least-norm solution has an average of zero for every event. It treats all events as equal, when they're not. I propose that instead of having the average be zero, the average should be how many points the average robot scored at that event. (So we should add the average event score / 3 to every team's WMPR). This will smooth out the differences between each event. Using this method, here are 33's new WMPRs. MISOU: 106.3 MIMID: 71.7 MITRY: 112.7 MICMP: 86.0 ARCHI: 93.5 Now these numbers correctly reflect how 33 did at each event. MIMID has the lowest WMPR, and that's where 33 did the worst. Their stats at MICMP and ARCHI are now comparable to their district events. OPR has proliferated because it's easy to understand (this robot scores X points per match). With this change, WMPR also becomes easier to understand (this robot scores and defends their opponents by X points per match). Since this adds the same constant to everybody's WMPR, it'll still predict the match winner and margin of victory with the same accuracy. Thoughts? |
Re: Incorporating Opposing Alliance Information in CCWM Calculations
Quote:
I like this idea of solving all 3 equations simultaneously. WMPR is a good improvement over CCWM because winning margin depends on who you are playing against (except in 2015), but I think EPR is even better. I will adopt it after you guys finish your validation on how well it predicts outcome. I like this EPR because it is one number instead of two. It can replace both OPR and WMPR. The problem with OPR is similar to the problem with CCWM. It does not take into account of who the opponents were. If you play against stronger opponents, you may not be able to score as many points, especially in those years with limited game pieces. Equation 1 will take care of that. It will improve on the line fitting. To me, I would interpret EPR as how many points a team will be able to score with typical opponents on the field. This eliminates the error of match schedule strength due to luck of the draw. A team may have higher than normal score because they faced weaker opponents more often. That would skew the OPR numbers. I think EPR would be more accurate in predicting match scores. Would somebody like to test it out? Another reason I like EPR is that it is easier to compute without all that SVD stuff. I would prefer high school students to be able to understand and implement this on their own. |
Re: Incorporating Opposing Alliance Information in CCWM Calculations
Quote:
I'll try to get to the verification on testing data in the next day or so. I personally like this normalized WMPR (nWMPR?) better than EPR as the interpretation is cleaner: we're just trying to predict the winning margin. EPR is trying to predict the individual scores and the winning margin and weighting the residuals all the same. It's a bit more ad-hoc. On the other hand, one could look into which weightings result in the best overall result in terms of whatever measure of result folks care about. I still am most interested in how well a metric predicts the winning margin of a match (and in my FTC android apps I also hope to include an estimate of "probability of victory" from this which incorporates the expected winning margin and the standard deviation of that expectation along with the assumption of a normally distributed residual). And using these for possible scouting/ alliance selection aids (especially for lower picks). But other folks may be interested in using them for other things. |
Re: Incorporating Opposing Alliance Information in CCWM Calculations
Here's a generalized perspective.
Let's say you pick r1, r2, r3, b1, b2, b3 to minimize the following error E(w)= w*[ (R-B) - ( (r1+r2+r3)-(b1+b2+b3) ) ]^2 + (1-w) * [ (R-(r1+r2+r3))^2 + (B- (b1+b2+b3))^2] if w=1, you're computing the WMPR solution (or any of the set of WMPR solutions with unspecified mean). if w=0, you're computing the OPR solution. if w=1-small epsilon, you're computing the nWMPR solution (as the relative values will be the WMPR but the mean will be selected to minimize the second part of the error, which will be the mean score in the tournament). if w=0.5, you're computing the EPR solution. I wonder how the various predictions of winning margin, score, and match outcomes are as w goes from 0 to 1? |
Re: Incorporating Opposing Alliance Information in CCWM Calculations
Quote:
In order for this to catch on it should 1. Be better than OPR at predicting the winner of a match 2. Be easy to understand 3. Have a catchy name 4. Apply very well to all modern FRC games 5. Be easy to compare across events I think that by adding in the average score and calling it "WMPR" we accomplish all of those things. 2015 is probably the strangest game we've had (and I would think the worst for WMPR), and yet WMPR still works pretty well. I'm not sure why scaling down gives you better results at predicting the margin. I know you said it decreases the variance of the residuals, but does it also introduce bias? Would you propose a universal scaling factor, or one dependent on the event/game? Quote:
Ax=b Where A is who played on what alliance in each match and b is the margin of victory in each match. x is the contribution from each robot to the margin. You'd expect x to be the inverse of A times b, but A is not invertable, so we use the pseudoinverse of A instead. In Matlab the code is x = pinv(A)*b And that's it, pretty simple. I agree with you though that the ultimate test would be how it performs in predicting matches. I compared it to WMPR in the 2014 Archimedes division, although that was with using the training data as the testing data, so it's probably not the best test. |
Re: Incorporating Opposing Alliance Information in CCWM Calculations
Quote:
| 2 2 2 -1 -1 -1 | |r1| = |2RS - BS| | 2 2 2 -1 -1 -1 | |r2| = |2RS - BS| | 2 2 2 -1 -1 -1 | |r3| = |2RS - BS| | -1 -1 -1 2 2 2 | |b1| = |2BS - RS| | -1 -1 -1 2 2 2 | |b2| = |2BS - RS| | -1 -1 -1 2 2 2 | |b3| = |2BS - RS| |
Re: Incorporating Opposing Alliance Information in CCWM Calculations
2 Attachment(s)
Quote:
Column D looks a lot like what you're suggesting, except it adds the average OPR instead. Also attached are 2013 A b T for OPR CCWM WMPR and EPR. The raw qual match data from TBA used to generate those is posted here. Quote:
If you look at _Aepr.CSV (or _Aepr.dat) and _bepr.CSV (or _bepr.dat) it should be pretty clear. Then you solve for EPR like so: EPR = pinv(Aepr)*bepr If you want to see what the matrix for the normal equations looks like, look at Method 2 in this post. N will be square. Quote:
Just kidding. |
Re: Incorporating Opposing Alliance Information in CCWM Calculations
Quote:
|
Re: Incorporating Opposing Alliance Information in CCWM Calculations
Quote:
|
Re: Incorporating Opposing Alliance Information in CCWM Calculations
Quote:
Again, I like it because it is one number instead of two numbers. I like it because it has a better chance to predict outcome regardless of the game, rather than OPR being good for some games and WMPR being good for some other games. |
Re: Incorporating Opposing Alliance Information in CCWM Calculations
Quote:
And from my testing, the order of predictiveness goes WMPR>EPR>OPR. The only improvement EPR has over OPR is that it's half WMPR! Why not just go all the way and stick with WMPR? Again, this is with using the training data as the testing data, if EPR is shown to be better when these are separate then perhaps we should use it instead. |
Re: Incorporating Opposing Alliance Information in CCWM Calculations
Quote:
When we have more data, multiple years and multiple events that support WMPR as the best predictor for match outcome, then I will stop looking at OPR. But sometimes in alliance selection for first round pick, without any scouting data and you want somebody for pure offense, OPR is still a good indicator. |
Re: Incorporating Opposing Alliance Information in CCWM Calculations
[Edit: The data has been updated to reflect an error in the previous code. Previously, the data was reported for the scaled down versions of the metrics in the TESTING DATA section. Now, the data is reported for the unscaled metrics (though the last table for each tournament shows the benefits of scaling them, which is substantial!)]
Here's the data for the four 2014 tournaments starting with "A". My thoughts will be in a subsequent post: Code:
2014: archi |
Re: Incorporating Opposing Alliance Information in CCWM Calculations
[Edit: my previously posted results had mistakenly reported the values for the scaled versions of OPR, CCWM, and WMPR as the unscaled values (!). Conclusions are somewhat changed as noted below.]
So my summary of the previous data: WMPR always results in the smallest training data winning margin prediction residual standard deviation. (Whew, try saying that 5 times fast.) WMPR is also very good at predicting training data match outcomes. For some reason, CCWM beats it in 1 tournament but otherwise WMPR is best in the other 3. But on the testing data, things go haywire. There are significant drops in performance in predicting winning margins for all 3 stats, showing that all 3 stats are substantially overfit. Frequently, all 3 stats give better performance at predicting winning margins by using scaled down versions of the stats. The WMPR in particular is substantially overfit (look for a later post with a discussion of this). BTW, it seems like some folks are most interested in predicting match outcomes rather than match statistics. If that's really what folks are interested in, there are probably better ways of doing that (e.g., with linear models but where the error measure better correlates with match outcomes, or with non-linear models). I'm going to ponder that for a while... |
Re: Incorporating Opposing Alliance Information in CCWM Calculations
I've been watching this thread because I'm really interested in a more useful statistic for scouting--a true DPR. I think this path may be a fruitful way to arrive at that point.
Currently the DPR doesn't measure how a team's defensive performance causes the opposing alliance to deviate from its predicted OPR. The current DPR calculation simply assumes that the OPRs of the opposing alliances are randomly distributed in a manner that those OPRs are most likely to converge on the tournament average. Unfortunately that's only true if a team plays a very large number of matches that capture potential alliance combinations. Instead we're working with a small sample set that is highly influenced by the individual teams included in each alliance. Running the DPR separately across the opposing alliances becomes a two-stage estimation problem in which 1) the OPRs are estimated for the opposing alliance and 2) the DPR is estimated against the predicted OPRs. The statistical properties become interesting and the matrix quite large. I'll be interested to see how this comes out. Maybe you can report the DPRs as well. |
Re: Incorporating Opposing Alliance Information in CCWM Calculations
1 Attachment(s)
I tested how well EPR predicted match outcomes in the four events in 2014 beginning with "a". These tests excluded the match being tested from the training data and recomputed the EPR.
EPR: ABCA: 59 out of 76 (78%) ARFA: 50 out of 78 (64%) AZCH: 63 out of 79 (78%) ARCHI: 123 out of 166 (74%) And as a reminder, here's how OPR did (as found by wgardner) OPR: ABCA: 56 out of 76 (74%) ARFA: 55 out of 78 (71%) AZCH: 59 out of 79 (75%) ARCHI: 127 out of 166 (77%) So over these four events OPR successfully predicted 297 matches and EPR successfully predicted 295. |
Re: Incorporating Opposing Alliance Information in CCWM Calculations
On the Overfitting of OPR and WMPR
I'm working on studying exactly what's going on here with respect to the overfitting of the various stats. Look for more info in a day or two hopefully. However, I thought I'd share this data point as a good example of what the underlying problem is. I'm looking at the 2014 casa tournament structure (# of teams=54 which is a multiple of 6 and the # of matches is twice the # of teams, so it fits in well with some of the studies I'm doing). As one data point, I'm replacing the match scores with completely random, normally distributed data for every match (i.e., there is absolutely no relationship between the match scores and which teams played!). Stdev of each match score is 1.0, so the winning margin is the difference between 2 and has variance of 2.0 and stdev of 1.414. I get the following result on one run (which is pretty typical). Code:
2014 Sim: casaThis is what I mean by overfitting: the metrics are modeling the match noise even when the underlying OPRs and WMPRs should all be zero. And this is why the final table shows that scaling down the OPRs and WMPRs (e.g., replace the actual OPRs by 0.9*OPRs, or 0.8*OPRs, etc.) results in a lower standard deviation in the predicted Testing data residual, because that reduces the amount of overfitting by decreasing the variance of the predicted outputs. In this case, the best weighting should be zero, as it's better to predict the testing data with 0*OPR or 0*WMPR than it is to predict with completely bogus OPRs and WMPRs. And WMPR seems to suffer from this more because there are fewer data points to average out (OPR uses 216 equations to solve for 54 values, whereas WMPR uses 108 equations to solve for 54 values). More to come... |
Re: Incorporating Opposing Alliance Information in CCWM Calculations
For posterity, the follow up work I did on this is reported and discussed the paper in this thread.
|
Re: Incorporating Opposing Alliance Information in CCWM Calculations
I just want to say how awesome you people are. My linear algebra skills are weak, but this thread has moved me a lot closer to a working understanding of the scouting stats. Thank you all for sharing your work.
|
| All times are GMT -5. The time now is 06:16. |
Powered by vBulletin® Version 3.6.4
Copyright ©2000 - 2017, Jelsoft Enterprises Ltd.
Copyright © Chief Delphi