![]() |
Re: Incorporating Opposing Alliance Information in CCWM Calculations
Quote:
And from my testing, the order of predictiveness goes WMPR>EPR>OPR. The only improvement EPR has over OPR is that it's half WMPR! Why not just go all the way and stick with WMPR? Again, this is with using the training data as the testing data, if EPR is shown to be better when these are separate then perhaps we should use it instead. |
Re: Incorporating Opposing Alliance Information in CCWM Calculations
Quote:
When we have more data, multiple years and multiple events that support WMPR as the best predictor for match outcome, then I will stop looking at OPR. But sometimes in alliance selection for first round pick, without any scouting data and you want somebody for pure offense, OPR is still a good indicator. |
Re: Incorporating Opposing Alliance Information in CCWM Calculations
[Edit: The data has been updated to reflect an error in the previous code. Previously, the data was reported for the scaled down versions of the metrics in the TESTING DATA section. Now, the data is reported for the unscaled metrics (though the last table for each tournament shows the benefits of scaling them, which is substantial!)]
Here's the data for the four 2014 tournaments starting with "A". My thoughts will be in a subsequent post: Code:
2014: archi |
Re: Incorporating Opposing Alliance Information in CCWM Calculations
[Edit: my previously posted results had mistakenly reported the values for the scaled versions of OPR, CCWM, and WMPR as the unscaled values (!). Conclusions are somewhat changed as noted below.]
So my summary of the previous data: WMPR always results in the smallest training data winning margin prediction residual standard deviation. (Whew, try saying that 5 times fast.) WMPR is also very good at predicting training data match outcomes. For some reason, CCWM beats it in 1 tournament but otherwise WMPR is best in the other 3. But on the testing data, things go haywire. There are significant drops in performance in predicting winning margins for all 3 stats, showing that all 3 stats are substantially overfit. Frequently, all 3 stats give better performance at predicting winning margins by using scaled down versions of the stats. The WMPR in particular is substantially overfit (look for a later post with a discussion of this). BTW, it seems like some folks are most interested in predicting match outcomes rather than match statistics. If that's really what folks are interested in, there are probably better ways of doing that (e.g., with linear models but where the error measure better correlates with match outcomes, or with non-linear models). I'm going to ponder that for a while... |
Re: Incorporating Opposing Alliance Information in CCWM Calculations
I've been watching this thread because I'm really interested in a more useful statistic for scouting--a true DPR. I think this path may be a fruitful way to arrive at that point.
Currently the DPR doesn't measure how a team's defensive performance causes the opposing alliance to deviate from its predicted OPR. The current DPR calculation simply assumes that the OPRs of the opposing alliances are randomly distributed in a manner that those OPRs are most likely to converge on the tournament average. Unfortunately that's only true if a team plays a very large number of matches that capture potential alliance combinations. Instead we're working with a small sample set that is highly influenced by the individual teams included in each alliance. Running the DPR separately across the opposing alliances becomes a two-stage estimation problem in which 1) the OPRs are estimated for the opposing alliance and 2) the DPR is estimated against the predicted OPRs. The statistical properties become interesting and the matrix quite large. I'll be interested to see how this comes out. Maybe you can report the DPRs as well. |
Re: Incorporating Opposing Alliance Information in CCWM Calculations
1 Attachment(s)
I tested how well EPR predicted match outcomes in the four events in 2014 beginning with "a". These tests excluded the match being tested from the training data and recomputed the EPR.
EPR: ABCA: 59 out of 76 (78%) ARFA: 50 out of 78 (64%) AZCH: 63 out of 79 (78%) ARCHI: 123 out of 166 (74%) And as a reminder, here's how OPR did (as found by wgardner) OPR: ABCA: 56 out of 76 (74%) ARFA: 55 out of 78 (71%) AZCH: 59 out of 79 (75%) ARCHI: 127 out of 166 (77%) So over these four events OPR successfully predicted 297 matches and EPR successfully predicted 295. |
Re: Incorporating Opposing Alliance Information in CCWM Calculations
On the Overfitting of OPR and WMPR
I'm working on studying exactly what's going on here with respect to the overfitting of the various stats. Look for more info in a day or two hopefully. However, I thought I'd share this data point as a good example of what the underlying problem is. I'm looking at the 2014 casa tournament structure (# of teams=54 which is a multiple of 6 and the # of matches is twice the # of teams, so it fits in well with some of the studies I'm doing). As one data point, I'm replacing the match scores with completely random, normally distributed data for every match (i.e., there is absolutely no relationship between the match scores and which teams played!). Stdev of each match score is 1.0, so the winning margin is the difference between 2 and has variance of 2.0 and stdev of 1.414. I get the following result on one run (which is pretty typical). Code:
2014 Sim: casaThis is what I mean by overfitting: the metrics are modeling the match noise even when the underlying OPRs and WMPRs should all be zero. And this is why the final table shows that scaling down the OPRs and WMPRs (e.g., replace the actual OPRs by 0.9*OPRs, or 0.8*OPRs, etc.) results in a lower standard deviation in the predicted Testing data residual, because that reduces the amount of overfitting by decreasing the variance of the predicted outputs. In this case, the best weighting should be zero, as it's better to predict the testing data with 0*OPR or 0*WMPR than it is to predict with completely bogus OPRs and WMPRs. And WMPR seems to suffer from this more because there are fewer data points to average out (OPR uses 216 equations to solve for 54 values, whereas WMPR uses 108 equations to solve for 54 values). More to come... |
Re: Incorporating Opposing Alliance Information in CCWM Calculations
For posterity, the follow up work I did on this is reported and discussed the paper in this thread.
|
Re: Incorporating Opposing Alliance Information in CCWM Calculations
I just want to say how awesome you people are. My linear algebra skills are weak, but this thread has moved me a lot closer to a working understanding of the scouting stats. Thank you all for sharing your work.
|
| All times are GMT -5. The time now is 06:16. |
Powered by vBulletin® Version 3.6.4
Copyright ©2000 - 2017, Jelsoft Enterprises Ltd.
Copyright © Chief Delphi