![]() |
Overview and Analysis of FIRST Stats
1 Attachment(s)
Another thread for stat nerds. Again, if you don't know or don't care about OPR and CCWM and how they're calculated, this thread will probably not interest you.
---------------------------------------------------------------- Based on the recent thread on stats, I did way too much study and simulation of what the different stats do in different situations. I've attached a very long paper with all of the findings. Many thanks to Ether who helped a lot with behind-the-scenes comments and data generation. I think he's off working on some related ideas that I suspect we'll all hear about soon. The Overview and Conclusions of the paper are included below. ----------------------------------------------------------------- Overview: This paper presents and analyzes a wide range of statistical techniques that can be applied to FIRST Robotics Competition (FRC) and FIRST Tech Challenge (FTC) tournaments to rate the performance of teams and robots competing in the tournament. The well-known Offensive Power Rating (OPR), Combined Contribution to Winning Margin (CCWM), and Defensive Power Rating (DPR) measures are discussed and analyzed. New measures which incorporate knowledge of the opposing alliance members are discussed and analyzed. These include the Winning Margin Power Rating (WMPR), the Combined Power Rating (CPR), and the mixture-based Ether Power Rating (EPR). New methods are introduced to simultaneously estimate separate offensive and defensive contributions of teams. These methods lead to new, related simultaneous metrics called sOPR, sDPR, sWMPR, and sCPR. New MMSE estimation techniques are introduced. MMSE techniques reduce overfitting problems that occur when Least Squares (LS) parameter estimation techniques are used to estimate parameters on a relatively small data set. The performance of LS and MMSE techniques is compared over a range of scenarios. All of the techniques are analyzed over a wide range of simulated and actual FRC tournament data, using results from the 2013, 2014, and 2015 FRC seasons. ----------------------------------------------------------------- Conclusions New improved techniques for incorporating defense into FRC and FTC tournament statistics have been introduced. New MMSE techniques for estimating model parameters have been introduced. Most FRC tournaments do suffer from a small data size, causing Least Squares estimates to be overfit to the noisy tournament data which degrades their performance in predicting match outcomes not in the Training set. MMSE techniques appear to provide limited but significant and consistent improvements in match score and winning margin prediction compared to similar Least Squares techniques. While incorporating defense into the statistics using MMSE estimation techniques does not result in any decrease in the statistical prediction performance, the advantages in doing so are usually quite small and may make it not worth the effort to do so unless a given FRC season is expected to have substantial defensive components. Occasionally incorporating defense can result in around an 8-12% further reduction in winning margin prediction error (e.g., 2014 casb, 2015 incmp, 2015 micmp tournaments), but this is rare. MMSE based estimation of the sOPR, sDPR, and sCPR parameters results in the smallest squared prediction error for match scores and match winning margins across all of the studied parameters. MMSE based estimation of OPR parameters often produces results that are quite close. Least Squares estimates of OPR, CCWM, and DPR using FRC tournament data probably overestimate the relative differences in ability of the teams. MMSE estimates probably underestimate the relative differences. The small amount of data created in FRC tournaments results in noisy estimates of statistics. Testing set match outcomes from 2013-2015 often had very significant random components to them that could not be predicted by the best linear prediction methods, most likely due to purely random issues that occur in FRC matches. |
Re: Overview and Analysis of FIRST Stats
Thanks William. That's a super useful paper. My takeaway is that while stats that incorporate defence might help a little, good old (LS based) OPR seems to be just about as good in most cases.
I have another area of study you could look into. I like to use OPR during tournaments for two purposes: 1) to derive a strength of schedule (basically the predicted winning margin). This is useful for deciding where best to spend limited resources on strategy planning, and alliance-mate repairs. 2) as a first order sort for pick list order Both of these use cases suffer from even more drastic lack of data points. For the SOS, I use previous tournament data, and for the pick list, we've usually only completed 70% of the matches. For these use cases, it would be valuable to know: for 1) Which predictor does best when the training set and the testing set are from different tournaments. For 2) which predictor does best with 70% or less of the matches in the training set. (Maybe this is where MMSE solutions will shine) Lastly, I'm intrigued by the possibility of incorporating scouting data (such as individual team point counts) into the MMSE calculation as a way to verify scouting records, but i have to think about that more. If you can respond to any of these questions, I'd be obliged, otherwise thanks again for sharing your work! |
Re: Overview and Analysis of FIRST Stats
This is a really well written paper, thanks for putting it together!
I have some questions about how to choose VarD/VarO and VarN/VarO since I'm unfamiliar with MMSE estimation. How would you go about choosing these values during/before an event? Quote:
Quote:
|
Re: Overview and Analysis of FIRST Stats
Quote:
They do vary from tournament tournament because again there just isn't enough data in a tournament to settle on a "true" underlying value. |
Re: Overview and Analysis of FIRST Stats
Quote:
OPR doesn't even exist when the # of matches played is less than the number of teams/2, and then suddenly it exists but is noisy, and then it progressively gets less noisy as more matches are played. So I was looking for a way to show stats, and it seemed like the stats should slowly incorporate information as matches are played. The app currently predicts match scores and winning margin, but I'd also like to incorporate a "probability of victory" measure to show what kind of confidence exists. The MMSE approach allows for estimated stats regardless of how many matches are played. I'll try to run some sims with 0-100%of matches played to see how well things work over time. It also occurred to me to try to predict the match outcomes for the simulated tournaments where the underlying stats are completely known just to see the limits of how well match prediction could be if perfect knowledge of the underlying parameters existed. Another thing that I could do would be to simulate how picking alliances based on the estimated stats would do vs. picking them based on the underlying stats. For example, if the top 3 teams are picked based on the various estimates (LS OPR, MMSE OPR, MMSE sCPR, etc.) and they are compared with the top 3 teams in the simulated tournaments where the underlying actual data is known (the actual underlying O and D), how many fewer points will the alliance end up scoring on average? This might be the real question that folks want to know... Gotta run now: more later. |
Re: Overview and Analysis of FIRST Stats
Quote:
Both 2014 galileo and newton sCPR searches picked VarN/VarO=3. If VarD/VarO = 0, then the total match variance would be 3*VarO + VarN = 6*VarO. If on the other hand VarD/VarO = 0.1, then the total match variance would be 3*VarO + 3*VarD + VarN = 6.3*VarO. So while this looks like a really different result, we're only talking about a change in about 5% of the overall variance in a match that could be predicted with VarD/VarO being 0.0 or 0.1. Instead, it might be helpful to increase the step size in the VarN/VarO search, which is currently 1 (!), so each step in that search could cause a much greater change in the match variance. |
Re: Overview and Analysis of FIRST Stats
1 Attachment(s)
The attached png file shows some interesting data.
This is for the 2014 casa tournament with simulated data using the model from the paper with Var(O)=100 (or stdDev(O)=10), Var(D)=0, and Var(N)=3*Var(O). The tournament had 54 teams, so each team got to play 1 time every 9 total matches. The tournament had 108 total matches, or 12 matches played by every team. Each of the first 4 plots shows the estimated OPRs vs. the number of matches played per team (so X=1 means 9 total matches, X=2 means 18 total matches, etc.). The data points from 1-12 on the X axis correspond to 1 match per team, ... up to 12 matches per team (the whole tournament). The 13th point on the X axis is the actual underlying O values. Plot 1 corresponds to the traditional Least Squares (LS) OPRs, which is also the MMSE solution where Var(N) is estimated to be equal to 0. Note that there are no OPR values until each team has played 4 matches, as that's the number of matches needed to make the matrix invertible. Plot 2 corresponds to the MMSE OPR estimates where Var(N) is estimated to be equal to 1* Var(O). As the actual Var(N)=3*Var(O), this is underestimating the noise in each match. Plot 3 corresponds to the MMSE OPR estimates where Var(N) is estimated to be equal to 3* Var(O) (the "correct" value). Plot 4 corresponds to the MMSE OPR estimates where Var(N) is estimated to be equal to 10* Var(O), greater than the actual noise. Plot 5 shows the percentage error each curve has in estimating the actual underlying O values. Comments: The LS OPR values start out crazy and then settle down a bit. Looking at the step from X=12 (the final OPRs) to X=13 (the "real" O values), you can see that the final OPRs have more variance than the real O values. This means that the final OPRs are still overestimating the variance of the abilities of the teams. Look at the X=1 points for Plots 2-4. The MMSE estimates start conservatively with the OPRs bunched around the mean and then progressively expand out. Plot 4 shows the noise overestimated (the most conservative estimate), so the OPRs start out very tightly bunched and stay that way. Plot 2 starts out wider, and Plot 3 starts out in the middle. Interestingly, you can see that each X=1 point for the MMSE plots have 3 teams with the same estimate. This makes sense, as after having played 1 match, the 3 teams on each alliance are indistinguishable from each other and it requires more than 1 match played by each team to start separating them. Look at the X=12 (the final estimates) vs X=13 (the "real" O values) points for Plots 2-4. Plot 2 looks like it's still over estimating the variance, Plot 3 has it about right, and Plot 4 has underestimated the true variance even at the end of the tournament (you see the Plot 4 OPRs expand out from X=12 to X=13). [Edit: checking the numbers for the run shown, the variances of the OPRs computed by LS, MMSE 1, MMSE 3, and MMSE 10 were respectively 164, 138, 102, and 47, confirming the above comment. The MMSE 3 solution using the "right" Var(N) estimate is quite close to the true underlying variance of 100. Over multiple runs, the MMSE 3 solution is slightly biased under 100 on average, showing that more matches are needed for it to converge to the "right" variance. All of the techniques do eventually converge to the right solution and variance if the tournament is simulated to be much greater than 108 matches.] In Plot 5, the performances of the different techniques get close to each other as the tournament nears completion. They should all converge as the number of matches grows large as the LS and MMSE solutions will eventually converge to each other. But they are off by quite a bit early on. Even though the MMSE 1 solution with Var(N) underestimated at 1*Var(O) is underestimating the Var(N), it still gives pretty good results. |
Re: Overview and Analysis of FIRST Stats
Super cool. It definitely looks like MMSE OPR gives better early predictions, at least in that data set. Now I need to write an app to give us live calculations during an event...
Did you get a chance to look at using training data from previous events? For instance, does MMSE OPR values from the last regional accurately predict a teams performance at CMP? What should we give for the MMSE parameter estimates in this case? |
Re: Overview and Analysis of FIRST Stats
Quote:
|
Re: Overview and Analysis of FIRST Stats
Quote:
It’s also impressive at how much better MMSE techniques are for when an event is underway and not a lot of matches have been played. This is helpful for predicting the outcome of the second day of matches (and thus seeing who is likely to be a captain). Is this behavior typical for all stats or just OPR? Could you run a similar plot for sCPR? (since that stat seems to do slightly better than OPR). Additionally, how would you implement the techniques described in the “Advanced MMSE Estimation” section? What would you change in the pseudocode to, for instance, change a team’s apriori Oi? |
Re: Overview and Analysis of FIRST Stats
2 Attachment(s)
Quote:
The top row is with the estimated Var(D)=0 and the estimated Var(N)=0, 1, 3, and 10 as before. So the top left corner is regular OPRs and the top row is MMSE OPRs. The middle row is the same but with Var(D) estimated at 0.1 and the bottom row is the same with with V(D) estimated at 0.2. Note that with Var(D)>0 and Var(N)=0, the results are always the same, the "vanilla" LS sCPR. That's shown in the left middle. Even worse than the OPR, it doesn't really start having values until the rank of the matrix is 2*#teams-1 which is at the 7th match per team. It is VERY overfit which is why it starts noisy and stays noisy. The bottom left shows the plots of the percentage of the combined O+D (or O in the case when D=0) left after prediction. It's saturated to be no worse than 100%, though the LS OPR and sCPR are worse than 100% when the number of matches is small, meaning that the prediction error has more variance than the original set of parameters (!). The black curve is the LS OPR and the red curve is the LS sCPR which is so overfit that it's worse than nothing until the very last match is played. Quote:
Basically, there's a general equation for arbitrary expected mean vectors and covariance matrices of both the parameters and the noise, so you can run the estimation algorithm given any set of expected mean vectors and covariance matrices. |
Re: Overview and Analysis of FIRST Stats
1 Attachment(s)
Quote:
In this equation, x is the parameter you're trying to estimate (like O), z is the noise (like N), xhat is your estimated parameters, xbar is the expected mean, Cx is the covariance matrix of x, and Cz is the covariance matrix of the noise. For example, in my MMSE equation for the OPRs, xbar is just Oave (but you could have it be a vector with team specific expectations). A* xbar is just the average match outcome which I have as 3*Oave (but again, if you expect xbar to be team specific then that would cause non-constant match mean scores). Cz is just sig2n * I and Cx is just sig2o * I. I plugged these in and simplified the equations. But if things are more complicated (like with EPR), then you just plug in whatever complicated assumptions you have and go from there. It would be neat if we could study the best predictor of a team's OPR at championships from the OPR they had in their last regional before championships using data from previous years. We'd probably come up with a mean and variance of this best predictor, and then we could plug these in and have some expectations for what championships would look like even before the first match was played. Then as matches are played, the values update to include the new information using the MMSE equation with changing A and Mo. |
Re: Overview and Analysis of FIRST Stats
1 Attachment(s)
The dust seems to have settled in this thread so I thought I'd toss this out for discussion. |
Re: Overview and Analysis of FIRST Stats
Quote:
I think this is essentially minimizing another "mixture of errors" like the original EPR proposal. In this case, you're minimizing the sum of the standard error measure that leads to your normal parameters plus the squared error between the parameters and your a priori expectations of them, with a weighting factor added in to adjust the importance of the original measure and your new second measure. If you form your error measure this way, then take the derivative of the error squared, set the result equal to zero, and solve for the parameters, you end up with the equation that you're solving. That's another way to incorporate a priori expectations of the means, like the MMSE estimates do. And if you keep the weights constant, it would progressively go from a priori info at the start of a tournament to incorporating mainly match info later in a tournament, again like the MMSE estimates do. About the only thing that would be hard to do using this method would be incorporating a priori variance estimates, but then I suspect that this "feature" of the MMSE estimates has limited practical utility anyway. Did you do any testing of this method to see how it does and how well it compares? If you were so inclined, I'd love to see a database with the following data for Championships for the past 3 years: . Ar, Ab, Mr, Mb for the CMP divisions (which we already have because you already generated them: thanks again!) . OPR, OPRm1, and OPRm3 for all of the regional tournaments that all of the CMP teams played in that year, and perhaps the mean and variance of each of these statistics at the respective tournaments. I'm calling OPRm1 and OPRm3 the MMSE estimates with Var(N)/Var(O) estimated at 1 and 3 respectively. Ideally, there might be csv files with a row for each team in each CMP division (in the same order as the columns of the corresponding CMP Ar and Ab matrices) and columns with a 0 if the team didn't play in that week or a number if the team did play (e.g., OPR, or mean OPR for that tournament that week, etc). So, for example, for 2014 archi, there might be the following files, each containing 100 rows (one per team) by 7 columns (one per week): ARCHI_OPR ARCHI_OPRm1 ARCHI_OPRm3 ARCHI_OPR_mean ARCHI_OPRm1_mean ARCHI_OPRm3_mean ARCHI_OPR_var ARCHI_OPRm1_var ARCHI_OPRm3_var Any chance you'd be up for generating this data? Given this data, we could test out different methods for predicting the match outcomes of the CMP divisions. I suggest that it would be interesting to try to predict A. The results of CMP matches completely based off the results from previous tournaments. B. The results of the last 3/4 of the CMP matches based off the results from previous tournaments AND the first 1/4 of the CMP matches. C. The results of the last 1/2 of the CMP matches based off the results from previous tournaments AND the first 1/2 of the CMP matches. |
Re: Overview and Analysis of FIRST Stats
I think this might just be mmse the more I think about it. Plug Oave in for your a priori means and stdevN /stdevO in for your weights and solve and I think you just get the mmse equations. I'm out now and typing on my phone but will look into it more later.
|
| All times are GMT -5. The time now is 15:10. |
Powered by vBulletin® Version 3.6.4
Copyright ©2000 - 2017, Jelsoft Enterprises Ltd.
Copyright © Chief Delphi