|
|
|
![]() |
|
|||||||
|
||||||||
![]() |
|
|
Thread Tools |
Rating:
|
Display Modes |
|
|
|
#1
|
|||
|
|||
|
Re: "standard error" of OPR values
Quote:
OPR should then yield an estimate of the mean of this distribution. An estimate of the standard deviation can be obtained, as mentioned, by taking the RMS of the residuals. To approximate the standard deviation of the mean (which is what is usually meant by "standard error" of these sorts of measurements), one would then divide this by sqrt(n) (for those interested in a proof of this, simply consider the fact that when summing random variables, variances add), where n is the number of matches used in the team's OPR calculation. This, of course, fails if the assumptions we made at the outset aren't good (e.g. OPR is not a good model of team performance). Moreover, even if the assumptions hold, if the distribution of the random variable describing a team's performance in a given match is sufficiently wonky that the distribution of the mean is not particularly Gaussian then one is fairly limited in the conclusions they can draw from the standard deviation, anyway. Last edited by Oblarg : 17-05-2015 at 03:30. |
|
#2
|
||||
|
||||
|
Re: "standard error" of OPR values
Quote:
What you say holds if one is taking a number of independent, noisy measurements of a value and computing the mean of the measurements as the estimate of the underlying value. So that would work if OPR was computed by simply averaging the match scores for a team (and dividing by 3 to accommodate for 1/3 of the match score being due to each team's contribution). But that's not the way OPR is computed at all. It's computed using linear regressions and all of the OPRs for all of the teams are computed simultaneously in one big matrix operation. For example, it isn't clear to me what n should be. You say "n is the number of matches used in the team's OPR calculation." But all OPRs are computed at the same time using all of the available match data. Does n count matches that a team didn't play in, but that are still used in the computation? Is n the number of matches a team has played? Or the total matches? OPR can be computed based on whatever matches have already occurred at any time. So if some teams have played 4 matches and some have played 5, it would seem like the OPRs for the teams that have played fewer matches should have more uncertainty than the OPRs for the teams that have played more. And the fact that the computation is all intertwined and that the OPRs for different teams are not independent (e.g., if one alliance has a huge score in one match, that affects 3 OPRs directly and the rest of them indirectly through the computation) seems to make the standard assumptions and arguments suspect. Thoughts? Last edited by wgardner : 17-05-2015 at 07:20. |
|
#3
|
||||
|
||||
|
Re: "standard error" of OPR values
Quote:
Wikipedia gives the following definition: "The standard error (SE) is the standard deviation of the sampling distribution of a statistic, most commonly of the mean. The term may also be used to refer to an estimate of that standard deviation, derived from a particular sample used to compute the estimate." So my real question is perhaps what is the statistic, or what are we trying to estimate, or what are we "computing the mean" of? At each tournament, we have many matches and can get a standard error for the predicted match results because we have many predicted match results and can compute the standard deviation of the distribution of the errors in the predictions. But each tournament only provides one single OPR estimate for each team. It's tough to compute a standard error on these OPR estimates based on this 1 sample because you only have the 1 data point. If OPRs are a value that we estimate for each team at each tournament and we expect them to stay the same from tournament to tournament (stop laughing now), we can compute the standard deviation in each team's independent OPR values across all of the tournaments in a season to get a standard error for those values. Then you could use the standard error to estimate the distribution of a team's OPR in future tournaments based on their previous tournament results. And I suppose you could also view this same standard error to estimate how much a team's OPR might vary if the same tournament was run again and we had a different set of random match outcomes. But I'm guessing that what you're really interested in is: if the same tournament were run multiple times and if the match results varied randomly as we modeled (yeah, yeah, and if everybody had a can opener), what would be the standard error of the OPR estimates? Or in other words, what if the same teams with the same robots and the same drivers played in 100 tournaments back-to-back and we computed the OPR for each team for all 100 tournaments, what would be the standard error for these 100 different OPR estimates? If this is the question you're interested in, then now we have a statistic that we can compute the standard error for. Let's look into this. Let's let the OPR vector for all of the teams be called O (t x 1 vector, where t is the # of teams). Let's let the match scores be called M (m x 1, where m is the number of scores or 2x the number of actual matches). So we're modeling the matches as: M = A O + N where A is an m x t matrix with the i,jth element equal to 1 if team j was a member of the alliance leading to the ith match score and 0 otherwise, and where N is an m x 1 noise vector with variance equal to the variance of the prediction residual for each match score. Let's call this variance sig^2. Given M, the least squares estimate for OPR is calculated as Oest = Inv(A' A) A' M = Inv(A' A) A' (A O + N) = O + Inv(A' A) A' N As N is zero-mean, Oest has mean O (which we want) and variance equal to the variance of the second term, Inv(A' A) A' N. Note that Inv(A' A) A' is a t x m matrix that is solely a function of the match schedule. The variance of the estimated OPR for the ith team is the variance of the ith element of Oest, which is sig^2 * (sum of the squared values of the elements in the ith row of Inv(A' A) A' ). This can be different for each team if the match schedule represented in A is unbalanced (e.g., if when a live OPR is being computed during a tournament, some teams have played more matches than others). I would hope for a complete tournament with a balanced match schedule that these variances would be equal or very nearly so. But it would be interesting to compute Inv (A' A) A' for a tournament and see if the sum of the squared values of each row are truly the same. Then finally the standard error for each estimate is just the standard deviation, or the square root of the variance we just computed. To summarize the whole thing: If a tournament has random match scores created by M = A O + N where N is zero mean and variance = sig^2, and if you estimate the underlying O values by computing Oest = Inv (A' A) A' M, then the ith team's OPR estimate which is the ith value of the Oest vector will have mean equal to the ith value of the O vector, will have variance = sig^2 * (sum of the squared values of the elements in the ith row of the matrix Inv(A' A) A'), and thus will have a "standard error" equal to the square root of this variance. To estimate this for a particular tournament, you first compute the OPR estimate O and compute sig^2 as the variance of the regression error in the predicted match results. Then you compute the matrix Inv(A' A) A' from the match schedule and then finally compute the standard errors as described. Too much for a Sunday morning. Thoughts? Last edited by wgardner : 17-05-2015 at 09:09. Reason: fixed minor error in derivation |
|
#4
|
||||
|
||||
|
Re: "standard error" of OPR values
And a follow up:
Take the above derivation, but let's pretend that each match score is only the result of 1 team's efforts, not 3. So in this case, each row of A would only have a single 1 in it, not 3. In this pretend case, the OPR IS exactly just computing the average of that team's match scores(!). A' A is diagonal and the diagonal elements are the number of matches that a team has played, so Inv (A' A) is diagonal with diagonal elements that are 1/ the number of matches that a team has played. Then the i,jth elements of Inv (A' A) A' are just 1/the number of matches a team has played if team i played in match j or 0 otherwise. The variance of the Oest values in this pretend case is the variance of the prediction residual / number of matches that a team has played, and thus the standard error of the Oest value is the standard error of the match predictions divided by the square root of the number of matches that a team has played. So this connects Oblarg's statements to the derivation. If match results were solely the result of one team's efforts, then the standard error of the OPR would just be the standard error of the match prediction / sqrt(n), where n is the number of matches that a team has played. But match results aren't solely the result of one team's efforts, so the previous derivation holds in the more complicated, real case. Last edited by wgardner : 17-05-2015 at 08:05. |
|
#5
|
|||
|
|||
|
Re: "standard error" of OPR values
Quote:
|
|
#6
|
||||
|
||||
|
Re: "standard error" of OPR values
Quote:
Last edited by wgardner : 17-05-2015 at 10:52. |
|
#7
|
|||
|
|||
|
Re: "standard error" of OPR values
Quote:
What I'm saying is that our model is that M = AO, where M and O are both vectors whose elements are random variables. Writing O as a vector of flat means and adding a noise vector N doesn't really gain you anything - in our underlying model, the *teams* have fundamental variances, not the matches. The match variances can be computed from the variances of each team's O variable. Now, we have the problem that we cannot directly measure the variance of each element of O, because the only residuals we can measure are total for each match (the elements of the "noise vector" N). However, we can do another linear least-squares fit to assign estimated variance values for each team, which I believe is precisely what your solution ends up doing. Last edited by Oblarg : 17-05-2015 at 11:17. |
|
#8
|
||||
|
||||
|
Re: "standard error" of OPR values
Quote:
|
|
#9
|
||||
|
||||
|
Re: "standard error" of OPR values
Quote:
We use OPR to estimate the score of an alliance in a match. Or to be even more precise, we compute the OPR values as the ones that result in the best linear prediction of the match results. If we have an alliance run the same match over and over, we will see a variability in the match results and a variability in the prediction error we get when we subtract the actual match results from the OPR-based prediction. We can compute the standard error of this prediction error. This SE tells us the probability range that we would expect the match result to fall in, but doesn't tell us anything about the range that we would expect OPR estimates to fall in over a full tournament. I'm confused by this sentence: "So if we run the same match over and over, we would expect to see a similar OPR." ??? |
|
#10
|
|||
|
|||
|
Re: "standard error" of OPR values
Quote:
Code:
Team Original OPR Mean OPR Standard Deviation StdDev / Mean 1023 119.9222385 120.0083320153 11.227427964 0.0935554038 234 73.13049299 72.801129356 8.9138064084 0.1224404963 135 71.73803792 72.0499437529 7.953512079 0.1103888728 1310 68.29454232 69.3467152712 14.1978070751 0.2047365476 1538 66.51660956 65.739882921 10.0642899215 0.1530926049 1640 63.89355804 63.1124212044 12.5486944006 0.1988308191 4213 59.83218159 60.3799737845 9.7581954471 0.1616131117 2383 59.3454496 58.4390556944 8.8170835924 0.1508765583 5687 58.89565276 58.0801454327 8.5447703278 0.1471203328 2338 57.52050487 57.8998084926 9.9345796042 0.1715822533 68 57.31570571 57.5000280561 7.3734953486 0.1282346391 2342 56.91016998 57.2987212179 6.6038945531 0.115253786 2974 55.52108592 57.1342122847 8.3752237419 0.1465885921 857 56.58983207 56.5258351411 7.2736015551 0.1286774718 2619 55.87939909 55.7690519681 8.4202867997 0.150984937 314 54.93283739 54.2189755764 9.2781646413 0.1711239385 4201 54.36868175 53.4393101098 10.5474638148 0.1973727541 2907 52.20131966 52.8528874425 7.542822466 0.1427135362 360 50.27624758 50.4115562132 7.0992892482 0.1408266235 5403 50.29915841 50.3683881678 6.7117433122 0.133253089 201 45.9115291 44.7743914139 8.4846178186 0.189497111 2013 44.91032156 44.6243506137 6.8765159824 0.1540978387 3602 44.27190346 44.0845482182 9.1690079569 0.2079868872 207 43.76003325 43.534273676 9.6975195297 0.2227559739 1785 42.88695283 43.4312399486 8.2699452851 0.1904146714 1714 43.01192386 42.548981107 10.4744349747 0.2461735793 2848 42.09926229 42.3315382699 5.5963086425 0.1322018729 5571 41.52437471 41.7434170692 9.1647109829 0.2195486528 3322 41.46602143 41.5494849767 7.1743838875 0.1726708259 4334 40.44991373 41.05033774 8.7102627815 0.2121849237 5162 40.45440709 40.9929568271 8.2624477928 0.2015577414 5048 39.89000748 40.3308767357 11.0199899828 0.2732395344 2363 39.94545778 40.1152579819 6.6177263936 0.1649678134 280 39.5619946 39.5341268065 7.3717432763 0.1864653117 4207 38.2684727 39.4991498122 6.9528849981 0.1760261938 5505 39.67352888 38.9668291926 11.3348728596 0.2908851732 217 36.77649547 37.4492632177 6.4891284445 0.1732778668 836 36.43648963 37.0437210956 12.1307341233 0.3274707228 503 36.81699351 36.7802949819 7.9491833149 0.2161261436 1322 36.38199798 36.7254993257 8.5268395114 0.2321776332 4451 35.19372256 35.3483644749 9.807710599 0.2774586815 623 34.52165055 35.1189107974 7.930898959 0.2258298671 1648 35.50610406 35.0638323174 10.815198205 0.3084431304 51 34.66010328 34.6703806244 5.4485310273 0.157152328 122 34.32806143 33.5962803896 7.5092149942 0.223513285 115 31.91437124 31.3399395607 8.4108320311 0.2683742263 5212 30.01729221 30.4525516362 8.9862156315 0.2950890861 1701 29.87650404 30.3212455768 6.3833025833 0.2105224394 3357 29.17742219 29.6022237315 6.381280757 0.2155676146 1572 29.88934385 29.5148636895 7.882621955 0.2670729582 3996 29.80296599 29.071104692 12.1221539603 0.4169829144 2655 26.12997208 26.8414199039 8.2799141902 0.3084752677 3278 27.75400612 26.676383757 8.7090459236 0.3264702593 2605 26.77170149 26.4416718205 7.2093344642 0.2726504781 2914 25.16358084 25.6405460981 8.2266061339 0.3208436397 5536 25.12712518 25.537683706 8.9692243899 0.3512152666 108 25.12900331 24.9994393089 8.1059495087 0.3242452524 4977 23.84091367 24.1678220977 8.8309117942 0.3653995697 931 20.64386303 20.6395850124 9.7862519781 0.4741496485 3284 20.6263851 20.3004828941 7.7358872421 0.3810691244 5667 20.24853487 20.2012572648 10.5728126478 0.5233739915 188 19.63432177 19.5009951172 8.527091207 0.4372644142 5692 17.52522898 16.9741593261 9.9533189003 0.5863806689 1700 15.35451961 15.0093164719 7.5208523959 0.5010789405 4010 12.26210563 13.9952121466 9.8487154699 0.7037203414 1706 12.6972477 11.7147928015 6.1811481569 0.5276361487 3103 12.14379904 11.6822069225 8.4008681879 0.7191165371 378 11.36567533 11.6581748916 8.2483175766 0.7075136248 3238 8.946537399 9.2298154231 9.6683698675 1.0475149745 5581 9.500192257 8.7380812257 8.2123397521 0.9398333044 5464 4.214298451 5.4505495437 7.2289498778 1.326279088 41 5.007828439 4.3002816244 9.0353666405 2.1011104457 2220 4.381189923 4.2360658386 6.880055327 1.6241615662 4364 4.923793169 3.504087428 8.6917749423 2.4804674886 1089 1.005273551 0.9765385053 6.9399339807 7.1066670109 691 -1.731531162 -1.2995295456 11.9708242834 9.2116599609 In terms of whether this is a valid way of looking at it, I'm not sure--the results seem to have some meaning, but I'm not sure how much of it is just that only looking at 200 scores is even worse than just 254, or if there is something more meaningful going on. *Using python's random.sample() function. This means that I did nothing to prevent duplicate runs (which are extremely unlikely; 254 choose 200 is ~7.2 * 10^55) and nothing to ensure that a team didn't "play" <3 times in the selection of 200 scores. |
|
#11
|
||||
|
||||
|
Re: "standard error" of OPR values
Quote:
If you are asking for individual standard error associated with each OPR value, no one ever posts them because the official FRC match data doesn't contain enough information to make a meaningful computation of those individual values. In a situation, unlike FRC OPR, where you know the variance of each observed value (either by repeated observations using the same values for the predictor variables, or if you are measuring something with an instrument of known accuracy) you can put those variances into the design matrix for each observation and compute a meaningful standard error for each of the model parameters. Or if, unlike FRC OPR, you have good reason to believe the observations are homoscedastic, you can compute the variance of the residuals and use that to back-calculate standard errors for the model parameters. If you do this for FRC data the result will be standard errors which are very nearly the same for each OPR value... which is clearly not the expected result. |
|
#12
|
||||
|
||||
|
Re: "standard error" of OPR values
Quote:
|
|
#13
|
||||
|
||||
|
Re: "standard error" of OPR values
I think you missed my point entirely. Yes, they can be computed, but that doesn't mean they are statistically valid. They are not, because the data does not conform to the necessary assumptions. Quote:
Quote:
Quote:
Quote:
|
|
#14
|
||||
|
||||
|
Re: "standard error" of OPR values
So it's not possible to perform a statistically valid calculation for standard deviation? Are there no ways to solve for it with a system that is dependent on other robots' performances?
|
|
#15
|
|||||
|
|||||
|
Re: "standard error" of OPR values
Quote:
Quote:
As it turns out, I was recently asked for the average time it takes members of my branch to produce environmental support products. Because we get requests that range from a 10 mile square box on one day to seasonal variability for a whole ocean basin, the (requested) mean production time means nothing. For one class of product, the standard deviation of production times was greater than the mean. Without the scatter info, the reader would have probably assumed that we were making essentially identical widgets and that the scatter was +/- 1 or 2 in the last reported digit. |
![]() |
| Thread Tools | |
| Display Modes | Rate This Thread |
|
|