|
|
|
| Track balls are red, Or they are blue - I get deflated, When I'm without you. |
![]() |
|
|||||||
|
||||||||
![]() |
| Thread Tools |
Rating:
|
Display Modes |
|
#31
|
||||
|
||||
|
Re: "standard error" of OPR values
Quote:
Wikipedia gives the following definition: "The standard error (SE) is the standard deviation of the sampling distribution of a statistic, most commonly of the mean. The term may also be used to refer to an estimate of that standard deviation, derived from a particular sample used to compute the estimate." So my real question is perhaps what is the statistic, or what are we trying to estimate, or what are we "computing the mean" of? At each tournament, we have many matches and can get a standard error for the predicted match results because we have many predicted match results and can compute the standard deviation of the distribution of the errors in the predictions. But each tournament only provides one single OPR estimate for each team. It's tough to compute a standard error on these OPR estimates based on this 1 sample because you only have the 1 data point. If OPRs are a value that we estimate for each team at each tournament and we expect them to stay the same from tournament to tournament (stop laughing now), we can compute the standard deviation in each team's independent OPR values across all of the tournaments in a season to get a standard error for those values. Then you could use the standard error to estimate the distribution of a team's OPR in future tournaments based on their previous tournament results. And I suppose you could also view this same standard error to estimate how much a team's OPR might vary if the same tournament was run again and we had a different set of random match outcomes. But I'm guessing that what you're really interested in is: if the same tournament were run multiple times and if the match results varied randomly as we modeled (yeah, yeah, and if everybody had a can opener), what would be the standard error of the OPR estimates? Or in other words, what if the same teams with the same robots and the same drivers played in 100 tournaments back-to-back and we computed the OPR for each team for all 100 tournaments, what would be the standard error for these 100 different OPR estimates? If this is the question you're interested in, then now we have a statistic that we can compute the standard error for. Let's look into this. Let's let the OPR vector for all of the teams be called O (t x 1 vector, where t is the # of teams). Let's let the match scores be called M (m x 1, where m is the number of scores or 2x the number of actual matches). So we're modeling the matches as: M = A O + N where A is an m x t matrix with the i,jth element equal to 1 if team j was a member of the alliance leading to the ith match score and 0 otherwise, and where N is an m x 1 noise vector with variance equal to the variance of the prediction residual for each match score. Let's call this variance sig^2. Given M, the least squares estimate for OPR is calculated as Oest = Inv(A' A) A' M = Inv(A' A) A' (A O + N) = O + Inv(A' A) A' N As N is zero-mean, Oest has mean O (which we want) and variance equal to the variance of the second term, Inv(A' A) A' N. Note that Inv(A' A) A' is a t x m matrix that is solely a function of the match schedule. The variance of the estimated OPR for the ith team is the variance of the ith element of Oest, which is sig^2 * (sum of the squared values of the elements in the ith row of Inv(A' A) A' ). This can be different for each team if the match schedule represented in A is unbalanced (e.g., if when a live OPR is being computed during a tournament, some teams have played more matches than others). I would hope for a complete tournament with a balanced match schedule that these variances would be equal or very nearly so. But it would be interesting to compute Inv (A' A) A' for a tournament and see if the sum of the squared values of each row are truly the same. Then finally the standard error for each estimate is just the standard deviation, or the square root of the variance we just computed. To summarize the whole thing: If a tournament has random match scores created by M = A O + N where N is zero mean and variance = sig^2, and if you estimate the underlying O values by computing Oest = Inv (A' A) A' M, then the ith team's OPR estimate which is the ith value of the Oest vector will have mean equal to the ith value of the O vector, will have variance = sig^2 * (sum of the squared values of the elements in the ith row of the matrix Inv(A' A) A'), and thus will have a "standard error" equal to the square root of this variance. To estimate this for a particular tournament, you first compute the OPR estimate O and compute sig^2 as the variance of the regression error in the predicted match results. Then you compute the matrix Inv(A' A) A' from the match schedule and then finally compute the standard errors as described. Too much for a Sunday morning. Thoughts? Last edited by wgardner : 17-05-2015 at 09:09. Reason: fixed minor error in derivation |
|
#32
|
||||
|
||||
|
Re: "standard error" of OPR values
And a follow up:
Take the above derivation, but let's pretend that each match score is only the result of 1 team's efforts, not 3. So in this case, each row of A would only have a single 1 in it, not 3. In this pretend case, the OPR IS exactly just computing the average of that team's match scores(!). A' A is diagonal and the diagonal elements are the number of matches that a team has played, so Inv (A' A) is diagonal with diagonal elements that are 1/ the number of matches that a team has played. Then the i,jth elements of Inv (A' A) A' are just 1/the number of matches a team has played if team i played in match j or 0 otherwise. The variance of the Oest values in this pretend case is the variance of the prediction residual / number of matches that a team has played, and thus the standard error of the Oest value is the standard error of the match predictions divided by the square root of the number of matches that a team has played. So this connects Oblarg's statements to the derivation. If match results were solely the result of one team's efforts, then the standard error of the OPR would just be the standard error of the match prediction / sqrt(n), where n is the number of matches that a team has played. But match results aren't solely the result of one team's efforts, so the previous derivation holds in the more complicated, real case. Last edited by wgardner : 17-05-2015 at 08:05. |
|
#33
|
|||
|
|||
|
Re: "standard error" of OPR values
Quote:
|
|
#34
|
||||
|
||||
|
Re: "standard error" of OPR values
That would be different, I think. N is match noise and an m x 1 vector. If I understand your equation correctly, O would be the OPR random variable with mean of the "actual" OPR and some variance, but O is t x 1 and not m x 1, so I don't think they're the same. And the noise that the regression is computing is truly the noise to be expected in each match outcome, not the noise in the OPR estimates themselves. Or am I misunderstanding what you're saying?
Last edited by wgardner : 17-05-2015 at 10:52. |
|
#35
|
|||
|
|||
|
Re: "standard error" of OPR values
Quote:
What I'm saying is that our model is that M = AO, where M and O are both vectors whose elements are random variables. Writing O as a vector of flat means and adding a noise vector N doesn't really gain you anything - in our underlying model, the *teams* have fundamental variances, not the matches. The match variances can be computed from the variances of each team's O variable. Now, we have the problem that we cannot directly measure the variance of each element of O, because the only residuals we can measure are total for each match (the elements of the "noise vector" N). However, we can do another linear least-squares fit to assign estimated variance values for each team, which I believe is precisely what your solution ends up doing. Last edited by Oblarg : 17-05-2015 at 11:17. |
|
#36
|
||||
|
||||
|
Re: "standard error" of OPR values
Quote:
Perhaps the truth of the matter is that there's variability in both: for example if a driver screws up or an autonomous run doesn't work quite perfectly, then I guess that's team-specific OPR variability, but if the litter gets thrown a particular random way that hinders performance or score, then that's match variability. But I guess the bottom line is that if we're in agreement on the algorithm and the results of the equations, then it probably doesn't matter much if we think about the underlying process differently. ![]() Last edited by wgardner : 17-05-2015 at 12:22. |
|
#37
|
||||
|
||||
|
Re: "standard error" of OPR values
One more thought on this:
I guess I wrote the equations the way I did because that's always the way that I've seen the linear regression derived in the first place. Namely, that the way you compute the OPR values is view the problem as M = A O + N then form the squared error as N' N = (M - A O)' (M - A O) then compute the derivative of N' N with respect to O and solve for O, which gives you Oest = Inv(A' A) A M. Is there a different way of expressing this derivation without resorting to a vector N of the errors that are being minimized? |
|
#38
|
||||
|
||||
|
Re: "standard error" of OPR values
Quote:
254 equations in 76 unknowns; (for the example I posted) system is overdetermined; there is no exact solution for O; A'AO=A'M; 76 equations in 76 unknowns; Exact solution O for this system will be the least squares solution for the original 254x76 system. |
|
#39
|
||||
|
||||
|
Re: "standard error" of OPR values
Guys, Can we all agree on the following? Computing OPR, as done here on CD, is a problem in multiple linear regression (one dependent variable and 2 or more independent variables). The dependent variable for each measurement is alliance final score in a qual match. Each qual match consists of 2 measurements (red alliance final score and blue alliance final score). If the game has any defense or coopertition, those two measurements are not independent of each other. For Archimedes, there were 127 qual matches, producing 254 measurements (alliance final scores). Let [b] be the column vector of those 254 measurements. For Archimedes, there were 76 teams, so there are 76 independent dichotomous variables (each having value 0 or 1). For each measurement, all the independent variables are 0 except for 3 of them which are 1. Let [A] be the 254 by 76 matrix whose ith row is a vector of the values of the independent variables for measurement i. Let [x] be the 76x1 column vector of model parameters. [x] is what we are trying to find. [A][x]=[b] is a set of 254 simultaneous equations in 76 variables. The variables in those 254 equations are the 76 (unknown) model parameters in [x]. We want to solve that system for [x]. Since there are more equations (254) than unknowns (76), the system is overdetermined, and there is no exact solution for [x]. Since there's no exact solution for [x], we use least squares to find the "best" solution1. The solution will be a 76x1 column vector of Team OPR. Let that solution be known as [OPR]. Citrus Dad wants to know "the standard error" of each element in [OPR]. Are we in agreement so far? If so, I will continue. 1Yes, I know there are other ways to define "best", but every OPR computation I've ever on CD uses least squares, so I infer that's what Citrus Dad had in mind. Last edited by Ether : 17-05-2015 at 15:11. |
|
#40
|
||||
|
||||
|
Re: "standard error" of OPR values
Quote:
For 1 tournament, you have a single estimate of each element of OPR. There is no standard error. If you have multiple tournaments, then you will have multiple estimates of the underlying OPR and can compute the standard error of these estimates. If you use the baseline model to create a hypothetical set of random tournaments as I described, then you can compute the standard error of these estimates from the hypothetical set of random tournaments. |
|
#41
|
||||
|
||||
|
Re: "standard error" of OPR values
Quote:
Quote:
|
|
#42
|
||||
|
||||
|
Re: "standard error" of OPR values
There is a standard error for the OPR estimate for a single tournament. That standard error tells you the probability range that your estimate falls within making some fundamental assumptions. The assumption about the normality distribution derives from the Central Limit Theorem. The OPR is essentially an estimate of the average point contribution across all of the matches in the tournament. The OPR itself assumes that in a perfect world the robot would contribute the same in each match which of course isn't true. The variation in the contribution in each match (which we don't always observe directly) is the source of the standard error.
|
|
#43
|
||||
|
||||
|
Re: "standard error" of OPR values
Quote:
Do you agree with this post? |
|
#44
|
||||
|
||||
|
Re: "standard error" of OPR values
Quote:
|
|
#45
|
||||
|
||||
|
Re: "standard error" of OPR values
Yes, and r_x in that post is exactly the N that I described.
|
![]() |
| Thread Tools | |
| Display Modes | Rate This Thread |
|
|