|
|
|
| Let's find ourselves a trailer and get hitched! |
![]() |
|
|||||||
|
||||||||
|
|
Thread Tools |
Rating:
|
Display Modes |
|
#12
|
||||
|
||||
|
Re: "standard error" of OPR values
Quote:
Wikipedia gives the following definition: "The standard error (SE) is the standard deviation of the sampling distribution of a statistic, most commonly of the mean. The term may also be used to refer to an estimate of that standard deviation, derived from a particular sample used to compute the estimate." So my real question is perhaps what is the statistic, or what are we trying to estimate, or what are we "computing the mean" of? At each tournament, we have many matches and can get a standard error for the predicted match results because we have many predicted match results and can compute the standard deviation of the distribution of the errors in the predictions. But each tournament only provides one single OPR estimate for each team. It's tough to compute a standard error on these OPR estimates based on this 1 sample because you only have the 1 data point. If OPRs are a value that we estimate for each team at each tournament and we expect them to stay the same from tournament to tournament (stop laughing now), we can compute the standard deviation in each team's independent OPR values across all of the tournaments in a season to get a standard error for those values. Then you could use the standard error to estimate the distribution of a team's OPR in future tournaments based on their previous tournament results. And I suppose you could also view this same standard error to estimate how much a team's OPR might vary if the same tournament was run again and we had a different set of random match outcomes. But I'm guessing that what you're really interested in is: if the same tournament were run multiple times and if the match results varied randomly as we modeled (yeah, yeah, and if everybody had a can opener), what would be the standard error of the OPR estimates? Or in other words, what if the same teams with the same robots and the same drivers played in 100 tournaments back-to-back and we computed the OPR for each team for all 100 tournaments, what would be the standard error for these 100 different OPR estimates? If this is the question you're interested in, then now we have a statistic that we can compute the standard error for. Let's look into this. Let's let the OPR vector for all of the teams be called O (t x 1 vector, where t is the # of teams). Let's let the match scores be called M (m x 1, where m is the number of scores or 2x the number of actual matches). So we're modeling the matches as: M = A O + N where A is an m x t matrix with the i,jth element equal to 1 if team j was a member of the alliance leading to the ith match score and 0 otherwise, and where N is an m x 1 noise vector with variance equal to the variance of the prediction residual for each match score. Let's call this variance sig^2. Given M, the least squares estimate for OPR is calculated as Oest = Inv(A' A) A' M = Inv(A' A) A' (A O + N) = O + Inv(A' A) A' N As N is zero-mean, Oest has mean O (which we want) and variance equal to the variance of the second term, Inv(A' A) A' N. Note that Inv(A' A) A' is a t x m matrix that is solely a function of the match schedule. The variance of the estimated OPR for the ith team is the variance of the ith element of Oest, which is sig^2 * (sum of the squared values of the elements in the ith row of Inv(A' A) A' ). This can be different for each team if the match schedule represented in A is unbalanced (e.g., if when a live OPR is being computed during a tournament, some teams have played more matches than others). I would hope for a complete tournament with a balanced match schedule that these variances would be equal or very nearly so. But it would be interesting to compute Inv (A' A) A' for a tournament and see if the sum of the squared values of each row are truly the same. Then finally the standard error for each estimate is just the standard deviation, or the square root of the variance we just computed. To summarize the whole thing: If a tournament has random match scores created by M = A O + N where N is zero mean and variance = sig^2, and if you estimate the underlying O values by computing Oest = Inv (A' A) A' M, then the ith team's OPR estimate which is the ith value of the Oest vector will have mean equal to the ith value of the O vector, will have variance = sig^2 * (sum of the squared values of the elements in the ith row of the matrix Inv(A' A) A'), and thus will have a "standard error" equal to the square root of this variance. To estimate this for a particular tournament, you first compute the OPR estimate O and compute sig^2 as the variance of the regression error in the predicted match results. Then you compute the matrix Inv(A' A) A' from the match schedule and then finally compute the standard errors as described. Too much for a Sunday morning. Thoughts? Last edited by wgardner : 17-05-2015 at 09:09. Reason: fixed minor error in derivation |
| Thread Tools | |
| Display Modes | Rate This Thread |
|
|