![]() |
"standard error" of OPR values
Quote:
Quote:
I'd like to hear what others have to say. Do you think the concept of standard error applies to the individual computed values of OPR, given the way OPR is computed and the data from which it is computed? Why or why not? If yes: explain how you would propose to compute the standard error for each OPR value, what assumptions would need to be made about the model and the data in order for said computed standard error values to be meaningful, and how the standard error values should be interpreted. |
Re: "standard error" of OPR values
Just to check that I understand this correctly: standard error is basically the standard deviation from the "correct" value, and you're asking if OPR values have this distribution from the "correct" value (i.e. the given OPR)?
Also, is OPR is calculated by taking t1+t2+t3 = redScore1, t4+t5+t6 = blueScore1, etc. and then solving that series of linear equations? I would guess it would depend on what you mean by OPR. I always assumed, perhaps incorrectly, that OPR was the solution of the above calculations, and thus it is just a number, neither correct nor incorrect. If OPR is meant to indicate the actual scoring ability, this would change. However I'm not sure how to figure out how many points a team contributes--if one team stacked 6 totes and another capped, does the first get 12 and the second 24, or each get 36, or something other combination? I suppose one way to do it would be to take the difference between a team's OPR and 1/3 of their alliance's points from each match they played in, and see the change in that difference. Comparing that between teams will be tricky, since the very top/bottom teams will have a greater difference than an average one. Similarly, looking at a team's OPR after X matches and 1/3 of the match X+1 score would be interesting but would also have that problem. (Or I could just be very confused about what OPR and standard error really are--I've tried to piece together what I've read here and on the internet but haven't formally taken linear algebra or statistics.) |
Re: "standard error" of OPR values
I'm not sure if there is a good clean method which produces some sort of statistical standard deviation or the such, although I would be happy to be proven wrong.
However, I believe that the following method should give a useful result: If you start out with the standard OPR calculations, with the matrix equation A * x = b, where x is a n x 1 matrix containing all the OPRs, A is the matrix describing which teams a given team has played with and b has the sum of the scores from the matches a team played in, then in order to compute a useful error value we would do the following: 1) Calculate the expected score from each match (using OPR), storing the result in a matrix exp, which is m x 1. Also, store all the actual scores in another m x 1 matrix, act. 2) Calculate the square of the error for each match, in the matrix err = (act - exp)^2 (using the squared notation to refer to squaring individual elements). You could also try taking the absolute value of each element, which would result in a similar distinction as that between the L1 and L2 norm. 3) Sum up the squared err for each match into the matrix errsum, which will replace the b from the original OPR calculation. 4) Solve for y in A * y = errsum (obviously, this would be over-determined, just like the original OPR calculation). In order to get things into the right units, you should then take the square root of every element of y and that will give a team's typical variance. This should give each team's typical contribution to the change in their match scores. added-in note: I'm not sure what statistical meaning the values generated by this method would have, but I do believe that they would have some useful meaning, unlike the values generated by just directly computing the total least-squared error of the original calculation (ie, (A*x - b)^2). If no one else does, I may implement this method just to see how it performs. |
Re: "standard error" of OPR values
Calculation of the standard error in OPR for each team sounds straightforward - the RMS of the residuals between the linear model and the match data for the matches in which a team participated. However, this number would probably not cast much if any light on the source of this scatter. One obvious source of scatter is the actual match-to-match performance variation of each team - puts up two stacks per match, but in that match, they set the stack on some litter and it knocked over the first. Another is non-linearity in the combined scoring (e.g. two good teams that perform very well when with mediocre partners, but run out of game pieces when allied, or a tote specialist allied with an RC specialist who do much better together than separately).
|
Re: "standard error" of OPR values
There are two types of error:
The first is the prediction residual which measures how well the OPR model is predicting match outcomes. In games where there is a lot of match-to-match variation, the prediction residual will be high no matter how many matches each team plays. The second is the error in measuring the actual, underlying OPR value (if you buy into the linear model). If teams actually had an underlying OPR value, then as teams play 10, 100, 1000 matches the error in computing this value will go to zero. So, the question is, what exactly are you trying to measure? If you want confidence in the underlying OPR values or perhaps the rankings produced by the OPR values, then the second error is the one you want to figure out and the prediction residual won't really answer that. If you want to know how well the OPR model will predict match outcomes, then the first error is the one you care about. |
Re: "standard error" of OPR values
Quote:
If one were to assume that this is actually the case, though, then one would just take the error from the first part and divide it by sqrt(n) to find the error in the estimation of the mean. |
Re: "standard error" of OPR values
I agree with most of what people have said so far. I would like to add my observations and opinions on this topic.
First of all, it is important to understand how OPR is calculated and what it means from a mathematical standpoint. Next it is important to understand all the reasons why OPR does not perfectly reflect what a team actually scores in a match. To put things in perspective, I would like to categorize all the reasons into two bins. Things that are beyond a team's control and things that reflects the actual "performance" of the team. I consider anything that is beyond a team's control as noise. This is something that will always be there. Some examples, as others have also pointed out, are bad call by refs, compatibility with partners' robots, non-linearity of scoring, accidents that is not due to carelessness, field fault not being recognized, robot failure that is not repeatable etc. The second bin will be things that truly reflects the "performance" of a team. This will measure what a team can potentially contribute to a match. This will take into account how consistent a team is. The variation here will include factors like how careful they are in not knocking stacks down, getting fouls, robot not functioning due to wiring that is avoidable. The problem is this measure is meaningful only if no teams are allowed to modify their robot between matches meaning the robot is in the exact same condition in every match. However in reality there are three scenarios. 1) The robot keeps getting better as teams worked out the kinks or tuned it better. 2) The robot keeps getting worse as things wear down quickly due to inappropriate choice of motors, bearings or the lack of, design or construction techniques were used. Performance can get worse also as some teams keep tinkering with their robot or programming without fully validating the change. 3) The robot stays the same. I understand what some people are trying to do. We want a measure of expected variability around each team's OPR numbers, some kind of a confidence band. If we have that information, then there will be a max and min prediction of the outcome of the score of each alliance. Mathematically, this can be done relatively easily. However the engineer in me tells me that it is a waste of time. Based on the noise factors I listed above and that the robot performance may change over time, this becomes just a mathematical exercise and does not have much contribution to the prediction of outcome of the next match. However I do support the publication of the R^2 coefficient of determination. It will give an overall number as to how well the actual outcome fits the statistical model. |
Re: "standard error" of OPR values
Quote:
|
Re: "standard error" of OPR values
I have to strongly agree with what Ed had to say above. Errors in OPR happen when its assumptions go unmet: partner or opponent interaction, team inconsistency (including improvement), etc. If one if these single factors caused significantly more variation than the others, then the standard error might be a reasonable estimate of that factor. However, I don't believe that this is the case.
Another option would be to take this measure in the same way that we take OPR. We know that OPR is not a perfect depiction of a team's robot quality or even a team's contribution to its alliance, but we use OPR anyway. In the same way, we know the standard error is an imperfect depiction of a team's variation in contribution. People constantly use the same example in discussing consistency in FRC. A low-seeded captain, when considering two similarly contributing teams, is generally better off selecting an inconsistent team over a consistent one. Standard error could be a reasonable measure of this inconsistency (whether due to simple variation or improvement). At a scouting meeting, higher standard error could indicate "teams to watch" (for improvement). But without having tried it, I suspect a team's standard error will ultimately be mostly unintelligible noise. |
Re: "standard error" of OPR values
Has anyone ever attempted a validation study to compare "actual contribution" (based on scouting data or a review of match video) to OPR values? It seems like this would be fairly easy and accurate for Recycle Rush (and very difficult for Aerial Assist). I did that with our performance at one district event and found the result to be very close (OPR=71 vs "Actual"= 74).
In some ways, OPR is probably more relevant than "actual contribution". For example, a good strategist in Aerial Assist could extract productivity from teams that might otherwise just drive around aimlessly. This sort of contribution would show up in OPR, but a scout wouldn't attribute it to them as an "actual contribution". It would be interesting to see if OPR error was the same (magnitude and direction) for low, medium, and high OPR teams, etc. |
Re: "standard error" of OPR values
Quote:
It's the second error term that I haven't seen reported. And in my experience working with econometric models, having only 10 observations likely leads to a very large standard error around this parameter estimate. I don't think that calculating this will change the OPR per se, but it will provide a useful measure of the (im)precision of the estimates that I don't think most students and mentors are aware of. Also, as you imply, a linear model may not be the most appropriate structure even though it is by far the easiest to compute with Excel. For example, the cap on resource availability probably creates a log-linear relationship. |
Re: "standard error" of OPR values
Quote:
Someone did a study for Archimedes this year. I would say it is similar to 2011 where 3 really impressive scorers would put up a really great score, but if you expected 3X, you would instead get more like 2.25 to 2.5.... |
Re: "standard error" of OPR values
Quote:
|
Re: "standard error" of OPR values
Quote:
|
Re: "standard error" of OPR values
Quote:
And regardless I think see the SEs lets us see if a team has a more variable performance than another. That's another piece of information that we can then use to explore it further. For example is the variability arising because parts keep breaking or is there an underlying improvement trend through the competition--either one would increase the SE compared to a steady performance rate. There's other tools for digging into that data, but we may not look unless we have that SE measure first. |
Re: "standard error" of OPR values
Kind of reminds me of a joke I heard this past weekend that was accidentally butchered:
A physicist, engineer and a statistician are out hunting. Suddenly, a deer appears 50 yards away. The physicist does some basic ballistic calculations, assuming a vacuum, lifts his rifle to a specific angle, and shoots. The bullet lands 5 yards short. The engineer adds a fudge factor for air resistance, lifts his rifle slightly higher, and shoots. The bullet lands 5 yards long. The statistician yells "We got him!" ************************************************** ******** A really interesting read into "what is important" from stats in basketball: http://www.nytimes.com/2009/02/15/ma...ewanted=1&_r=0 +/- system is probably the most similar "stat" to OPR utilized in basketball. It is figured a different way, but is a good way of estimating impact from a player vs. just using points/rebounds and.... The article does a really good job of doing some comparison to a metric like that to more typical event driven stats to actual impactful details of a particularly difficult to scout player. I really enjoy the line where it discusses trying to find undervalued mid pack players. Often with scouting, this is exactly what you too are trying to do. Rank the #16-#24 team at an event as accurately as possible in order to help foster your alliances best chance at advancing. If you enjoy this topic, enjoy the article, and have not read Moneyball, it is well worth the read. I enjoyed the movie, but the book is so much better about the details. |
Re: "standard error" of OPR values
Quote:
There's an equivalent economists' joke in which trying to feed a group on a desert island ends with "assume a can opener!":D ************************************************** ******** Quote:
In baseball, this use of statistics is called "sabremetrics." Bill James is the originator of this method. |
Re: "standard error" of OPR values
1 Attachment(s)
Getting back to the original question: Quote:
So for those of you who answered "yes": Pick an authoritative (within the field of statistics) definition for standard error, and compute that "standard error" for each Team's OPR for the attached example. |
Re: "standard error" of OPR values
Quote:
|
Re: "standard error" of OPR values
Here's a poor-man's approach to approximating the error of the OPR value calculation (as opposed to the prediction error aka regression error):
1. Collect all of a team's match results. 2. Compute the normal OPR. 3. Then, re-compute the OPR but excluding the result from the first match. 4. Repeat this process by removing the results from only the 2nd match, then only the 3rd, etc. This will give you a set of OPR values computed by excluding a single match. So for example, if a team played 6 matches, there would be the original OPR plus 6 additional "OPR-" values. 5. Compute the standard deviation of the set of OPR- values. This should give you some idea of how much variability a particular match contributes to the team's OPR. Note that this will even vary team-by-team. Thoughts? |
Re: "standard error" of OPR values
Quote:
The question is this thread is how (or if) a standard, textbook, widely-used, statistically valid "standard error" (as mention by Citrus Dad and quoted in the original post in this thread) can be computed for OPR from official FRC qual match results data unsupplemented by manual scouting data or any other data. |
Re: "standard error" of OPR values
Quote:
Code:
Team Original OPR Mean OPR Standard Deviation StdDev / MeanIn terms of whether this is a valid way of looking at it, I'm not sure--the results seem to have some meaning, but I'm not sure how much of it is just that only looking at 200 scores is even worse than just 254, or if there is something more meaningful going on. *Using python's random.sample() function. This means that I did nothing to prevent duplicate runs (which are extremely unlikely; 254 choose 200 is ~7.2 * 10^55) and nothing to ensure that a team didn't "play" <3 times in the selection of 200 scores. |
Re: "standard error" of OPR values
Quote:
The method I propose above gives a standard deviation measure on how much a single match changes a team's OPR. I would think this is something like what you want. If not, can you define what you're looking for more precisely? Also, rather than taking 200 of 254 matches and looking at the standard deviation of all OPRs, I suggest just removing a single match (e.g., compute OPR based on 253 of the 254 matches) and looking at how that removal affects only the OPRs of the teams involved in the removed match. So if you had 254 matches in a tournament, you'd compute 254 different sets of OPRs (1 for each possible match removal) and then look at the variability of the OPRs only for the teams involved in each specific removed match. This only uses the actual qualification match results, no scouting or other data as you want. |
Re: "standard error" of OPR values
And just to make sure I'm being clear (because I fear that I may not be):
Let's say that team 1234 played in a tournament and was involved in matches 5, 16, 28, 39, 51, and 70. You compute team 1234's OPR using all matches except match 5. Say it's 55. Then you compute team 1234's OPR using all matches except match 16. Say it's 60. Keep repeating this, removing each of that team's matches, which will give you 6 different OPR numbers. Let's say that they're 55, 60, 50, 44, 61, and 53. Then you can compute the standard deviation of those 6 numbers to give you a confidence on what team 1234's OPR is. Of course, you can do this for every team in the tournament and get team-specific OPR standard deviations and an overall tournament OPR standard deviation. Team 1234 may have a large standard deviation (because maybe 1/3 of the time they always knock over a stack in the last second) while team 5678 may have a small standard deviation (because they always contribute the exactly same point value to their alliance's final score). And hopefully the standard deviations will be lower in tournaments with more matches per team because you have more data points to average. |
Re: "standard error" of OPR values
Quote:
I am asking you (or anyone who cares to weigh in) to pick a definition from an authoritative source and use that definition to compute said standard errors of the OPRs (or state why not): Quote:
Quote:
|
Re: "standard error" of OPR values
Quote:
Citrus Dad asked why no-one ever reports "the" standard error for the OPRs. "Standard Error" is a concept within the field of statistics. There are several well-defined meanings depending on the context. So what am trying to do is this: have a discussion about what "the" standard error might mean in the context of OPR. |
Re: "standard error" of OPR values
Quote:
|
Re: "standard error" of OPR values
@ Citrus Dad: If you are reading this thread, would you please weigh in here and reveal what you mean by "the standard errors" of the OPRs, and how you would compute them, using only the data in the example I posted? Also, what assumptions do you have to make about the data and the model in order for the computed standard errors to be statistically valid/relevant/meaningful, and what is the statistical meaning of those computed errors? |
Re: "standard error" of OPR values
Quote:
OPR should then yield an estimate of the mean of this distribution. An estimate of the standard deviation can be obtained, as mentioned, by taking the RMS of the residuals. To approximate the standard deviation of the mean (which is what is usually meant by "standard error" of these sorts of measurements), one would then divide this by sqrt(n) (for those interested in a proof of this, simply consider the fact that when summing random variables, variances add), where n is the number of matches used in the team's OPR calculation. This, of course, fails if the assumptions we made at the outset aren't good (e.g. OPR is not a good model of team performance). Moreover, even if the assumptions hold, if the distribution of the random variable describing a team's performance in a given match is sufficiently wonky that the distribution of the mean is not particularly Gaussian then one is fairly limited in the conclusions they can draw from the standard deviation, anyway. |
Re: "standard error" of OPR values
Quote:
What you say holds if one is taking a number of independent, noisy measurements of a value and computing the mean of the measurements as the estimate of the underlying value. So that would work if OPR was computed by simply averaging the match scores for a team (and dividing by 3 to accommodate for 1/3 of the match score being due to each team's contribution). But that's not the way OPR is computed at all. It's computed using linear regressions and all of the OPRs for all of the teams are computed simultaneously in one big matrix operation. For example, it isn't clear to me what n should be. You say "n is the number of matches used in the team's OPR calculation." But all OPRs are computed at the same time using all of the available match data. Does n count matches that a team didn't play in, but that are still used in the computation? Is n the number of matches a team has played? Or the total matches? OPR can be computed based on whatever matches have already occurred at any time. So if some teams have played 4 matches and some have played 5, it would seem like the OPRs for the teams that have played fewer matches should have more uncertainty than the OPRs for the teams that have played more. And the fact that the computation is all intertwined and that the OPRs for different teams are not independent (e.g., if one alliance has a huge score in one match, that affects 3 OPRs directly and the rest of them indirectly through the computation) seems to make the standard assumptions and arguments suspect. Thoughts? |
Re: "standard error" of OPR values
Quote:
Wikipedia gives the following definition: "The standard error (SE) is the standard deviation of the sampling distribution of a statistic, most commonly of the mean. The term may also be used to refer to an estimate of that standard deviation, derived from a particular sample used to compute the estimate." So my real question is perhaps what is the statistic, or what are we trying to estimate, or what are we "computing the mean" of? At each tournament, we have many matches and can get a standard error for the predicted match results because we have many predicted match results and can compute the standard deviation of the distribution of the errors in the predictions. But each tournament only provides one single OPR estimate for each team. It's tough to compute a standard error on these OPR estimates based on this 1 sample because you only have the 1 data point. If OPRs are a value that we estimate for each team at each tournament and we expect them to stay the same from tournament to tournament (stop laughing now), we can compute the standard deviation in each team's independent OPR values across all of the tournaments in a season to get a standard error for those values. Then you could use the standard error to estimate the distribution of a team's OPR in future tournaments based on their previous tournament results. And I suppose you could also view this same standard error to estimate how much a team's OPR might vary if the same tournament was run again and we had a different set of random match outcomes. But I'm guessing that what you're really interested in is: if the same tournament were run multiple times and if the match results varied randomly as we modeled (yeah, yeah, and if everybody had a can opener), what would be the standard error of the OPR estimates? Or in other words, what if the same teams with the same robots and the same drivers played in 100 tournaments back-to-back and we computed the OPR for each team for all 100 tournaments, what would be the standard error for these 100 different OPR estimates? If this is the question you're interested in, then now we have a statistic that we can compute the standard error for. Let's look into this. Let's let the OPR vector for all of the teams be called O (t x 1 vector, where t is the # of teams). Let's let the match scores be called M (m x 1, where m is the number of scores or 2x the number of actual matches). So we're modeling the matches as: M = A O + N where A is an m x t matrix with the i,jth element equal to 1 if team j was a member of the alliance leading to the ith match score and 0 otherwise, and where N is an m x 1 noise vector with variance equal to the variance of the prediction residual for each match score. Let's call this variance sig^2. Given M, the least squares estimate for OPR is calculated as Oest = Inv(A' A) A' M = Inv(A' A) A' (A O + N) = O + Inv(A' A) A' N As N is zero-mean, Oest has mean O (which we want) and variance equal to the variance of the second term, Inv(A' A) A' N. Note that Inv(A' A) A' is a t x m matrix that is solely a function of the match schedule. The variance of the estimated OPR for the ith team is the variance of the ith element of Oest, which is sig^2 * (sum of the squared values of the elements in the ith row of Inv(A' A) A' ). This can be different for each team if the match schedule represented in A is unbalanced (e.g., if when a live OPR is being computed during a tournament, some teams have played more matches than others). I would hope for a complete tournament with a balanced match schedule that these variances would be equal or very nearly so. But it would be interesting to compute Inv (A' A) A' for a tournament and see if the sum of the squared values of each row are truly the same. Then finally the standard error for each estimate is just the standard deviation, or the square root of the variance we just computed. To summarize the whole thing: If a tournament has random match scores created by M = A O + N where N is zero mean and variance = sig^2, and if you estimate the underlying O values by computing Oest = Inv (A' A) A' M, then the ith team's OPR estimate which is the ith value of the Oest vector will have mean equal to the ith value of the O vector, will have variance = sig^2 * (sum of the squared values of the elements in the ith row of the matrix Inv(A' A) A'), and thus will have a "standard error" equal to the square root of this variance. To estimate this for a particular tournament, you first compute the OPR estimate O and compute sig^2 as the variance of the regression error in the predicted match results. Then you compute the matrix Inv(A' A) A' from the match schedule and then finally compute the standard errors as described. Too much for a Sunday morning. Thoughts? |
Re: "standard error" of OPR values
And a follow up:
Take the above derivation, but let's pretend that each match score is only the result of 1 team's efforts, not 3. So in this case, each row of A would only have a single 1 in it, not 3. In this pretend case, the OPR IS exactly just computing the average of that team's match scores(!). A' A is diagonal and the diagonal elements are the number of matches that a team has played, so Inv (A' A) is diagonal with diagonal elements that are 1/ the number of matches that a team has played. Then the i,jth elements of Inv (A' A) A' are just 1/the number of matches a team has played if team i played in match j or 0 otherwise. The variance of the Oest values in this pretend case is the variance of the prediction residual / number of matches that a team has played, and thus the standard error of the Oest value is the standard error of the match predictions divided by the square root of the number of matches that a team has played. So this connects Oblarg's statements to the derivation. If match results were solely the result of one team's efforts, then the standard error of the OPR would just be the standard error of the match prediction / sqrt(n), where n is the number of matches that a team has played. But match results aren't solely the result of one team's efforts, so the previous derivation holds in the more complicated, real case. |
Re: "standard error" of OPR values
Quote:
|
Re: "standard error" of OPR values
Quote:
|
Re: "standard error" of OPR values
Quote:
What I'm saying is that our model is that M = AO, where M and O are both vectors whose elements are random variables. Writing O as a vector of flat means and adding a noise vector N doesn't really gain you anything - in our underlying model, the *teams* have fundamental variances, not the matches. The match variances can be computed from the variances of each team's O variable. Now, we have the problem that we cannot directly measure the variance of each element of O, because the only residuals we can measure are total for each match (the elements of the "noise vector" N). However, we can do another linear least-squares fit to assign estimated variance values for each team, which I believe is precisely what your solution ends up doing. |
Re: "standard error" of OPR values
Quote:
Perhaps the truth of the matter is that there's variability in both: for example if a driver screws up or an autonomous run doesn't work quite perfectly, then I guess that's team-specific OPR variability, but if the litter gets thrown a particular random way that hinders performance or score, then that's match variability. But I guess the bottom line is that if we're in agreement on the algorithm and the results of the equations, then it probably doesn't matter much if we think about the underlying process differently. :) |
Re: "standard error" of OPR values
One more thought on this:
I guess I wrote the equations the way I did because that's always the way that I've seen the linear regression derived in the first place. Namely, that the way you compute the OPR values is view the problem as M = A O + N then form the squared error as N' N = (M - A O)' (M - A O) then compute the derivative of N' N with respect to O and solve for O, which gives you Oest = Inv(A' A) A M. Is there a different way of expressing this derivation without resorting to a vector N of the errors that are being minimized? |
Re: "standard error" of OPR values
Quote:
254 equations in 76 unknowns; (for the example I posted) system is overdetermined; there is no exact solution for O; A'AO=A'M; 76 equations in 76 unknowns; Exact solution O for this system will be the least squares solution for the original 254x76 system. |
Re: "standard error" of OPR values
Guys, Can we all agree on the following? Computing OPR, as done here on CD, is a problem in multiple linear regression (one dependent variable and 2 or more independent variables). The dependent variable for each measurement is alliance final score in a qual match. Each qual match consists of 2 measurements (red alliance final score and blue alliance final score). If the game has any defense or coopertition, those two measurements are not independent of each other. For Archimedes, there were 127 qual matches, producing 254 measurements (alliance final scores). Let [b] be the column vector of those 254 measurements. For Archimedes, there were 76 teams, so there are 76 independent dichotomous variables (each having value 0 or 1). For each measurement, all the independent variables are 0 except for 3 of them which are 1. Let [A] be the 254 by 76 matrix whose ith row is a vector of the values of the independent variables for measurement i. Let [x] be the 76x1 column vector of model parameters. [x] is what we are trying to find. [A][x]=[b] is a set of 254 simultaneous equations in 76 variables. The variables in those 254 equations are the 76 (unknown) model parameters in [x]. We want to solve that system for [x]. Since there are more equations (254) than unknowns (76), the system is overdetermined, and there is no exact solution for [x]. Since there's no exact solution for [x], we use least squares to find the "best" solution1. The solution will be a 76x1 column vector of Team OPR. Let that solution be known as [OPR]. Citrus Dad wants to know "the standard error" of each element in [OPR]. Are we in agreement so far? If so, I will continue. 1Yes, I know there are other ways to define "best", but every OPR computation I've ever on CD uses least squares, so I infer that's what Citrus Dad had in mind. |
Re: "standard error" of OPR values
Quote:
For 1 tournament, you have a single estimate of each element of OPR. There is no standard error. If you have multiple tournaments, then you will have multiple estimates of the underlying OPR and can compute the standard error of these estimates. If you use the baseline model to create a hypothetical set of random tournaments as I described, then you can compute the standard error of these estimates from the hypothetical set of random tournaments. |
Re: "standard error" of OPR values
Quote:
Quote:
|
Re: "standard error" of OPR values
Quote:
|
Re: "standard error" of OPR values
1 Attachment(s)
Quote:
Do you agree with this post? |
Re: "standard error" of OPR values
Quote:
|
Re: "standard error" of OPR values
Quote:
|
Re: "standard error" of OPR values
Quote:
We use OPR to estimate the score of an alliance in a match. Or to be even more precise, we compute the OPR values as the ones that result in the best linear prediction of the match results. If we have an alliance run the same match over and over, we will see a variability in the match results and a variability in the prediction error we get when we subtract the actual match results from the OPR-based prediction. We can compute the standard error of this prediction error. This SE tells us the probability range that we would expect the match result to fall in, but doesn't tell us anything about the range that we would expect OPR estimates to fall in over a full tournament. I'm confused by this sentence: "So if we run the same match over and over, we would expect to see a similar OPR." ??? |
Re: "standard error" of OPR values
Quote:
|
Re: "standard error" of OPR values
Quote:
Can we agree that, if we have multiple measurements of a value and make fundamental assumptions that these multiple measurements are representative of the underlying distribution, then we can model the underlying distribution and look at the standard error of the estimates assuming they are computed from the underlying distribution? If you're willing to accept this, then I humbly suggest that my long derivation from this morning is what you're looking for. One topic of confusion is this statement: "The OPR is essentially an estimate of the average point contribution across all of the matches in the tournament." I will agree that if you truly computed the average contribution across all matches by just computing the average of the match results, then you could simply compute the standard error because you have multiple estimates of the average: the individual match results. But in fact OPR is not computed by averaging the match results. It's a single, simultaneous joint optimization of ALL OPRs for ALL teams at the same time. That's why, for example, if a new match is played and new match results are provided, ALL of the OPRs change, not just the OPRs for the teams in that match. We don't actually have a bunch of OPR estimates that we just average together to compute our final estimate. That's the rub. If we did, we could compute the standard error of these separate estimates. But in fact, we don't have them: only the single estimate computed from the whole tournament. |
Re: "standard error" of OPR values
I know we're beating a dead horse here. Here's my attempt at trying to create a simpler example of my confusion.
Let's say we have a single bag of 10 apples. We compute the average weight of the 10 apples. Let's say we want to know the confidence we have in the estimate of the average weight of the apples. I claim there are a few ways to do this. Ideally we'd get some more bags of apples, compute the average weights of the apples in each bag, and compute the standard error of these different measurements. Or, we could compute the average weight of the 10 apples we know, compute the standard deviation of the weights of the 10 apples, then assume that the true distribution of the weights of all apples has this average weight and standard deviation. If we buy into this assumed distribution, then we can look at the standard error of all estimates of the average weight of 10 apples as if each set of 10 apples was actually taken from this modeled distribution. Does this make sense? Are there other ways y'all are thinking about this? To relate this to OPRs, I'd claim that the "get lots of bags of apples" approach is like the "get lots of results from different tournaments and see what the standard error in those OPR estimates is". I'd claim that the "model the underlying distribution and then look at how the estimates will vary if the data is truly taken from this distribution" approach is like what I derived this morning. |
Re: "standard error" of OPR values
Quote:
"Citrus Dad wants to know "the standard error" of each element in [OPR]" ? |
Re: "standard error" of OPR values
Quote:
I think that would help greatly to clear the confusion about what you mean. |
Re: "standard error" of OPR values
Quote:
|
Re: "standard error" of OPR values
Quote:
I do compute OPRs and match predictions in Java in my android apps available on Google play (Watch FTC Tournament and FTC Online) but it may take me a few days to find the time to either translate the new equations from this morning's derivation into the Java or to bring the code up in Octave, Scilab, or something similar as I haven't had to do that for a while. |
Re: "standard error" of OPR values
Quote:
Quote:
I am hoping he will do the specific computation for the example I posted; I think that will make things clear. |
Re: "standard error" of OPR values
Quote:
|
Re: "standard error" of OPR values
Quote:
Your proof starts with N, squares it, finds the least-squares minimum, and shows that it's the solution to the normal equations. The proof I posted starts with the solution to the normal equations, and shows that it minimizes (in the least-squares sense) the residuals of the associated overdetermined system. |
Re: "standard error" of OPR values
Quote:
Quote:
Quote:
|
Re: "standard error" of OPR values
Quote:
Quote:
|
Re: "standard error" of OPR values
Quote:
Quote:
|
Re: "standard error" of OPR values
1 Attachment(s)
Scilab code is in the attachment.
Note that there is a very real chance that there's a bug in the code, so please check it over before you trust anything I say below. :) ---------------------- Findings: stdev(M)=47 (var = 2209) Match scores have a standard deviation of 47 points. stdev(M - A O)=32.5 (var = 1060) OPR prediction residuals have a standard deviation of about 32 points. So OPR linear prediction can account for about 1/2 the variance in match outcomes (1-1060/2209). What is the standard error for the OPR estimates (assuming the modeled distribution is valid) after the full tournament? about 11.4 per team. Some teams have a bit more or a bit less, but the standard deviation of this was only 0.1 so all teams were pretty close to 11.4. To be as clear as I can about this: This says that if we compute the OPRs based on the full data set, compute the match prediction residuals based on the full data set, then run lots of different tournaments with match results generated by adding the OPRs for the teams in the match and random match noise with the same match noise variance, and then compute the OPR estimates for all of these different randomly generated tournaments, we would expect to see the OPR estimates themselves have a standard deviation around 11.4. If you choose to accept these assumptions, you might be willing to say that the OPR estimates have a 1 std-deviation confidence of +/- 11.4 points. How does the standard error of the OPR (assuming the modeled distribution is valid) decrease as the number of matches increases? I ran simulations through only the first 3 full matches per team up to 10 full matches per team, or with match totals of: 76, 102, 128, 152, 178, 204, 228, 254 sig^2 (the variance of the per-match residual prediction error) from 3 matches per team to 10 matches per team was 0.0, 19.7, 26.1, 29.3, 30.2, 30.8, 32.5, 32.5 (With only 3 matches played per team, the "least squares" solution can perfectly fit the data as we only have 76 unknowns and 76 parameters. With 4 or 5 matches per team, the model is still a bit "overfit" as we have 102 or 128 unknowns being predicted by 76 parameters.) mean (StdErr of OPR) from 3 matches per team to 10 is 0.0, 16.5, 16.3, 15.2, 13.6, 12.8, 12.2, 11.4 (so the uncertainty in the OPR estimates decreases as the number of matches increases, as expected) stdev (StdErr of OPR) from 3 matches per team to 10 is 0.0, 1.3, 0.6, 0.4, 0.3, 0.2, 0.1, 0.1 (so there isn't much variability in team-to-team uncertainty in the OPR measurements, though the uncertainty does drop as the number of matches increases. There could be more variability if we only ran a number of matches where, say, 1/2 the teams played 5 matches and 1/2 played 4?) And for the record, sqrt(sig^2/matchesPerTeam) was 0.0, 9.9, 12.0, 12.0, 11.4, 11.1, 10.8, 10.3 (compare this with "mean (StdErr of OPR)" above. As the number of matches per team grows, the OPR will eventually approach the simple average match score per team/3 and then these two values should approach each other. They're in the same general range but still apart by 1.1 (or 11.4 - 10.3) with 10 matches played per team.) |
Re: "standard error" of OPR values
Quote:
I could be wrong, but I doubt this is what Citrus Dad had in mind. Can we all agree that 0.1 is real-world meaningless? There is without a doubt far more variation in consistency of performance from team to team. Manual scouting data would surely confirm this. @ Citrus Dad: you wrote: Quote:
|
Re: "standard error" of OPR values
Quote:
|
Re: "standard error" of OPR values
Quote:
For example, this means that if a team had, say, an OPR of 50, that if they were in another identical tournament with the same matches and randomness in the match results, that the OPR computed from that tournament would probably be between 39 and 61 (if you're being picky, 68% of the time the score would lie in this range if the data is sufficiently normal or Gaussian). So picking a team for your alliance that has an OPR of 55 over a different team that has an OPR of 52 is silly. But picking a team that has an OPR of 80 over a team that has an OPR of 52 is probably a safe bet. :) In response to the latest post, this could be run on any other tournament for which the data is present. Ether made this particularly easy to do by providing the A match matrix and the vector of match results in nice csv files. BTW, the code is attached and scilab is free, so anybody can do this for whatever data they happen to have on hand. |
Re: "standard error" of OPR values
Quote:
http://www.chiefdelphi.com/media/papers/3132 |
Re: "standard error" of OPR values
Quote:
Here are the results for the Waterloo tournament: mpt = matches per team (so the last row is for the whole tournament and earlier rows are for the tournament through 4 matches per team, through 5, etc.) varM = variance of the match scores stdevM = standard deviation of the match scores varR and stdevR are the same for the match prediction residual so varR/varM is the fraction of the match variance that can't be predicted by the OPR linear prediction model. /sqrt(mpt) = the standard deviation of the OPRs we would have if we were simply averaging a teams match score to estimate their OPR, which is just stdevR/sqrt(mpt) StdErrO = the standard error of the OPRs using my complicated model derivation. stdevO = the standard deviation of the StdErrO values taken across all teams, which is big if some teams have more standard error on their OPR values than other teams do. Code:
mpt varM stdevM varR stdevR /sqrt(mpt) StdErrO stdevOCode:
mpt varM stdevM varR stdevR /sqrt(mpt) StdErrO stdevOThe OPR seems to do a much better job of predicting the match results in the Waterloo tournament (removing 80% of the match variance vs. 50% in Archmedes), and the standard deviation of the OPR estimates themselves is less (7.42 in Waterloo vs. 11.37 in Archimedes). |
Re: "standard error" of OPR values
Quote:
To be honest setting up a pooled time series with this data would take me more time than I have at the moment. I've thought about it and maybe it will be a summer project (maybe my son Jake (themccannman) can do it!) Note that the 1 SD SE of 11.5 is the 68% confidence interval. For 10 or so observations, the 95% confidence interval is about 2 SD or about 23.0. The t-statistic is the relevant tool for finding the confidence interval metric. |
Re: "standard error" of OPR values
Quote:
From this and other prior statements, I had the very strong impression you were seeking a separate error estimate for each team's OPR. Such estimates would certainly not be virtually identical for every team! It would be very helpful if you would please provide more information about statistical software packages you know that provide "parameter standard errors". I couldn't find any that could provide such estimates for the multiple-regression model we are talking about for OPR computation using FRC-provided match score data. I suspect that's because it's simply not possible to get such estimates for that model and data. |
Re: "standard error" of OPR values
Quote:
Note that this is computing the confidence of each OPR estimate for each team. This is different from trying to compute the variance of score contribution from match to match for each team, which is a very different (and also very interesting) question. I think it would be reasonable to hypothesize that the variance of score contribution for each team might vary from team to team, possibly substantially. For example, it might be interesting to know that team A scores 50 points +/- 10 points with 68% confidence but team B scores 50 points +/- 40 points with 68% confidence. At the very least, if you saw that one team had a particularly large score variance, it might make you investigate this robot and see what the underlying root cause was (maybe 50% of the time they have an awesome autonomous but 50% of the time it completely messes up, for example). Hmmm.... |
Re: "standard error" of OPR values
Quote:
Manual scouting data would surely confirm this. Consider the following thought experiment. Team A gets actual scores of 40,40,40,40,40,40,40,40,40,40 in each of its 10 qual matches. Team B gets actual scores of 0,76,13,69,27,23,16,88,55,33 The simulation you described assigns virtually the same standard error to their OPR values. If what is being sought is a metric which is somehow correlated to the real-world trustworthiness of the OPR for each individual team (I thought that's what Citrus Dad was seeking), then the standard error coming out of the simulation is not that metric. My guess is that the 0.1 number is just measuring how well your random number generator is conforming to the sample distribution you requested. |
Re: "standard error" of OPR values
Quote:
Your model certainly might be valid, and my derivation explicitly does not deal with this case. The derivation is for a model where OPRs are computed, then multiple tournaments are generated using those OPRs and adding the same amount of noise to each match, and then seeing what the standard error of the resulting OPR estimates is across these multiple tournaments. If you know that the variances for each team's score contribution are different, then the model fails. For that matter, the least squares solution for computing the OPRs in the first place is also a failed model in this case. If you knew the variances of the teams' contributions, then you should use weighted-least-squares to get a better estimate of the OPRs. I wonder if some iterative approach might work: First compute OPRs assuming all teams have equal variance of contribution, then estimate the actual variances of contributions for each team, then recompute the OPRs via weighted-least-squares taking this into account, then repeat the variance estimates, etc., etc., etc. Would it converge? [Edit: 2nd part of post, added here a day later] http://en.wikipedia.org/wiki/Generalized_least_squares OPRs are computed with an ordinary-least-squares (OLS) analysis. If we knew ahead of time the variances we expected for each team's scoring contribution, we could use weighted-least-squares (WLS) to get a better estimate of the OPRs. The link also describes something like I was suggesting above, called "Feasible generalized least squares (FGLS)". In FGLS, you use OLS to get your initial OPRs, then estimate the variances, then compute WLS to improve the OPR estimate. It discusses iterating this approach also. But, the link also includes this comment: "For finite samples, FGLS may be even less efficient than OLS in some cases. Thus, while (FGLS) can be made feasible, it is not always wise to apply this method when the sample is small." If we have 254 match results and we're trying to estimate 76 OPRs and 76 OPRvariances (152 parameters total), we have a pretty small sample size. So this approach would probably suffer from too small of a sample size. |
Re: "standard error" of OPR values
See also this link:
http://en.wikipedia.org/wiki/Heteroscedasticity "In statistics, a collection of random variables is heteroscedastic if there are sub-populations that have different variabilities from others. Here "variability" could be quantified by the variance or any other measure of statistical dispersion." And see particularly the "Consequences" section which says, "Heteroscedasticity does not cause ordinary least squares coefficient estimates to be biased, although it can cause ordinary least squares estimates of the variance (and, thus, standard errors) of the coefficients to be biased, possibly above or below the true or population variance. Thus, regression analysis using heteroscedastic data will still provide an unbiased estimate for the relationship between the predictor variable and the outcome, but standard errors and therefore inferences obtained from data analysis are suspect. Biased standard errors lead to biased inference..." |
Re: "standard error" of OPR values
Quote:
|
Re: "standard error" of OPR values
Quote:
|
Re: "standard error" of OPR values
Quote:
|
Re: "standard error" of OPR values
Quote:
|
Re: "standard error" of OPR values
hi all,
as a student going into his first year of undergrad this fall, this kind of stuff interests me. what level (or course equivalent or experience of the student) is this kind of stuff typically taught at? I have researched into interpolation, as I would like to spend some time developing spline path generation for auton modes independently, and that particular area requires a bit of knowledge in Linear Algebra, which I will begin the process of self-teaching soon enough. As for this, what would be the equivalent of interpolation:linear algebra? I don't mean to hijack the thread, but it feels like the most appropriate place to ask... |
Re: "standard error" of OPR values
Quote:
If you are asking for individual standard error associated with each OPR value, no one ever posts them because the official FRC match data doesn't contain enough information to make a meaningful computation of those individual values. In a situation, unlike FRC OPR, where you know the variance of each observed value (either by repeated observations using the same values for the predictor variables, or if you are measuring something with an instrument of known accuracy) you can put those variances into the design matrix for each observation and compute a meaningful standard error for each of the model parameters. Or if, unlike FRC OPR, you have good reason to believe the observations are homoscedastic, you can compute the variance of the residuals and use that to back-calculate standard errors for the model parameters. If you do this for FRC data the result will be standard errors which are very nearly the same for each OPR value... which is clearly not the expected result. |
Re: "standard error" of OPR values
Quote:
|
Re: "standard error" of OPR values
I think you missed my point entirely. Quote:
Quote:
Quote:
Quote:
Quote:
|
Re: "standard error" of OPR values
So it's not possible to perform a statistically valid calculation for standard deviation? Are there no ways to solve for it with a system that is dependent on other robots' performances?
|
Re: "standard error" of OPR values
Quote:
Quote:
As it turns out, I was recently asked for the average time it takes members of my branch to produce environmental support products. Because we get requests that range from a 10 mile square box on one day to seasonal variability for a whole ocean basin, the (requested) mean production time means nothing. For one class of product, the standard deviation of production times was greater than the mean. Without the scatter info, the reader would have probably assumed that we were making essentially identical widgets and that the scatter was +/- 1 or 2 in the last reported digit. |
Re: "standard error" of OPR values
Quote:
Standard error of the model parameters is a very useful statistic in those cases where it applies. I mentioned one such situation in my previous post: Quote:
In such as case, computing standard error of the model parameters is justified, and the results are meaningful. All modern land surveying measurement adjustment apps include it in their reports. Quote:
I briefly addressed this in my previous post: Quote:
In fact, when you use the above technique for OPR you are essentially assuming that all teams are identical in their consistency of scoring, so it's not surprising that when you put that assumption into the calculation you get it back out in the results. GIGO. Posting invalid and misleading statistics is a bad idea, especially when there are better, more meaningful statistics to fill the role. For Richard and Gus: If all you are looking for is one overall ballpark number "how bad are the OPR calculations for this event" let's explore better ways to present that. |
Re: "standard error" of OPR values
Quote:
I just discussed this problem as a major failing for engineers in general--if they are not fully comfortable in reporting a parameter, e.g., a measure of uncertainty, they often will simply ignore the parameter entirely. (I was discussing how the value of solar PV is being estimated across a dozen studies. I've seen this tendency over and over in almost 30 years of professional work.) Instead, the appropriate method ALWAYS, ALWAYS, ALWAYS is to report the uncertain or unknown parameter with some sort of estimate and all sorts of caveats. Instead what happens is that decisionmakers and stakeholders much too often accept the values given as having much greater precision than they actually have. While calculating the OPR really is of no true consequence, because we are working with high school students who are very likely to be engineers, it is imperative that they understand and use the correct method of presenting their results. So, the SEs should be reported as the best available approximation of the error term around the OPR estimates. And the caveats about the properties of the distribution can be reported with a discussion about the likely biases in the parameters due to the probability distributions. |
Re: "standard error" of OPR values
Quote:
Quote:
Quote:
Quote:
Quote:
Quote:
Quote:
Let's explore alternative ways to demonstrate the shortcomings of the OPR values. Quote:
As I've suggested in my previous two posts, how about let's explore alternative, valid ways to demonstrate the shortcomings of the OPR values. One place to start might be to ask whether or not the average value of the vector of standard errors of OPRs might be meaningful, and if so, what exactly it means. |
Re: "standard error" of OPR values
Ether
I wasn't quite sure why you dug up my original post to start this discussion. It seemed out of context with all of your other discussion about adding error estimates. That said, my request was more general, and it seems to be answered more generally by the other computational efforts that have been going on in the 2 related threads. But one point, I will say that using a fixed effects models with a separate match progression parameter (to capture the most likely source of heteroskedasticity) should lead to parameter estimates that will provide valid error terms using FRC data. But computing fixed effects models are much more complex processes. It is something that can be done in R. Quote:
|
Re: "standard error" of OPR values
Quote:
Quote:
Quote:
The usefulness of the fitted model can, however, be assessed without using said statistics. Quote:
Quote:
Quote:
Quote:
Given [A][x]=[b], the following computation produces the same values as those packages: x = A\b;The above code clearly shows that this computation is assuming that the standard deviation is constant for all measurements (alliance scores) and thus for all teams... which we know is clearly not the case. That's one reason it produces meaningless results in the case of FRC match results data. Quote:
|
Re: "standard error" of OPR values
One definition of statistical validity:
https://explorable.com/statistical-validity Statistical validity refers to whether a statistical study is able to draw conclusions that are in agreement with statistical and scientific laws. This means if a conclusion is drawn from a given data set after experimentation, it is said to be scientifically valid if the conclusion drawn from the experiment is scientific and relies on mathematical and statistical laws. Quote:
Here's a discussion for fixed effects from the SAS manual: http://www.sas.com/storefront/aux/en...48_excerpt.pdf |
Re: "standard error" of OPR values
Quote:
Quote:
Finally, if standard errors could be validly produced for each team as a measure of its consistency/reliability, that would be outstanding. Given that teams change strategy and modify robots between matches, (and this year's nonlinear scoring), it is not surprising that per-team standard error calculations are not valid. (And by the way, Ether's finding that the numbers could be calculated but did not communicate variability is at least qualitatively similar to Richard's argument concerning OPR.) This does not negate the need for a "standard error" or "probable error" of the whole data set. OPR is ultimately a measurement, and anyone using OPR to drive a decision needs to understand the accuracy. That is, does a difference of 5 points in OPR means that one team is better than the other with 10% confidence, 50% confidence, or 90% confidence? |
Re: "standard error" of OPR values
Quote:
Ether and I have been having some private discussions and running some simulations on this topic. I thought I'd report the general results here. I think Ether agrees with what I say below, but I'll leave that for him to confirm or deny. :) Executive Summary: 1. The mean of the standard error vector for the OPR estimates is a decent approximation for the standard deviation of the team-specific OPR estimates themselves, and is a very good approximation for the mean of the standard deviations of the team-specific OPR estimates taken across all of the teams in the tournament. 2. Teams with more variability in their offensive contributions (e.g., teams that contribute a huge amount to their alliance's score by performing some high-scoring feats, but fail at doing so 1/2 the time) will have slightly more uncertainty in their OPR estimate than the mean of the standard error vector would indicate, but not by too much. 3. Teams with less variability in their offensive contributions (e.g., consistent teams that always contribute about the same amount to their alliance's score every match) will have slightly less uncertainty in their OPR estimate than the mean of the standard error vector would indicate, but not by too much. Details: I simulated match scores in the following way. 1. I computed the actual OPRs from the actual match data (in this case, from the 2014 misjo tournament as suggested by Ether). 2. I computed the sum of the squared values of the prediction residual and divided this sum by (#matches - #teams) to get an estimate of the per-match randomness that exists after the OPR prediction is performed. 3. I divided the result from step#2 above by 3 to get a per-team estimate of the variance of each team's offensive contribution. I took the square root of this to get the per-team estimate of the standard deviation of each team's offensive contribution. 4. I then simulated 1000 tournaments using the same match schedule as the 2014 misjo tournament. The simulated match scores were the sum of the 3 OPRs for the teams in that match plus 3 zero-mean, variance-1 normally distributed random numbers scaled by the 3 per-team offensive standard deviations computed in step #3. Note that at this point, each team has the same value for the per-team offensive standard deviations. 5. I then computed the OPR estimates from the match scores for each simulated tournament and computed the actual standard deviation of the 1000 OPR estimates for each team. These standard deviations were all close to 11.5 (between 11 and 12) which was the average of the elements of the traditional standard error vector calculation performed on the original data. This makes sense, as the standard error is supposed to be the standard deviation of the estimates if the randomness of the match scores had equal variance for all matches, as was simulated. As a reminder, all of the individual elements of the standard error vector were extremely close to 11.5 in this case. 6. But then I tried something different. Instead of having the per-team standard deviation of the offensive contributions be constant, I instead added a random variable to these standard deviations and then renormalized all of them so that the average variance of the match scores would be unchanged. In other words, now some teams have a larger variance in their offensive contributions (e.g., team A might have an OPR of 30 but have its score contribution typically vary between 15 and 45) while other teams might have a smaller variance in their contributions (e.g., team B might also have an OPR of 30 but have its score contribution only typically vary between 25 and 35). 7. Now I resimulated another 1000 tournaments using this model. So now, some match scores might have greater variances and some match scores might have smaller variances. But the way OPR was calculated was not changed. 8. Then I calculated the OPRs for these new 1000 simulated tournaments and calculated the standard deviations of these 1000 new per-team OPR estimates. What I found was that the OPR estimates did vary more for teams that had a greater offensive variance and did vary less for teams that had a smaller offensive variance. So, if you're convinced that different teams have substantially different variances in their offensive contributions, then just using the one average standard error computation to estimate how reliable all of the different OPR estimates are is not completely accurate. But the differences were not that large. For example, in one set of simulations, team A had an offensive contribution with a standard deviation of 8 while team B had an offensive contribution with a standard deviation of 29. So in this case, team B had a LOT more variability in their offensive contribution than team A did (almost 4x as much). But the standard deviation of the 1000 OPR estimates for team A was 10.8 while the standard deviation of the 1000 OPR estimates for team B was 12.9. So yes, team B had a much bigger offensive variability and that made the confidence in their OPR estimates worse than the 11.5 that the standard error would suggest, but it only went up by 1.4, while team A had a much smaller offensive variability but that only improved the confidence in their OPR estimates by 0.7. And also, the average of the standard deviations of the OPR estimates for the teams in the 1000 tournaments was still very close to the average of the standard error vector computed assuming that the match scores had identical variances. So, repeating the Executive Summary: 1. The mean of the standard error vector for the OPR estimates is a decent approximation for the standard deviation of the team-specific OPR estimates themselves, and is a very good approximation for the mean of the standard deviations of the team-specific OPR estimates taken across all of the teams in the tournament. 2. Teams with more variability in their offensive contributions (e.g., teams that contribute a huge amount to their alliance's score by performing some high-scoring feats, but fail at doing so 1/2 the time) will have slightly more uncertainty in their OPR estimate than the mean of the standard error vector would indicate, but not by too much. 3. Teams with less variability in their offensive contributions (e.g., consistent teams that always contribute about the same amount to their alliance's score every match) will have slightly less uncertainty in their OPR estimate than the mean of the standard error vector would indicate, but not by too much. |
Re: "standard error" of OPR values
Quote:
|
Re: "standard error" of OPR values
Quote:
The elephant in the room here is that assumption that the alliance is equal to the sum of its members. For example, consider a 2015 (Recycle Rush) robot with a highly effective 2-can grab during autonomous, and the ability to build, score, cap and noodle one stack of six from the HP station, or cap five stacks of up to six totes during a match, or cap four stacks with noodles loaded over the wall. For argument's sake, it is essentially 100% proficient at these tasks, selecting which to do based on its alliance partners. I will also admit up front that the alliance match-ups are somewhat contrived, but none truly unrealistic. If I'd wanted to really stack the deck, I'd have assumed that the robot was the consummate RC specialist and had no tote manipulators at all.
The real point is that this variation is based on the alliance composition, not on "performance variation" of the robot in the same situation. I also left HP littering out, which would provide additional wrinkles. My takeaway on this thread is that it would be good and useful information to know the rms (root-mean-square) of the residuals for an OPR/DPR data set (tournament or season). This would provide some understanding as to how much difference really is a difference, and a clue as to when the statistics mean about as much as the scouting. On another slightly related matter, I have wondered why CCWM (Combined Contribution to Winning Margin) is calculated by combining separate calculations of OPR and DPR, rather than by solving a single matrix of winning margin. I suspect that the single calculation would prove to be more consistent for games with robot-based defense (not Recycle Rush); if a robot plays offense five matches and defense five matches, then both OPR and DPR would each have a lot of noise, whereas true CCWM should be a more consistent number. |
Re: "standard error" of OPR values
Quote:
|
Re: "standard error" of OPR values
Quote:
|
Re: "standard error" of OPR values
Quote:
|
Re: "standard error" of OPR values
Quote:
The paper discusses MMSE-based estimation of the metrics (as opposed to the traditional least-squares method) which reduces the overfitting effects, does better at predicting previously unseen matches (as measured by the size of the squared prediction residual in "testing set" matches), and is better at predicting the actual underlying metric values in tournaments which are simulated using the actual metric models. |
Re: "standard error" of OPR values
Quote:
Rather, it would be overfitting if the predictive power of the model (when tested against data not used to tune it) did not increase with the amount of data available to tune the parameters. I highly doubt that is the case here. |
Re: "standard error" of OPR values
Quote:
|
Re: "standard error" of OPR values
Quote:
On the first sentence of that quote, I previously found that if I replaced the data from the 2014 casa tournament (which had the greatest number of matches per team of the tournaments I worked with) with completely random noise, the OPR could "predict" 26% of the variance and WMPR could "predict" 47% of it. So they're clearly describing the random noise in this case where a "properly fit" model would come closer to finding no relationship between the model parameters and the data, as should be the case when the data is purely random. On the second sentence, again for the 2014 casa tournament, the OPR calculation only has 4 data points per parameter and the WMPR only has 2, which again sounds like "having too many parameters relative to the number of observations" to me. BTW, I think the model is appropriate, so I view it more as a problem of having too few observations rather than too many parameters. And again, the casa tournament is one of the best cases. Most other tournaments have even fewer observations per parameter. So that's why I think it's overfitting. Your opinion may differ. No worries either way. :) This is also discussed a bit in the section on "Effects of Tournament Size" on my "Overview and Analysis of First Stats" paper. |
Re: "standard error" of OPR values
Quote:
The problem here is that there are two separate things in that wikipedia article that are called "overfitting:" errors caused by fundamentally sound models with insufficient data, and errors caused by improperly revising the model specifically to fit the available training data (and thus causing a failure to generalize). If one is reasoning purely based on patterns seen in the data, then there is no difference between the two (since the only way to know that one's model fits the data would be through validation against those data). However, these aren't necessarily the same thing if one has an externally motivated model (and I believe OPR has reasonable, albeit clearly imperfect, motivation). We may be veering off-topic, though. |
| All times are GMT -5. The time now is 13:38. |
Powered by vBulletin® Version 3.6.4
Copyright ©2000 - 2017, Jelsoft Enterprises Ltd.
Copyright © Chief Delphi