|
|
|
#1
|
||||
|
||||
|
Re: "standard error" of OPR values
Quote:
There's an equivalent economists' joke in which trying to feed a group on a desert island ends with "assume a can opener!" ************************************************** ******** Quote:
In baseball, this use of statistics is called "sabremetrics." Bill James is the originator of this method. Last edited by Citrus Dad : 13-05-2015 at 18:41. Reason: added about sabremetrics |
|
#2
|
||||
|
||||
|
Re: "standard error" of OPR values
hi all,
as a student going into his first year of undergrad this fall, this kind of stuff interests me. what level (or course equivalent or experience of the student) is this kind of stuff typically taught at? I have researched into interpolation, as I would like to spend some time developing spline path generation for auton modes independently, and that particular area requires a bit of knowledge in Linear Algebra, which I will begin the process of self-teaching soon enough. As for this, what would be the equivalent of interpolation:linear algebra? I don't mean to hijack the thread, but it feels like the most appropriate place to ask... |
|
#3
|
||||
|
||||
|
Re: "standard error" of OPR values
Quote:
It's the second error term that I haven't seen reported. And in my experience working with econometric models, having only 10 observations likely leads to a very large standard error around this parameter estimate. I don't think that calculating this will change the OPR per se, but it will provide a useful measure of the (im)precision of the estimates that I don't think most students and mentors are aware of. Also, as you imply, a linear model may not be the most appropriate structure even though it is by far the easiest to compute with Excel. For example, the cap on resource availability probably creates a log-linear relationship. |
|
#4
|
||||
|
||||
|
Re: "standard error" of OPR values
Getting back to the original question: Quote:
So for those of you who answered "yes": Pick an authoritative (within the field of statistics) definition for standard error, and compute that "standard error" for each Team's OPR for the attached example. Last edited by Ether : 15-05-2015 at 19:16. |
|
#5
|
||||
|
||||
|
Re: "standard error" of OPR values
... and for those of you who think the answer is "no", explain why none of the well-defined "standard errors" (within the field of statistics) can be meaningfully applied to the example data (provided in the linked post) in a statistically valid way.
Last edited by Ether : 16-05-2015 at 13:39. |
|
#6
|
||||
|
||||
|
Re: "standard error" of OPR values
Here's a poor-man's approach to approximating the error of the OPR value calculation (as opposed to the prediction error aka regression error):
1. Collect all of a team's match results. 2. Compute the normal OPR. 3. Then, re-compute the OPR but excluding the result from the first match. 4. Repeat this process by removing the results from only the 2nd match, then only the 3rd, etc. This will give you a set of OPR values computed by excluding a single match. So for example, if a team played 6 matches, there would be the original OPR plus 6 additional "OPR-" values. 5. Compute the standard deviation of the set of OPR- values. This should give you some idea of how much variability a particular match contributes to the team's OPR. Note that this will even vary team-by-team. Thoughts? Last edited by wgardner : 16-05-2015 at 14:14. |
|
#7
|
||||
|
||||
|
Re: "standard error" of OPR values
Quote:
The question is this thread is how (or if) a standard, textbook, widely-used, statistically valid "standard error" (as mention by Citrus Dad and quoted in the original post in this thread) can be computed for OPR from official FRC qual match results data unsupplemented by manual scouting data or any other data. |
|
#8
|
||||
|
||||
|
Re: "standard error" of OPR values
Quote:
The method I propose above gives a standard deviation measure on how much a single match changes a team's OPR. I would think this is something like what you want. If not, can you define what you're looking for more precisely? Also, rather than taking 200 of 254 matches and looking at the standard deviation of all OPRs, I suggest just removing a single match (e.g., compute OPR based on 253 of the 254 matches) and looking at how that removal affects only the OPRs of the teams involved in the removed match. So if you had 254 matches in a tournament, you'd compute 254 different sets of OPRs (1 for each possible match removal) and then look at the variability of the OPRs only for the teams involved in each specific removed match. This only uses the actual qualification match results, no scouting or other data as you want. |
|
#9
|
||||
|
||||
|
Re: "standard error" of OPR values
And just to make sure I'm being clear (because I fear that I may not be):
Let's say that team 1234 played in a tournament and was involved in matches 5, 16, 28, 39, 51, and 70. You compute team 1234's OPR using all matches except match 5. Say it's 55. Then you compute team 1234's OPR using all matches except match 16. Say it's 60. Keep repeating this, removing each of that team's matches, which will give you 6 different OPR numbers. Let's say that they're 55, 60, 50, 44, 61, and 53. Then you can compute the standard deviation of those 6 numbers to give you a confidence on what team 1234's OPR is. Of course, you can do this for every team in the tournament and get team-specific OPR standard deviations and an overall tournament OPR standard deviation. Team 1234 may have a large standard deviation (because maybe 1/3 of the time they always knock over a stack in the last second) while team 5678 may have a small standard deviation (because they always contribute the exactly same point value to their alliance's final score). And hopefully the standard deviations will be lower in tournaments with more matches per team because you have more data points to average. |
|
#10
|
||||
|
||||
|
Re: "standard error" of OPR values
Quote:
|
|
#11
|
||||
|
||||
|
Re: "standard error" of OPR values
@ Citrus Dad: If you are reading this thread, would you please weigh in here and reveal what you mean by "the standard errors" of the OPRs, and how you would compute them, using only the data in the example I posted? Also, what assumptions do you have to make about the data and the model in order for the computed standard errors to be statistically valid/relevant/meaningful, and what is the statistical meaning of those computed errors? Last edited by Ether : 16-05-2015 at 23:11. |
|
#12
|
||||
|
||||
|
Re: "standard error" of OPR values
Guys, Can we all agree on the following? Computing OPR, as done here on CD, is a problem in multiple linear regression (one dependent variable and 2 or more independent variables). The dependent variable for each measurement is alliance final score in a qual match. Each qual match consists of 2 measurements (red alliance final score and blue alliance final score). If the game has any defense or coopertition, those two measurements are not independent of each other. For Archimedes, there were 127 qual matches, producing 254 measurements (alliance final scores). Let [b] be the column vector of those 254 measurements. For Archimedes, there were 76 teams, so there are 76 independent dichotomous variables (each having value 0 or 1). For each measurement, all the independent variables are 0 except for 3 of them which are 1. Let [A] be the 254 by 76 matrix whose ith row is a vector of the values of the independent variables for measurement i. Let [x] be the 76x1 column vector of model parameters. [x] is what we are trying to find. [A][x]=[b] is a set of 254 simultaneous equations in 76 variables. The variables in those 254 equations are the 76 (unknown) model parameters in [x]. We want to solve that system for [x]. Since there are more equations (254) than unknowns (76), the system is overdetermined, and there is no exact solution for [x]. Since there's no exact solution for [x], we use least squares to find the "best" solution1. The solution will be a 76x1 column vector of Team OPR. Let that solution be known as [OPR]. Citrus Dad wants to know "the standard error" of each element in [OPR]. Are we in agreement so far? If so, I will continue. 1Yes, I know there are other ways to define "best", but every OPR computation I've ever on CD uses least squares, so I infer that's what Citrus Dad had in mind. Last edited by Ether : 17-05-2015 at 15:11. |
|
#13
|
||||
|
||||
|
Re: "standard error" of OPR values
Quote:
For 1 tournament, you have a single estimate of each element of OPR. There is no standard error. If you have multiple tournaments, then you will have multiple estimates of the underlying OPR and can compute the standard error of these estimates. If you use the baseline model to create a hypothetical set of random tournaments as I described, then you can compute the standard error of these estimates from the hypothetical set of random tournaments. |
|
#14
|
||||
|
||||
|
Re: "standard error" of OPR values
I am not defining "standard error".
I am asking you (or anyone who cares to weigh in) to pick a definition from an authoritative source and use that definition to compute said standard errors of the OPRs (or state why not): Quote:
Quote:
|
|
#15
|
||||
|
||||
|
Re: "standard error" of OPR values
Quote:
Citrus Dad asked why no-one ever reports "the" standard error for the OPRs. "Standard Error" is a concept within the field of statistics. There are several well-defined meanings depending on the context. So what am trying to do is this: have a discussion about what "the" standard error might mean in the context of OPR. |
![]() |
| Thread Tools | |
| Display Modes | Rate This Thread |
|
|