Go to Post FRC competitions are fun, people. If you honestly need a video game system to keep you interested, you're ignoring a tremendous experience. - Mr. Pockets [more]
Home
Go Back   Chief Delphi > FIRST > General Forum
CD-Media   CD-Spy  
portal register members calendar search Today's Posts Mark Forums Read FAQ rules

 
Reply
Thread Tools Rating: Thread Rating: 4 votes, 5.00 average. Display Modes
  #91   Spotlight this post!  
Unread 12-07-2015, 22:46
GeeTwo's Avatar
GeeTwo GeeTwo is offline
Technical Director
AKA: Gus Michel II
FRC #3946 (Tiger Robotics)
Team Role: Mentor
 
Join Date: Jan 2014
Rookie Year: 2013
Location: Slidell, LA
Posts: 3,723
GeeTwo has a reputation beyond reputeGeeTwo has a reputation beyond reputeGeeTwo has a reputation beyond reputeGeeTwo has a reputation beyond reputeGeeTwo has a reputation beyond reputeGeeTwo has a reputation beyond reputeGeeTwo has a reputation beyond reputeGeeTwo has a reputation beyond reputeGeeTwo has a reputation beyond reputeGeeTwo has a reputation beyond reputeGeeTwo has a reputation beyond repute
Re: "standard error" of OPR values

Quote:
Originally Posted by wgardner View Post
So, repeating the Executive Summary:

1. The mean of the standard error vector for the OPR estimates is a decent approximation for the standard deviation of the team-specific OPR estimates themselves, and is a very good approximation for the mean of the standard deviations of the team-specific OPR estimates taken across all of the teams in the tournament.

2. Teams with more variability in their offensive contributions (e.g., teams that contribute a huge amount to their alliance's score by performing some high-scoring feats, but fail at doing so 1/2 the time) will have slightly more uncertainty in their OPR estimate than the mean of the standard error vector would indicate, but not by too much.

3. Teams with less variability in their offensive contributions (e.g., consistent teams that always contribute about the same amount to their alliance's score every match) will have slightly less uncertainty in their OPR estimate than the mean of the standard error vector would indicate, but not by too much.
The bottom line here seems to be that even assuming that an alliance's expected score is a simple sum of each team's contributions, the statistics tend to properly report the global match-to-match variation, while under-reporting each team's match-to-match variation.
The elephant in the room here is that assumption that the alliance is equal to the sum of its members. For example, consider a 2015 (Recycle Rush) robot with a highly effective 2-can grab during autonomous, and the ability to build, score, cap and noodle one stack of six from the HP station, or cap five stacks of up to six totes during a match, or cap four stacks with noodles loaded over the wall. For argument's sake, it is essentially 100% proficient at these tasks, selecting which to do based on its alliance partners. I will also admit up front that the alliance match-ups are somewhat contrived, but none truly unrealistic. If I'd wanted to really stack the deck, I'd have assumed that the robot was the consummate RC specialist and had no tote manipulators at all.
  • If the robot had the field to itself, it could score 42 points. (one noodled, capped stack of 6) The canburglar is useless, except as a defensive measure.
  • If paired with two HP robots that could combine to score 2 or 3 capped stacks, this robot would add at most a few noodles to the final score. It either can't get to the HP station, or it would displace another robot that would have been using the station. Again, the canburglar has no offensive value.
  • If paired with an HP robot that could score two capped & noodled stacks, and a landfill miner that could build and cap two non-noodled stacks, the margin for this robot would be 66 points. (42 points for its own noodled, capped stack, and 24 points for the fourth stack that the landfill robot could cap). The canburglar definitely contributes here!
  • If allied with two HP robots that could put up 4 or 5 6-stacks of totes (but no RCs), the margin value of this robot would be a whopping 120 points. (Cap 4 6-stacks with RCs and noodles, or cap 5 6-stacks with RCs). Couldn't do it without that canburglar!

The real point is that this variation is based on the alliance composition, not on "performance variation" of the robot in the same situation. I also left HP littering out, which would provide additional wrinkles.

My takeaway on this thread is that it would be good and useful information to know the rms (root-mean-square) of the residuals for an OPR/DPR data set (tournament or season). This would provide some understanding as to how much difference really is a difference, and a clue as to when the statistics mean about as much as the scouting.

On another slightly related matter, I have wondered why CCWM (Combined Contribution to Winning Margin) is calculated by combining separate calculations of OPR and DPR, rather than by solving a single matrix of winning margin. I suspect that the single calculation would prove to be more consistent for games with robot-based defense (not Recycle Rush); if a robot plays offense five matches and defense five matches, then both OPR and DPR would each have a lot of noise, whereas true CCWM should be a more consistent number.
__________________

If you can't find time to do it right, how are you going to find time to do it over?
If you don't pass it on, it never happened.
Robots are great, but inspiration is the reason we're here.
Friends don't let friends use master links.

Last edited by GeeTwo : 12-07-2015 at 22:56. Reason: Several nitnoids
Reply With Quote
  #92   Spotlight this post!  
Unread 13-07-2015, 00:22
Oblarg Oblarg is offline
Registered User
AKA: Eli Barnett
FRC #0449 (The Blair Robot Project)
Team Role: Mentor
 
Join Date: Mar 2009
Rookie Year: 2008
Location: Philadelphia, PA
Posts: 1,116
Oblarg has a reputation beyond reputeOblarg has a reputation beyond reputeOblarg has a reputation beyond reputeOblarg has a reputation beyond reputeOblarg has a reputation beyond reputeOblarg has a reputation beyond reputeOblarg has a reputation beyond reputeOblarg has a reputation beyond reputeOblarg has a reputation beyond reputeOblarg has a reputation beyond reputeOblarg has a reputation beyond repute
Re: "standard error" of OPR values

Quote:
Originally Posted by GeeTwo View Post
The elephant in the room here is that assumption that the alliance is equal to the sum of its members.
This was directly addressed on the earlier pages, and it's known that there's no real way we can account for the degree to which the underlying OPR model is inaccurate (short of positing some complicated nonlinear model and using that instead).
__________________
"Mmmmm, chain grease and aluminum shavings..."
"The breakfast of champions!"

Member, FRC Team 449: 2007-2010
Drive Mechanics Lead, FRC Team 449: 2009-2010
Alumnus/Technical Mentor, FRC Team 449: 2010-Present
Lead Technical Mentor, FRC Team 4464: 2012-2015
Technical Mentor, FRC Team 5830: 2015-2016

Last edited by Oblarg : 13-07-2015 at 00:26.
Reply With Quote
  #93   Spotlight this post!  
Unread 13-07-2015, 06:28
wgardner's Avatar
wgardner wgardner is online now
Registered User
no team
Team Role: Coach
 
Join Date: Feb 2013
Rookie Year: 2012
Location: Charlottesville, VA
Posts: 172
wgardner is a splendid one to beholdwgardner is a splendid one to beholdwgardner is a splendid one to beholdwgardner is a splendid one to beholdwgardner is a splendid one to beholdwgardner is a splendid one to beholdwgardner is a splendid one to behold
Re: "standard error" of OPR values

Quote:
Originally Posted by Oblarg View Post
Couldn't one generate an estimate for each team's "contribution to variance" by doing the same least-squares fit used to generate OPR in the first place (using the matrix of squared residuals rather than of scores)? This might run the risk of assigning some team a negative contribution to variance (good luck making sense of that one), but other than that (seemingly unlikely) case I can't think of why this wouldn't work.
We just tried this about a day ago. Unfortunately, there isn't enough data in a typical tournament to get reliable estimates of the per-team offensive variation. With much larger tournament sizes, it does work OK, but it doesn't work when you only have about 5-10 matches played by each team. I'll send you some of our private messages where this is discussed and where the results are shown.
__________________
CHEER4FTC website and facebook online FTC resources.
Providing support for FTC Teams in the Charlottesville, VA area and beyond.

Last edited by wgardner : 13-07-2015 at 06:32.
Reply With Quote
  #94   Spotlight this post!  
Unread 13-07-2015, 06:39
wgardner's Avatar
wgardner wgardner is online now
Registered User
no team
Team Role: Coach
 
Join Date: Feb 2013
Rookie Year: 2012
Location: Charlottesville, VA
Posts: 172
wgardner is a splendid one to beholdwgardner is a splendid one to beholdwgardner is a splendid one to beholdwgardner is a splendid one to beholdwgardner is a splendid one to beholdwgardner is a splendid one to beholdwgardner is a splendid one to behold
Re: "standard error" of OPR values

Quote:
Originally Posted by GeeTwo View Post
On another slightly related matter, I have wondered why CCWM (Combined Contribution to Winning Margin) is calculated by combining separate calculations of OPR and DPR, rather than by solving a single matrix of winning margin. I suspect that the single calculation would prove to be more consistent for games with robot-based defense (not Recycle Rush); if a robot plays offense five matches and defense five matches, then both OPR and DPR would each have a lot of noise, whereas true CCWM should be a more consistent number.
Yes, read the paper attached in the first post of this thread. What you described is called the Winning Margin Power Rating (WMPR) or Combined Power Rating (CPR) in that paper depending on how you choose to normalize it (called WMPR if the means are 0 like CCWM or called CPR if the means equal the means of the OPRs). If combined with MMSE estimation to address some overfitting issues, it can occasionally result in improved match prediction compared to OPR, DPR, or CCWM measures. Even in years with a lot of defense though, it's not a whole lot better.
__________________
CHEER4FTC website and facebook online FTC resources.
Providing support for FTC Teams in the Charlottesville, VA area and beyond.
Reply With Quote
  #95   Spotlight this post!  
Unread 13-07-2015, 06:50
wgardner's Avatar
wgardner wgardner is online now
Registered User
no team
Team Role: Coach
 
Join Date: Feb 2013
Rookie Year: 2012
Location: Charlottesville, VA
Posts: 172
wgardner is a splendid one to beholdwgardner is a splendid one to beholdwgardner is a splendid one to beholdwgardner is a splendid one to beholdwgardner is a splendid one to beholdwgardner is a splendid one to beholdwgardner is a splendid one to behold
Re: "standard error" of OPR values

Quote:
Originally Posted by GeeTwo View Post
My takeaway on this thread is that it would be good and useful information to know the rms (root-mean-square) of the residuals for an OPR/DPR data set (tournament or season). This would provide some understanding as to how much difference really is a difference, and a clue as to when the statistics mean about as much as the scouting.
Yes. In the paper in the other thread that I just posted about, the appendices show how much percentage reduction in the mean-squared residual is achieved by all of the different metrics (OPR, CCWM, WMPR, etc). An interesting thing to note is that the metrics are often much worse at predicting match results that they haven't included in their computation, indicating overfitting in many cases.

The paper discusses MMSE-based estimation of the metrics (as opposed to the traditional least-squares method) which reduces the overfitting effects, does better at predicting previously unseen matches (as measured by the size of the squared prediction residual in "testing set" matches), and is better at predicting the actual underlying metric values in tournaments which are simulated using the actual metric models.
__________________
CHEER4FTC website and facebook online FTC resources.
Providing support for FTC Teams in the Charlottesville, VA area and beyond.
Reply With Quote
  #96   Spotlight this post!  
Unread 13-07-2015, 21:48
Oblarg Oblarg is offline
Registered User
AKA: Eli Barnett
FRC #0449 (The Blair Robot Project)
Team Role: Mentor
 
Join Date: Mar 2009
Rookie Year: 2008
Location: Philadelphia, PA
Posts: 1,116
Oblarg has a reputation beyond reputeOblarg has a reputation beyond reputeOblarg has a reputation beyond reputeOblarg has a reputation beyond reputeOblarg has a reputation beyond reputeOblarg has a reputation beyond reputeOblarg has a reputation beyond reputeOblarg has a reputation beyond reputeOblarg has a reputation beyond reputeOblarg has a reputation beyond reputeOblarg has a reputation beyond repute
Re: "standard error" of OPR values

Quote:
Originally Posted by wgardner View Post
Yes. In the paper in the other thread that I just posted about, the appendices show how much percentage reduction in the mean-squared residual is achieved by all of the different metrics (OPR, CCWM, WMPR, etc). An interesting thing to note is that the metrics are often much worse at predicting match results that they haven't included in their computation, indicating overfitting in many cases.
I don't think this necessarily indicates "overfitting" in the traditional sense of the word - you're always going to get an artificially-low estimate of your error when you test your model against the same data you used to tune it, whether your model is overfitting or not (the only way to avoid this is to partition your data into model and verification sets). This is "double dipping."

Rather, it would be overfitting if the predictive power of the model (when tested against data not used to tune it) did not increase with the amount of data available to tune the parameters. I highly doubt that is the case here.
__________________
"Mmmmm, chain grease and aluminum shavings..."
"The breakfast of champions!"

Member, FRC Team 449: 2007-2010
Drive Mechanics Lead, FRC Team 449: 2009-2010
Alumnus/Technical Mentor, FRC Team 449: 2010-Present
Lead Technical Mentor, FRC Team 4464: 2012-2015
Technical Mentor, FRC Team 5830: 2015-2016

Last edited by Oblarg : 13-07-2015 at 22:01.
Reply With Quote
  #97   Spotlight this post!  
Unread 13-07-2015, 22:31
GeeTwo's Avatar
GeeTwo GeeTwo is offline
Technical Director
AKA: Gus Michel II
FRC #3946 (Tiger Robotics)
Team Role: Mentor
 
Join Date: Jan 2014
Rookie Year: 2013
Location: Slidell, LA
Posts: 3,723
GeeTwo has a reputation beyond reputeGeeTwo has a reputation beyond reputeGeeTwo has a reputation beyond reputeGeeTwo has a reputation beyond reputeGeeTwo has a reputation beyond reputeGeeTwo has a reputation beyond reputeGeeTwo has a reputation beyond reputeGeeTwo has a reputation beyond reputeGeeTwo has a reputation beyond reputeGeeTwo has a reputation beyond reputeGeeTwo has a reputation beyond repute
Re: "standard error" of OPR values

Quote:
Originally Posted by wgardner View Post
Yes, read the paper attached in the first post of this thread. What you described is called the Winning Margin Power Rating (WMPR) or Combined Power Rating (CPR) in that paper depending on how you choose to normalize it (called WMPR if the means are 0 like CCWM or called CPR if the means equal the means of the OPRs). If combined with MMSE estimation to address some overfitting issues, it can occasionally result in improved match prediction compared to OPR, DPR, or CCWM measures. Even in years with a lot of defense though, it's not a whole lot better.
Seems a shame for such a meaningful statistic to be referenced not with a bang, but with a WMPR. ;->
__________________

If you can't find time to do it right, how are you going to find time to do it over?
If you don't pass it on, it never happened.
Robots are great, but inspiration is the reason we're here.
Friends don't let friends use master links.
Reply With Quote
  #98   Spotlight this post!  
Unread 14-07-2015, 09:22
wgardner's Avatar
wgardner wgardner is online now
Registered User
no team
Team Role: Coach
 
Join Date: Feb 2013
Rookie Year: 2012
Location: Charlottesville, VA
Posts: 172
wgardner is a splendid one to beholdwgardner is a splendid one to beholdwgardner is a splendid one to beholdwgardner is a splendid one to beholdwgardner is a splendid one to beholdwgardner is a splendid one to beholdwgardner is a splendid one to behold
Re: "standard error" of OPR values

Quote:
Originally Posted by Oblarg View Post
I don't think this necessarily indicates "overfitting" in the traditional sense of the word - you're always going to get an artificially-low estimate of your error when you test your model against the same data you used to tune it, whether your model is overfitting or not (the only way to avoid this is to partition your data into model and verification sets). This is "double dipping."

Rather, it would be overfitting if the predictive power of the model (when tested against data not used to tune it) did not increase with the amount of data available to tune the parameters. I highly doubt that is the case here.
From Wikipedia on Overfitting : "In statistics and machine learning, overfitting occurs when a statistical model describes random error or noise instead of the underlying relationship. Overfitting generally occurs when a model is excessively complex, such as having too many parameters relative to the number of observations."

On the first sentence of that quote, I previously found that if I replaced the data from the 2014 casa tournament (which had the greatest number of matches per team of the tournaments I worked with) with completely random noise, the OPR could "predict" 26% of the variance and WMPR could "predict" 47% of it. So they're clearly describing the random noise in this case where a "properly fit" model would come closer to finding no relationship between the model parameters and the data, as should be the case when the data is purely random.

On the second sentence, again for the 2014 casa tournament, the OPR calculation only has 4 data points per parameter and the WMPR only has 2, which again sounds like "having too many parameters relative to the number of observations" to me. BTW, I think the model is appropriate, so I view it more as a problem of having too few observations rather than too many parameters.

And again, the casa tournament is one of the best cases. Most other tournaments have even fewer observations per parameter.

So that's why I think it's overfitting. Your opinion may differ. No worries either way.

This is also discussed a bit in the section on "Effects of Tournament Size" on my "Overview and Analysis of First Stats" paper.
__________________
CHEER4FTC website and facebook online FTC resources.
Providing support for FTC Teams in the Charlottesville, VA area and beyond.

Last edited by wgardner : 14-07-2015 at 09:25.
Reply With Quote
  #99   Spotlight this post!  
Unread 14-07-2015, 13:35
Oblarg Oblarg is offline
Registered User
AKA: Eli Barnett
FRC #0449 (The Blair Robot Project)
Team Role: Mentor
 
Join Date: Mar 2009
Rookie Year: 2008
Location: Philadelphia, PA
Posts: 1,116
Oblarg has a reputation beyond reputeOblarg has a reputation beyond reputeOblarg has a reputation beyond reputeOblarg has a reputation beyond reputeOblarg has a reputation beyond reputeOblarg has a reputation beyond reputeOblarg has a reputation beyond reputeOblarg has a reputation beyond reputeOblarg has a reputation beyond reputeOblarg has a reputation beyond reputeOblarg has a reputation beyond repute
Re: "standard error" of OPR values

Quote:
Originally Posted by wgardner View Post
From Wikipedia on Overfitting : "In statistics and machine learning, overfitting occurs when a statistical model describes random error or noise instead of the underlying relationship. Overfitting generally occurs when a model is excessively complex, such as having too many parameters relative to the number of observations."

On the first sentence of that quote, I previously found that if I replaced the data from the 2014 casa tournament (which had the greatest number of matches per team of the tournaments I worked with) with completely random noise, the OPR could "predict" 26% of the variance and WMPR could "predict" 47% of it. So they're clearly describing the random noise in this case where a "properly fit" model would come closer to finding no relationship between the model parameters and the data, as should be the case when the data is purely random.

On the second sentence, again for the 2014 casa tournament, the OPR calculation only has 4 data points per parameter and the WMPR only has 2, which again sounds like "having too many parameters relative to the number of observations" to me. BTW, I think the model is appropriate, so I view it more as a problem of having too few observations rather than too many parameters.

And again, the casa tournament is one of the best cases. Most other tournaments have even fewer observations per parameter.

So that's why I think it's overfitting. Your opinion may differ. No worries either way.

This is also discussed a bit in the section on "Effects of Tournament Size" on my "Overview and Analysis of First Stats" paper.
Well, any nontrivial model at all that's looking at a purely random process without sufficient data is going to "overfit," nearly by definition, because no nontrivial model is going to be at all correct.

The problem here is that there are two separate things in that wikipedia article that are called "overfitting:" errors caused by fundamentally sound models with insufficient data, and errors caused by improperly revising the model specifically to fit the available training data (and thus causing a failure to generalize).

If one is reasoning purely based on patterns seen in the data, then there is no difference between the two (since the only way to know that one's model fits the data would be through validation against those data). However, these aren't necessarily the same thing if one has an externally motivated model (and I believe OPR has reasonable, albeit clearly imperfect, motivation).

We may be veering off-topic, though.
__________________
"Mmmmm, chain grease and aluminum shavings..."
"The breakfast of champions!"

Member, FRC Team 449: 2007-2010
Drive Mechanics Lead, FRC Team 449: 2009-2010
Alumnus/Technical Mentor, FRC Team 449: 2010-Present
Lead Technical Mentor, FRC Team 4464: 2012-2015
Technical Mentor, FRC Team 5830: 2015-2016
Reply With Quote
Reply


Thread Tools
Display Modes Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump


All times are GMT -5. The time now is 08:03.

The Chief Delphi Forums are sponsored by Innovation First International, Inc.


Powered by vBulletin® Version 3.6.4
Copyright ©2000 - 2017, Jelsoft Enterprises Ltd.
Copyright © Chief Delphi