View Single Post
  #99   Spotlight this post!  
Unread 14-07-2015, 13:35
Oblarg Oblarg is offline
Registered User
AKA: Eli Barnett
FRC #0449 (The Blair Robot Project)
Team Role: Mentor
 
Join Date: Mar 2009
Rookie Year: 2008
Location: Philadelphia, PA
Posts: 1,112
Oblarg has a reputation beyond reputeOblarg has a reputation beyond reputeOblarg has a reputation beyond reputeOblarg has a reputation beyond reputeOblarg has a reputation beyond reputeOblarg has a reputation beyond reputeOblarg has a reputation beyond reputeOblarg has a reputation beyond reputeOblarg has a reputation beyond reputeOblarg has a reputation beyond reputeOblarg has a reputation beyond repute
Re: "standard error" of OPR values

Quote:
Originally Posted by wgardner View Post
From Wikipedia on Overfitting : "In statistics and machine learning, overfitting occurs when a statistical model describes random error or noise instead of the underlying relationship. Overfitting generally occurs when a model is excessively complex, such as having too many parameters relative to the number of observations."

On the first sentence of that quote, I previously found that if I replaced the data from the 2014 casa tournament (which had the greatest number of matches per team of the tournaments I worked with) with completely random noise, the OPR could "predict" 26% of the variance and WMPR could "predict" 47% of it. So they're clearly describing the random noise in this case where a "properly fit" model would come closer to finding no relationship between the model parameters and the data, as should be the case when the data is purely random.

On the second sentence, again for the 2014 casa tournament, the OPR calculation only has 4 data points per parameter and the WMPR only has 2, which again sounds like "having too many parameters relative to the number of observations" to me. BTW, I think the model is appropriate, so I view it more as a problem of having too few observations rather than too many parameters.

And again, the casa tournament is one of the best cases. Most other tournaments have even fewer observations per parameter.

So that's why I think it's overfitting. Your opinion may differ. No worries either way.

This is also discussed a bit in the section on "Effects of Tournament Size" on my "Overview and Analysis of First Stats" paper.
Well, any nontrivial model at all that's looking at a purely random process without sufficient data is going to "overfit," nearly by definition, because no nontrivial model is going to be at all correct.

The problem here is that there are two separate things in that wikipedia article that are called "overfitting:" errors caused by fundamentally sound models with insufficient data, and errors caused by improperly revising the model specifically to fit the available training data (and thus causing a failure to generalize).

If one is reasoning purely based on patterns seen in the data, then there is no difference between the two (since the only way to know that one's model fits the data would be through validation against those data). However, these aren't necessarily the same thing if one has an externally motivated model (and I believe OPR has reasonable, albeit clearly imperfect, motivation).

We may be veering off-topic, though.
__________________
"Mmmmm, chain grease and aluminum shavings..."
"The breakfast of champions!"

Member, FRC Team 449: 2007-2010
Drive Mechanics Lead, FRC Team 449: 2009-2010
Alumnus/Technical Mentor, FRC Team 449: 2010-Present
Lead Technical Mentor, FRC Team 4464: 2012-2015
Technical Mentor, FRC Team 5830: 2015-2016
Reply With Quote