

Sometimes the best hiding places are in plain sight.  Robert Cawthon [more] 



Thread Tools  Rate Thread  Display Modes 
#1




The Case for the Median
As Karthik has said in one of his lectures, "a lot of people are just like: 'what's the average points per match? what's the average?'" In Karthik's case, he was advocating for people to pay attention to standard deviations: for people to care about the spread of a team's performance as well as measures of central tendency like the mean. I'd like to approach what Karthik observed from a different angle: teams are always saying "what's the average?", and I'd like to encourage them to ask: "what's the median?"*
Let's be real: it's extremely unlikely for any official competition to have more than 12 qualification matches per team. The Waterloo regional had 13 qual matches per team, but I can't think of any other official event with that many matches per team. So, when we go to make statistical decisions about teams for match strategy and for picklisting, teams ought to remember that they are working with a maximum of n=12: often much less. Usually important picklisting decisions are made with data in the realm of 8 or 9 matches, even at district events. I haven't competed in the regional system yet, but I expect that regional teams picklisting the night before alliance selection are working with even less data. That sample size matters when it comes to outliers. The median is resistant to outliers, while the mean isn't. If you rely on the mean alone, you are choosing to be influenced strongly by outlier matches. In some situations, this can be valuable, but for most alliance captains, and when making strategy for most qualification matches, teams are more interested in the likely outcome of a team rather than their tail results. It's not like the median is difficult to calculate, either. More teams than ever use complicated electronic scouting systems, and advanced statistics like OPR are commonplace among teams making any sort of strategic assessment. If you're doing the linear algebra to calculate OPR, how hard is it to type =median(range) into excel? Of course, the mean has its place. It's a good measure of central tendency, and it has the advantage of combining all the available data points into one summary statistic rather than selecting one representative (or splitting the difference between two.) The mean is the basis of most methods of statistical inference, so if teams want to use zscores, confidence intervals, or pvalues in their scouting and strategy decisions the mean is critical. But how many FRC teams use statistical inference? I remember seeing zscores in 1678's scouting documentation from 2016, and I know that 1712 used gear confidence intervals and tried to approximate the probability of a 4rotor or 40 kPa match in 2017, but realistically most teams aren't building statistical inference into their scouting decisions. Spending half a minute to add the median to your analysis toolbox is going to be more valuable than spending half a week to add OPR to your analysis toolbox. That's not to say that OPR isn't a helpful statistic (which is an entirely different conversation), but rather that the median is heavily undervalued for the minimal time and effort it takes to use effectively. Teams should care about the median, not just the mean. * Yes, the median is technically an average. But it's pretty obvious that people asking "what's the average" are asking about the arithmetic mean. Last edited by GKrotkov : 09022017 at 07:10 PM. Reason: Edits made to get the link to Karthik's 2015 talk to work. 
#2




Re: The Case for the Median
More topics like this on the scouting forum please!
I've tended to look to mode and range more often than mean for evaluation. Mean has also given me mixed resullts for match prediction. I'll keep median in mind in the future. 
#3




Re: The Case for the Median
Quote:
0 0 131 134 136 139 147 160 161 163Binning by 1 would give 0 as the mode, by 5 would give 160164, by 10 would give 130139. It's hard to attach much significance to a measurement which is almost as much a function of the details of the measuring as of that which is being measured. Bimodal/multimodal distributions are also not uncommon with such small data sets, even if the underlying phenomenon is Gaussian. 
#4




Re: The Case for the Median
Quote:
Captain_Kirch also brought up the range as an estimator of spread: it's interesting to look at robust alternatives to standard deviation for measuring spread. Range also, isn't robust: by definition it's entirely reliant on the maximum and minimum. Personally, I'd advocate for IQR (= Q3Q1), since it's resistant to outliers and provides more specificity than something like MAD (Median Absolute Deviations). On the other hand, though, if you're trying to determine how consistent a team is, you might want to be strongly influenced by outliers. So, I went into my MidAtlantic Champs data and did some comparisons. I've pulled out, with respect to the number of gears scored, the most consistent teams by IQR and Standard Deviation. Here are the 5 most consistent teams in the dataset by IQR, along with their datasets. 56: 3, 4, 3, 2, 3, 3, 5, 3, 0, 3, 3, 2 IQR = 0 3974: 6, 4, 3, 4, 5, 4, 4, 4, 6, 4, 4, 3 IQR = .25 5407: 3, 3, 3, 2, 3, 3, 3, 3, 3, 1, 2, 3 IQR = .25 4285: 3, 2, 2, 3, 3, 3, 4, 3, 3, 2, 3, 4 IQR = .25 1257: 3, 4, 0, 3, 4, 4, 4, 4, 4, 4, 3, 0, 4 IQR = .5 Here are the 5 most consistent teams in the dataset by Standard Deviation, along with the dataset and the calculated (sample) standard deviation. 5407: 3, 3, 3, 2, 3, 3, 3, 3, 3, 1, 2, 3 StDev: .651 2600:4, 3, 4, 3, 4, 4, 4, 4, 2, 3, 3, 3 StDev: .669 3929: 4, 4, 3, 4, 4, 4, 4, 3, 4, 4, 3, 2 StDev: .669 4285: 3, 2, 2, 3, 3, 3, 4, 3, 3, 2, 3, 4 StDev: .669 303: 4, 3, 3, 3, 3, 2, 2, 3, 4, 4, 3, 2 StDev: .739 Feel free to make of this what you wish. To me, it looks like the story of outliers: IQR is resistant, StDev isn't. I'd personally advocate for people to look at both when making a decision and understanding what they both mean. IQR definitely has some value: a team like 3974 or 1257 with one or two outlier matches but otherwise absurdly consistent performances will be looked over by StDev, but IQR would pull it out. Similarly, 3929's performance is definitely desirable*, but because all the deviations are on one side of the mean their IQR raises a bit. Teams like 5407 and 4285, with small deviations on both sides of the center, are the types that score well by both metrics. * I mean, they won MAR Champs. 
#5




Re: The Case for the Median
As a statistician with training in game theory, I am going to plead with everyone not to place a lot of reliance on any of the numbers. I believe GKrotkov is completely correct that in sample sizes such as we have in a typical FRC competition, the median is going to be a better predictor in general than the mean for predicting any single team's performance. But in general, sample sizes aren't big enough for any of the typical measures to be good choices. We try to record all of the hard number data we can in scouting, but often our most useful information are the qualitative observations. My gut feeling is that the overall effectiveness of scouting has not improved at all in the past half decade or so. In particular I see teams relying on overall numbers, and not paying enough attention to what is actually happening in matches. Sometimes a team may be great but can be shut down by a particular strategy. Sometimes a team had a really rough first day of competition, but their last three or four matches were great. (We never make any final decisions on Friday.)

#6




Re: The Case for the Median
Quote:
Yes! Understanding WHY something happens may be more important than WHETHER it happened. You may be able to mitigate your partner's weakness or exploit your opponents', but only if you understand what that weakness is. Leave space on your scouting forms for free form comments, and read them! 
#7




Re: The Case for the Median
Quote:

#8




Re: The Case for the Median
Instead of just looking at the median/mode/average etc., I find myself trying to look at the trend of the data over the course of the event. For example, look at "climb data" (easy example because it's a boolean) (N= not climbed, Y = climbed)
Team 1: NNNYNYYNYY Team 2: YYYNYYNNNN Team 3: YNNYNYYNNY All three teams have the same climb percentage (50%), but Team 1 is the best pick. I have yet to determine a good way to produce a qualitative metric for this type of data analysis. (Sorry if this doesn't make sense or derails thread too much off topic) 
#9




Re: The Case for the Median
Quote:
Whenever attempting to make use of trend data, it's often helpful to actually understand the root cause of the trend (whether by observation or conversation with the team). 
#10




Re: The Case for the Median
Quote:
For example, on Saturday morning at Tech Valley this year, my head scout and I were going through teams and noticed that while 5952 could reliably score three gears a match, their climb rate was ~50%. We observed that missed climbs clearly increased in frequency over time, so we talked to them. We learned that the problem was their Velcro roller not sticking to their rope strongly enough. There were no other problems with the climber. Ultimately, they were unpicked by the time our alliance's second chance to pick came along (likely due to their ~50% climb rate), and we begged our captain to pick them. After lunch, their climber was good to go and they played a key role in helping us score the fourth rotor and walk away from the event with our first ever blue banner. 
#11




Re: The Case for the Median
This is very good advice. As Brian pointed out, if you know the reason for something it can completely change your analysis. I was chatting with my cocoach the other day and we think that the emphasis on people collecting and sharing data (which is a good thing, I am not trying to get teams not to collaborate and share) has lead to a decrease in number of people actually looking at what is going on and trying to figure out the whys. We also think that teams tend to do a really bad job of putting thought into what teams will best help them implement what strategies.

#12




Re: The Case for the Median
Quote:
Now that said, this is for small, resourcelimited teams using collaborative scouting information. Large teams have fewer excuses (but I would accept some). 
#13




Re: The Case for the Median
Quote:
I remember once (in 2013, when I was a referee) a team telling me that they would never pick a mecanum drive team as an ally. They passed up on picking the eventual event champion, who clearly had the best driven robot at the event, because of this. Even if you had a bias against mecanum drive, if a bunch of scouts had all written down comments like "Wow, that driver is awesome!" or "Very good driver" you might well be convinced to make that pick. The same team opted to play defense against the team they did not pick because "They are mecanum and we will be able to shove them all over the place." That didn't work, and observation of the matches would have told them it wouldn't work. But the mecanum team had been unlucky with alliance partners and seeded only 11th. 
#14




Re: The Case for the Median
Quote:
I'm imagining a procedure which would be something like this: 1. Find conventional L2 OPRs for all teams 2. For each team, find the median residual error for their matches and add that error to their normal OPR. 3. Possibly do something along these lines iteratively a few times? For example, say a team has an OPR of 10 and the following residuals (actual score  (sum of alliance OPRs)) for their 5 matches: 4, 3, 1, 0, 1, 5. Since their median residual is 0.5, their medianadjusted OPR would be 9.5. Their new residuals would then be 3.5, 2.5, 0.5, 0.5, 1.5, 5.5 assuming that their partners' OPRs are unchanged. I might play around with something like this to see if this can be used to improve the predictive power of OPR. *I would put a link here but I can't find the post or whitepaper 
#15




Re: The Case for the Median
Quote:
A_{0} = X_{0} The weight of the older data decays exponentially; a small k is a fast decay; large k (though less than 1!) is slower. With a k of 0.9, after twelve matches, the first match is weighted about 31% as much as the most recent. Last edited by GeeTwo : 09082017 at 08:14 PM. Reason: Realized formula was too simple 
Thread Tools  
Display Modes  Rate This Thread 

