Statistics for week 2

This is for all the qualifying matches that have occurred so far on this weekend and were posted to usfirst.org. This is based on about 3 times as much data as my last post. For the time being, I’m going to ignore finals. There really aren’t that many rounds of finals to get worthwhile data out of, and they are so different from qualifying rounds that I couldn’t really lump them into them.

Average score at BAE regional (last week): 24.6

This week (in brackets is last week, if available)
Average red score: 29.44 (23.8)
Average blue score: 29.53 (25.4)
Average score: 29.493 (24.6)
Average score when neither team scored zero: 30.41
Percent of matches where at least one team scored zero: 30% (35%)

of red wins: 257

of blue wins: 254

of ties: 8

Median score: 27 (50% of scores were less than this)

Average win margin: 24.34 (18.2)
Average win margin when both teams were nonzero: 24.06
Average winning score: 42.0 (33.7)
Average losing score: 17.2 (15.4)

Effect of multiple regional attendance
Average score for a team full of first-timers: 29.318
Average score for a team where one member had attended a previous regional: 34.244

of games in which one alliance had a member that had attended a previous regional, and the other alliance didn’t: 37

Win rate for the experienced team in that situation: 57% (note: with such a small sample size, this may not actually be significant. We’ll know for sure as the weeks go on if experience creates a significant advantage)

My comments:
As with last week, almost a third of games can be won by scoring a SINGLE point and not taking any penalties.

CHARTS:

Definition: robot-regionals. Units indicating the sum of the number of regionals the robots in an alliance had attended. For example: If an alliance had 2 robots that had attended 1 regional each, and 1 robot that was attending its first, it would be measured at 2 robot-regionals.
attendance.png
This is a boxplot showing the average score of alliances, seperated by their robot-regional values. Surprisingly, there was never an alliance put together that included more than one robot that had attended a previous regional.

matchesPlayed.png
Shows score achieved versus the average number of matches played by the robots making up the alliance. You can see that it does increase slightly throughout the competition. It’s only really accurate up to 8 though, as only one regional had posted up to 11 matches when I downloaded the data this morning.

MatchNumber.png
This shows the scores achieved versus match number. So if a score of 40 was achieved in match #38, it would be plotted at (38,40). Grouped in segments of 10

prevnext.png
This was the most interesting one for me. I wanted to see how strongly past performance was a predictor of future performance. Along the x-axis, we see a team’s performance in a past match. Along the y-axis, we see the team’s performance in the following match. So if a team had scores (in order) of 21,35,70, and 30, they would have points at (21,35), (35,70), (70,30). From this graph, for example, we can see that for all robots that scored 70 on a given match, then on the following match, most robots (between 25th and 75th percentile) scored between 20 and 40. For more details, see boxtutorial.png

How to read a boxplot
Each box indicates the range from the 25th to 75th percentile. The lines above and below the box indicate the range from the 5th percentile to the 95th (it could be 10th to 90th, I’m not sure). The dots outside the boxes and lines indicate outliers that are above the 95th percentile or below the 5th. The thick line in the middle of the box indicates the median. See boxtutorial.png. Boxplots are better at showing trends when you have a whole lot of data because they show you how ranges of data did, rather than just making you eyeball a scatterplot and say ‘yarrr… that looks like a trend’. Think of them as a scatterplot, but with a bit of help to help you read it.













More stats: This time about some teams. What has always interested me is what effect money has on teams, and how many teams attend multiple regionals.

Results:

of active teams: 1107

of teams attending a single regional: 806 (72.8%)

of teams attending two regionals: 278 (25.1%)

of teams attending three regionals: 23 (2.07%)

This was pretty surprising to me. Having always attended the greater toronto regional, it always seemed that every team but my own had already been to 2-3 other regionals that year. However, it seems that the VAST majority of teams only attend one regional per year.

Money has been another interest of mine. What effect does having a well-funded team have on general robot quality? I decided I could measure robot quality by looking at the average scores that a team got at their first regional. This would eliminate experience as a confounding factor, and would show nothing but what kind of quality the team was able to create during the build period. Would a richer team generally do better at their first regional than a poorer team? I could get an idea of how rich a team was by the # of regionals they were attending, since each regional represents quite a sum of money to raise (not only for entry, but for travel and accomodation as well). Combining these two measures, we get this:

Average score for all single-regional robots at their first regional: 27.3
Average score for all double-regional robots at their first regional: 30.8
Average score for all triple-regional robots at their first regional: 34.0

Win rate for all single-regional robots in their first regional: 45.5%
Win rate for all double-regional robots in their first regional: 53.9%
Win rate for all triple-regional robots in their first regional: 55.1%

So clearly the amount of regionals a team attends is correlated to their competitiveness. However, there are other things that cannot be ruled out. Teams that have enough money to compete at many regionals may have more members or more team spirit, which would explain why they have more money: a greater abililty to fundraise. It may not be the money that explains their greater ability, but rather their work ethic and sheer numbers that allows them to both raise money and build a great robot.

In short:
Option A:
-Money causes great robots

Option B:
-Team spirit and work ethic causes money
-Team spirit and work ethic causes great robots
-Money and great robots are unrelated

It’s impossible to decide between option A and option B, because we don’t know if team spirit and work ethic is also related to how many regionals a team attends. This is known as confounding. X may be related to Y, but there may also be a third factor Z which is not measurable, but is also related to both X and Y.

Here’s an example of confounding: suicide rates (X) are higher near airports (Y). One might think that airports (X) cause suicides (Y). However, land near airports (Y) is worth less, and therefore people near airports will have less money (Z). Having less money (Z) is also related to suicide rates (X).

Here’s updated versions of graphs I posted for week 1, except since I have much more data and a bit more experience with my graphing program, they’re a bit easier to read.

teamNumVersusScore.png
This shows the alliance AVERAGE team number versus the score that that alliance achieved. I’m using team number to represent age here. An alliance with three low-numbered robots has combined decades of experience to draw on, whereas one with 3 robots in the 1900s is probably 3 nervous rookies. And it shows: there is a correlation between average alliance number and score in a match. The median for an alliance of 200s is well above the 75th percentile of an alliance of 1900s.

score.png
This is just a histogram showing the frequency of scores. As you can see, the great majority of scores happened pretty close to 30, and taper off as you head into the hundred region.







Time to give Bongle great reputation…

Only read your first post and am already hooked, keep up the great stat-talk

Fantastic, thanks for compiling this!

Statistics are funny things…

During the Great Renumbering of ???, how high did the numbers go? I know some of us relative newcomers think team 50 must be older than team 400, but we know that this is not true since the numbers were assigned alphabetically during the Great Renumbering. Can an oldtimer fill us on on the year of the GR, and how many teams were numbered that year?

If the GR went up to 700 or so (which IIRC is pretty close), I can’t come up with any hypothesis as to why team number and scoring are related. I can think of some other measures that might make sense (team budget vs scoring, regional roots vs scoring, previous success vs scoring), but not team numbers within the range of “original numbers.” It’s interesting that your data seem to show a positive slope on the team number vs scoring curve, but I suspect it is either the law of small numbers (the sample is not statistically significant), or it is a meaningless coincidence.

I would also buy the hypothesis that Anciente Teames (pre-GR) can be treated as a single statistical population, and each year cohort since then could be treated as a population. It would be interesting to test this by grouping all the ATs together.

The last comment I wanted to make is that I find it encouraging that you can demonstrate my intuition that rookie teams are outperformed by veterans, and shows that veteran teams might spend more effort in mentoring rookies.

Thanks for taking the time to assemble the numbers. Good work. (I learned more valuable life-math in Statistics than I did in my other math classes.)

Rick, teams kept the numbers assigned in 1998, and the highest number then was 191 (X-cats)ish. Before then it was done by alphabet, which explains why GOMPEI (WPI) and the X-cats (Xerox), highly profilic teams who’ve been here since the start, have such high numbers.

I personally think that money is the big dividing factor. However, within the money category, an interesting statisitic, abeit probably harder to gather, would be the source of the big money. How much comes from one “deep pockets” sponsor? How much additional support; ie machining, paid support, etc; does the sponsor supply? How do the NASA “house” teams fare compared to other teams?

The very high and very low numbers on that will not be very significant, since it is rare to randomly choose 3 teams that all have very low and very high numbers. Once you get to the 300s from the low side and 1700s from the high side though, you’re talking about more than 20 matches and the trend is probably valid. Keep in mind it is average: any team with an average # of 300 is still composed of three teams in the 0-750 range, which still means that it has experienced teams on it. If you plot simply team # versus scores, the trend is much less evident. See the statistics, week 1 thread for that.

I personally think that money is the big dividing factor. However, within the money category, an interesting statisitic, abeit probably harder to gather, would be the source of the big money. How much comes from one “deep pockets” sponsor? How much additional support; ie machining, paid support, etc; does the sponsor supply? How do the NASA “house” teams fare compared to other teams?

One way I thought of measuring sponsorship was to graph length of team name versus scores achieved, but it probably wouldn’t show much. A team that’s like “noname store 1 and noname store 2 and town of somewhere and a high school with a really long name” probably wouldn’t fare better than “Google and Short HS”. Likewise, a team with many large sponsors would have a longer name than a team that just scrapes by with a single short sponsor. I think the # of regionals a team attends is a pretty good proxy for how much money they have.

As for age stuff, I just grabbed it now off of First Wiki. I had to make some guessing for newer teams, but this should be mostly accurate. See attachments for ‘age versus score’. Like the average team # chart, there are very few samples for the extremely young alliances and very old alliances.





We have a ton of sponsors… no one sponsor other than John Deere gives us great amounts of money. Most of it is small donations here and there.

The “deep pocket” teams aren’t always the ones that fare the best. Many teams with lots of money still don’t have the engineering resources and such to build a well machined robot. Student commitment has a lot to do with it also. I hate to say it but I think you’re wrong with the dividing factor being money. Experience is a precious commodity that I believe is the dividing factor.

And what percentage of your funds (and what dollar amount are we talking about) come from John Deere? What other resources to they supply you with? John Deere has to be a pretty major mechanical engineering company, not to mention in the 21st century a controls company.

So, if I read this right, the highest number assigned at the Great Numbering was 191? That’s lower than I thought. Thanks for filling in my speculation with actual facts.

Thanks for the great stats Bongle! They show some very interesting trends which also may be useful to teams when determining a list of possible alliance partners; for example if you could choose two teams of fairly even stats as far as shooting and such, you may be inclined to pick one team over another based on team number or how many regionals they have attended since these both seem to be general factors in performance. Also, if you can keep your average team number for your alliance lower, then you may be able to do better, although all of this is theoretical and obviously varies based on individual teams selected. The trends are still interesting to look at and could help break some ties when choosing an alliance or guessing at a match outcome.

Thanks again and keep up the great work!
-Figment

I’d be careful reading into them that much. As properties of robots go, team-number should be somewhere around the last thing you consider. Remember, correlation is not causation. High team numbers don’t cause teams to perform worse than average, high team numbers mean that the teams are relatively newer. They’ll (in general) have fewer sponsors, less students, and less experience. There will be MANY exceptions to that rule, just as there will be MANY exceptions to the idea that lower-numbered teams will perform better. What these graphs are are just interesting trends that might be useful in predicting the winner of a match at a slightly better rate than pure chance. If you look at the raw robot number versus score data (i.e. not an average of the alliance’s team number), then the trend is almost nonexistant.

Really, the only way that it’s a trend is in aggregate. On AVERAGE, lower-numbered alliances do better than high-numbered alliances. If you’re looking at a team that has been doing well all regional, for the love of god don’t turn them down because they may have a higher number. If that doesn’t explain it, it’s because I’m poor with words and because I’m distracted by an idea I had regarding this whole thing.

Wow, this one actually really surprised me. This is win rate versus alliance differential. The alliance differential is the difference between the average team number of the two competing alliances in a match, rounded to the nearest 100. If you have a difference as little as 700, the win rate drops to an astoundingly bad 20% for the higher-numbered (read: less experienced) team. This is good information to know if you’re a bookie.

Keep in mind that this graph has a cut-off bottom. The bottom is 20%, not zero.

I’m really looking forward to making more of these graphs as more regionals pass. One thing I really want to check is win rate if you have 2 or 3 more robots that have attended a previous regional on your team than your opponents. I already know from this week that if you have 1 robot on your team that has been to a previous regional, the win rate is 57%.