Some statistics on year-to-year consistency, 2005-2007

I made a bunch of these last year comparing 2005 and 2006 performance among teams. Basically, I made some scatter plots that showed that performance in a given year basically absolutely doesn’t predict performance in the next. Top seeds in 2005 were ending up anywhere from top seed again to nearly dead last in 2006.

With the 2007 regionals done, I again collected all the data. Since 2005 and 2007 are purported to be similar, it would stand to reason that well-performing teams in 2005 would again do well in 2007. Does the data support this? Not really, though someone in stats class could probably do some tests to see if there is any statistically significant trends going on.

How to read the charts:
A team’s performance in a year is given a value from 0 to 1, indicating the average percentage they placed in whatever regionals they went to that year. So a team that came 10th out of 60 and 20th out of 40 would be plotted along some axis as a 0.33 ((0.167 + 0.5)/2 = 0.33). Along the x axis is their performance in the earlier year, and along the y axis is their performance in the later year.

Areas:
Top left: teams that did well in the earlier year, poorly in the later year
Top right: Teams that did poorly both years
Bottom left: teams that did well both years
Bottom right: teams that did poorly in the earlier year, then got better in the later year.

Things you can kind of draw from these graphs:
-Teams that got top seed in a given year tend to not turn into bottom-performers the year after
-Likewise, teams that got near bottom seed in a given year tend to not turn into top seeds the year after
-There seems to be LESS correlation between 2005 and 2007 performance (both arm years) than between 2005/2006 or 2006/2007. I explain this by saying that since students graduate, there is much possibility for team quality varying as time passes. Even though you have the constructed 2005 arm, if everyone that knows about construction, maintenance, and operation of it has graduated, then it isn’t going to help you.
-So long as you’re not bottom seed or top seed, it is difficult to predict much about your performance the following year
-I plotted average performance for the 3 years versus team number. There is a VERY slight correlation between team number and performance.

Graphs:
20052006.PNG - 2005 performance along x axis, 2006 performance along y axis
20062007.PNG - 2006 performance along x axis, 2007 performance along y axis
20052007.PNG - 2005 performance along x axis, 2007 performance along y axis
teamPerformance.PNG - Team number versus averaged performance in years 2005-2007. So if a team placed 20% in 2005, 15% in 2006, and 10% in 2007, then they would be plotted at position (teamNumber,0.15). Remember, lower is better!

That’s enough explaining of graphs. I also looked at some other stuff:
Of the 418 teams for which I have data (many 2005 regionals are missing standings so I don’t have the whole set), only EIGHT had an average top 20% placing in each of the three years. Those teams are 365, 118, 25, 254, 703, 126, 987, and 111. If you restrict it to top 10% placing in each year, then only 365 gets that honour.

20052006.PNG
20062007.PNG
20052007.png
teamPerformance.png


20052006.PNG
20062007.PNG
20052007.png
teamPerformance.png

I see we have a statistician amongst us! :slight_smile:

Yes there’s just too much variability year to year which is probably the goal of FIRST to keep everyone guessing or else it would be a free for all and there would be no challenges.

As for the teams that are statistically successful - they have the technology, experience, resources and definitely luck on their sides.

Wow, that is a heck of a lack of correlation. Yay FIRST!

I’d be interested to see any more data you can pull like this.

Do you have the raw numbers available? I’d like to grease up my stat gears.

Wierd, I thought that I attached that. It should be attached to this post.

You may need to turn down your macro security in order to open it.

teamPerform.rar (63.6 KB)


teamPerform.rar (63.6 KB)

could you tell me where 1126 was in these rankings b/c i cant seem to open up ur files. thanks

Too bad this doesn’t take into account awards won, i think the graphs would look much different.

It looks like I don’t have data for all 3 years for you. In the 2005-2007 graph, you would be basically at the bottom left corner, since you did very well in both years, near as I can tell. You need winRAR to open the attached excel file.

Too bad this doesn’t take into account awards won, i think the graphs would look much different.

Problem is that awards are sparse enough that I don’t think it would show any patterns. You’d have teams like 1114 that win huge amounts of awards every year, and then you’d have teams that maybe won a single award in 3 years, and then everybody else. You can’t really go from winning 0.75 awards to winning 0.6 awards to 0.5 awards to make a nice trend. Most teams will go 1-0-0, 0-1-0, etc and it’ll make an ugly graph.

It might be kinda cool just to see who has won the MOST awards in 3 years, but it would take quite awhile to gather the data, and I feel lazier this morning than I did last night.

I have a feeling that 111 would be up there with the most awards won in that 3 year period.

Alright, I’m going to do awards, I thought of some interesting plots:
-Average performance by the set of teams that win each kind of award. Do ‘Delphi Driving Tomorrow’s Technology’ winners generally do better in competition than winners of less robotics-related awards such as the spirit award?

-Do winners of regional chairman’s awards tend to maintain a high level of performance afterwards?

I’ll probably think of more as I strip the data out.

Edit: I screwed something up, standby for update

only EIGHT had an average top 20% placing in each of the three years. Those teams are 365, 118, 25, 254, 703, 126, 987, and 111. If you restrict it to top 10% placing in each year, then only 365 gets that honour.

Let’s see who’s got a tough divison
Curie - 118 , 126, 365
Galileo - 25, 703
Newton - 111, 987
Archimedes - 254

Let’s see who’s got a tough divison
Curie - 118 , 126, 365
Galileo - 25, 703
Newton - 111, 987
Archimedes - 254

Well, it doesn’t mean they are any better or worse in a given year, it just means that they are astonishingly consistent. Said another way, you could say “in no year since 2005 has 365 averaged less than a top 10% finish at their regionals”

Ok, here is the awards results, attempt two. This one make substantially more sense than my last attempt, which was screwed up by excel not moving all the award labels when re-sorting, and me having left some leftover data on the sheet when I re-ran the algorithm.

The y axis is team performance, lower is better.





What’s regional winner #4 in that graph? It looks like it’s ranked dead last.

I’m somewhat surprised that Regional Winner #4 is ranked last, unless one of my assumptions is wrong.

For a team to be #4 on the alliance, they would have to be a replacement team, the highest seed to not be picked. The lowest this team could be would be 25th, if the top 8 seeds all choose within the top 24. (Typically it seems to be more in the range of 16 or even lower). If they have a performance of .7, then 70% of teams perform better then them. 25/.7=35 for a typical regional where a replacement happens. Since almost all regionals are larger then 35 and replacement teams are almost always ranked better then 25th, that number doesn’t seem right.

No surprise that the industrial design award winners usually do well…

I was thinking more along the lines of awards won per team per year, or something like the first scatter plot. If those are individual teams, then it’d be neat to see benchmark teams (25, 71, 111, 233 come to mind).

What’s regional winner #4 in that graph? It looks like it’s ranked dead last.

The team in question (there was only one #4 regional winner) was team 1216, who came 20/46 (0.43) at one regional (presumably the one they went #4 regional winner at) and 34/34 at another (1.0). So their math ends up being an average placing of 0.71, which is why the regional winner #4 is so low. Since there is only one sample point, it is artifically low and I probably shouldn’t have included it in the list.

All the other awards except for regional finalist #4 have at least 24 data points, and in the case of the judges award, as many as 52.

Being that I’m bored now, I’m going to update it to include 2005 and 2006, then I’ll see about doing some year-to-year relationships.

Oops, I forgot that a team’s rating came from multiple regionals.

Midwest and Peachtree had a 4th champion as well (1850 and 1848)

My team-data algorithm ignores teams after 1705 for now, because it was initially made just to do year-to-year comparisons, and there were no teams after 1705 in 2005. So anything involving team seeding performance only uses early teams.

Anyway, I did the award-count ranking, and the result kinda surprised me. 1305 is the king of the awards from 2005-2007, with 15.

Anyway, here’s the list of the top 10 award winners (THAT I HAVE DATA FOR, the archived 2005 data from FIRST is spotty*) from 2005-2007:


1305	15
48	13
111	13
375	13
494	13
1114	13
118	12
188	12
71	11
103	11

1305’s achievements are as follows:
2 x 2005 Regional Winner #2
2 x 2005 Regional Engineering Inspiration Award
2 x 2005 Motorola Quality Award
1 x 2005 Underwriters Laboratory Industrial Safety Award
2 x 2006 Kleiner Perkins Caufield & Byers Entrepreneurship Award
1 x 2006 Regional Finalist #1
1 x 2006 Regional Engineering Inspiration Award
1 x 2007 Website award
1 x 2007 Regional Finalist #1
1 x 2007 Regional Finalist #2
1 x 2007 Regional Chairman’s Award

*An example of the spottiness: I have 961 awards for 2006, 1082 for 2007, and just 392 for 2005. I should probably ignore 2005 because it completely ignores most teams that played in the first 2 weeks of march, where almost 100% of those regionals are missing award lists. If I ignore 2005, then the top awards-given list looks like:

111	11
375	11
494	11
234	10
1114	10
103	9
114	9
188	9
469	9
1714	9

1305 drops to a still-impressive 21st since they lose their very plentiful 2005 season.

Here are 111’s accomplishments in 2006 and 2007:
2 x 2006 Innovation in Control Award
1 x 2006 Regional Winner #1
1 x 2006 Regional Winner #2
1 x 2006 Regional Chairman’s Award
2 x 2007 General Motors Industrial Design Award
1 x 2007 Motorola Quality Award
2 x 2007 Regional Finalist #2
1 x 2007 Regional Winner #1

Also attached is the updated awards-vs-performance rankings now using 2005 and 2006 data. I like how regional winners #1, #2, and #3 just edge out regional finalists #1, #2, and #3 in seeding on average. When you consider that most of the data is from 2006 and 2007 where the serpentine draft is in use and assume that the higher-seeded alliance tends to win (hmmm… this gives me an idea), you would think that the #3 pick for the winner would tend to be a lower-seeded team than the #3 pick for the finalists.





If you are really bored and want to go completely crazy with this, most regional’s award pages will go all the way back to 2003 (if they were around back then) just by changing the 2007 to 2003 (like this for Philly http://www2.usfirst.org/2003comp/events/PA/awards.htm)

It stops there right? Wrong, because we have the old FIRSTStar system. If you wipe all the dust off, it will give you an awards history for every team from 1998 (first year team # set) - 2002 all in one Excel page:eek: . You might have to ignore some of the awards that aren’t given anymore like #1 seed and Outstanding Defense. You also might have to take into account the fact that there were only 199 teams back then. Overall, there is 10 years of awards data available if you want it :slight_smile: