Log in

View Full Version : [C^3] Predicting Offseason Performance


Rachel Lim
20-09-2016, 00:14
Predicting Offseason Performance

The first regionals are still 163 days away, but Chezy Champs is coming up this weekend, marking the first of California's offseasons. Offseasons provide the opportunity for a large amount of prescouting, since each team has an entire season's worth of data behind them. However, the question arises: how good is competition season data in terms of predicting offseason performance?

To illustrate the trend, I graphed competition season OPRs against Chezy Champ OPRs from 2014 and 2015. The data includes all teams, including non-CA teams. OPR data includes division data in all calculations, but it does not include Einstein.

Without further delay, here are the graphs:

2014: max vs CC, avg vs CC
http://i.imgur.com/rRjHSlV.png

2014: min vs CC, last vs CC
http://i.imgur.com/2HlsbME.png

2015: max vs CC, avg vs CC
http://i.imgur.com/wxiSiss.png

2015: min vs CC, last vs CC
http://i.imgur.com/gGh5ggi.png


The trends are interesting: 2014 teams almost always underperformed their season expectations, while in 2015 it was clearly split between teams who qualified for champs and those who didn't. In both years teams tended to go over their min OPR but under their max. Average and last OPRs were fairly decent indicators, especially in 2015 for teams that qualified for champs. In general, teams that qualified for champs seemed to have offseason performances in-line with season ones, while those who didn't tended to score under expectations.


A quick explanation on my naming:
- All calculations include regionals, district events, DCMPs, and division data
- Max, min, avg, and last OPRs are the highest/lowest/average/last event OPRs from that team for all events (excluding Einstein)
- cmp includes only data from teams that attended champs
- no_cmp includes only data from teams that didn't attend champs

It is also probably worth saying that the trendlines can be misleading, especially for fewer / more heavily grouped data points (e.g. 2015 no_cmp data). For those data sets--and perhaps even everything else, counting the number of teams above the 1:1 line (i.e. the teams that outperformed their season data) vs those below it might be more accurate.

The variation in teams that attended champs surprised me, so I colored them by qualification type.

However, I hit my image limit here, so I've included them in the post below.

The categories used are as listed below:
- Captain/1st pick: team won (or received a wildcard as the finalist alliance) as the captain/1st pick of the alliance. DCMP winners were also put here even though they also qualified via points
- 2nd pick: same as above, but with 2nd picks
- Awards: EI, RCA, RAS
- Waitlist: qualified via the waitlist (or I didn't figure out how else they qualified)
- Teams that qualified via multiple means were colored according to the method highest in this list. Pre-qualified teams were not colored differently since they all qualified again through one of these methods.


Raw data: 21053

Rachel Lim
20-09-2016, 00:16
Here are the graphs I wasn't able to fit into the previous post:

http://i.imgur.com/JQTOGpD.png

http://i.imgur.com/Z5QEhHH.png

SoftwareBug2.0
20-09-2016, 02:10
Have you considered statistical tests for goodness of fit? They would allow you to compare the results in an objective way.

Francis-134
20-09-2016, 11:43
This is very interesting! Do you think the rule changes in 2015 caused the increase in scores compared to on-season performances? Or perhaps the rule changes benefited the better teams / teams that could handle cans the best more than the average.

Rachel Lim
21-09-2016, 17:38
Have you considered statistical tests for goodness of fit? They would allow you to compare the results in an objective way.

I couldn't find an easy way to do that in excel, but I'll try to do that next time. The data is noisy, but it'd still be nice to have that to compare.

This is very interesting! Do you think the rule changes in 2015 caused the increase in scores compared to on-season performances? Or perhaps the rule changes benefited the better teams / teams that could handle cans the best more than the average.

Thanks!

I totally blanked on that, but rule changes could definitely have affected the scores. I'm not sure if there's a way to really analyze those effects, but I would guess that you're right about rule changes benefiting teams who were previously hitting the limit with the number of recycling containers (i.e. the better teams to begin with)

SoftwareBug2.0
21-09-2016, 18:33
I couldn't find an easy way to do that in excel, but I'll try to do that next time. The data is noisy, but it'd still be nice to have that to compare.

If you look on page 3 of this pdf:

http://dataprivacylab.org/courses/popd/lab2/ExcelLine.pdf

There's a picture that shows a checked box for "Display Equation on chart". Click the box below it labeled "Display R-squared value on chart".