paper: A Statistical Analysis of the Success of New England FRC Robotics Teams

Thread created automatically to discuss a document in CD-Media.

A Statistical Analysis of the Success of New England FRC Robotics Teams
by: Maxwellfire

This is the paper that I wrote for my AP statistics class on the success or FRC teams as determined by surveyable metrics.

This project was done based on data I collected on New England teams in a voluntary survey. The OPR data came from team 1114’s amazing scouting database as well as another collection of OPRs over the past couple of years. Thanks so much to those guys for compiling and creating those awesome resources.

For this project, I wish that I had had the time to analyze the data a bit deeper. Team 1114’s database includes a number of metrics that would have been really cool to compare with my survey results.

The graphs were created with Tableau, which is free for FRC teams and high school students.

Outliers have been hidden in a number of the graphs.

If anyone wants to use the data or graphs or anything for any reason, feel free to do so.

~Max Tepermeister

EDIT: Added spelling and grammar corrected documents

Graphs and Packaged Data.zip (5.28 MB)
Full Writeup.docx (287 KB)

This is my final project for my AP statistics class this year. It involves an analysis of OPR and survey data on New England teams. Thanks so much to Steve Cremer for helping me distribute the survey to the teams and to team 1114 for their awesome scouting and OPR database

I’m going to be a jerk here because I tried to read this and had to stop 5 paragraphs in because I was sick of seeing horrific misspellings. Please tell me you didn’t actually turn it in this way? It could have been interesting but the lack of attention to detail (by which I mean running spell check) undermines any points you make with your analysis.

I have to agree. Most/all of them could have been picked up with spell check, and they render the otherwise interesting paper difficult to read at best.

I did, however, read the whole thing, and I like the analysis you tried to do, although removing a few numbers to make the data look good is a little sketchy. When asking how many mentors each team had, you should have specified exactly what “mentor” meant, instead of removing a few cases that seemed to be outliers.

I’m actually quite surprised that according to your data, budget had no correlation with OPR, considering team budgets tend to increase as teams need to advance to higher levels of competition, therefore a team that was good enough to qualify for championship would have a larger budget.

I went back and looked through the document and was really surprised at my spellchecker. It caught 0 of the misspellings. That’s actually really concerning…

I always go and read my whole paper before I turn it in. For this one, I read the whole thing out loud a number of times before turning it in. Reading through now word by word I am finding the misspellings, but I didn’t catch any of them before. Thanks for pointing it out, and I’m really sorry that it made the paper difficult to read.

For the data points that I removed, all of them were statistical outliers, not just random data points that didn’t fit the trend. I actually had a long conversation with my teacher about whether or not I should remove them.

The fact that team budget didn’t appear correlated with OPR really surprised me as well.

In general I was hoping to work a lot more with the data, but I was limited by a time constraint on the project. Now that it’s the summer I may do some more work with it.

In a few minutes I will replace the error filled copy with one that contains none (I hope).

EDIT: I found that the document language had been set to none and so spell check wasn’t working…

I think your goal and process here are quite interesting… I think that it’s strength of conclusions are probably being limited by the data (self-reported data and a lack of data that describes a ‘successful’ team).

Regarding using OPR… first, the use of OPR to define success is mediocre, because OPR itself is somewhat flawed (particularly in some games like 2014 where alliance score is highly dependent on partners) and because not all teams have the goal of best on-field success. You acknowledge this, and I do recognize that there aren’t many FRC-stats. Perhaps the District points system would be a better point source though? It looks at a more all-encompassing and is actually quite meaningful over the course of a season.

Additionally, you mentioned looking at the OPR from teams from 2008 to 2013… but this data spans a tremendous length of time and a teams success in 2008 or 2009 probably has incredibly little bearing on their success in 2014 or 2015. Teams wax and wane; the best FRC teams now weren’t the best teams then… even in NE which has many old teams. Definitely use more recent data, but do use multiple years (2 or 3, probably; although keep in mind the longer the span the further from the current survey snapshot you are), as some teams do just have ‘off years’ or get ‘lucky’ with a particular robot’s success.

The self-reported stats are hard to really rely on… it means someone filling in data based off what they think you want (which may not be what you want), and that individual’s knowledge of the information may be inaccurate (even if they’re someone who should know). Just to prove a point though, look at the ‘team budget’ data… why do half the teams report a budget of less than 5k, when the minimum registration fee is $5k. Are these teams including only direct robot parts? Are some teams excluding registration fees? How about team apparel and team travel? What costs from team travel are included? How much of these items are self-funded my team members (and so, excluded from the budget)? I’m guessing there is an overwhelming amount of variation in these team budgets largely just from what is being included…

Team mentors, parents, and even student numbers can be difficult too. Some team rosters are huge, but are bloated by people that barely participate/contribute, while other fairly large team rosters may still have moderately high contribution from each member.

Anyway, I would strongly recommend implementing this using the NEFIRST district points from 2014 and 2015… I’m not sure what can be done about the survey data.

The question that was asked about the budget was very specific. It asked about team budgets for the construction of prototypes and the final robot only. It was not about any other team expenses. I should have made that clearer in my explanation. I wanted to see if how much a team allotted to build the robot affected it’s success, not their overall budget.

District points would probably have been a much better measure to use. I was originally going to survey all of the teams that went to worlds in person, but that wasn’t practical so I switched to just New England. For the world, district points wouldn’t have worked.

Also, success for the sake of this paper was based pretty exclusively on robot performance and not other successes of the teams. District points include awards that aren’t robot related. It would be interesting to see what results I get using them anyway though. If I have some free time, I will try that and share the results.

What I was doing with the OPR data that spanned that time was trying to see if the correlation between team age and OPR held for many games and not just the 2015 one. All of the graphs that compared survey data to OPR were only based on the 2015 OPR data.

As a side note: since I had data on every award that every team in the world has won, it would have been really cool to try and correlate which awards teams have won with other data about them.

Thanks,
~Max

Regarding team budget - I’d actually make the claim that you could use “Number of Paid Entry Fees” as an approximate analog for budget. A team that can pay for 2 regionals tends to have access to more cash than a team that can’t. It’s not perfect, but it does get around the harder question of if team budget includes student/mentor travel costs.

This caught me too. It gives the impression that the “real” budget of these teams is being hidden and, as a result, I can’t really trust any of your conclusions. It would be helpful to detail your data collection methods in this regard (though I’m not sure if that was in the scope of the paper). Like, did you ask how much budget a team had to spend on their robot? Or how much money was in their account after registrations?
For instance, when my team started we had a budget of $7,500. As in, there was $7,500 in our team account. Where would we fall in your graph? Would a team that goes to two regionals ‘lose’ an additional $4,000 due to the way you asked for the data?

Like, did you ask how much budget a team had to spend on their robot? Or how much money was in their account after registrations?

I believe I responded to this question above Andrew Schreiber’s post. I’m not sure if you noticed it.

The question that was asked about the budget was very specific. It asked about team budgets for the construction of prototypes and the final robot only. It was not about any other team expenses. I should have made that clearer in my explanation. I wanted to see if how much a team allotted to build the robot affected it’s success, not their overall budget.

I did not care about each teams “real” budget for everything. I wanted information on their budget for the robot only. The actual question asked was as follows:

What is your team’s budget for the construction and development of the Robot?
This includes everything on the final robot as well as any prototypes or machining or second robots etc.
Thanks!
~Max