New Location for Historical Scouting Databases

I am honored to announce that, with permission from teams 1114 and 2834, these teams’ historical scouting databases have a new home on my GitHub. You can find them in the “archive” folder of the “SykesScoutingDatabase”. Direct link here. Below are brief descriptions of each database and why I am doing this.

1114 Scouting Database:
This resource was originally published here on CD from 2005-2015. Typically once a year around the time of championships. These databases contain Win/Loss/Ties, seeding info, alliance selection results, playoff results, and award data for every team.
1114 was one of the first teams in the world to analyze team strengths using a least-squares linear regression on match scores, which has since become more commonly known as OPR, but that 1114 called “calculated contributions” (a better name imo and one I try to use). They calculated these metrics internally a few years prior to 2008, but starting that year they began releasing them publicly in their database.
From 2012-2014 this database also used the FIRST Twitter data to determine calculated contributions on match subscores (component OPRs). This allowed for an even better understanding of teams’ abilities than before when using raw scores alone.

2834 Scouting Database:
This resource was originally published here on CD from 2007-2018. In some years it was published weekly, and in some years it was only published around the time of championships. It has a similar look and feel to the 1114 scouting database because that source was the original inspiration for the creator Ed Law, although the internals of this database are completely different than those of the 1114 database. It contains many similar datasets to the 1114 database, including records, alliance selection data, playoff results, and awards.
In 2007 it was one of the first publicly available tools to show calculated contributions (OPRs). It was also the originator of “Calculated Contribution to the Winning Margin” or CCWM. In later years, they also published EPR (“Ether Power Rating” or “Equidistant Power Rating” depending on your preference), which is kind of a hybrid of OPR and CCWM.
Also included with this database are a series of presentations given by 2834 over the years describing the database and the math underlying it.
In 2012-2014 this database also used the FIRST Twitter data to determine calculated contributions on match subscores (component OPRs). They continued calculating these in 2015 and beyond using the new API provided by FIRST.

There are two main reasons why I am adding these to my GitHub:
First, I consider my own scouting database to be a spiritual successor to these databases, so it makes sense to group them together. I want people who are interested in my statistics to easily find the sources that inspired my work, and have as good of an understanding as possible of the history of statistics in FRC.
The second is that, as time progresses, I get concerned that great resources like these will either become lost to people’s memories, or become inaccessible because of dead links. As an example, the 2004 version of the 1114 scouting database seems to be unrecoverable, and the 2005-2006 versions would have been lost if not for some help from Karthik. I don’t want things like this to happen to any more databases if possible.

Big thanks to Karthik and Ed Law for permitting me to do this effort.

These databases have had a stark influence on my work up to the present, and I hope by making them more accessible that they may more easily inspire others in the future.

20 Likes

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed.