paper: 4536 scouting database BETA

Thread created automatically to discuss a document in CD-Media.

4536 scouting database BETA
by: Caleb Sykes

This is a database which contains component calculated contributions for every category FIRST’s API provides, and then some.

This is a scouting database which calculates component calculated contributions (OPRs) using the data from the FIRST API. As this project is still in its infancy. Please report any bugs or potential improvements to Caleb Sykes (calebsyk@gmail.com). Every week, a new database will be published which will contain data from all events up to that date.

Be extremely careful when using the individual defense crossings (columns J-Q on each sheet). At a given event, if a defense is chosen fewer times than there are teams at the event, a #NUM! error will appear. If a defense is chosen less than twice as many times as there are teams at the event, place limited faith in the numbers.

See the “instructions” sheet for more detailed information on what each category represents.

4536 scouting database BETA.xlsx (710 KB)
4536 scouting database 1.3.xlsx (4.54 MB)
4536 scouting database 1.4.xlsx (4.15 MB)
4536 scouting database 1.5.xlsx (5.28 MB)
4536 scouting database 1.6.xlsx (6.16 MB)
4536 scouting database 1.7.xlsx (6.84 MB)
4536 scouting database 1.7.2.xlsx (7.29 MB)
4536 scouting database 1.8.xlsx (7.29 MB)

This is a beta test of a scouting database which calculates component calculated contributions (OPRs) using the data from the FIRST API. As this project is still in its infancy. Please report any bugs or potential improvements to Caleb Sykes (calebsyk@gmail.com). Each sheet currently contains data from a distinct week 1 event. Starting weekly on 3/21, a new database will be published which will contain data from all events up to that date.

Be extremely careful when using the individual defense crossings (columns J-Q on each sheet). At a given event, if a defense is chosen fewer times than there are teams at the event, a #NUM! error will appear. If a defense is chosen less than twice as many times as there are teams at the event, place limited faith in the numbers.

See the “instructions” sheet for more detailed information on what each category represents.

I have just uploaded version 1 of this database. It is populated with all week 1-3 events.

In addition to adding the additional data, the two main changes since the BETA include:
An alternative calculation of eOPR (elimination OPR) is now included for each team. So there are now two eOPR calculations, which I have dubbed eOPR1 and eOPR2. Details on how these are calculated can be found in the “instructions” sheet. Although I have not verified this, I expect eOPR1 to provide better elimination predictions at weaker events where captures are more infrequent, and eOPR2 to provide better elimination predictions at stronger events where captures are more frequent.

A new “world results” sheet has been added, which allows for component comparisons for every team at every event in which they have competed. Be aware that this list will have duplicates for teams that competed at 2+ events. Also, **don’t compare individual defense crossing data unless you know what you are doing. ** For example, team 5114 has a drawbridge contribution of 1722968039259170.00. 5114 is not that good at crossing the drawbridge, this just means that the drawbridge was not chosen frequently enough at Midland for there to be meaningful results for drawbridge contributions. As a rule of thumb, you can almost always trust the rock wall, sally port, and cheval de frise contributions, but be wary of the others.

Remember, this project is still quite young, and there are very likely errors in places (especially since I have not yet automated everything, and have to do some copying by hand). If you see any errors, please let me know and I will look into it.

First: very cool spreadsheet! I’m glad to have a resource that looks at the component OPR for pretty much every possible condition! It has all the usual OPR caveats, but it does seem useful for establishing some trends and making some comparisons.

As a sidenote, thank you to FRC HQ for making this data more available for capture. The API certainly provides much better data than the twitter feed over recent years.

I have some questions about the “units” of some columns… I’m pretty sure they’re my initial guess for most of them, but I wanted to double-check.

For columns H and I (teleop Capture or Breach), I presume a “1” would indicate a successful Capture/Breach?

For columns J - V and AM (defense crossings), is “1” a single defense traversal (5 pts) or a weakened defense (10 pts, 2 traversals)?

Also, how are eOPR 1 and eOPR 2 calculated? What’s the difference? They differ dramatically from the OPRs based solely on match scores.

Thank you. I’m glad to see people are benefitting from it.

As a sidenote, thank you to FRC HQ for making this data more available for capture. The API certainly provides much better data than the twitter feed over recent years.

Agreed. Also thanks to Ether and team 2834, I have built off of both of their work, and this spreadsheet would not be possible without them.

I have some questions about the “units” of some columns… I’m pretty sure they’re my initial guess for most of them, but I wanted to double-check.

For columns H and I (teleop Capture or Breach), I presume a “1” would indicate a successful Capture/Breach?

For columns J - V and AM (defense crossings), is “1” a single defense traversal (5 pts) or a weakened defense (10 pts, 2 traversals)?

Also, how are eOPR 1 and eOPR 2 calculated? What’s the difference? They differ dramatically from the OPRs based solely on match scores.

The “instructions” tab has more detailed descriptions about each category, let me know if anything there is unclear and I can revise it. I’ll summarize here anyway though.

“teleop Tower Captured” and “teleop Defenses Breached” both have units of ranking points. A 1 in either of these would indicate that the given team contributes an average of 1 ranking point each match.

All categories that have “crossings” in their name have units of crossings, not weakenings. That is, a 2 in any of these categories would indicate that the given team contributed 2 scored CROSSINGS over this defense each match.

eOPR1 and eOPR2 are my rough attempts to compensate for different scoring methods in quals and elims. Since breaches and captures provide points in elims, but not in quals, “normal” OPR probably does a poor job predicting elimination match scores (although this is as of yet unverified). eOPR1 essentially makes boulders and crosses scored in quals worth more, and eOPR2 takes breaching/capturing contributions and assigns them point values, and then adds those to the “normal” OPR.

Week 4 data has been added.

Additionally, I deleted the unnecessary whitespace that was beneath most of the event sheets’ data. This will allow sorting to make much more sense and cause the scroll bar to be more appropriately sized.

Also, I hadn’t realized that excel saved the position of the last cell selected, which is why seemingly random positions on each page were previously selected upon entering them for the first time. I have now selected the top-left corner cell on each sheet.

As always, I appreciate feedback and/or error reports.

Week 5 data has been added.

I will include the data for the Western Canada regional in the week 6 update.

Thanks for producing this every week! It is very interesting how the results from this data aligns very closely with scoring averages by type in our scouting data (not a perfect match, but very close)–we’ll definitely be using it for Championships scouting.

I believe there is a good chance that I am currently calculating “tech foul count” and/or “tech fouls drawn” improperly. I will be investigating more tonight, but for the time being assume that these metrics are erroneous.

Caleb, thank you for pulling this together every week! Our team has been using it as a “pre-assessment” of teams before each event. We will for sure being using it for CMP!

Thanks again!

Thank you so much for putting this together. It’s been a great tool so far.

The results align reasonably well with our data as well. We got the most similar results when rounding all negative values up to 0 and rounding all positive values down to the nearest 0.1 or 0.2.

Thanks for the compliments, I’m glad to hear it is getting some good use.

Never mind, I think these are still being calculated properly.

The reason I thought these were wrong was primarily a result of me forgetting how exactly tech fouls are scored.

Caleb, this is great stuff, very helpful. The world results sheet makes it great to use for Worlds scouting.

I do have a question on how you’ve calculated these numbers, though. I’m assuming for a given event you’re taking averages, but how do we end up with negative numbers for things like teleop high boulder points, etc.? There are several fields with values like this that I don’t understand as the minimum value should really be zero.

Can you explain this? Thanks.

Also, I’m wondering if you’ll be producing a sheet that contains only the teams going to Worlds in St. Louis? That would be helpful for those who are going. Thanks for doing this work!

/mike

Thought of another couple:

Does this data just reflect qualification rounds or also playoff rounds? I assume the latter, but wanted to verify.

Also, how are you calculating total points? This seems to be a really low number…

/mike

These numbers are calculated using a least-squares approximation on qualification scores assuming that every team contributes the same amount to the selected category in every match. This value is each team’s calculated contribution (or OPR) in that category. The only inputs to the algorithm are the category scoring breakdown per match and the match schedule. For more detail on how OPR is calculated, see the first link on this page titled “Presentation to explain new scouting database.”

As to why negative values arise, there are two main reasons this could occur. First, recognize that these values represent a given team’s contribution to a given category, which is generally not equivalent to what we conventionally think of as scoring. For example, a team which never takes shots, but transports boulders into the courtyard, could have a positive value in “teleop Boulders High.” Although scouts would never say that they scored boulders high, if alliances which they are a part of tend to score more high goals, their “teleop Boulders High” value might be positive. In the same way, if a team plays the game in a way that hinders partners from scoring high boulders (by taking balls from them, taking their desired shooting position, running into them, etc…) then this team will have a justifiably lower score in “teleop Boulders High” than just the average number of boulders they themselves scored high.

The other reason a team could have a negative value in a category boils down to our assumption that every team contributes the same amount every match. This is very clearly false, but it is a reasonable enough approximation that we can still arrive at reasonably good results when making it. If team A never scores in the high goal, but happens to be on the same alliance as a very good shooter in the same match that the shooter breaks down, team A will likely receive a small negative value in “teleop Boulders High.”

Personally, when I interpret these values, I generally round all negative values up to 0, but YMMV.

Also, I’m wondering if you’ll be producing a sheet that contains only the teams going to Worlds in St. Louis? That would be helpful for those who are going.

Good idea. I will include a sheet like this in my next update.

I do only factor in qualification rounds. There are a number of reasons for this, many of which are described by Ed Law here. The reasons there are important, but the largest reason for me is that using qualification matches only has become the de facto standard on calculations like these, and it is important to me that my scores are equivalent to those listed on TBA, the 2834 database, and the 1114 database.

Also, how are you calculating total points? This seems to be a really low number…

Total points is actually equivalent to OPR. This number represents the calculated contribution to the match scores of a team throughout their qualification matches.

Remember that these numbers represent only the given teams’ contribution, not their average alliance’s score. Also, remember that playoff scores are calculated differently than qual scores. If you want to approximate a playoff alliance’s score, you will likely get better results using my eOPR1 or eOPR2 metrics.

Okay, knowing these are all calculated similarly to OPR makes sense and explains the numbers, probably including the next one I was going to ask about which was why challenge/scale likelihood was often >1.0.

Looking forward to the next update, thanks.

/mike

Week 7 data has been added.

Per request, I have also added a “championship preview” sheet which contains data on the best event (by OPR) of every team registered for championships as of 5PM CST on 4/18/2016. There is no new information on this sheet, all data are copied directly from the “world results” sheet. I am not planning to release updates if/when the championship team list changes, so you will have to update this sheet yourself.

If someone could check the data from the Michigan State Championship against scouting data to see that they roughly correlate, I would appreciate it. When I originally made this database, all of my calculations assumed that no event would have more than 100 teams or more than 200 matches. Thus, I had to modify a few things to accommodate MSC, which makes me nervous that I may have introduced one or more small errors somewhere.

Unless someone notices an error, I will not be releasing another update until after championships.

These values have been verified to be correct. Ether ran calculations independently that provided results which matched the results in this database.

Update on updates.
By request, I have decided to release an update on Friday night with division preview tabs. I might also do match/ranking predictions using components, but no guarantees.