Log in

View Full Version : paper: 4536 scouting database BETA


Caleb Sykes
18-03-2016, 12:41
Thread created automatically to discuss a document in CD-Media.

4536 scouting database BETA (http://www.chiefdelphi.com/media/papers/3248?) by Caleb Sykes

Caleb Sykes
18-03-2016, 12:42
This is a beta test of a scouting database which calculates component calculated contributions (OPRs) using the data from the FIRST API. As this project is still in its infancy. Please report any bugs or potential improvements to Caleb Sykes (calebsyk@gmail.com). Each sheet currently contains data from a distinct week 1 event. Starting weekly on 3/21, a new database will be published which will contain data from all events up to that date.

Be extremely careful when using the individual defense crossings (columns J-Q on each sheet). At a given event, if a defense is chosen fewer times than there are teams at the event, a #NUM! error will appear. If a defense is chosen less than twice as many times as there are teams at the event, place limited faith in the numbers.

See the "instructions" sheet for more detailed information on what each category represents.

Caleb Sykes
21-03-2016, 18:27
I have just uploaded version 1 of this database. It is populated with all week 1-3 events.

In addition to adding the additional data, the two main changes since the BETA include:
An alternative calculation of eOPR (elimination OPR) is now included for each team. So there are now two eOPR calculations, which I have dubbed eOPR1 and eOPR2. Details on how these are calculated can be found in the "instructions" sheet. Although I have not verified this, I expect eOPR1 to provide better elimination predictions at weaker events where captures are more infrequent, and eOPR2 to provide better elimination predictions at stronger events where captures are more frequent.

A new "world results" sheet has been added, which allows for component comparisons for every team at every event in which they have competed. Be aware that this list will have duplicates for teams that competed at 2+ events. Also, don't compare individual defense crossing data unless you know what you are doing. For example, team 5114 has a drawbridge contribution of 1722968039259170.00. 5114 is not that good at crossing the drawbridge, this just means that the drawbridge was not chosen frequently enough at Midland for there to be meaningful results for drawbridge contributions. As a rule of thumb, you can almost always trust the rock wall, sally port, and cheval de frise contributions, but be wary of the others.



Remember, this project is still quite young, and there are very likely errors in places (especially since I have not yet automated everything, and have to do some copying by hand). If you see any errors, please let me know and I will look into it.

Nathan Streeter
22-03-2016, 11:38
First: very cool spreadsheet! I'm glad to have a resource that looks at the component OPR for pretty much every possible condition! It has all the usual OPR caveats, but it does seem useful for establishing some trends and making some comparisons.

As a sidenote, thank you to FRC HQ for making this data more available for capture. The API certainly provides much better data than the twitter feed over recent years.

I have some questions about the "units" of some columns... I'm pretty sure they're my initial guess for most of them, but I wanted to double-check.

For columns H and I (teleop Capture or Breach), I presume a "1" would indicate a successful Capture/Breach?

For columns J - V and AM (defense crossings), is "1" a single defense traversal (5 pts) or a weakened defense (10 pts, 2 traversals)?

Also, how are eOPR 1 and eOPR 2 calculated? What's the difference? They differ dramatically from the OPRs based solely on match scores.

Caleb Sykes
22-03-2016, 11:53
First: very cool spreadsheet! I'm glad to have a resource that looks at the component OPR for pretty much every possible condition! It has all the usual OPR caveats, but it does seem useful for establishing some trends and making some comparisons.

Thank you. I'm glad to see people are benefitting from it.


As a sidenote, thank you to FRC HQ for making this data more available for capture. The API certainly provides much better data than the twitter feed over recent years.


Agreed. Also thanks to Ether and team 2834, I have built off of both of their work, and this spreadsheet would not be possible without them.


I have some questions about the "units" of some columns... I'm pretty sure they're my initial guess for most of them, but I wanted to double-check.

For columns H and I (teleop Capture or Breach), I presume a "1" would indicate a successful Capture/Breach?

For columns J - V and AM (defense crossings), is "1" a single defense traversal (5 pts) or a weakened defense (10 pts, 2 traversals)?

Also, how are eOPR 1 and eOPR 2 calculated? What's the difference? They differ dramatically from the OPRs based solely on match scores.

The "instructions" tab has more detailed descriptions about each category, let me know if anything there is unclear and I can revise it. I'll summarize here anyway though.

"teleop Tower Captured" and "teleop Defenses Breached" both have units of ranking points. A 1 in either of these would indicate that the given team contributes an average of 1 ranking point each match.

All categories that have "crossings" in their name have units of crossings, not weakenings. That is, a 2 in any of these categories would indicate that the given team contributed 2 scored CROSSINGS over this defense each match.

eOPR1 and eOPR2 are my rough attempts to compensate for different scoring methods in quals and elims. Since breaches and captures provide points in elims, but not in quals, "normal" OPR probably does a poor job predicting elimination match scores (although this is as of yet unverified). eOPR1 essentially makes boulders and crosses scored in quals worth more, and eOPR2 takes breaching/capturing contributions and assigns them point values, and then adds those to the "normal" OPR.

Caleb Sykes
30-03-2016, 12:23
Week 4 data has been added.

Additionally, I deleted the unnecessary whitespace that was beneath most of the event sheets' data. This will allow sorting to make much more sense and cause the scroll bar to be more appropriately sized.

Also, I hadn't realized that excel saved the position of the last cell selected, which is why seemingly random positions on each page were previously selected upon entering them for the first time. I have now selected the top-left corner cell on each sheet.

As always, I appreciate feedback and/or error reports.

Caleb Sykes
05-04-2016, 13:07
Week 5 data has been added.

I will include the data for the Western Canada regional in the week 6 update.

Ben Martin
12-04-2016, 10:13
Thanks for producing this every week! It is very interesting how the results from this data aligns very closely with scoring averages by type in our scouting data (not a perfect match, but very close)--we'll definitely be using it for Championships scouting.

Caleb Sykes
13-04-2016, 14:58
I believe there is a good chance that I am currently calculating "tech foul count" and/or "tech fouls drawn" improperly. I will be investigating more tonight, but for the time being assume that these metrics are erroneous.

Dancin103
13-04-2016, 15:38
Caleb, thank you for pulling this together every week! Our team has been using it as a "pre-assessment" of teams before each event. We will for sure being using it for CMP!

Thanks again!

hutchMN
13-04-2016, 16:13
Thank you so much for putting this together. It's been a great tool so far.

Caleb Sykes
13-04-2016, 17:19
Thanks for producing this every week! It is very interesting how the results from this data aligns very closely with scoring averages by type in our scouting data (not a perfect match, but very close)--we'll definitely be using it for Championships scouting.

The results align reasonably well with our data as well. We got the most similar results when rounding all negative values up to 0 and rounding all positive values down to the nearest 0.1 or 0.2.

Caleb, thank you for pulling this together every week! Our team has been using it as a "pre-assessment" of teams before each event. We will for sure being using it for CMP!

Thanks again!

Thank you so much for putting this together. It's been a great tool so far.

Thanks for the compliments, I'm glad to hear it is getting some good use.

Caleb Sykes
13-04-2016, 17:21
I believe there is a good chance that I am currently calculating "tech foul count" and/or "tech fouls drawn" improperly. I will be investigating more tonight, but for the time being assume that these metrics are erroneous.

Never mind, I think these are still being calculated properly.

The reason I thought these were wrong was primarily a result of me forgetting how exactly tech fouls are scored.

mitchellzone
17-04-2016, 13:08
Caleb, this is great stuff, very helpful. The world results sheet makes it great to use for Worlds scouting.

I do have a question on how you've calculated these numbers, though. I'm assuming for a given event you're taking averages, but how do we end up with negative numbers for things like teleop high boulder points, etc.? There are several fields with values like this that I don't understand as the minimum value should really be zero.

Can you explain this? Thanks.

Also, I'm wondering if you'll be producing a sheet that contains only the teams going to Worlds in St. Louis? That would be helpful for those who are going. Thanks for doing this work!

/mike

mitchellzone
17-04-2016, 13:51
I do have a question on how you've calculated these numbers, though.


Thought of another couple:

Does this data just reflect qualification rounds or also playoff rounds? I assume the latter, but wanted to verify.

Also, how are you calculating total points? This seems to be a really low number...

/mike

Caleb Sykes
17-04-2016, 14:40
I do have a question on how you've calculated these numbers, though. I'm assuming for a given event you're taking averages, but how do we end up with negative numbers for things like teleop high boulder points, etc.? There are several fields with values like this that I don't understand as the minimum value should really be zero.

Can you explain this? Thanks.



These numbers are calculated using a least-squares approximation on qualification scores assuming that every team contributes the same amount to the selected category in every match. This value is each team's calculated contribution (or OPR) in that category. The only inputs to the algorithm are the category scoring breakdown per match and the match schedule. For more detail on how OPR is calculated, see the first link on this (http://www.chiefdelphi.com/media/papers/2174) page titled "Presentation to explain new scouting database."

As to why negative values arise, there are two main reasons this could occur. First, recognize that these values represent a given team's contribution to a given category, which is generally not equivalent to what we conventionally think of as scoring. For example, a team which never takes shots, but transports boulders into the courtyard, could have a positive value in "teleop Boulders High." Although scouts would never say that they scored boulders high, if alliances which they are a part of tend to score more high goals, their "teleop Boulders High" value might be positive. In the same way, if a team plays the game in a way that hinders partners from scoring high boulders (by taking balls from them, taking their desired shooting position, running into them, etc...) then this team will have a justifiably lower score in "teleop Boulders High" than just the average number of boulders they themselves scored high.

The other reason a team could have a negative value in a category boils down to our assumption that every team contributes the same amount every match. This is very clearly false, but it is a reasonable enough approximation that we can still arrive at reasonably good results when making it. If team A never scores in the high goal, but happens to be on the same alliance as a very good shooter in the same match that the shooter breaks down, team A will likely receive a small negative value in "teleop Boulders High."

Personally, when I interpret these values, I generally round all negative values up to 0, but YMMV.

Also, I'm wondering if you'll be producing a sheet that contains only the teams going to Worlds in St. Louis? That would be helpful for those who are going.

Good idea. I will include a sheet like this in my next update.

Thought of another couple:

Does this data just reflect qualification rounds or also playoff rounds? I assume the latter, but wanted to verify.

I do only factor in qualification rounds. There are a number of reasons for this, many of which are described by Ed Law here (http://www.chiefdelphi.com/forums/showpost.php?p=1562188&postcount=5). The reasons there are important, but the largest reason for me is that using qualification matches only has become the de facto standard on calculations like these, and it is important to me that my scores are equivalent to those listed on TBA, the 2834 database, and the 1114 database.


Also, how are you calculating total points? This seems to be a really low number...

Total points is actually equivalent to OPR. This number represents the calculated contribution to the match scores of a team throughout their qualification matches.

Remember that these numbers represent only the given teams' contribution, not their average alliance's score. Also, remember that playoff scores are calculated differently than qual scores. If you want to approximate a playoff alliance's score, you will likely get better results using my eOPR1 or eOPR2 metrics.

mitchellzone
18-04-2016, 09:12
These numbers are calculated using a least-squares approximation on qualification scores assuming that every team contributes the same amount to the selected category in every match. This value is each team's calculated contribution (or OPR) in that category.

Okay, knowing these are all calculated similarly to OPR makes sense and explains the numbers, probably including the next one I was going to ask about which was why challenge/scale likelihood was often >1.0.

Looking forward to the next update, thanks.

/mike

Caleb Sykes
18-04-2016, 18:48
Week 7 data has been added.

Per request, I have also added a "championship preview" sheet which contains data on the best event (by OPR) of every team registered for championships as of 5PM CST on 4/18/2016. There is no new information on this sheet, all data are copied directly from the "world results" sheet. I am not planning to release updates if/when the championship team list changes, so you will have to update this sheet yourself.

If someone could check the data from the Michigan State Championship against scouting data to see that they roughly correlate, I would appreciate it. When I originally made this database, all of my calculations assumed that no event would have more than 100 teams or more than 200 matches. Thus, I had to modify a few things to accommodate MSC, which makes me nervous that I may have introduced one or more small errors somewhere.

Unless someone notices an error, I will not be releasing another update until after championships.

Caleb Sykes
19-04-2016, 11:32
If someone could check the data from the Michigan State Championship against scouting data to see that they roughly correlate, I would appreciate it. When I originally made this database, all of my calculations assumed that no event would have more than 100 teams or more than 200 matches. Thus, I had to modify a few things to accommodate MSC, which makes me nervous that I may have introduced one or more small errors somewhere.

These values have been verified to be correct. Ether ran calculations independently that provided results which matched the results in this database.

Caleb Sykes
19-04-2016, 13:04
Unless someone notices an error, I will not be releasing another update until after championships.

Update on updates.
By request, I have decided to release an update on Friday night with division preview tabs. I might also do match/ranking predictions using components, but no guarantees.

Caleb Sykes
20-04-2016, 12:45
I would just like to remind everyone that the changed tower strength at champs means that some of these metrics lose value if they are applied to the championship events. Specifically, teleop Tower Captured, eOPR1, and eOPR2 should not be directly applied to the championship event. I will create a new metric ceOPR (championship elimination OPR) in my Friday update which will be calculated in the same way eOPR1 is currently calculated, but modified to account for the change in tower strength.

joeojazz
20-04-2016, 21:43
Hi I was looking at your scouting info and didn't see my team 5712 in the championship preview. Also wanted to say how great this is and how useful this will be when checking out alliance partners and opponents.

Caleb Sykes
20-04-2016, 22:11
Hi I was looking at your scouting info and didn't see my team 5712 in the championship preview.

Per request, I have also added a "championship preview" sheet which contains data on the best event (by OPR) of every team registered for championships as of 5PM CST on 4/18/2016. There is no new information on this sheet, all data are copied directly from the "world results" sheet. I am not planning to release updates if/when the championship team list changes, so you will have to update this sheet yourself.

Your team will be included in this tab in the Friday update.

Caleb Sykes
22-04-2016, 21:51
I have uploaded a new championships preview database. It contains an updated teams list in the "championship preview" sheet, as well as divisional preview sheets. All results from these 9 sheets are taken directly from the "world results" sheet.

Additionally, I have created a new metric ceOPR (championship eliminations OPR), which can be used to predict elimination scores at championships. This value is equivalent to (total points) + 2.5*(subtracted tower strength) + 2.5*(cross defense count). This value is only in the world results, championship preview, and divisional preview sheets, not in the previous events.

Let me know if you have any questions or concerns.

jajabinx124
22-04-2016, 22:04
I have uploaded a new championships preview database. It contains an updated teams list in the "championship preview" sheet, as well as divisional preview sheets. All results from these 9 sheets are taken directly from the "world results" sheet.

Additionally, I have created a new metric ceOPR (championship eliminations OPR), which can be used to predict elimination scores at championships. This value is equivalent to (total points) + 2.5*(subtracted tower strength) + 2.5*(cross defense count). This value is only in the world results, championship preview, and divisional preview sheets, not in the previous events.

Let me know if you have any questions or concerns.

Just wanna say thanks for making this scouting database and making a championship preview version of it! This makes pre-scouting much easier and these stats will be useful.

Travis Hoffman
22-04-2016, 22:22
I have uploaded a new championships preview database. It contains an updated teams list in the "championship preview" sheet, as well as divisional preview sheets. All results from these 9 sheets are taken directly from the "world results" sheet.

Additionally, I have created a new metric ceOPR (championship eliminations OPR), which can be used to predict elimination scores at championships. This value is equivalent to (total points) + 2.5*(subtracted tower strength) + 2.5*(cross defense count). This value is only in the world results, championship preview, and divisional preview sheets, not in the previous events.

Let me know if you have any questions or concerns.

You are quite the awesome person for doing this. Thank you kindly.

mitchellzone
23-04-2016, 08:21
Caleb, you're awesome for doing this, thanks!

One suggestion: On the championship preview tab, it would be great to have an additional column with what division each team is in. This would allow us to just ingest the data once into any analytics platform and very quickly group the teams into divisions by that field rather than having to load them each separately and join them.

Thanks again!

/mike

Wayne TenBrink
24-04-2016, 07:58
Thanks for the excellent work!

Please check the Carson preview tab. It appears to be blank.

Caleb Sykes
24-04-2016, 11:49
Thanks for the excellent work!

Please check the Carson preview tab. It appears to be blank.

It displays fine for me. Do all of the other division preview sheets display properly for you?

Wayne TenBrink
24-04-2016, 22:46
It displays fine for me. Do all of the other division preview sheets display properly for you?
It's all there now. Third download was the charm. Thanks again.

Caleb Sykes
01-05-2016, 19:29
I have ran the calculations for championships, but there seems to be a bit of a discrepancy between my data and what is posted on TBA. Could someone independently run OPR calculations for Carson to see if the error is more likely on my side or on TBA's side? I will be investigating more on my own, the discrepancy seems like it might be related to qualification match #1.

Here is Carson's top 15 OPR according to TBA:

1024 67.95
868 64.40
973 59.73
225 57.22
2052 57.07
610 56.95
2122 56.15
2590 54.44
5895 54.13
41 52.82
2067 51.82
3824 50.56
2137 49.56
3538 47.72
2474 47.07


Here is the top 15 OPR according to my calculations:

1024 67.75374586
868 63.84195886
2122 59.32732553
973 59.02368093
225 57.49356224
610 56.71770781
2052 56.52062215
2590 54.57781003
5895 53.93177915
41 52.86710115
2067 51.99822564
3824 50.64961634
2137 49.26312626
3538 47.7814427
2474 47.15846249

Ether
01-05-2016, 21:01
Could someone independently run OPR calculations for Carson

1024 67.94699488
868 64.39937013
973 59.72708602
225 57.21656682
2052 57.06823258
610 56.94982187
2122 56.15330503
2590 54.44241751
5895 54.13186804
41 52.82497403
2067 51.82418186
3824 50.55946533
2137 49.55923262
3538 47.71879089
2474 47.06784238
1918 47.06371556
4028 45.59624045
904 45.41944435
1718 45.29367738
1625 43.90561045
4362 43.58606725
2996 43.53157842
2655 43.28570403
525 43.20450209
2403 43.1175526
2771 42.92284815
3970 42.72891288
135 42.57181239
5907 42.13118592
1987 41.79576813
4264 41.54625885
2486 41.10297812
3688 41.04522675
5167 40.89402704
319 40.27903932
4131 40.17573866
1619 39.56191031
2485 37.35922029
1533 36.90397632
1137 36.73476528
6098 35.99548457
233 34.91407358
6144 34.34974789
5663 32.20977958
5913 31.95709232
60 31.8850688
1156 31.25624534
1258 30.90491415
5084 30.58091357
5332 28.5437442
5454 28.51064153
1126 28.40910065
2761 28.31012272
2445 28.30861263
1159 26.93097019
2202 26.2979391
5879 25.36524886
5712 25.16147429
6025 24.98551622
2978 24.36624945
3352 22.86652503
4592 22.59269553
11 22.58569118
4121 22.17481602
296 22.09641455
1939 21.41832877
4026 20.96318342
4135 20.9368346
5572 20.35016881
3021 20.24024128
2526 19.82585398
5897 19.74217565
746 18.84443104
51 17.41725192
1369 7.854537579

Nathan Streeter
03-05-2016, 11:38
Is a post-CMP update coming? :-)

Caleb Sykes
03-05-2016, 14:40
Is a post-CMP update coming? :-)

I normally update within a day of the 2834 scouting database being updated, because the "world results" page uses event information from it. So if the answer to this question (http://www.chiefdelphi.com/forums/showpost.php?p=1582426&postcount=18) is yes, I will update within a day after the 2834 update. If there will be no update to the 2834 database, I will have to rework some things, so it will take me a bit longer.

If the 2834 database isn't updated by Friday, I'll rework my database and publish an update no later than Saturday.

Caleb Sykes
04-05-2016, 19:18
I have uploaded a final update to this database. This update has tabs for each of the championship divisions. I have also removed all of the championships preview tabs.

I hope everyone who downloaded it found it useful. I am planning to maintain this effort in the upcoming years. I am also planning to spend some more time developing the interface to reach the level of the 1114 and 2834 databases in future versions. I will also be looking to develop new metrics next year, depending on what the game is and what data the API provides. Keep an eye out for a thread near the end of build season next year where I will be asking for feedback on what everyone would like to see calculated.

Thanks to teams 1114 and 2834 for providing my inspiration for creating this. Special thanks to Ether for providing the CSV files on which my entire database is founded. None of this would have been possible without him.