Sykes Scouting Database 2019


Another year, another scouting database, and this one is bigger and better than ever before. Key general additions since last year:
Added week to the top of each event sheet
Added Chairman’s Eligibility (I’ll get to actually using it soon)
Added country, state/province, city, and district for each team

Additionally, I have lots and lots of 2019 specific metrics that I’m including. The wealth of scoring data this year is nuts, but I did my best to break it down as well as I could without adding anything superfluous.

Why’d you change the name to “Sykes Scouting Database”? To be honest, I ran into some annoying issues with the apostrophe in “Caleb’s Scouting Database”, and since “Caleb Scouting Database” sounds stupid, I decided to go with “Sykes Scouting Database”.

I’m not quite sure what the best way to handle constantly updating documents is with the new CD. In the old whitepapers section, I could just upload revisions all into one place, but I don’t really have that option here. I’m trying out just telling people to scroll to the bottom. Maybe I could link to a dropbox or something instead? I’m not sure, but I’m open to ideas.

Here are previous years’ scouting databases for the interested:

As always, feedback is appreciated


Sykes_Scouting_Database_2019.0.1.xlsx (10.3 MB)

Here’s the very first week 0 version. It’s mostly just laying out the structure for future updates since we don’t have any actual data yet. I’m missing Chairman’s Strengths and eligibilities for now, but I should be able to get those by next week at the latest. All non-Elo metrics are seeded based on averages from the official week 0 matches. They’re not great but they’re the best I’ve got with no actual matches having been played yet.

So the only really interesting thing right now is the Elo ratings, which are start of season ratings. You can find more info on how these are calculated here.

Let me know if anything looks off, there’s always bugs in the early weeks.

Brandon’s suggestion when I asked him this question was to use wikis. On it’s face I’m not a huge fan of this because then anyone can edit it, but I haven’t actually tried it yet so I don’t know if it’s actually a problem. The CD community is pretty reasonable; hopefully people won’t be randomly editing wikis just to mess them up.

As always, looking forward to reading through the new improvements in the upcoming weeks!

I’d suggest just hosting it somewhere else and just updating the link every time you announce a new version. Heck, you could probably use Git to manage the file and point to a Github repo.

1 Like

Here’s a week 1 update: Sykes_Scouting_Database_2019.1.1.xlsx (17.4 MB)

There’s a lot of things I need to improve, but you should be able to trust all of the data shown, most of the issues are aesthetic. Here are my known issues:
Seeds for upcoming events are not set
Chairman’s strengths and eligibilities are not set
Some metrics (placeholder 1, right side bias, and own side bias) are not calculated for events that didn’t complete all of their qual matches
Some qual headers are not aligned properly
Some of the number formatting is ugly
Some columns are incorrect widths
Missing Home Championship
HAB Climb Level 2+ rate is not calculated
Israel 1 and Bosphorous will be included next week

So basically, if you want something prettier, wait until next week, but if you want the data now, I’m providing it. :slight_smile: Also on my todo list is to host this on GitHub as I think that’s a better solution than throwing it into the thread.

I want to talk briefly about some of the metrics and my thoughts on them. The biggest thing I want to emphasize is that just because I have included lots of metrics does not mean I think they all have equal value. Some are very useful/understandable, and some are basically worthless and/or difficult to meaningfully interpret. You’ll need to make the call on what you want to use, but here are some thoughts of mine:
Elo is my favorite and I think it’s the best summary statistic (it’s my baby though so your call). Also note that the Elos listed will change slightly next week after week 1 events are complete. This is because I’m using a score standard deviation of 15.0 right now, but once all week 1 events are done, I’ll find the actual score stdev and use that in Elo calcs, so expect the Elos listed to change a couple of points next week.
It’s not clear to me if total Points or unpenalized Total Points is superior this year. I analyzed 2015-2018, and in some years total Points was a better predictor of success, and in others unpenalized Total Points was a better predictor of success. Seems like an average of them was the best bet in previous years though, so use that if you want.
“win”, despite having the name that makes it sound the most important, is actually not a very good metric.
For summary metrics, my belief is Elo > total Points = unpenalized Total Points > winning Margin > win (which is also the order I have them arranged).
placeholder1 was a poor attempt by me to calculate right side bias an alternative way, it didn’t work out well so you can ignore it. I’ll likely remove it in the future.

For some reason, the FIRST API provides a field called “sandStormBonusPoints” as well as a field called “autoPoints”. As far as I can tell they are identical. I had penciled in 3 sandstorm fields to cover both of these as well as their difference, but since they are the same I guess I’ll have 3 identical fields now. :man_shrugging:

Right side bias is actually very interesting to me. I thought it might not have value, but I think it actually might for high-scoring teams. Here is the graph of right side bias magnitude versus a team’s teleop points:
The correlation isn’t strong, but it would seem that really high scoring teams (15+ teleop points) actually do frequently tend to have a measurable left or right side bias. I don’t know how much stake you should place in that, but might be worth sending your defender over on your opponent’s preferred side at the start of the match and see how things go from there.
Even more interesting to me though is the “Own Side Bias Points” field. This is calculated by looking at the difference between the points scored on your left/right side of the field and the other side. Matches in which a team is in the center station are not considered.
As you can see from the graph, teams really strongly prefer scoring on their own side (duh). Some teams are better than others though at scoring on the far side, and you can learn that by comparing a team’s total teleop points to their bias points.


I’m moving the source of my database to a github link in this thread

Please direct any questions/comments there

1 Like