Thread created automatically to discuss a document in CD-Media.
Scouting Accuracy Benchmarking Study by: Basel A
This study used 5 teams’ datasets from MSC 2014 to gauge overall quantitative scouting accuracy and to provide recommendations for improving scouting accuracy to all teams.
This study had three purposes: (1) to compare the scouting accuracy of some top Michigan teams; (2) to gauge the overall accuracy of scouting and assess the implications of accuracy on the value of scouting data; and finally (3) to provide recommendations on improving quantitative scouting data collection. There were a few base ways of comparing the teams, and the development of those statistics is explained in the methodology. The teams are presented as case studies of datasets, with only limited information on the methods of data collection and entry. Discussion of overall accuracy is based on the summary statistics and details of the case studies.
Before opening this up, I’d like to provide some of the background for this past study. It came out of my concern about 3322’s scouting this season. We switched from a paper-only system to an entirely redesigned paper-to-digital system between Week 3 and Week 5. There were then additional changes to the system from Week 5 to MSC. With so many changes going on, I did not feel confident about the accuracy of our data. Following MSC, I began asking around for teams’ datasets from MSC to compare with ours. We collected 4 from elite teams, and the data from these 5 teams became this study.
I decided to allow the teams involved to remain anonymous. If the teams would like to come forward, they’re free to do so.
If any other teams would like to add their (MSC 2014) data to the study, I’d be happy to take it. Just send me a PM for my email, and once I’ve received it, I’ll get right on adding your data.
With that, I’d like to open this up for questions or comments.
This is a great paper, I really enjoyed reading it. In the back of my mind, I have always wanted to run an analysis like this for every team at an event that will give me data, and then give an award to the team with the “best” scouting data using an analysis like this. Of course, thinking grand ideas is far different than implementing them.
The importance of good scouts too often seems undervalued relative to other team positions, and I think that giving a small team award could go a long way toward making scouts have a more enjoyable experience.
Hmm, was hoping for a more complete data set to try and grade the accuracy of OPR for 2014. Still a nice paper, and an interesting first look at accuracy in scouting.
The sample size may be small, but did you find a correlation between amount of data taken and accuracy? (My personal opinion/assumption is that teams generally take way too much data which makes it harder to scout, hurting the data teams actually use).
I tried to do a project like this back in 2012 using data from 2010-2012 (I used 2337’s datasets for this period, 2-3 events per year). While this was somewhat useful in comparing the utility of OPR across different years, I ran into difficulty comparing OPRs directly to scouting data because extrapolating scores from teams’ statistics is a nontrivial task. OPRs are, of course, dependent directly on match scores, so this extrapolation is necessary for comparison. For example, in 2011, tube scoring did not scale linearly. You’d encounter the same problem in 2014 with scores of balls depending on the number of assists. However, it’s definitely a topic worthy of further study.
Interesting question! This isn’t something I originally considered, but I took a look. With only 5 data points, 4 of which are pretty similar in accuracy, I wasn’t optimistic about having a real answer here, but then it got worse. All 5 teams took around the same amount of data! They varied from 16 fields of data to 21. None had exactly the same number, and the most accurate team was in the middle at 18 fields.