AI Scouting

Hi, I have seen so many great efforts from teams on making scouting apps to make the job easier for scouts.

However, I was wondering if any AI/Machine Learning approach has been made. It would be interesting to use the vast amount of data in the blue alliance to train a scouting system that tells you your definite best picks.

If you have an idea, be welcome to share it! Maybe some of us can make this happen.

This is a cool idea and while I’ve seen many similar things (match predictors, data visualization, etc.), I’m not sure that I’ve ever seen anything quite like this.

A couple things to note:

  • OPR and other stats are anything but perfect. Even the best of prediction software is often wildly inaccurate. It would depend on whatever data FIRST offers in the API that year (though it is rarely robot specific). In order to generate actual predictions, you may need to collect manual data not provided by the API.

  • There is no live alliance selection data. If you generated a pick list and manually deleted teams as the selection went on, that would be fine. However, if you wanted to create an ideal alliance, you would need to have a list of many possible combinations rather than just a singular list. Alternatively, you could use software in which you delete picks and then it forms the best alliance based on who is still available.

Regardless of what data you choose to use, don’t underestimate the power of Google Sheets.

What exactly would the scouting system “learn” from? From what I’ve seen, the sample size at a single tournament is generally too small for any quantitative measure to completely summarize a teams’ strengths. For example, even TBA’s OPR rankings, despite being a fairly good judge of team strength, is nothing more than a rough gauge; 254 at Silicon Valley probably made the best choice in selecting 604 even though they were 5th in OPR, significantly below some of the other top options.

If you are thinking about aggregating data from multiple events, I think all the information that such an effort would yield is that teams with more points and more wins tend to be better.

It would be interesting for each regional thus far to get characteristics for each team in eliminations, normalize to each competition (to minimize weak vs. strong regionals/districts effects), and then see if the teams progressing further have any notable alliance characteristics. For instance, should the first pick be capable of scoring fuel (or is it not needed)? Should you risk the second pick on a more capable gear deliver with a spottier climbing record or is picking the best available climber the best choice? And of course how this may be different depending on the characteristics of the alliance captain?

To go back to the machine learning question, I did k-means clustering last year but it was more exploratory rather than predictive. I used the 4 components after OPR as inputs and found 3 clusters to be most useful (top teams, elimination teams, and the rest). I hadn’t tried it this year, but the OPR components can be easily obtained in Sheets (Rachel Lim’s) and directly accessed by Tableau (which now has built-in k-means). There are more components, but using all could be worse. My gut hasn’t really thought OPR this year was as useful as last, in part because the 0 or 50 point swing for successful climbing and 0,40,80,120 jumps for rotors with non-linear gears delivered.

One of the things that I have noticed in regionals is that teams often tend to overlook robots that are not in the first 20 places. Let me put an example.

In Laguna Regional 2017, 4403 was experiencing some issues during the first day of competition and was ranked 33/35. At the end of the day, it began consistently putting 4 gears and climbing, and so did the second day. However, it was ranked 24.

The alliance 1 picked this team and won.

Maybe the improvement of a team over the first and second day of a regional can be used as a feature for the ML system.