A Possible Universal Scouting Standard and System

Despite the apparent ease and benefit of a user-submission system for compiling basic team data, strangely very few attempts have been made to extend upon this concept to compile additional and largely objective data useful for scouting. Now various disparate systems exist, each finding a small following, and much redundant work is done.

I would like to thus present my vision of a universal scouting system and standard in three parts (see replies and document to be posted later for additional detail):

-Mechanical Specifications of Robot
-Performance Characteristics of Team
-Gameplay Performance of Team

Most of this information can either fit in a web database format or a simple CSV/XML/JSON file possibly with custom delimiters. Additional information such as photographs or log files can be possibly attached to the main file via a custom metadata field. With this system in place, it should be possible to produce datasheets/analyses with only custom input macros to relevant software.

Furthermore, the problem of interoperability is largely removed by having the data in a generic plaintext format in the first place, and a dictionary file for objects and data entries can allow for continued operation without significant reformatting and maintenance while ensuring simplicity of the main data file.

The only issue I see so far with this system is the lack of any data typing, which would create difficulties for data processing programs. In exchange for ease of processing the system would also necessitate a considerable amount of manual labor during the early phases of development.

A document will be released tomorrow providing a complete specification of the format. All suggestions are welcome and will likely be accepted.

A private git repository should be released by the end of this week, and as I am a senior preparing for my finals and the only developer of this system so far, this is an optimistic estimate and time delays are very much likely.

1 Like

I worked a lot on data engineering projects, and let me tell you (as you can also witness here on CD), there is not a single standard that really anyone can agree on. Personally I would say to start on one of the parts and focus on that. The gameplay is always going to be determined by the game, while the other 2 are relatively more static.
I was contemplating working on something like this previously, before life took over, and Iā€™ll give you a few of my thoughtsā€¦
Think of creating tiers of data detail for increased flexibility, so that if someone collects in the most detailed tier, they can ā€œmergeā€ their data to be able to compare it against someone that collects in a lesser detailed level.

Example would beā€¦
Tier 1: How many points in end game?
Tier 2: How many robots hung?
Tier 2: How many robots hung on the left chain? right chain? center chain?
Tier 3: Which robot hung on which chain? ex: Robot 1 hung on the center chain

And as you will learn in the real world, take your timeline, double it, and add some more. I think I usually use a 150-250% multiplier based on how certain I am :slight_smile:

3 Likes

I was looking at this teamā€™s documentation on a universal scouting data standard. It is interesting but I think this is still a work in progress.

1 Like

I like the concept of a universal data format. However the devil is very much in the details. Having contributed to IETF standards for 25+ years I very much agree with @runneals comments above (especially the timeline).

The biggest hurdle to a universal data format and its widespread adoption is what demand is there for it. While there are various local or regional scouting alliances, they simply share the data in a single source like Google Sheets or Excel with no need to convert the data. Also, lots of teams Iā€™ve talked to still prefer to keep their scouting data ā€œin houseā€. That can be for any number of reasons including strategic, data confidence, because they prefer to rely solely on themselves, or for any number of other reasons.

A universal scouting system that can be easily customized to each teams needs & wants would be something many teams would consider adopting. The more they can focus on their data, how itā€™s used and less on building the entire system again, the more likely they would be to adopt it.

40 Likes

Very much true.

1 Like

I have read through the document and have seen their implementation of the standard on github. While the TPW is professionally done and very well structured, I do believe my system has some advantages compared to TPW, especially in terms of abstraction, accessibility, and modularity. I would not have posted this thread otherwise.

1 Like

The data system will allow for nulling easily and possibly even contradictory data.

After falling ill and being bedridden for several days due to jet lag and some heavy pre-exam cram sessions here is a very, very, rough draft which I typed up with LaTeX in the span of a few hours. It is little more than a collection of thoughts so far. Might update soon.
Proposal.pdf (124.6 KB)

My team actually had a bunch of debates on this during this season. We came to the conclusion that scouting is a really broad field that, if done correctly, could be a really good advantage. We disliked the idea because of all the individualized work we put into it this year. We had formulas and game-specific statistics that took the entire season to build, and weā€™d have wasted our time to create that system if weā€™d have shared it with everyone else.

OUR BIGGEST ISSUE was that we wouldnā€™t have been able to trust all the data coming in. We went to great lengths to ensure the accuracy of our data just within our team.

So, while I love this idea and I would 100% participate, I donā€™t believe thereā€™s a truly fair OR fast process to do this for every game. :slight_smile:

I would strongly encourage you to look into The Purple Standard. Itā€™s not universally accepted yet, but itā€™s as close as it gets

3 Likes

Even with lots of training it is still easy to get inaccurate scouting data. That can happen for lots of reasons but it boils down to human error in which is not easily addressed by any scouting system or standard.

I used video footage to recheck the scouting data for just our robot at 1 event this season and found 90% of the matches had incorrect data. There were several common minor mistakes such as incorrect Parked or Leave status. However there were also more significant errors in ~30% of our matches where our robot was under or over scored by as many as 3 Notes.

Even with multiple training sessions we still had some fairly significant issues that we still need to overcome. If anyone has suggestions or ideas as to how they try to handle that on their team (or scouting alliance), I would love to hear about them.

1 Like

Definitely an understandable struggle. We fact checked all of our data and most of it tended to be right. We think that this is due to a few things:

  1. We emphasized the use of our DNR (Did not record) option instead of just making up or approximating data. This worked out REALLY WELL! A side bonus of this was us being able to tell who didnā€™t pay attention during matches (we did have to take some people off the rotation)

  2. We put a lot of time and effort into making our form as concise and as clear as possible for anyone who was using it. It still took training, but it was much more reliable than previous years.

  3. Another great thing we were conscious of during competitions was making sure people didnā€™t scout more than 3-5 matches at a time. They would then have AT LEAST a 5 match break. We usually made sure they had a 10 match break.

Hope this helps! Iā€™m also very curious in any other ways that teams accomplished this! :+1:

1 Like

How did you fact check your data? Iā€™ve been working on some processes to do that with ours. I have an approximation I have been calling confidence/completion (detailed here: FRC Team 1710 | 2024 Build Thread | Open Alliance - #37 by CoreyBrown).

Thank you to everyone who has recommended The Purple Standard! Hereā€™s the past CD thread made about it and our discord server invite where we are definitely open to suggestions/improvements that we can make to ensure all FRC teams can utilize it. Our app development subteam is also working on some updates so community input is much appreciated!

1 Like

Hey there, Iā€™m one of the devs who worked on The Purple Standard! Itā€™s definitely great to see more momentum for the concept of a universal scouting data standard. The TPS format is fully customizable and flexible based on the type of data you or any other scouting app wants to store. Since TPS is open source and seeks to foster community contributions, Iā€™d love to see if there are any possibilities of working together so that we donā€™t create unnecessary competing standards (like the XKCD meme earlier in the thread). Feel free to send me a DM or ping me in the Discord server Tiffany linked above if youā€™re interested in discussing further!

Unfortunately we had to fact check most matches ourselves. We left some alone if they were done by experienced mentors or our more frequent scouters. If thereā€™s an easier way, weā€™d much prefer thatā€¦ :laughing:

I donā€™t think there is a way to automatically validate data with 100% certainty. If there was, we wouldnā€™t need to scout!

I just calculate individual robot scores using scouting data and add TBA data, then compare the scores.

Expand this to see an example of simple validation:

TBA

  1. Blue Score = 92
    a. Blue Fouls = 0
    b. Blue Leave = 4
    c. Blue Co-op = 1
    d. Blue End Game = 2
  2. Red Score = 82
    a. Red Fouls = 10
    b. Red Leave = 6
    c. Red Co-op = 1
    d. Red End Game = 3

Scout Data

  1. Blue Robot A Note Points = 20
    a. Auto = 5
    b. Tele Amp = 1
    c. Tele Speaker = 14
    d. Trap = 0

  2. Blue Robot B Note Points = 36
    a. Auto = 15
    b. Tele Amp = 7
    c. Tele Speaker = 12
    d. Trap = 0

  3. Blue Robot C Note Points = 5
    a. Auto = 5
    b. Tele Amp = 0
    c. Tele Speaker = 0
    d. Trap = 0

  4. Red Robot D Note Points = 14
    a. Auto = 10
    b. Tele Amp = 0
    c. Tele Speaker = 4
    d. Trap = 0

  5. Red Robot E Note Points = 28
    a. Auto = 5
    b. Tele Amp = 5
    c. Tele Speaker = 13
    d. Trap = 5

  6. Red Robot F Note Points = 13
    a. Auto = 5
    b. Tele Amp = 0
    c. Tele Speaker = 8
    d. Trap = 0

Totals
Blue Calculated Total = 20 + 36 + 5 + 0 + 4 + 1 + 2 = 68
Blue Actual Total = 92
Blue Confidence = 1 - (|92 - 68| / 92) = .739 ā‰ˆ 74% Confidence

Red Calculated Total = 14 + 28 + 13 + 10 + 6 + 1 + 3 = 75
Red Actual Total = 82
Red Confidence = 1 - (|82 - 75| / 82) = .914 ā‰ˆ 91% Confidence

Each of the robots on the Blue alliance and each scout that was assigned a robot on the blue alliance receives a .739 confidence/completion score for this match. Each of the robots and scouts looking at the Red alliance receive a .914 confidence/completion score for this match.

The confidence/completion scores can then be used to generate error bars for each robotā€™s aggregate score, reward particularly effective scouts, or even just understand where our app or scout training needs to be improved.

Edit: I should probably relate this data validation back to the original thread. Having standards in place for scouting data is an important step in sharing, viewing, and using other teamsā€™ data. Another important step is trusting the data. Having some sort of data validation, even if it is simple, goes a long way in earning trust of those that want to use the data that is published.

1 Like

Ah, thatā€™s a clever way to do it. Where is this system set up? We did something similar to check for duplicate entries (simpler, without multiple formulas), but thatā€™s an interesting idea. Might have to play around with that during the offseason.

Currently, I just run this via python script after pulling our data down, out of our scouting app. I would love to get this baked into our scouting app in the near future.