Despite the apparent ease and benefit of a user-submission system for compiling basic team data, strangely very few attempts have been made to extend upon this concept to compile additional and largely objective data useful for scouting. Now various disparate systems exist, each finding a small following, and much redundant work is done.
I would like to thus present my vision of a universal scouting system and standard in three parts (see replies and document to be posted later for additional detail):
-Mechanical Specifications of Robot
-Performance Characteristics of Team
-Gameplay Performance of Team
Most of this information can either fit in a web database format or a simple CSV/XML/JSON file possibly with custom delimiters. Additional information such as photographs or log files can be possibly attached to the main file via a custom metadata field. With this system in place, it should be possible to produce datasheets/analyses with only custom input macros to relevant software.
Furthermore, the problem of interoperability is largely removed by having the data in a generic plaintext format in the first place, and a dictionary file for objects and data entries can allow for continued operation without significant reformatting and maintenance while ensuring simplicity of the main data file.
The only issue I see so far with this system is the lack of any data typing, which would create difficulties for data processing programs. In exchange for ease of processing the system would also necessitate a considerable amount of manual labor during the early phases of development.
A document will be released tomorrow providing a complete specification of the format. All suggestions are welcome and will likely be accepted.
A private git repository should be released by the end of this week, and as I am a senior preparing for my finals and the only developer of this system so far, this is an optimistic estimate and time delays are very much likely.
I worked a lot on data engineering projects, and let me tell you (as you can also witness here on CD), there is not a single standard that really anyone can agree on. Personally I would say to start on one of the parts and focus on that. The gameplay is always going to be determined by the game, while the other 2 are relatively more static.
I was contemplating working on something like this previously, before life took over, and Iāll give you a few of my thoughtsā¦
Think of creating tiers of data detail for increased flexibility, so that if someone collects in the most detailed tier, they can āmergeā their data to be able to compare it against someone that collects in a lesser detailed level.
Example would beā¦
Tier 1: How many points in end game?
Tier 2: How many robots hung?
Tier 2: How many robots hung on the left chain? right chain? center chain?
Tier 3: Which robot hung on which chain? ex: Robot 1 hung on the center chain
And as you will learn in the real world, take your timeline, double it, and add some more. I think I usually use a 150-250% multiplier based on how certain I am
I like the concept of a universal data format. However the devil is very much in the details. Having contributed to IETF standards for 25+ years I very much agree with @runneals comments above (especially the timeline).
The biggest hurdle to a universal data format and its widespread adoption is what demand is there for it. While there are various local or regional scouting alliances, they simply share the data in a single source like Google Sheets or Excel with no need to convert the data. Also, lots of teams Iāve talked to still prefer to keep their scouting data āin houseā. That can be for any number of reasons including strategic, data confidence, because they prefer to rely solely on themselves, or for any number of other reasons.
A universal scouting system that can be easily customized to each teams needs & wants would be something many teams would consider adopting. The more they can focus on their data, how itās used and less on building the entire system again, the more likely they would be to adopt it.
I have read through the document and have seen their implementation of the standard on github. While the TPW is professionally done and very well structured, I do believe my system has some advantages compared to TPW, especially in terms of abstraction, accessibility, and modularity. I would not have posted this thread otherwise.
After falling ill and being bedridden for several days due to jet lag and some heavy pre-exam cram sessions here is a very, very, rough draft which I typed up with LaTeX in the span of a few hours. It is little more than a collection of thoughts so far. Might update soon. Proposal.pdf (124.6 KB)
My team actually had a bunch of debates on this during this season. We came to the conclusion that scouting is a really broad field that, if done correctly, could be a really good advantage. We disliked the idea because of all the individualized work we put into it this year. We had formulas and game-specific statistics that took the entire season to build, and weād have wasted our time to create that system if weād have shared it with everyone else.
OUR BIGGEST ISSUE was that we wouldnāt have been able to trust all the data coming in. We went to great lengths to ensure the accuracy of our data just within our team.
So, while I love this idea and I would 100% participate, I donāt believe thereās a truly fair OR fast process to do this for every game.
Even with lots of training it is still easy to get inaccurate scouting data. That can happen for lots of reasons but it boils down to human error in which is not easily addressed by any scouting system or standard.
I used video footage to recheck the scouting data for just our robot at 1 event this season and found 90% of the matches had incorrect data. There were several common minor mistakes such as incorrect Parked or Leave status. However there were also more significant errors in ~30% of our matches where our robot was under or over scored by as many as 3 Notes.
Even with multiple training sessions we still had some fairly significant issues that we still need to overcome. If anyone has suggestions or ideas as to how they try to handle that on their team (or scouting alliance), I would love to hear about them.
Definitely an understandable struggle. We fact checked all of our data and most of it tended to be right. We think that this is due to a few things:
We emphasized the use of our DNR (Did not record) option instead of just making up or approximating data. This worked out REALLY WELL! A side bonus of this was us being able to tell who didnāt pay attention during matches (we did have to take some people off the rotation)
We put a lot of time and effort into making our form as concise and as clear as possible for anyone who was using it. It still took training, but it was much more reliable than previous years.
Another great thing we were conscious of during competitions was making sure people didnāt scout more than 3-5 matches at a time. They would then have AT LEAST a 5 match break. We usually made sure they had a 10 match break.
Hope this helps! Iām also very curious in any other ways that teams accomplished this!
Thank you to everyone who has recommended The Purple Standard! Hereās the past CD thread made about it and our discord server invite where we are definitely open to suggestions/improvements that we can make to ensure all FRC teams can utilize it. Our app development subteam is also working on some updates so community input is much appreciated!
Hey there, Iām one of the devs who worked on The Purple Standard! Itās definitely great to see more momentum for the concept of a universal scouting data standard. The TPS format is fully customizable and flexible based on the type of data you or any other scouting app wants to store. Since TPS is open source and seeks to foster community contributions, Iād love to see if there are any possibilities of working together so that we donāt create unnecessary competing standards (like the XKCD meme earlier in the thread). Feel free to send me a DM or ping me in the Discord server Tiffany linked above if youāre interested in discussing further!
Unfortunately we had to fact check most matches ourselves. We left some alone if they were done by experienced mentors or our more frequent scouters. If thereās an easier way, weād much prefer thatā¦
I donāt think there is a way to automatically validate data with 100% certainty. If there was, we wouldnāt need to scout!
I just calculate individual robot scores using scouting data and add TBA data, then compare the scores.
Expand this to see an example of simple validation:
TBA
Blue Score = 92
a. Blue Fouls = 0
b. Blue Leave = 4
c. Blue Co-op = 1
d. Blue End Game = 2
Red Score = 82
a. Red Fouls = 10
b. Red Leave = 6
c. Red Co-op = 1
d. Red End Game = 3
Scout Data
Blue Robot A Note Points = 20
a. Auto = 5
b. Tele Amp = 1
c. Tele Speaker = 14
d. Trap = 0
Blue Robot B Note Points = 36
a. Auto = 15
b. Tele Amp = 7
c. Tele Speaker = 12
d. Trap = 0
Blue Robot C Note Points = 5
a. Auto = 5
b. Tele Amp = 0
c. Tele Speaker = 0
d. Trap = 0
Red Robot D Note Points = 14
a. Auto = 10
b. Tele Amp = 0
c. Tele Speaker = 4
d. Trap = 0
Red Robot E Note Points = 28
a. Auto = 5
b. Tele Amp = 5
c. Tele Speaker = 13
d. Trap = 5
Red Robot F Note Points = 13
a. Auto = 5
b. Tele Amp = 0
c. Tele Speaker = 8
d. Trap = 0
Totals
Blue Calculated Total = 20 + 36 + 5 + 0 + 4 + 1 + 2 = 68
Blue Actual Total = 92
Blue Confidence = 1 - (|92 - 68| / 92) = .739 ā 74% Confidence
Red Calculated Total = 14 + 28 + 13 + 10 + 6 + 1 + 3 = 75
Red Actual Total = 82
Red Confidence = 1 - (|82 - 75| / 82) = .914 ā 91% Confidence
Each of the robots on the Blue alliance and each scout that was assigned a robot on the blue alliance receives a .739 confidence/completion score for this match. Each of the robots and scouts looking at the Red alliance receive a .914 confidence/completion score for this match.
The confidence/completion scores can then be used to generate error bars for each robotās aggregate score, reward particularly effective scouts, or even just understand where our app or scout training needs to be improved.
Edit: I should probably relate this data validation back to the original thread. Having standards in place for scouting data is an important step in sharing, viewing, and using other teamsā data. Another important step is trusting the data. Having some sort of data validation, even if it is simple, goes a long way in earning trust of those that want to use the data that is published.
Ah, thatās a clever way to do it. Where is this system set up? We did something similar to check for duplicate entries (simpler, without multiple formulas), but thatās an interesting idea. Might have to play around with that during the offseason.
Currently, I just run this via python script after pulling our data down, out of our scouting app. I would love to get this baked into our scouting app in the near future.