The Purple Standard: A Unified and Community-Driven Standard for FRC Scouting Data

As we worked with 140+ teams last season on scouting collaboratively through The Purple Warehouse, we realized that it’s still difficult for many teams to change their data collection system to a single communal scouting app like The Purple Warehouse, preventing easy data sharing between teams using different scouting apps. We were able to work with several teams to manually write reformatting scripts, converting their data to our app’s format so that they could contribute to and access The Purple Warehouse’s pool of shared scouting data. From our experiences this past season working with many different incompatible scouting data formats, we want to propose a better solution.

So, we are announcing The Purple Standard (TPS), a unified and community-driven standard for FRC scouting data. TPS includes a customizable data format and framework that supports an unlimited number of properties that fall into a set of data collection interfaces. The hope is that every scouting interface can support a TPS data export option so that every team can easily share and access data in a collaborative manner. Additionally, we will be releasing an API in the coming weeks to allow easy access for all shared data through The Purple Warehouse, which also performs analysis and accuracy checks on scouting data.


(from xkcd)

To ensure we don’t accidentally start an unnecessary standards war, we want your input and collaboration for TPS! Over the next few weeks, please share your thoughts and propose any changes/additions for TPS either by replying to this thread, sending suggestions on our Discord server, or opening issues/PRs on the TPS Github repo. If you’re building a scouting app, please reach out so we can learn more about how to best help you with TPS integrations.

We are Amplified :loud_sound: about the new season and can’t wait to see how much data we can collect, share, and analyze together this year!

- TPW Team (Kabir, Omkar, Deeya, Albert, Mikhil, Arturo), a part of Harker Robotics team 1072

10 Likes

Looks interesting, initial thoughts:

I’d recommend adding more documentation about format requirements etc.
Ex: when you say timers are timers in seconds, milliseconds, is it a integer or decimal?
When you’re talking about a warehouse of conversion scripts between internal team data formats that are conforming to this spec, these kinds of things are important.

Are any of the categories nullable? If I don’t collect ability data can I just leave abilities section null?

What kind of data formats are acceptable? Is there specific metadata that teams should always have?
The document recommends starting from the TPW example but most teams will already have a format that they are starting with, so specifying what parts are actually important for aggregate use would be nice.

The only thing your not capturing here is any time series data, which in some games / to some teams is important, but aggregating that is a pain and I can understand why you would want to avoid it.

2 Likes

Great questions, and thank you so much for your valuable feedback!

  1. In the reference table and the values section of each timer property’s document, the unit of integer milliseconds is specified for the timers that are currently added in the repository.
  2. Yes it is expected that any parser implementations should handle cases in which a certain section is null or not included. The TPW-based parser and API will handle this case and we will make it clearer in our documentation that those fields can be null
  3. The example format can be modified completely depending on the type of data collected by each team. Any fields can be replaced with different properties that teams use for collecting their own data (and they can contribute these properties to the TPS repo through opening a PR), as long as the overall structure of the JSON stays consistent. We can also make this clearer in our documentation by providing a more generalized example. The hope is that people can use existing properties from the TPS reference table and contribute documentation for new properties that don’t exist in TPS yet, that way everyone uses the same format and everything can be interoperable.
  4. Any team that wants to collect time series data is more than welcome to make a PR to the TPS repo with the details of their property, that’s the beauty of making this an open-source, community-driven initiative!

Ok, here’s my thing. I would gladly supply data, but I want it to be as easy as possible for me lol. I’m sure others feel the same way. They don’t want to do a lot of work to give you their data.
The user experience should be as easy as possible. I recommend a GUI or app of some sort where you can provide an easy way to upload data and convert it into TPS. You should be able to:

  • Add or define data points in one of tps’s categories
  • Add all of the data associated with that data point to its category.
  • Insert a mass amount of data in as many formats as possible: JSON, raw text, CSV, XLSX, TSV, XLSM, etc.
  • Organize it by uploaded columns, rows or whatever other way teams want to organize data (if they uploaded file type with that type of metadata)
  • Convert all this organized data to tps’s JSON format and upload it or send it to your database.

Instead of asking teams to come to you, you need to come to teams. Instead of asking tons of people to change their formats to yours, have something where they can easily upload their format and convert it to yours.

This is with no malice, just trying to help improve; convincing a ton of teams to change to your data format is going to be super difficult. If you write an app to help reformat data, you will attract a lot more people.

On a different note, XSS is a super big deal, and the solution to addressing it should absolutely not be a small paragraph at the bottom of your documentation warning about it is unacceptable. Any data your database receives should be sanitized immediately so someone doesn’t get attacked using your platform (I don’t know if you do this already, but it should be done if you don’t).

There should be large warnings everywhere, including this post about it being a possibility, and it may even be worth using a different data format entirely to prevent this. Additionally, you should make a tool to scan files and verify that the data is in fact scouting data in TPS and not a malicious attack.
Hope this helps :slight_smile:

6 Likes

I would similarly suggest a conversion tool that can go between TPS and whatever [app] spits out (sneaky edit: ) or on the other hand whatever [app] takes in.

3 Likes

this is a good idea, an export feature or script that includes the filetypes mentioned would be great

1 Like

Thank you for your feedback! We have been reaching out to scouting app developers over the past week to work directly with them on integrating TPS into their apps, including offering help with any conversion scripts that may be necessary to add TPS-compatible export options. In the past, we’ve written conversion scripts to help teams convert their data into a format that works with our scouting app, but we’ve run into issues with incompatibility in some areas and also limited time at competitions to convert data quickly, so the hope is to distribute the workload across different scouting apps (so that one single team doesn’t have the burden of writing tons of converters) while also ensuring that the integration can be done up-front and in an interoperable way.

In the future, we may create a conversion interface that can help people structure their data into a TPS-compatible format, but for now we’re hoping to integrate with scouting apps directly so that there is no additional effort needed by teams to convert data and it can be a one-time integration into a scouting app (which is often used by more than one team).

As for your concerns about XSS, we do understand that this is an important consideration (several members of our development team have extensive cybersecurity backgrounds, ranging from experience building cryptographically-secure protocols to working at established cybersecurity companies), so this is why we have even thought to include this XSS section in the first place. It’s worth noting that XSS attacks would only be applicable in certain environments (for example, viewing a CSV file or plain text would not result in any XSS attacks, so the data in this case should just be shown as-is without sanitization). We have provided the code necessary to sanitize data for the most common case of displaying data in HTML/DOM (replacing HTML tags with equivalent character entities, which will make any HTML tags display as plain text and eliminate potential for XSS attacks).

Of course, we will be working with app developers to ensure that their specific implementations are not subject to XSS attacks, and our API will check that submitted data is properly formatted as TPS data, but we can’t sanitize on the API directly as there is no universal solution that can work with every implementation (for example, sanitizing data for HTML output will cause plain-text viewers to improperly display the data). We feel that spreading extraneous warnings to users is not responsible and could cause unnecessary alarm and mistrust of scouting data when the risk is negligible for scouting apps that have implemented basic XSS prevention techniques (which should be the standard for any app displaying untrusted output in HTML, and is the case for most modern frameworks).

Please let us know if you have any other suggestions or feedback, and we’re happy to work with you and any scouting apps to securely and safely integrate TPS in the easiest way possible!

1 Like

I am glad to see you have taken good steps to prevent xss attacks and I hope you continue down that path as you create more integrations with other scouting apps.

On the topic of integrating scouting apps. I feel like you will get a lot more data with significantly cless work by making a public converter as it’s significantly easier imo than changing an app. The issue I believe you will run into aswell is that writing tons of integrations will become unnecessarily time consuming at some point. I believe your time would be more effectively spent on somthing all teams can use instead of individual code for each team.

I would also appreciate if you could put out some of the scripts you have already created Incase people can modify them themselves to work.

I really support this idea I just believe it needs to be more easily accessible. As some teams will have trouble modifying their app, some teams run offline apps (which I’m sure you could create code for but I feel like it’s much easier to just dump data into an app that gets sent to you) and several other reasons. I just think it would be better to work on something public than with each individual team, and I hope you look more into a public system.

In any matter I really love y’all’s current app, the effort y’all are putting into the things y’all do, and I love to see this continue into the future with this great community scouting.

1 Like

The timing of your post was a bit unfortunate as it did not give everyone a chance to look it over and consider it before starting their 2024 work.

I was too busy after kickoff to keep up with threads like this on CD. Once I finish some post-season tasks I look forward to diving in and taking a closer look. I joined the Discord server so I’ll post any questions or comments there to the appropriate channels.

2 Likes

Thank you for your interest in The Purple Standard, feel free to reach out if you have any questions about it!

1 Like

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed.