Should crowd sourced data be published in real time? [Poll + Discussion]

Context:
I am the developer of this scouting system: First Release of Analytics App for FRC MScout System
Currently I am in the planning stages is the next iteration of the scouting system (with a better name too).
Another colleague and I have been having discussion on the idea of crowd sourced data and how publishing should be accomplished. In this next iteration, I’m looking to expand the scouting app to a cloud based model (still retaining full offline functionality of course). In this cloud based model, data gathered by scouts would eventually get published into a public database (think TBA, but crowd sourced with a standardized quantitative scouting system). My colleague and I have had some differences in opinion regarding this issue and I wanted to open this discussion board to gather new perspectives and experiences regarding this complex issue before making a decision.

Opinion #1: Data created by the scouting system should be published as soon as possible (real-time)

The idea is that teams are incentivized to do first hand scouting if they have the resources to do so. First hand accounts are more reliable than second hand accounts from other teams. The scouting system serves to empower teams that do not have all the resources to run a full scouting team (like rookie teams) and facilitate the easy creation of scouting alliances. How this approach solves the free-rider problem is the idea that data gathered from your own team is more reliable than second hand data. Data publicly shared is only quantitative data, encouraging teams to scout themselves.
This opinion asserts the idea that the condition of using the scouting system is to publish the data. Private ownership of quantitative data is not considered to be a thing since the whole idea of the scouting system is for crowd sourced data.

Opinion #2: Data created by the scouting system should be published by the scouting team

The idea is that the free-rider problem causes teams to stop scouting entirely when they deem the data of other teams as better than their own. It also asserts the idea that the data created is, at least in some part, the property of the scouter, not the developer. Scouting teams should be in control of who gets access to the data, not the developer. The crowd sourcing relies on the good will (does GP apply here?) of teams instead of the automatic publishing system.

Compromise:

The compromise solution is that data should be published automatically after the event has concluded. This allows for teams to have temporary control of data during the event and public publishing afterwards.

Discussion:
Which opinion do you think is a good way to proceed with a crowd sourced scouting system and why? Do you have a different view? What experiences do you have with sharing scouting data?

Note:
I agree more with opinion #1 than #2, so I may have misrepresented #2 unintentionally. If you agree with opinion #2, please correct me.
I appreciate any and all feedback regarding this topic and I thank you for your time.

  • Opinion #1
  • Opinion #2
  • Compromise
  • Other

0 voters

I like the idea of publishing in real time, but would be worried about the veracity of the results, if relying on it. It seems that it would be relatively easy for someone to feed in bad data, even if unintentional. I’ve often seen unconscious biases from teams when they attempt to scout themselves, and all it would take would be one individual frustrated that their robot wasn’t ranked where it “should” be ranked to go in and submit a bunch of made up data to impact the results. This sort of a concern would push you towards option 2, where the data wouldn’t have an impact on the event itself. But when you do that, you lose a lot of the utility of having such data published. It’s an interesting problem to try to solve!

I think FIRST/TBA should start publishing all the gathered Field Data. They log start location, sandstorm line cross and ending location for every team individually. This would greatly increase the information available, while not being enough to remove the need to scout.

1 Like

Interesting…
Having bad data in the system is definitely a problem. I have thought of a few ways to try to increase the accuracy of aggregated data, including verifying data with TBA field data, per scout “reputation” system, and allowing the end user to exclude and filter data. I agree that this is definitely a problem and trying to find a good solution will be the challenge

FIRST/TBA already do. In fact, TBA gets its data from FIRST FMS data! If you look at the TBA API, you can get a lot of per bot data, but it doesn’t track everything (like number of cargo scored on rocket level 1 by robot #1 on blue alliance)

1 Like

Correct. The issue really is that the data the field needs is per alliance, not per robot. The field doesn’t care who scored that hatch panel or cargo, just that it was scored, and where it was scored. Scouters want more detail, but the field doesn’t need it in order to calculate the scores and ranking points.

The only reason the field cares about where each robot starts and which robots cross the line in auto is because that auto bonus depends on both. It’s both easier and more reliable to have the scorers put in the individual robot starting positions, and then mark which ones crossed the line. Asking them to remember where each one started when they mark which ones cross would lead to too many issues.

The field also cares about which teams get yellow/red cards, so it can appropriately carry that information forward for future matches, so that data is always provided.

Any scouting system that integrates data from the FRC Events API (or TBA) needs to consider carefully what data the field needs and what it doesn’t - by doing so, you can usually get a good idea of what will be available long before they announce the final API changes for that season. It’ll tell you what you can use to verify scouter data, and where you need your scouters to focus!

We’re a team that has a scouting app available for other teams as well, and we believe that while publicly sharing the platform to scout is key, publicly sharing the data between teams is not. We pull from TBA for general info, but also heavily rely on individually scouting each robot on an alliance. If teams want to make a scouting alliance, they can, but that’s their choice, and should not be a feature you can’t opt out of. We believe this for a few reasons:

  1. It’s our scouts, our time, our reward. We dedicate 6 students per match (1 per robot) for an entire event; that’s a lot of man hours. We, and any other team that uses our platform, put in the time in order to have a leg up on the competition. I’m all for sharing this data after the competition, but not during.

  2. Bad data is problematic. We occasionally get bad data, but we have a name attached to a scouting report, and can go to that student and tell them what they need to improve. If some random scout on another team keeps feeding in bad data, we can’t do anything to correct it.

  3. Inconsistent subjective data is also bad. We have several metrics that are on a 1-5 scale, and we specifically coach our scouts on how to rate these. Things like stability, reliability, driver skill, and human player skill. If we have data from other teams, those numbers will vary wildly from the standard we set with our own team.

  4. Allowance for custom metrics. We allow any scouting group to add whatever custom metrics they want (and they can name it whatever). If one group want to track just hatches, but another group wants to track each specific level of hath placement, they can both do that. Or if one team wants to track how friendly another team’s drive team is, they can add that in mid-competition. This however makes pooling the data on the fly near impossible. We believe in making our own decisions regarding metric, and we want others to be able to do the same.

  5. Self reliance is expected. Obviously FIRST is all about Gracious Professionalism, and helping other teams is a pillar of this community, but I feel that going to a competition knowing ahead of time that you will be continually leeching off another teams work, while having the ability to do it yourself, is abusing that courtesy. Just because 1798 can build you bumpers in their pit, doesn’t mean you shouldn’t try to make bumpers ahead of time.

Our goal is to empower teams to gather the data they want and control it. It seems like that’s not quite the goal this scouting app has though, as it’s more about pooling data and visualizing it. If all teams are inputting data to the same database, then all teams should have access to it in real time. Maybe make it so you have to have put in x number of scouting reports before you can view the data (a low number, like 3).

3 Likes

I’m not sure why the data needs to be published to a public database. You can make the database private and then give every team that is part of the scouting alliance their own credentials to be able to pull data from the database whenever they want.

If you form scouting alliances (in order to bolster the number of scouts available for small teams), then, in exchange for contributing the scouts, the team gets access to the data. If you don’t make it a requirement to provide scouts, and give the teams access to the data, no one is going to contribute the scouts to the system.

In other words, before you can crowd-source the data, you need to source the crowd. The crowd comes from other teams and those teams need an incentive to supply their portion of the crowd. The incentive is the data itself.

The data needs to be available real time. You could offer it on a push basis where the team comes and requests a data push from your scouting lead. But it is easier to set up a pull system where they can pull the data from the database whenever they want to. If they want to strategize their next match and want some data to support that, they can pull the latest and have it with them in queue. If they want to keep track of the data on the second day of a district event (after they have put their picklist together the previous night), they can periodically pull the data and look for any notable changes that might need to be considered. Bottom line is that real time data is extremely valuable. If you are designing a scouting system, this should be high on your list of desired features.

TL:DR - real time data in a private database that can be pulled at any time by any of the teams that are part of the scouting alliance.

2 Likes

The reasoning behind the public database idea is that the app could be a TBA like database, but with crowd sourced data. We have discussed requiring team verified accounts before access (i.e. make it semi-private) but I don’t know if this will draw in teams evaluating scouting platforms. Thanks for the feedback though, I’m still struggling to form concrete policies to try to best execute my vision for the platform.

1 Like

Hey Alex, our team did use your programs last season. I have thoughts there, but I’ll wait until we are into scouting this year because I’d have check where again what issues we had, and how much was our unpreparedness.

I’d be willing to try an open data system, even if teams that put little effort in getting the data. I mean, I put almost zero effort into collecting FMS/TBA data. Now I get the desire for privacy, it is a contentious idea to “give away your work for free”. I would suggest that to reach the greatest audience, to provide a private version. I think given the idea to do a public database, make the private version opt-in. Also, make this clear upfront.

Now the other problem, that free data will end up with too many freeloaders and not enough data collectors. I truthfully could do completely away with data collection and focus on analysis. After all, I want the students to think and apply their minds to the problem, and counting the number of hatch panels for 8 hours doesn’t inspire that. But I also think anyone who has scouted realizes that the better analyst is watching lots of matches, and combining the data with the knowledge and intuition they’ve gathered. There is still reason to scout, even if you could just download a dataset.

But maybe that isn’t enough. Events are small and even a small number of freeloaders could render the dataset incomplete. Not to mention we are allowing private users, who aren’t adding to the public data, so the pool of scouts is already diminished. This is truly the second part, that there should be some additional incentives to attract a larger pool. I’ve thought of a top match scout award, maybe with some type of award. This might inflate the upper level of enough to get complete data. It could be team-based. It also could be referral-based, like if you would get more points for referring other scouts when they complete reports (like a pyramid scheme).

Final point, comments are a concern, so I’d probably make them private if you have a comment section. Custom data is another area, where you can do what makes sense. Maybe some of it could be made public, but I’d have to think more about how.

Totally agree with the comments. If the app will handle qualitative data (comments, subjective ratings, etc.), those will totally be private by default.
Interesting point you bring up with wanting more analysis. This has been my driving vision for the app, empowering people to do analysis and stop worrying about how to collect the data.
Top scout awards? That’s an interesting idea, I might implement that. However, people may try to game the system if the way to get the award is not well thought out.

1 Like

This thread has got me thinking about how to design a public input scouting system and still get quality results. Obviously, we all know the potential for junk data even with a trained scouting team, and that potential expands greatly when teams are using a system they are unfamiliar with and with no quality control regarding the data entry. There’s also the issues of “freeloaders” that have been mentioned before.

What if it was “gamified” to an extent, where users (or teams) earn credits based on how many matches they scout, and they can spend those credits to purchase scouting data on other teams and/or matches. Rookies and low resource teams could be granted waivers or credit multipliers to help balance things out. That creates an incentive at least to enter match data and avoid the freeloader issue, but that could also incentivize the collection of “bad data” as teams try to gain credits for matches they didn’t actually scout.

In order to avoid this, I think you would need to associate users with teams (and have a way for teams to verify that users are really part of their team, such as an invitation system) to put team reputations on the line. I think you would also need to have a way to report and reject bad data entries. There’s also the potential for some sort of reputation or ranking system, where quality scouting data is recognized.

2 Likes

Very well put and exactly my feelings on the matter.

2 Likes

I do have a way to tie user to teams. FIRST has a per team account that you can add a facebook account to. Using the facebook account to login and that account can become the “source of trust” and all decisions taken by that account can be considered as an official team action (inviting user to team group, etc.)
I do want to avoid “gamifying” the system too much though as this could really incentivize very bad behavior, but that is an idea I’ll consider.

We have been doing scouting alliances with other teams at events for the past 4 years (before that we really did not do any real methodical scouting). The past 2 years (2018 and 2019) we have supplied the app. It has been a struggle to get people to commit the time to do the scouting in a consistent way so that the data in our database is valid. In general, we have needed to incentivize their involvement and the easiest way to do that is with the database. Keeping the data private and only sharing it with the teams that help scout is the only way that we have been able to make sure that those teams provide scouts.

I like some of the other “incentives” that have been discussed here, but all of them involve some layer of earning your way into the inner circle of the dataset. So at the end of the day, it is still a pay to play system. I would personally rather try to manage an “in or out” private system than layers of data within the database - some layers being public and others being private. I think it is much easier to say to a team “we will give you the full dataset if you provide scouts” and leave it at that.

We try to reach out to as many teams as possible before an event to build a scouting alliance for an event. If every team at that event agreed to be a part of the alliance, then everyone at the event would have access to the data. So, you could get to a data set that was effectively public. But our experience so far is that we are lucky to get more than 3 or 4 teams to join our scouting alliance and we still struggle to get 6 scouts in the stands for every match. If we took away the carrot of the data, we would have far fewer scouts than we get today.

Would publishing the data after the event still allow for the incentive to exist? Like, during the event, data would be in a “private” database that turns public after the event is over. Also, when do you publish the app? My aim is to have a working beta by end of week 1 of build season (last year, the collection system was done on kickoff day) through modular design. Would publishing early solve the problem of having people struggle to commit to a particular scouting app?

The primary structure of our scouting system (framework for the app and the framework for the database) is something that we try to get done in the offseason. Once we know the game, we have a pretty good idea of what data we will want to scout for, but I don’t think we really try to get the app complete until closer to week 0. We have made it available for other teams to use for their own in-house scouting, but I am not sure if anyone has actually used it or not. If we were to put this out publicly earlier in the build season, there might be more takers. I’m not sure.

So, the primary use of our app is for our own scouting program. We do crowd source the data by forming scouting alliances. For each of the events, we will start contacting teams a week or two before the event to see if anyone wants to join our scouting alliance. We typically scout some early events by watching the livestream on Twitch to test our app and based on the results we typically tweak the layout and button options of the the app up until a few days before our first event. We don’t really publish the app so much as push make it available to download on the first day of the event for all the people joining the scouting alliance.

We have not officially published the data publicly after the event, but we could consider doing that in the future. While it has some historical value, the performance of a team at one event is not always a good predictor of their performance at the next event. So we pretty much start fresh at each event. We’ve gotten a few requests from people who wanted to look at trends throughout the season and we have shared our data with them so that they can look at how a specific robot’s performance changed during the season. But most of those projects are post-season exercises and not really in-season event scouting projects. Thus, we have really not found any real value in making the data public after an event.

Getting back to your question; I am not sure how publishing the data after an event would incentivize teams to contributing to the scouting during an event. Scouting is done to inform your picklist. The picklist is needed for alliance selection. So, publishing the data after the event is over makes it worthless to the teams alliance selection process. Using data from one event to create a picklist for a different event seems dangerous and i am not sure i have heard of anyone using solely data from previous events rather than scouting the event they are at. So, I don’t think this would provide any real incentive for teams to participate in the scouting.

1 Like

Maybe I’m missing something, but why can’t you provide both option one and two and let the user/team decide if/when the data is published?

As for getting bad data maybe provide all quantitative forms from the scouters and let the app or user average the results. Sure someone might try to pad their robots stats, but if you see someones data from team x is off from every other response you can throw out their data. This is assuming that someone is not disingenuous with their identity.

Ah, sorry for the miscommunication. What I am talking about is the compromise solution. Teams can share and collaborate over a private database during the event (i.e. the scouting alliance) and then after the event is over, the data becomes public (turning that “private” database into a public database).

Yeah, sure. Nothing wrong with that.

I’m just not sure I understand who would want that data after the event.