Ranking Defense with primarily offense data in a game where defense does not require robot to robot contact?

Rapid React has been a really interesting game for defense, with defensive color sorting, ball hiding, launchpad covering in addition to the standard robot contact and blocking strategies. In addition, with the new qualification for world’s, many of the robots available for defensive picks at high levels have played primarily offense leading up to selection.

How do you quantifiably measure the defensive contribution of a specific robot in this sort of game?

Short answer is you don’t.

You sit down as a group, open the twitch vod and grind it out, subjectivity arguing about driver ability and overall robot robustness.

Making a first pick is easy. Making a 2nd and 3rd pick is HARD, and normally is the deciding factor at the highest level of play.

26 Likes

An interesting metric that could be used would be to have a place on the offensive robot’s scouting sheet to mark which robot was defending them (I know many teams, ours included, ask the scouter whether a robot was being defended). You could then calculate the percentage decrease from the average scoring output that a defender causes. Of course, this wouldn’t work for matches where multiple robots are playing ‘opportunistic defense’ but should give a fairly reasonable metric of the defensive potential of a robot.

For example, 2910 defending 973 in Newton Q72 in 2019 would have a 66% ‘defense score’ - restricting 973 to 3 game pieces where on average they scored 9 (data from Newton SPAMalytics - 2019 - Google Sheets)

1 Like

I agree with JJ. Also adding that I think the operative word here is measure. You can quantatate according to JJ’s method. Sit down and have a discussion, then assign a number. Once you decide how much to factor defense, plug that in as a factor with the rest of your data.

But the issue isn’t the actual quantification. It’s the measurement. If I wanted to approach this systemically, I’d sit down and watch a ton of defense. I’d then put the teams into a few buckets – great defense, good defense, meh defense, and ineffective defense. I’d then look at what those teams have in common that put them into the buckets. Maybe great defenders delay scoring by an average of X seconds for the opposing alliance or cause the opposing alliance to miss Y% of shots. Then, once you have this framework, you can begin to measure.

That said, the time cost of creating this data set and evaluation system is much higher than the time cost of watching the teams individually and deciding without using numbers…especially if your scouts flag good defenders throughout the day and you don’t have to watch them all.

2 Likes

So, there actually is a metric out there that does that decrease.

And I’m going to get a LOT of downvotes for this…

Defensive Power Rating, or DPR, essentially measures how much a given robot probably decreased their opponents’ Offensive Power Ratings.

It’s a nightmare to calculate, and to say it’s imprecise is an understatement. There’s a reason it’s rarely if ever used.

This is a really interesting idea.
One possible solution for an opportunistic defense strategy is to have a button on the scouting app, that, while it is held, records the time that they were defending against each robot.
Then we can split it into what percentage of the match the team was defended against, and possibly even which specific timestamps. We could then also collect timestamps for balls intaked and shot.
We could then combine that data with accuracy data; for example, find if they force shots missed, slow down average time between intaking balls, etc while they are defending.

I’d say that this metric reduces a significant amount of the ambiguity of DPR by targeting a specific robot’s impact on a specific other robot during a match. Similarly to the way the scouting based “average points per match” metric provides a more accurate picture of how effective the robot is offensively than OPR, this should do the same for defense.

1 Like

While watching defenders play defense is the best tool for assessing defense skill, it is not the only one. I think trying to distill it to one single number is generally an exercise in futility, but that doesn’t mean that quantitative scouting data is useless. If you can see that top scorers consistently score significantly less than usual against a particular teams’ defense, that team probably plays pretty good defense. If you notice that top scorers typically score just as much as they usually do against that team, their defense probably isn’t doing much.

The numbers aren’t always enough to come to firm conclusions but I think they’re a good place to start, especially for identifying teams at the extremes (very good or not very good at defense)

3 Likes

The issue with trying to infer game pieces prevented by defense by comparing scores is that teams/robots aren’t fixed performance quantities. The underlying assumption of these inference-based defensive metrics is that the delta between Team X’s individual match performance and Team X’s average match performance is the result of Team Y’s defense. But, in reality, any seasoned scout knows that individual match performances tend to vary across multiple matches. And the huge amount of variables involved in an FRC match can often make it difficult to assess a team’s “expected” scoring, let alone being able to isolate secondary impacts on that scoring from even smaller sample sizes.

First and foremost, these are human drivers who will have some matches better than others. And of similar importance is the fact that these robots aren’t really fixed quantities to deliver on that driver performance (they are both upgraded and break over the course of an event). Beyond that, there’s game and alliance specific things that can impact an individual robot’s performance. Scoring opportunities for specific game pieces could be maxxed out in games like 2017 and 2019, so having alliance partners that score more of those limited opportunities could suppress an individual match performance for Team X, rather than Team Y’s defense. In 2022, the randomness of the cargo return and/or how many cargo balls Team X’s alliance partners acquire can impact the cargo scored by Team X, rather than team Y’s defense. In a game like 2018, strategy decisions based on the scale state and time remaining can impact team X’s power cube sccoring, rather than team Y’s defense. In 2016 the trustworthiness of alliance partners to complete a breach and/or a weaken the tower can impact how much time Team X can budget on scoring boulders (and their high/low goal decisions), rather than team Y’s defense. And in most games, the decision of who and when goes to the end game activity will impact an individual match’s scoring performance for Team X.

So, barring things like a defensive team playing against the same offensive team four times in qualification matches, you are generally left with a very small sample size to try and infer the defensive impact of Team Y on Team X, and a whole plethora of variables to isolate with that limited sample size.

4 Likes

One of the best strategies I’ve heard of: At the end of each match, the three scouters of a give alliance have a short discuss and rank the robots 1-3, with 3 being the best driving capabilities and 1 being the worst. This over time creates comparable data on the strength of drivers at a competition.
I seem to recall this was a strategy first developed by FRC 2056, but I couldn’t find the source.

3 Likes

Is there any concern in weeding out teams with great drivers, and potentially a great defensive bot, that decided not to attempt defense during qualifications, or only attempted defense once or twice?

1 Like

As with any scouting system, the key is to figure out specific traits that you are looking to add to your alliance for playoffs and then scout for those traits.

Some examples of quantitative traits that we have scouted for this year:

  1. How many G204/G205 penalties did the robot get while trying to play defense? (honestly, during our early comps, this was the biggest deciding factor and robots that did not even have an intake at all made pretty attractive potential defenders for this reason)

  2. How often was the defender able to disrupt the opposing robot and prevent a score or cause a scoring shot to miss the target?

  3. Was the defender able to push the opposing robot (or at least hold their own in a pushing match)?

  4. Were they able to keep up with the opposing robot that they were trying to defend (speed)?

These on-field performance metrics, along with some additional qualitative measures like maneuverability, willingness to work with our strategies, etc., were used to sort through potential defenders. We also incorporated their offensive capabilities into the mix for things like taxi points, autonomous points and end game points.

The bottom line is that you need to scout for specific capabilities and not just some amorphous bucket called “defense”.

2 Likes

Eric, gave you a down vote because the DPR in the way it’s calculated it not mathematically or statistically valid. The mirror image OPR measures how much a single robot added to the score of an alliance given the average offensive output of those three other robots. The problem is that the DPR does not measure how the offensive output of the 3 robots on the opposing alliance deviated from the average outputs of those 3 robots across all of their other matches.

That said, we tried to calculate the DPR correctly a half dozen years ago and found that we couldn’t derive either a valid and robust estimator. This is probably the case because in many, even a majority, of the matches, the alliances don’t play effective defense on each other.

Given the few times that robots play defense it’s very hard to quantitatively measure their impact. The best defenders usually play offense unless they lose their offensive tool. For example watch as 973 jams its intake early and then virtually shuts down 8033. Semis 2 Match 1 - Sacramento Regional 2022 - The Blue Alliance

You want to look at how and what driving skills correlate with good defense in playoff matches and then look for those in the following competitions. That’s how we design our scouting system.

1 Like

Team 6328 calculated the impact of defense for a team in many different ways. At the end of each match, the scout will record (on a scale of 0-5) the duration and rating for that team playing defense, and avoiding defense (defense duration ratings: 0: none, 1: 0 - 15 secs, 2: 15 - 30 secs, 3: 30 - 60 secs, 4: 60 - 90 secs, 5: 90+ sec) similar to driver/intake ratings. We also calculated the average points of the team in every match. In Alteryx, we flagged teams that were under defense or playing defense for the majority of a particular match. We then created a spreadsheet that showed the teams points, the average of the points, and other subjective ratings, such as driver, intake, avoiding defense and playing defense) to see if there were any teams that could be defended easily, or teams that had a significant impact while playing defense. The data was mainly used along with subjective driver, intake, playing, and under defense ratings to determine the best defense team to choose.

  • Ayush
4 Likes

Though I would agree that it’s impossible to perfectly attribute offensive ability to defense, I do think that it’s definitely possible.

At champs this year, we adjusted our 2nd pickability model to account for proxies of driver ability. One great example of this was the (average # of intakes)^2 and (max # of intakes in a match). These were great indicators for driver awareness and were great proxies for their ability to play positional defense. We obviously still watched match video to evaluate their speed but modeling for these proxies definitely made life a lot easier.

We don’t scout intakes so we just use average balls to get our initial sort before we go through the list watching videos and discussing. # of intakes seems like a great first sort to save some time shuffling things around.

2 Likes

Yeah average number of balls scored was a tough one to use as a driver ability proxy since there were a significant number teams that had turrets that would overcome for their poor driving or bad shooters that would underrepresent their driving ability. So when you’re looking between a few teams, there are big differences in ability when you would originally think they were pretty even.

And the other factor was the variance of balls scored could be very large for one team and quite small for another, so the average wasn’t necessarily a good indicator for a team’s driving ability. In fact, it was problematic to use that even for our choosing our first pick which was primarily an offensive asset.

1 Like