Paper: 4536 scouting database 2017

Thread created to discuss this whitepaper.

This is a a scouting database which calculates component calculated contributions (OPRs) and other metrics using the data from the TBA API. Each sheet contains data for a distinct FRC event. A new database will be published weekly within a day or two of all of that week’s events being completed. For sheets which contain events that have not yet occurred, seed values for each category are available in order to aid in pre-scouting.

Key improvements over the 2016 database include:
Updated metrics for the 2017 game
All new metrics to determine defensive ability (coming soon)
Added Winning Margin Elo.
Added a “seed values” tab which contains an estimate of every team’s ability at this moment, whether or not they have competed yet in 2017.
Added in seed values for every upcoming event to assist teams in pre-scouting.
Added nicknames
Added a “home championship” column to the “world results” sheet.
Added championship tabs which will update each week as teams register for championships.
Better formatting (frozen rows/columns, sorting capability already added, vertical instead of horizontal column names).

I am publishing this database now primarily to get feedback before I update with week 1 data. Let me know if you have any suggestions for improvements.

If you want to use this for week 1 events, the only interesting information right now are the Elo ratings, as the other seeds are the same for every team and based off the official week 0 event.

My plan for additions/improvements is roughly to have the following done for each revision:
Week 1: Better seed values for all teams (I can make some pretty good estimates after we know how this game actually plays).
Week 2: Fill in all of the “estimated Gears” categories based on analysis of Week 1 data.
Week 3: Fill in the defense categories based on analysis of Week 1 data.

Super excited to see this come out. We were already planning on using the WM Elos to help with match strategy, but the 2017-specific numbers and seed values are going to help a ton with pre-scouting.

Out of curiosity, how did you determine that 10% mean reversion was optimal with the seed scores?

Out of curiosity, how did you determine that 10% mean reversion was optimal with the seed scores?

I performed a comparison a few weeks ago which analyzed the best seed values to use for 2016 categories when given 2015 calculated contributions. I have attached a summary of the results of this investigation. K2015 represents how much mean reversion used. The values in each category represent the RMS error in my week 1 predictions for each qual match. The absolute value isn’t as important as the relative values within each category though. Lower values indicate stronger predictive power than higher values. As can be seen, the best value to use for mean reversion varies between ~0% to ~20% depending on each category. Even when the best mean reversion value is 0% or 20%, 10% usually provides the next best results.

I could use different amounts of mean reversion for each category, but I think that would probably be over-complicating things considering that using 10% generally works well, and only causes minimal added error in the worst cases.

2015-2016 veteran seed comparison.xlsx (10.9 KB)


2015-2016 veteran seed comparison.xlsx (10.9 KB)

Thanks for the reply. That methodology makes sense and it is definitely clear how the 0 - 20% range is optimal.

The RMSE values for Stronghold scoring methods seem a bit high compared to the average scores for each category (ie. total points RMSE of 18.4 vs total points overall average of ~26 in quals, teleop high goals RMSE of 1.13 vs teleop high goal average of ~0.9 in quals). Given this, do you think the seed values will be better at predicting for the stand out, powerhouse teams? It seems that the standard variation provided for the ‘average’ team is almost as large or larger as their expected value in each category.

For some context, I have attached the mean values and standard deviation of values for all week 1 categories.

It isn’t generally very useful to compare RMSE values with averages, because although they have the same units, they represent fundamentally different things. It is generally more useful to compare the standard deviation to the RMSE. What you can learn from comparing averages to standard deviations is whether or not a normal distribution represents the data well. In cases where the RMSE is larger than the average (and values of <0 are not allowed), a normal distribution would pretty clearly be a poor representation of the data.

Comparing RMSE values to the standard deviation however, tells us how helpful the predictive model is. When the RMSE value is larger than the standard deviation, the prediction model is worse than just guessing the average score for each match. This can be seen for the foul categories, what this means is that trying to predict fouls with my current methodology is a fool’s errand, since I would have better predictions by just guessing the average foul points for each match.

Given this, do you think the seed values will be better at predicting for the stand out, powerhouse teams?

The seed values for 2016 depend on the 2015 calculated contribution to total points value for each team. These values are then normalized to a 2016 category and then reverted 10% toward the mean. I would be surprised if the seeds tended to be better or worse on average for high than for low performing team. I don’t really see how this conclusion would follow from your previous statements.

I hope my above helps to answer your question. If not, could you rephrase your question? I am not sure that I am understanding it properly.

2015-2016 veteran seed comparison.xlsx (11.4 KB)


2015-2016 veteran seed comparison.xlsx (11.4 KB)

Thank you again for the very detailed response!

I do see how a RMSE lower than the standard deviation indicates a useful predictive model but am still not sure why one would not want to compare it to average. If your RMSE is 2.5 gears and your average team is scoring 3 gears than it seems you still have high variance relative to the predicted values and can only get a general sense of team strength. On the other hand, an RMSE of 0.25 gears and an average of 3 gears seems to mean that you can reliably predict how many gears a team will score.

As for the strong vs weak team question, I understand now why the model would not differ for them. My original reasoning was that score values (for everything) in FRC tend to be heavily right skewed so strong teams would have a higher average/stddev ratio than weaker teams. On second thought, this doesn’t make much sense.

Week 1 data has been added.

I wasn’t quite sure how I wanted to deal with Elo considering every team’s Elo rating constantly updates. I have decided to post each team’s end of event Elo rating in the event sheets as well as in the world results sheet. Please be aware though that Elo is a fundamentally different metric that has some different properties than the other stats. The best place to compare Elo ratings is in the “seed values” tab.

What you are describing seems to be a ratio found by dividing the RMSE for a category by the category average. Let’s call this ratio r. In your first example, r1 = 2.5/3 = 0.83, in your second example, r2 = 0.25/3 = 0.083. Your statement appears to be that lower values of r indicate higher predictive power. Is this a valid summary of your thoughts?

Yes. Lower r = higher predicitve power is the assumption I am making. I am approaching this similar to how lower signal to noise ratios mean more precise measurements.

I take it there is something flawed/off with this approach, and would be interested in learning what it is. :slight_smile:

Hopefully this example will help. Let’s compare two two predictive models for two different week 1 scoring categories, predicting when teams will achieve kPa ranking points in quals and predicting when teams will get the first rotor activated. I am constructing these “models” after having looked at the data, so they aren’t true predictions, but you can hopefully see how the ideas generalize.

Model 1, predicting kPa ranking points: Let’s say that my predictive model says that, for every alliance in every quals match, there is a 0.0% chance that they will get the kPa ranking point. Alternatively, I say that each alliance will achieve 0 kPa ranking points each match. Using data from TBA Insights, we find that an alliance got 1 kPa ranking point in 3 out of 3100 opportunities. The square error for this model would be 3*(1-0)^2 = 3, and the RMSE would be sqrt(3/3100) = 0.031. The average kPa ranking points an alliance achieved was 3/3100 = 0.00097. The r value described above has a value of 0.031/0.00097 = 31.96.

Model 2, predicting rotor 1 engaged: Let’s say that my predictive model says that, for every alliance in every quals match, they will receive 1 rotor Engaged “point.” Again using TBA data, we find the RMSE for this model to be 0.082, and we find the average value to be 0.993. The r value is 0.082/0.993 = 0.083.

The r value for model 2 is hundreds of times smaller than the r value for model 1, so we should conclude that model 2 is a far more useful model than model 1. However, this is clearly not the case. Out of 3100 attempts, model 1 was only wrong in 3 cases, while out of the same number of attempts, model 2 was wrong in 21 cases.

Averages are basically just offsets. We can use them as the starting point for many models, but they only describe how far our model is from our 0 reference. Predictive models are all about trying to explain/predict the variance between data points, and how far away they are from 0 is not as important. Another way to think about it is that our reference point of 0, which we need to use to get an average, can be rather arbitrary. If instead of tracking the number of times the rotor 1 was engaged, we decided to track the number of times the rotor 1 was notengaged, we would change our average from that of model 2 (and thus increase our r value), but the variance would remain constant.

Using the signal/noise analogy, consider averages to be DC offsets, and not noise. Random noise is problematic for signal systems because it is unpredictable. In contrast, known DC offsets can easily be fixed by changing your ground reference.

Also, higher SNRs are better than lower SNRs.

Using the signal/noise analogy, consider averages to be DC offsets, and not noise. Random noise is problematic for signal systems because it is unpredictable. In contrast, known DC offsets can easily be fixed by changing your ground reference.

Thank you for the response and examples. I was having trouble seeing how a model with large expected values would be of equal strength to a model with low expected values and the same variance, but the DC offset example helped a lot.

Week 2 data has been added.

Since there are no weird Israel or Australia events in the next two weeks, expect week 3 and 4 updates on the Sunday or Monday after that week’s competitions.

Week 3 data has been added.

I’m surprised that didn’t elicit any comments :slight_smile:

Week 4 data has been added. Israel champs will be included in the week 5 update.

Hi Caleb,

How are you generating this scouting database? It looks like everything is static - there’s no calculation in the workbook itself.

Some of these statistics are extremely useful for scouting during an actual competition. For instance, ranking the “auto fuel high”, “teleop fuel high”, and “kPa added” calculated contributions can give a very accurate picture of who the best fuel robots are at an event. If this could be generated real time using TBA data, it almost means that teams wouldn’t have to manually scout fuel at all.

Also, since most teams climb at the touchpad in front of their driver station, the spreadsheet could automatically determine a team’s actual climb rate by associating the team’s driver station position with the “touchpad Near/Middle/Far” fields that are accessible through the TBA API. This could be a good sanity check or backup for scouting robot climbing ability. However, there doesn’t seem to be any raw data in the spreadsheet at all, and no mechanism to update the statistics in the database to reflect new data in real time.

Over on my team, we have a program to download the up-to-date raw data from TBA, but we are unsure how to calculate the team contributions. Since you’ve already done it, do you by any chance have the downloading/datacrunching code on GitHub for us to peek at and hopefully contribute to?

Thanks!

I had considered giving every team a pseudo climb rate based on how frequently the touchpad in front of them was activated. The problem is that the scoring table is generally on the feeder station side of the field, but not exclusively. For events where the scoring table is on the other side, these climb rates would be completely worthless since it is associating teams with the incorrect touchpads. I haven’t checked in a while, but if 90+% of scoring tables are on the feeder station side, I would be fine making this metric and lacing it with a couple of disclaimers.

I believe everything else you asked about should be contained in the 4536 event simulator. Let me know by PM or over on that thread if you have any ideas for improvements to the event simulator.

The event simulator looks to be an amazing scouting tool. I’ll show it to our scouts today - we’ll probably end up using for future events. Thank you so much!

Week 5 data has been added.

After the discussion above, I investigated the usefulness of a pseudo-climb rate that I am calling “anterior Touchpad Activation Rate”. Every week 5 event I saw on webcasts had the scoring table on the feeder station side, so I felt comfortable making this metric. Note that it will not work for any event on which the scoring table is on the boiler side of the field.

This metric correlates much better with climb rate according to our scouting data from the Iowa regional than does calculated contribution to teleop takeoff points. I would be interested to know if this holds true for other events as well.