The Great Database

I don’t know about you, but I’ve always heard that OPR and CCWM aren’t the best of statistics, and it makes sense since these are calculated by solving 6 simultaneous equations. This is why i call for The Great Database.

One thing that I’m planning on implementing with my team at Palmetto and Peachtree is a system to record the match statistics for each individual robot in each match. If we count individually how much each robot scores, we get the most accurate OPR, CCWM, DPR, and Average Alliance Contribution. It is a lot of manual data entering, but if one team at every event could put 6-12 scouters on the job of just recording one robot every match, it’s possible to compile all of these into one sheet and have more accurate ranking numbers.

I call for The Great Database to begin. We can have a team or two cover each regional. Any thoughts, questions, or suggestions are welcome to be posted here.

I love the idea of “the Great Database”, but sometimes statistics are not the most accurate measure of a teams impact as well.

In 2010, I know a team that scored 11 goals in 1 match. Most other matches they were under 2. The reason was that they were fed balls. It turns out when partnered with the “non-scoring” but passing team, others had such success.
In basketball, often a good player will get stronger defense applied to them. Sometimes even double-teammed while the player scores less, his team will actually score more as the other players are more lightly defended. In basketball, they are starting to use a metric called +/- that accounts for this.
This year, I suspect OPR or CCWM or LS will have a close correlation to team performance. It might have the closest correlation since 2008 in which it was was an earily accurrate predicter.

*Brownie points for why I bolded team.

You don’t expect elimination scores to be close to the sum of the alliance OPR’s?

This is so true. I am still trying to figure out how it happened, but in 2008, team 148 was the 3rd bot on the 1st seed alliance with 1114 and 217. I am assuming people didn’t pick 148 because they couldn’t handle a track ball? But their contribution to wins had to have been ridiculous. They were the best 3rd bot in the world that year (both by opinion and their championship banner) and they were completely overlooked by 7 alliances. Maybe there will be a robot who can feed disks to bots who can pick up off the ground fast letting them not have to take the time to go to the feeder station? Maybe there will be a bot who can assist robots in a 30 point climb with ease?

Every year I ask myself: “Who is gonna make a bot like 469 from 2010?” but a more realistic question is “Who is gonna be the next Tumbleweed?”

If it wasn’t for a last second decision to change the pick, 148 would not have been picked at the championship (unless I am mistaken, 148 was a last minute addition to 1114’s pick list after not making it Friday night). For a team like 148 that would have been pretty big disapointment. That is not to knock 148 at all, but maybe there is a reason they haven’t gone that strategy route again.

It is a risky strategy to depend on getting picked as the 3rd robot for a top alliance, especially for teams (like 148) that have the ability to build robots that can be alliance captians/first picks.

OPR has limitations, and a great database would definitely improve the usefulness of our data. You’d still have crucial non-numerical considerations that will often outweigh what the stats say, but it would be awesome to have real data instead of the estimates we currently have available.

967 would be up for providing data from Kansas City and Minnesota North Star. Admittedly, it has always been a challenge for us to put together a data set that is accurate for every single robot in every single match.

It shouldn’t be too hard to record the scoring from frisbees and lifting. Capturing penalties is tough, though.

Right, and I don’t mean to imply that this list be used as a pick list for each event. I just think it would be very nice and useful to be able to table some values and put them together in a spreadsheet. This could definitely be used as a secondary resource for scouting (the first one obviously being observation of the robot in action). I know for a fact this sheet would fail to provide a good 2nd pick, and hell, probably won’t be as great for a 1st pick, but being able to list some numbers does well when it comes to match strategy. We can concentrate on which robots to shut down when it comes to offense, and using the Average Alliance Contribution stats, we can tell which team will be effectively used as a feeder or defender, and which one will be used as an offensive bot.

Well as an ex-scout and now mentor, I’m gonna have to add some of my experience.

Anupam, I understand your want for The Great Database, but I feel it will turn out like Charlie Brown’s Great Pumpkin–>it does not exist. But I’m not trying to discourage you by any means. Ideally, I would love for there to be a database for all the teams, but there are just so many different things to consider.

You have scoring, maneuverability, driver’s skill, hanging, endgame bonuses, sometimes human player.

You can have 3 high scoring robots on an alliance, but if the drivers can’t cooperate and bump into each other the entire time, the opposing alliance has a chance.

For example, during Lunacy, the plan was for Exploding Bacon (shoutout to you guys!), because tbh they were the lowest scoring team on our alliance, to play defense on our opposing alliance. They did great the first match and we were confident we were going to win. Then my team got pinned and we lost the 2nd match. Got pinned in 3rd match, but out of nowhere Bacon came swinging and won the match for us with a last minute unload into the trailers of the pinning robots.

I mean, its not all about statistics, sometimes its about how you can cooperate and how you improvise.

This is a very interesting idea. A few details we should hash out first:

  1. How are we going to submit/store the data?
  2. Who will be in charge of maintaining the data?

If we make a few gross estimations (and correct me if these are too gross):

  • 78 regionals & districts
  • 75 matches per event
  • 6 teams per match

Then we can estimate that we’ll have something on the order of 35,000 total entries.

Right, and as I said above, I don’t wish this to replace actual experience. There’s a movie that comes to mind called “Left of the Curve” that deals with this same concept. The baseball scouts all were riled up about one prospect because of his numbers, but one of the old men scouts who didn’t just use a computer knew from observation that he couldn’t hit a curveball.

I’m in no way in support of numerical data replacing actual experience or observations, I just feel like having such a resource could be beneficial as a secondary resource to go back and look at, with the primary being upfront observations. This list will likely only be the most useful during the first round, not the 2nd round.

The purpose of this is to have a compiled list of statistics to help as a secondary source for scouting and primarily to be used before a match to look at whether your alliance is more offensive, defensive, and which robots to aim for when defending.

We definitely can start with a shared google doc (though overtime that would definitely be much too short. We can compile on event per spreadsheet and keep all of the event spreadsheets in the same folder.

There’s actually an easier solution here. If you break up the google spreadsheet into multiple pages, one for each event, the data becomes much easier, especially for sorting and analysis. There, you still have all the data on one spreadsheet, but it’s still comprehensible.

We can definitely do this to some extend, but one thing to note is that google spreadsheets only allows a limited quantity of cells. I know because we tried to have one sheet per event for Fantasy FIRST, but we hit the limit when trying to put them all into one spreadsheet. We can probably go week by week, and put week 1 events into one spreadsheet, week 2 events in another, etc.

I agree that this year OPR will be a good estimation of team performance. Again it is up to how you want to analyze it but I believe it will work out pretty well (except how penalties from another team add to anothers OPR). So with that in mind, I am very close to completely an Android App that I will release prior to competition season that will calculate team stats.

This app will have automatic OPR calculations on the fly as long as you have internet connection. It will take information from The Blue alliance, usfirst.org, The first alliance and FRC-Spy (the twitter xml feed). Will all of this data, it should be very accurate. Now instead of waiting until all of the matches are over to calculate OPR, all you have to do is hit refresh and it gives the OPR values up to that point. I got it down so it calculates a 100 team tournament in under 5 sec and on average takes 3 sec to calculate.

The app will also have average score, max score, rank, and all of the capabilities that FRC Tracker has (match results, team listings, team info, etc).

I also am going to add Match Predicting based on OPR and DPR and predict all future matches in a tournament when you hit the update button.

Anyone think that this will be useful?

I happen to be on the FF team TeamRUSH, and we got all of our pick lists and final picks, along with a summary and draft dates on a multi-page spreadsheet. Was each sheet filled with data for your team?

With the varied and difficult tasks this year, I suspect we’ll see many teams that specialize. I think you will see many “shoot the length of the field” teams, fewer “pick up from the floor” teams, even fewer “make in the top of the pyramid” teams, and even fewer “30 point climb” teams.

The key, as always, will be to pick an effective alliance that combines these traits. You don’t need two length-of-field shooters. One of those, one floor pickup, and one top-of-the-pyramid shooters would be a well-rounded team.

I don’t count climbs to 30, because that will be a different animal. Those robots may have huge trade-offs that make that 30 points less important that what some may think. Consider that an unblocked team may easily score 15 2-point goals the length of the field in 20-25 seconds. How many teams that climb will climb at speeds faster than that? There’s always a trade off.

Not to shock you, but OPR requires finding the least-squares solution to N simultaneous equations, where N is the number of teams in the data set being analyzed. N can vary from approx 50 if analyzing only one event, all the way up to over a thousand if doing end-of-season combined analysis for all events.

This type of data processing is widely used in many fields to process raw data to gain insight. It’s not the solving of simultaneous equations that affects the usefulness of the resulting statistics. It’s the assumptions that go into building the model. In the case of OPR for FIRST, the scoring method can have a strong effect on the usefulness of OPR.

One upped again by Ether :P. I definitely didn’t mean to downplay any of the thought that went into forming the OPR algorithm, but my old team has been on the bad end of OPR inconsistencies, so I wanted to make a call for an accurate calculation based on actually counting each individual robot’s score.