pic: Alliance Scores Over the 2011 FRC Season

This is a plot of the 10 match moving average of alliance scores in 2011. The x axis is matches in the order they were played (ie they are not sorted by event, just by timestamp). A moving average is averaging a block of matches together and basically each point you change one number in the average. In this way, you reduce the volatility of the data. The trends remain the same without the moving average, it is just much harder to look at. The data is from the @FRCFMS twitter feed, Andrew Schreiber was nice enough to mine it for me.

I think it’s neat for a couple of a reasons. There is a clear upward progression in alliance score over the course of the season. The spikes are all elimination matches, so you can see that elimination matches are clearly higher scoring. You can also see how large each week is compared to the others by the distances between the elimination peaks. Also interesting is that the Championship is clearly played on a different plain compared to other weeks, as the average match score is inline with a typical elimination round!

Worth nothing that some of the upward trend in the data between the peaks could be due to a difference in time zones as some regions may have been playing in elimination matches while other regions were still in qualifying rounds.

In case anyone is interested, I did some more work on pinning down the average robot which EWCP has on their blog.

Average points per robot across all qualifying matches in 2010 was 1.4, and in 2011 was 11.3. At your typical event, the 50% percentile robot is in the elimination rounds or on the verge of the elimination rounds.

Some of the most interesting other trends in that data were in both 2010 and 2011 about 20% of alliances scored zero points after penalties, and in both 2010 and 2011 penalties reversed the winner about 5% of the time, and turned a win into a tie about 10% of the time. I was not expecting to see such similar numbers between such different games.

Anyone care to do an alysis on how many teams would have to average net 0 pts. in order for 20% of alliances have a resultant score of 0 pts.?

for example, with dice, if I have 3 dice, the probility of at least 1 of them being a 1 during a roll would be 31/6 or 50%. the prob of 2 being 1s would be 3/21/36 or 4.5%. The probablility of 3 1s would be 0.5%. At a district event with 80 matches, there would be 160 alliances, and thus I would expect 1,1,1 0.8 times or 80% of events, there would be at least 1 alliance that got 1, 1, 1.

If 0 is assumed as the lower limit, then a 0,0,0 should be difficult to get. If FRC was on 2 vs 2, and 50% of the field could score 1 (or more), and 50% of the field could score 0. I believe you would expect on 25% of alliances to have a score of 0.

For 3 vs. 3, it should (in theory) be significantly more difficult… in theory. I guess my argument is that if “average” robot might correspond with your values, but the “median robot” may perform significantly lower…

Here is the direct link to your article Ian.


That is a very good point. I should’ve thought of that. :o It only strengths the argument that the typical robot isn’t as good as most people think on kickoff.

I’ll look at home to see if I can find my 2007 and 2008 scouting data from BAE to see if I can scare up an an actual points/match by robot distribution. It would be interesting to compare that to a distribution predicted by OPR, just to see how well that metric matches the real world.

Not an exact answer to IKE’s question, but a move in that direction. His point that even if the mean robot scores 5 points, the median (or 50 percentile) may score significantly less if there are outliers to skew the high end of the field is a good point.

I don’t have any actual data for how many points robots score per match. So I used OPR, as that should do a decent job approximating the real distribution. All data is from BAE in 2011, OPR was calculated from Bongle’s OPR program.

First up, a histogram of OPR. OPR can be negative, just as robot contribution can be negative (more penalties than points). The mean was 10.1, the median was 6.7. This certainly supports the hypothesis that the median robot is not as good as scoring as the mean robot.

I then wrote a short script to simulate BAE using the OPR predicted scores. Interestingly, it did a very good job at simulating the top 50% of the field (the mean and third quartile barely moved), but the bottom 50% of the field was not as great. You can see it in the dotplot, but in real life teams tended to score 0 points or 30 points more often than they did 15 points. In the simulated matches, they scored 15 points more than 0 or 30. You can also see the significant movement (6 points) in the 1st quartile between the real and simulated matches. Meanwhile the median and third quartile moved by .2 and .1 points respectively.

From this one regional analysis, it appears IKE is right. Even using OPR to predict alliance scores (and I have a feeling that is less skewed than the true distribution) the median robot scored about 30% fewer points than the mean robot. As an interesting side note, it looks like in 2011 OPR did a suprisingly good job at predicting scores of the top 50% of the field, and was less good with the bottom 50% of the field.

Very interesting analysis, well done.

Hypothesis for the poor prediction of the lower 50%:

-It’s mainly sig figs. You can see this in your “Actual Scores” dotplot; the minibot scoring put a dip in values in the mid range. Scores are not evenly distributed throughout the entire range (0~100), because the minibots caused wild swings in scores. If you scored at all with a minibot, you were swinging your score upward by a large %.

I would be interested just to see what % of scores were under 14, and what % of scores are above 40. If the percentage is similar, it helps my hypothesis. Basically what I am saying is that if a large number of scores are within a limited range of data (0-14) it will be more difficult to properly predict team order. With the scores in the higher range, you have more values at your disposal (40-100) so placing teams in that range becomes simpler.

Does that make sense?


I agree that minibots throw big wrenches into the works, it makes the scoring nonlinear and hard to categorize typical (is 30/0/30/0 worth the same as 15/15/15/15?). I am not quite sure what you are saying with the 14 vs. 40 though, I’ll chew on it some more.

See Chris’s post below. FLR is apparently not indicative of the average match because of 6v0. Other regionals remain skewed, so I’ll do one of those tomorrow.

In the meantime, I ran the same thing for 2010 just to see what it looks like and it pains a very different picture. I used FLR because BAE does not have posted scores on FIRST’s website, so Bongle’s OPR calculator won’t work, and I’ve never quite got my MATLAB one to work. FLR is still an older week 1 event though, so I would hope the trends hold. (famous last words right?)

Firstly, the OPR distribution is very normal. The mean and the median differ by less than 10%.

And compared to the halfway decent match we got in 2011, the only thing that matches up is the minimum score is zero. Below is a boxplot of and histogram to bear that out. The OPR predicted scores are fairly normal, but the true distribution looks more much like a chi-squared distribution. So in 2010 OPR over predicted the average score fairly significantly.

Will the OPR algorithm always produce some relatively normal distribution barring a big disruptive force like minibots? I could see how that might be the case, but I’m afraid I’m not nearly good enough at math to go about demonstrating that is the case.

Ian, I think you’re failing to account for the fact that FLR was an event that very quickly caught on to the 6v0 strategy. That would explain such a large scoring discrepancy.

Can you recommend a BAE-esque regional for replacement? :slight_smile:

EDIT: Yup, Chris is right, other regionals in 2010 are significantly more skewed. I’ll replace FLR with another regional tomorrow.

So, here is WPI then. It is skewed slightly, the mean and the median differ by .25. However, the OPR predicted distribution and the actual one match up quite well.

For this set the quartiles line up pretty well, with just a couple of outliers in the real world case. Chris, do you know if these were 6v0, or just exceptional performances?

WPI in Week 2 had similar depth to BAE and only one repeating team (20). Very little 6v0 was played (mostly by us but we had a reasonably accurate OPR anyway)

I don’t want to say 6v0 because that implies 2791 contributed to the point spread, but one of the 10 point matches had us playing “helpful defense”…

I would say the rest of the high points could be attributed to 230 being allowed to run the field without interference, scoring points even without a ball magnet.

Your OPR Alliance score estimator will always create a more normal distribution than the actuals because it is using an average value, and not a scoring distribution. the minibot is a great example of this, and you 30/0/30/0 vs. 15/15/15/15 hits the nail right on the head. Both of these scenarios have the same average and thus would add to the OPR scoring algorithm the same way. From an Actuals though, the 30/0 would lead to 2 groupings which is more accurrate for the 2011 scoring.

The point to my comment was that the “average” or median robot does score significantly less than most people would estimate.

Our kids did an estimate on the VEX game this year. I think the max possible score was on the order of 60 pts. I then let them work on an estimate of what a good score would be. Initially they came up with a figure in the 40s. We did some refining techniques and their new estimate was much closer to 24-25 points. At their tournament this past weekend, that was pretty much exactly where the “good” scores came in. One alliance hit a 29 during a match.

For 2010, the average alliance score was around 4 points, but this was partially skewed by having the higher scorers contribute to more alliance scores because eliminations data was used as well (16 elims matches at FLR relative to 74 Qual matches with the best of the best playing in 50% of those elims matches). If you only use Qualification data, the average alliance score will be slightly lower than 3 pts. which means the “average” contribution would/should be just under 1 pt. The Median being slightly below this. To put this into perspective, if you started in the home zone, and just scored the 1 ball in the home zone every match, you would be better than 50% of the 2010 field. If you could hang (worth 2 pts.) 100% of the time, you would be over 2X the national “average”. If you could put 1 ball in and hang, then you would make it to 3 pts. and be able to outscore about 50% of alliances, all by yourself. At an event like FLR, this would put you in the top 7 or so of teams. Top 7 and you are only pushing 1 ball in the goal, and hanging at the end…:yikes: If your goal was to be an alliance captain or picked, targeting those easy 2-3 points is a very reasonable target. Notice the strategic difference though between these 3 points (which can be accomplished in the home zone) versus 3 points from a different zone. 3 points kicking balls means moving 3 balls into the home zone. Then moving the robot into the home zone, and then re-collecting and scoring the 3 balls. By my count this is a minimum of 7 actions to get 3 points (if you consider acquire, and then transfer seperate moves, it can be as many as 13 moves). Versus the original strategy which is 2-3 actions for 3 points…

For 2011, similar analysis shows the average score for an alliance being under 30 pts. It also showed that minibots were frequently not launched at all. Doing a post season analysis, If you simply had a good reliable minibot system, (not even a sub 2 second minibot) you would win most of your matches. At an absolute minimum, a scoring minibot was worth 10 points which was again more than the “average” contribution and well above the Median. Compare this to scoring tubes. Top row tubes are worth 3 points. 2x if you make a logo. If you hang an ubertube, its 6 in Auto, and up to an additional 6 points if you make a logo over it. In other words, in order to score 30 points in tubes, you would need to score an Ubertube in Autonomous, acquire and hang 3 different shaped tubes, in the right order (one of which would be difficult as you are hanging it over a ubertube). Again, this is 7 actions just to get to 30 points, versus essentially 2 actions for the minibot (align to tower and launch minibot). using the minibot minimum of 10 points, you would still need to score and uber tube and then acquire an hang another tube over it in order to beat the minibot minimum. If you don’t have an autonomous, then you would have to hang a minimum of 3 different tubes top row creating a logo (6 actions) or 4 tubes top row not creating a logo (8 actions) just to beat the minimum minibot contribution…

This season:

  1. Do a scoring analysis (all the way to get and block points, and then prioritize the way to get those points with the fewest distinct actions).
  2. Do some field analysis. The best way to be playing in elims is to win qualifications and be an alliance captain. Be realistic on what a real alliance score will be. Understand that only about 25% of teams will get autonomous bonus points, and only about 25% of teams will hit most end game bonuses. Being able to get one of those bonuses every time will usually move you towards the top of the field.
  3. Be realistic in your goals, and relentless in hitting them.

The attached Graph shows the distribution of individual team’s season OPRs for the 2011 Season.
The trend you see here is pretty typical and is important when doing game analysis and strategy: Typcially about 25% of FRC population has a season contribution of 0 or less (26% in 2011). The 50% population point is much lower than you think. This has been the case in 2008, 2010, and 2011 since the GDC got “penalty happy”. (2009 was an exception with only about 5% being negative, but the distributions are the same, just shifted to the right). The average scores per team increases quickly as teams play more events: Last year the OPR average by experience trend was 6.1, 18.0, 27.7, 34.4, 39.2, for 1-5 events played. Notice that it nearly triples going from 1 event to 2.
The performance distibution follows a roughly Gamma distirbution for all the teams and is very assymetrical. Last year 532 teams have net contribution at or below zero, while only 112 teams were at 30 or higher.
However, this function changes dramatically the more teams play.
If you can achieve half of the season maximum at your first event (OPR of 35 last year), you will be in the top 5% or so in the world at the beginning. If you keep this same level of performance, by your 3rd event you will only be barely above average relative to other teams with the same level of experience.


Jim, this is incredible! I hope lots of people get a chance to look at this and IKE’s comments and realize that they are better off aiming low and hitting their goals than shooting for the moon and coming up way short. It also speaks volumes for having a robot done early so you can practice.

Since I assume 33 has pretty decent points per robot per match data, have you ever matched up the OPR with the actual points a robot is worth per match? Can you make any comments as to how good that fit is?

The fit changes every year based on how linear the scoring is and how dependent it is on your alliance partners. 2008 was the best year for OPR. Here’s how our scouting matched with OPR in 2008. http://www.chiefdelphi.com/forums/showthread.php?p=732115&highlight=opr+scouting#post732115

We do a comparison at the events we are at to see how good of an indicator OPR is. We don’t run stats on it, but mostly do visual checks to see if we are missing something or to look for trends. This probably would be a good thing to run stats on though…

Like Joe Ross said, 2008 correlated incredilby well. Especially if you used the top 2 offensive teams OPR. Unfortunately, negative OPRs also correlated well that year.*

2009 didn’t work well at all. Over 50% of moonrocks were scored by humans, and many teams rotated the human player position.
2010 was pretty good. With good teams though, 2+2+2 =8 at MSC, the scores inflated drastically as teams could do zone play because there were good teams that could pick up the slack. 2010 also had some unique strategies that would underpredict certain teams. 67, 254, and 1114 were so good that even though OPR had them at 8-10, they would frequnetly score that many points, plus attempt a couple of points for the other teams! In reality, in a close match, those guys could do 10-12. OPR would often underpredict good alliances that year. Our qualifying match against 254 had an OPR predictor of I believe 16 to 12. The actual match turned out to be 20 to 18 (still one of my favorite FRC qualifying matches even though we lost).

2011 was interesting. OPR ws a reasonable predictor, but due to digressive scoring, it had the opposite effect as 2010 had. In 2011, 50+60+45 = 120-130 was not uncommon. There wasn’t enough room for 3 good robots to put up a good score, and there were only 2 minibot poles.

*One partner in 2008 had a 6WD with omnis on the corners. Every lap the driver would inadvertently spin the wrong way when changes direction while doing a lap and would spin back over the line and get a penalty. I convinced a the team to zip-tie grip-mat to their omni wheels for a our match, and they only got 1 penalty that match.

Examples like this are where scouting can pay dividends. Often a team with a negative OPR is either breaking a rule or driving poorly. If you can observe their issue, and point it out to them, you can frequently get a few more points, ah er, not loose a few points that you likely would have without the comments. I was suprised that there were not more DQs in 2011 with all the red-card opportunities the GDC had in the rules.

Same data graphed by Percentage lets you see the trends a little better:
You can see how the population center moves to the right as experience increases
A little bit about the data set and this method. I have a database which has all the OPRs for all the team at each event they play, spanning many years. I take all of the OPRs and group them into catagories, this year it is in segments of 5 points per segement. I have used the “20 slices” method since 2006 to allow me to overlay data from several years worth of competition onto the same chart for analysis of multi year trends even though the games often have very different scoring systems.
Included in the 2011 data set:
Teams who played at least one event = 2053
Teams who played at least two events = 800
Teams who played at least 3 events = 244
Teams who played at least 4 events = 45
Beyond this, the popluations are too small to be relevant.

You can see from the chart some of the things IKE mentions: At 4 events, the teams are clearly limiting one anothers’ total performance as indicated by the big peak at 40-45. With multiple robots of this caliber, the per team score actually goes down.

Observation in the first graph. from about 6400 to 6600 on the x axis looks like MSC.

The peaks from the 6 competition weeks, then MSC at a much higher caliber (quals nearly as good as week 6 elims), followed by CMP with CMP quals at MSC elims level.

Just wanting to correct you slightly on your math here.

If you throw three dice, the probability of at least one of them coming up with a 1 is not simply 3 * 1/6. Using this logic we could then assume that if we throw 6 dice then the number 1 is going to appear every single time (which is false, the actual probability in this case is about 66.5%). When throwing three dice, the probability of throwing at least one 1 is equal to 1 - (5/6)^3. This number turns out to be about 42.1%.

The probability of throwing exactly two 1’s is a little bit trickier, but it’s not too difficult. There are 216 possible dice rolls for three dice, and 15 of those rolls have exactly two 1’s in them. 15/216 is roughly 6.9%. If we include the 1, 1, 1 case (that is, all situations where at least two 1’s come up) then our probability is 16/216, or 7.4%.

The probability that all three dice show 1’s is 1/216, or .46%, so you were right about that one.

And this is where the house edge on the casino game of Sic Bo comes from. Yay math.