[FVC]: Analysis Shows Improvement Possible in Ranking System

Previously, I’ve mentioned that I thought grouping teams in lots of four and then having them play three matches against each other (in each possible combination of the pairings) would speed things up and might also provide a better selection mechanism. Well curiosity got the better of me, so I put together a Monte Carlo simulation of using that scenario vs. a series of random assignments.

The quick answer for those who do not wish to read further is that grouping appears to improve the selection process. A more notable finding, however, is that the current system of ranking is a fairly poor predictor compared to other methods. In fact, if other methods were used, half as many matches could be run while still achieving more accurate rankings.

I have attached a PDF file that summarizes the simulation. If people are interested, I will roll up all the spreadsheets and post those as well.

For those who are curious, I will attempt to briefly explain the method used to arrive at these conclusions. But first let me add that I am not a statistician, nor do I pretend to play one on TV. So be gentle.

That said, I was trying to answer a Boolean question: Is one approach better than the other. I knew it would be helpful to quantify the difference, but that was not my goal. Therefore, you might want to check my underlying assumptions before using the numerical results as gospel.

In any case, it seems to me that we could line up 100 robots in a room, and argue for hours about the merits of each and probably not agree on the exact sequential ranking of the group. We could, however, probably agree that a ranking does exist; we just do not have the power to easily determine it. Fortunately, for this simulation, that is all we need and, therefore, I assigned each robot a random “seed” value, much as they do in sports such as tennis. If a simulation is using 96 robots, each one has been assigned a unique seed value from 1 to 96.

A higher seeded robot (one with a lower seed value) should win more often than a lower seeded robot. To simulate this behavior, I established a baseline score. The highest seeded robot received the full score while the lowest received none; those in between were prorated.

I made the assumption that the scores of an individual robot would follow a normal distribution with a standard deviation of 20 percent. In other words, if a robot played 100 matches, and its average score was 50, in 69 of the matches its score would have been between 40 and 60.

For each match I added/subtracted a randomly determined amount (that followed the above normal distribution for the upper baseline score) to the each of the robots individual baseline scores (as determined by the seed value).

This was done for each of the robots on a team, and their total score was matched against the total score of the opposing pair of robots.

The result was a table of wins and losses for the robots. I also tabulated the total points scored, and because someone had suggested it, a delta score that is figured by summing the delta between winning and losing scores–winners add the delta, losers subtract it from their running total.

At this point, all that was left to do was to compare the final ranking with the original seed value and determine the difference. A comparison the average difference and standard deviation would determine which assignment method was preferred.

After running the simulation 100 times, it appears that using the group assignment method results in more accurate rankings—particularly if the win/loss method of scoring is used.

Now for the interesting part: It quickly became apparent that the total points scored by a team was a much better predictor than the win/loss record. In an effort to simulate the current scoring system, I added another measure which ranked the robots by wins and losses and then by total points. This helped somewhat, but it is still nowhere near the ability of the total points method.

In fact, you can run fewer matches and still have a better result using the total point method of scoring. To get a feeling for how many matches would be necessary, I ran simulations for 3, 6 and 9 matches in tournaments having 32, 64 and 96 robots.

Ok, fire away.

FVC Simulation Summary 06-03-07.pdf (20.1 KB)


FVC Simulation Summary 06-03-07.pdf (20.1 KB)

Bravo!

Bill,

With ManicMechanic, I have been working on a match scheduling program that will use some heuristics to produce (and display/print in attractive formats) match schedules for N teams each playing M matches.

Assuming nothing knocks the train off the tracks, and no one finds a fatal flaw in your analysis; I should be able to fairly easily include an option that creates matche schedules of the ilk you describe. Assuming MM & I succeed, local leagues and FVC APs will have a tool that will help them employ the approach you describe.

Blake
PS: There might even be value in combining the two efforts (my non-random match schedules fed into your simulations so that the simulations don’t have to use random pairings in the match assignments).

Very cool!

This is no surprise. At the NorCal Chapionship, I felt bad for an above-average (IMO) team that had a winless record. Unlike several higher-ranking teams that had multiple scoreless matches, they never failed to score less than 25 points. Using the point system would have given a more representative idea of their abilities.

However, I’m not sure how well a ranking system based primarily on points (rather than W-L-T) would go over. People are used to the sports paradigm, where a win is a win, no matter how squeakily it was obtained. Also, defensive play would go out the window.

The saving grace to a limted system has been alliance selection. The above-average team described was selected in the first draft, as they should have been.

Let me preface this with the observation that perhaps FIRST intended the selection process to be purposefully uncertain. In that case, the process is working, and we should attempt to understand the rational for this course of thinking as opposed to attempting to improve upon the system. If this is the case, it would be helpful if FIRST would acknowledge that it wants the system have an element of randomness (and perhaps provide the reasoning).

Personally, I can come up with only three reasons for FIRST wanting a system such as this: (1) It teaches kids that the world is not necessarily fair or just. (2) It gives teams that have lesser robots an opportunity to experience the excitement of making it to the finals and thus inspires them to do better work in the future. (3) It adds greater emphasis to the scouting aspect of the game.

Both numbers 1 and 2 are probably valid, but I wonder about number 3. It would seem to me that you would want your top robots to become alliance captains. Because these are the teams that had the interest and energy to build a competitive robot, they are the teams that are most likely to have invested time towards devising and organizing a coherent scouting effort. They also have a clue that they might be required to act as alliance captains. Contrast this with a team that “accidentally” becomes an alliance captain. These teams are less likely to have formed a rigorous scouting effort and may not have done any work in this regard because they did not “expect” to be put in this position. This leaves them in the unfortunate circumstance of having to pick partners by using incomplete or non-existent data, which then results in alliance teams that may include the robots in the bottom quartile of the field.

In some respects, I think the unreliability of the current system has to do with the rather large disparity in abilities between the top and bottom performing robots. The nature of these pairings, coupled with the rather large step values that a win/loss/tie system creates, makes the system prone to unreliable results. The introduction of the grouping method normalizes some of this out. But as could be seen in the simulations, grouping has much less effect if a system with greater granularity (such as total points) is used. Although I did not test this, my instinct is that grouping would also show less improvement if the robots were more evenly matched. I may attempt to simulate this.

As I see it right now, the biggest weakness with grouping is that it requires a precise number of teams–in increments of thirty-two. Perhaps this could be avoided by using the pairing algorithm that Blake and MM are working on. I would be happy to run it if they give me data (or I can give them engine).

As MM pointed out, our culture expects winning to be rewarded and using total points might not be as readily accepted or understood (although who really understands the rationale for scoring into your opponents goals to boost QPs).

In any case, it is possible to create a system which adds “bonus” points to the winner’s total points tally. I may attempt to simulate this also.

I will report back with any results.

There is one major kink in your analysis, it negates defense (especially in the points scored sector). If you don’t think defense has an impact during a FVC event, you’ve apparently never been to an FVC event. In your over-all ranking analysis, you can claim that the “baseline score” is not actually points scored, but a ranking on ability, but I find that claim hard to support, and even harder to accurately quantify, especially considering the ability of the defender is also heavily dependent on the offensive robot.
In addition, I have a question. How were the groupings formed? If it was randomly, it still has a large chance of dropping a well-qualified team to the bottom of the rankings, as the four best (or well-qualified) teams have a chance to all be in the same grouping. In that scenario you could see one of the best dropped to the bottom, and/or multiple (if not all four) lodged in the middle of the rankings. The inverse applies as well, as it can still propel a sub-par robot to the top. I don’t see how that would solve much (even if it slightly reduces it).
A large part of FIRST (imo, although I know many who support my claim) is being able to play against every (or many) different teams and robots. I wouldn’t want to interact with the same three teams the entire time.
This change would not be well received by the FIRST community as a whole. It has had little exposure and reaction because of the particular time of the year and sub-forum it is in. But if you look at the similar situation from the Week 1 FRC regionals, you will see how the FRC community reacted. At the NASA/VCU, Pacific Northwest, and Granite State regionals each team had a “partner” they competed against in every qualification match. My team, FRC 116, was included, as we competed against FRC 122 in all eight qualification matches. The reception by the teams involved, and over-all FRC community was almost entirely negative. St. Louis and New Jersey also had similar situations that weekend, but not 100% (same reactions though).
http://www.chiefdelphi.com/forums/showthread.php?t=55178&highlight

In closing. FIRST will not accept a schedule that isn’t random, and for good reason. In addition it would be next to impossible to impose a ranking criteria
based on individual team performance (and unfair if it does not include some form of defensive ranking ability), nor would all of FIRST support it.

Ahhhh - A debate worth sinking one’s teeth into :slight_smile:

It doesn’t negate defense. It does ignore the effect defense can have on match scores.

Bill’s “model” (in the mathematical sense) for what an FVC robot does is purposefully one-dimensional. Any model is imperfect. This one has a substantial imperfection, in general, but I have yet to see anyone demonstrate that that imperfection is important in this particular discussion.

Because in this sort of analysis one can play the part of a god and can know the proper ranking of every robot in the simulation, Bill chose to make the ranking metric simple. This allowed the simulation to clearly focus on whether a few rounds of 4-team, 3-match, mini-tournaments would create a ranking that is close to the correct one (known by the god-like analyst who set up the experiment).

His results are that if you want to employ a system that lets you discover what that correct ranking is, his system of having randomly chosen groups of 4 teams stick together for three quick matches appears to figure out the correct rankings faster than randomly selecting a totally new group of four teams for each and every match.

And… he has a body of results that can be (and should be) reproduced. No amount of anyone’s opinions or arm-waving can make his results untrue. If anyone wants to reject them they need to choose one of these options and then do the math.

  1. Devise a simulation that includes the effects of defense and show that Bill’s hypothetical match scheduling method does not reveal true rankings as well as the new-teams-every-single-match method(s), once defense has been included. I personally do not think that this is a foregone conclusion.

  2. Show that one of his assumptions, inputs or formulae is off-target in a way that makes his results inaccurate.

  3. Something else equally rigorous.

You don’t want to go there. Trust me - Bill has seen an FVC event.

Read it again. You are looking through the wrong end of the telescope.

Because Bill (or whoever duplicates/checks his very repeatable work) sets up the experiment, before the simulation runs he/they can perfectly sort/rank to the simulated robots according to their scoring ability. He knows exactly how good each robot is (using scoring ability as his metric) because he assigned those scoring abilities. The simulated tournament’s job, much like the jobs of a real tournament, is to attempt to (re)discover what that perfect ranking is.

Monte Carlo simulations run the same problem over and over again, scrambling (in appropriate ways) the initial conditions and inputs. At the end of the simulation (limited by the fidelity of your simulation’s models) you have a fairly good picture of how well the algorithm you are testing will perform over the long-haul.

Pointing out a few situations where the algorithm works poorly is not adequate evidence that the algorithm isn’t useful. I dare say that current methods and other alternatives all fail in some unusual situations. What is more important is how well they perform in most situations, and/or whether any of the algorithms being weighed produce completely unacceptable results in any pathological situations.

Bill does not recommend interacting with the same three teams all the time.

Bill suggests a quick 3-match mini-tournament for a first group of four FVC teams; followed by another 3-match mini-tournament for a different group of four teams, followed by …. until all teams have played in a mini-tournament. Then you scramble the teams and do it again… until time runs out for the entire tournament. In each/any mini-tournament the teams would be matched up like this:
Match 1 AB vs CD
Match 2 AC vs BD
Match 3 AD vs BC
You can see that in any given mini-tournament, a team never has the same ally more than once and faces an ever changing opposing alliance in the three matches (played in rapid succession).

Bill has not necessarily suggested it to the FIRST community as a whole. The strong implication of his original message is that FVC tournaments might benefit from using it. Even if he had recommended it to all of FRC, FVC, FLL, JFLL; I think it is a bit presumptuous for you to assert that “the FIRST community” would dislike a method that could be proven to generally produce more accurate “top 8”s than current methods (and that is likely to be no worse than current methods in extreme situations).

PS: The results of the math will not change if they are posted at some other time or under a more high-falootin heading.

As you may remember, I was there with you at NASA/VCU and I agree that that match scheduling method was a mistake (http://www.chiefdelphi.com/forums/showpost.php?p=589163&postcount=55). This one is very different. Also a pure application of this one might not be well-suited for use with 3-team FRC alliances. I don’t think Bill suggested using it in an FRC context.

That’s an interesting opinion you have there; but I’m not sure which schedule you are talking about. Bill’s thought experiment is chock full of randomness.

And… the last I checked, neither you nor I nor anyone else was authorized to speak for all of the participants in FIRST activities or for all of the folks who organize and run the FIRST corporation.

If I understand it correctly, Bill’s approach does not attempt to do this. Bill assigns the alliance’s score to the robots in the alliance. He uses a running total of these points to rank the robots at the end of a full tournament. He does this instead of assigning 2 qual points to each robot on a winning alliance.

Blake
PS: My large font “Bravo!” reply to Bill’s original message was intended to commend the methods he employed. Whether the results stand up in a more sophisticated analysis is beside the point. Well-written presentations of results obtained through proper use of the scientific method, computer science and math should be applauded.

My general point about defense was that his formula didn’t factor in defensive play properly to create an accurate simulation. In many situations, a defensive team will not score a single point, but can have a MASSIVE impact on the match as a whole. It is also impossible to quantify the value of a defensive bot (even as the person setting up the experiment), because the value of that bot changes each and every match, based on the offensive bots it is facing. A purely defensive bot cannot possibly have a value any higher than the combined value of the two offensive bots it is facing (ie, it cannot negate more points than they can score). In addition, a defensive design may or may not be effective against all offensive designs (ie, some bots this year were excellent at defending the high goals, but couldn’t scoop balls out of the low goals and vice versa). Or the situation four purely defensive bots (not capable of scoring, which is possible, particularly if the scoring method is rather complicated or difficult, such as the FRC objective this year) are in a match against each other, the result is guaranteed as a tie (something not addressed in his simulation).

Even though you only have the same partner once, you still interact with the same three other teams three matches in a row. In a six match tournament (fairly large by FVC standards), you will interact with only six other teams out of the entire field during qualifications. The reaction to the FRC events of this past season suggest that the FIRST community would not accept this (look at the multiple threads about the scheduling algorithm for the 2007 FRC season), and in addition, those same threads suggest that the FIRST community favors random match generation (in terms of teams participating, they still value space between matches).

A similar argument could be used for the scheduling algorithm put in place for the 2007 FRC competition, which seemed to be designed to create more competitive matches (a good thing right?). The FIRST community reacted with disgust to this, and especially its side effects which limited the amount of others teams that teams got to interact with (ask 1504 how they felt about facing 25 four times in Atlanta). FIRSTers, or at least those on CD, have shown a desire to play with as many different teams as possible on more than one occasion in the past.

While his scheduling method doesn’t base teams on individual performance, he also gave math showing that a more proper ranking would be based on individual points scored, which, as I said, would be next to impossible, and would discredit defensive efforts.

This is a fascinating thread; Great job, billw. I have one suggestion that I believe would make the model more realistic.

The initial deterministic ranking of teams from 1 though 96 implies a linear distribution of team ‘merit’ from top to bottom. In reality I suspect that relative team merit within a tournament more closely follows a normal distribution around a mean. What if instead as an input you assigned each team a baseline score between 1 and 96 with a determined mean and standard deviation? You would still be able to rank the teams 1 through 96 but the top team might have a basleine value of 92.4, the middle teams’ baseline values would be more closely bunched, etc. Then continue the exercise as you have done.

I would be interested to see if this makes a difference in the simulation outcomes.

First, I apologize for taking so long to get back to this discussion—I’ve had too many balls in the air lately.

I would like to thank Blake for covering in my absence. He has accurately described the simulation, and I am in alignment with all of his comments.

I would like to elaborate on three of Sean’s concerns:

  1. Non-random groupings have been shown to be fraught with problems.

I was dumbfounded when I read the FRC thread you referenced above. It seems that the outcome was entirely predictable–and unsatisfactory. Playing the same team repeatedly will only skew results, not refine them. Sadly, in the end, the true top eight teams were probably not represented as alliance captains in any of these tournaments.

During this past VEX season, it appeared to me and others that a similar problem was occurring and that many teams which belonged in the top eight were not selected. My suspicion was that because very good teams were occasionally paired with a very poor team (or one that did not even show up), they would record a loss. In a similar fashion, some poor teams would record an “undeserved” win. The fact that only a few matches were played, only exacerbated the problem and did not allow the results to “average” out. Furthermore, the ranking points did little to help refine the results among teams that were tied.

It occurred to me, however, that forcing the group members to play against each other would do two things: it would spread these disparities more evenly, and it might add enough efficiency that more matches could be played. I was simply attempting to answer the question: Would such a system result in more accurate rankings?

The answer is, Yes, it does appear to improve the results. It may also allow more matches to be run during a tournament. And although some teams have more interaction with some teams than others, the overall rankings are more accurate, which implies that the grouping algorithm is, in fact, more impartial than the current system.

Hopefully, you can now see that the goal of this system is really aligned with what you and others ultimately want—an unbiased ranking system.

  1. The simulation does not account for defensive abilities:

This is true insofar as there is not a discrete input which attempts to quantify this behavior.

If another random variable was added to account for defense and then combined in a linear fashion with the existing metric, we would essentially be summing two random variables to create a third—which by virtue of the central limit theorem would still have a normal distribution–albeit with a lower standard deviation. (My statistics are very rusty; someone please jump in if this is somehow incorrect). Thus, I believe that the seed value assigned to each robot inherently accounts for defense as well as offense. If the defense and offensive capabilities were exponentially different, a second variable would be necessary (as would a much faster computer because the simulation would take 100 times longer).

A larger problem, in my mind, is that the simulation assumes that a normal distribution accurately approximates the overall performance of the robots. Given, my personal experience, I would be surprised if this were the case. I base this on the fact that I have seen far too many robots experience partial or catastrophic failures which effectively disable them for the entire match. In a normal distribution, this occurrence would be quite rare, but in practice it is not. On the other hand, actual data would be required to determine the true distribution. Without that data I am forced to rely upon a distribution which is probably “close enough” to make the simulation valid. As an aside, there are also many very low scores in the data, so perhaps, a normal distribution is not that far off.

One last note regarding defense: I am not a fan of aggressive tactics where one robot is purposely pushing (and possibly entangling) another. I will elaborate on this idea further in the game suggestions thread, but I do not see this type of strategy as being appropriate for VEX. I did not see many examples this past season of a more refined defense whereby a goal was blocked or emptied of balls—both of which I would consider appropriate. I understand that FRC has a much different platform, and I can understand all of the benefits of the defensive tactics as they are applied there, but I am not convinced that they are benficial for VEX.

  1. This would never work in FRC.

This simulation had nothing to do with FRC. The grouping algorithm would require far too many matches and robots. It really only works for a paired situation as is used in VEX.

Sean, I would be happy to further discuss any of the above if you wish.

Josh, I think you make a good point. I would be curious about the outcome of normally distributing the “ability” level as well. Unfortunately, it would take quite a bit of time to incorporate this change. At some point, it may be necessary to make some other major revisions. I’ll put it in then, if possible.

New Data: Since the original run, I have been bothered by the fact that I did not add in ranking points. I have since modified the simulation and replaced the “Delta” method with a W/L/T + ranking points. The results do not show much improvement. They are attached.

For those who are not looking at the results, keep in mind that simply switching to a “Total Points” system far surpasses the ability of the current algorithm as well as that of the grouping algorithm. Furthermore, it does away with the inconvenient requirement imposed by the grouping algorithm that a specific multiple of robots be used.

One of my largest concerns with the VEX, now the FTC, program is that it will be shackled to the paradigms used by the FRC program. It is a new platform with different constraints and possibilities, all of which should be considered. Again, I will elaborate further on this in the other thread, but the real question is: Should FIRST be exploring an alternative ranking system? If you think that this is a worthy undertaking, you might want to send them an e-mail or add a note of support/dissent to this thread so that some sense of the community becomes apparent.

As I mentioned before, I would be happy to pass along the spreadsheets used to arrive at these results to anyone that is interested.

FVC Simulation Summary 06-23-07.pdf (20.1 KB)


FVC Simulation Summary 06-23-07.pdf (20.1 KB)

My point was not saying that it would work in FRC, but to merely reference the results of non-random schedules in FRC. After the Week 1 regionals, the problem with facing the same team every match was fixed (although high rates of repeat match-ups still occurred), the other non-random features still existed. In short, the algorithm sorted teams by age and grouped them to (theoretically) create more competitive matches. http://www.chiefdelphi.com/forums/showthread.php?t=55622
Once again, the FIRST community reacted with disgust to the non-random features, and the inability to play against a wide variety of different opponents. While I recognize you aren’t doing the “social experimenting” of organizing teams by age, you are still lowering the amount of different teams each participant interacts with. In an average Vex tournament, each team would only get to interact with either 3 or 6 other teams during the entire qualification rounds with your system. With the current system, they could get 3x that number of different opponents.
My general point about defense is that the defender’s “ranking” within the group would vary depending on their opponents. They cannot negate more points than the opponent can score, and therefor cannot have a higher ranking than the combined rankings of their opponents. Thus, they change each and every match, and a ranking assigned for the entire tournament for them would not be accurate. (Regardless of whether or not defense should be played, at this time, we have to assume it will be).
Everyone has a problem with “false” captains ranking in the Top 8, but I don’t believe that this is the solution. Even if it creates more accurate rankings (which I debate the accuracy of due to defensive variables), you sacrifice too much in interaction with other teams to make it worth it. The most reliable way, under any format (and your data shows this too) to increase the accuracy of the rankings is to play more matches. I think our efforts would be better focused on how to make competitions more streamlined and quicker running in order to have more qualification matches. I assume you attended the Farmville competition, given your location in Virginia and relationship with Blake, where there were only three qualification matches for each team. Most other competitions got more than this, causing more accurate results.

And I agreed… What I challenged readers to do is to show that using a more accurate simulation will cause Bill’s suggested method to produce an end-of-tournament ranking in which the contestants’ estimated ranks (that are a consequence of their performance during the tournament) are farther from their correct ranks than are the estimated ranks that result from using the current FVC method.

In a non-linear, multi-dimensional problem space such as the one you (LL) portray fairly well in your message, ranking the entities in the space (i.e. the robots) according to a 1-dimensional metric (which is exactly what BOTH Bill’s suggestion and last season’s FVC (and , FRC, FLL, NFL, NBA, NCAA, FIFA, etc.) methods do) results in a compromise. If you accept that the BW suggestion and the current FVC method intrinsically suffer from this compromise, then you hopefully come back to asking two questions.
• Will a more sophisticated simulation cause the intriguing results of the Bill’s first simulation to evaporate, or will the increased realism be mostly irrelevant to the result (i.e. will the BW method still create end-of-tournament ranks that are more accurate than last season’s FVC method?)?

• Is there anything other than creating an accurate 1-dimensional ranking (so that the “top” 8 teams can become alliance leaders) that should be used to choose the match parings (i.e. Should the number of other teams encountered on the field be important, or should creating close contests (weak-vs-weak, strong-vs-strong) be important), or…?
The question is not whether Bill’s simulation is 100% realistic. The question remains: “Will increased realism change the result? Will the rank assigned by the experimenter (which will in itself be a compromise in a non-linear, multi-dimensional problem) be reflected more accurately by the results of a BW tournament or by the results of a last-season FVC tournament?”

Let’s do the arithmetic:
ASSUMPTIONS:
• Assume 32 teams, 1 field and 6 hours

• Further assume paces of 1 match per 6 minutes for current FVC methods and 3 matches per 12 minutes in the BW method.CALCULATIONS• Last Season method
[INDENT]o Current FVC appearances on the field = 7.5 per team = {(6 Hrs x 60 Min/Hr) / (1 match / 6 min)] x (4 teams/match)}} / (32 teams /tournament)
o If each match is filled with a new ally and new opponents, a team will see 22.5 other teams on the field in a total of 7.5 matches.

o Note that last season’s methods did not guarantee that each match was a combination of teams that had never seen each other on the field before.• Other method being discussed
o BW appearances on the field = 3.75 tri_matches per team = {(6 Hrs x 60 Min/Hr) / (1 tri_match / 12 min)] x (4 teams/match)} / (32 teams /tournament)
o If each match is filled with a trio of other teams, a team will see 11.25 other teams on the field in a total of 11.25 matches. More matches, more time on the field, more accurate selection of the “Top 8”, fewer other teams.
• The suggested (BW) method with two fields available
o Choose to run two fields in parallel to get the non-trivial benefit of seeing more teams, in-person, on the field, as both ally and opponent. Do this in addition to using the BW suggestion to get the non-trivial benefit of a substantially more accurate end-of qualification ranking.

o Now each team sees 22.5 other teams on the field (same as in the single field FVC method).

o Each team gets to appear in 6.5 tri-matches. This results in 19.5 actual contests on the field, instead of the 15 they get from a two-field version of the current FVC method.[/INDENT]Assuming that the simulation’s accuracy holds up when/if the nonlinear effects of defensive ability against different types of scoring approaches are added to it (see above and other discussions about whether this is necessary), I still see Bill’s suggestion being worth some serious consideration. It has attractive advantages and tolerable disadvantages.

Analogies to last season’s FRC are weak at best.

In addition to once again pointing out that I was one of the people who disliked last FRC season’s match scheduling algorithm, and pointing out that the FRC algorithm FIRST desired apparently did not get fully implemented, let me make a few other points.
• The BW suggestion should not be implemented in a surprise.

• The amount that the BW suggestion has teams opposing or allied with repeat teams is not so much more than the last FVC season’s method did
[INDENT]o The first match of any tri-match is exactly like a match in the last season method.

o The second match is similar to the many matches under last season’s method in which you encounter past allies/opponents.

o The third of the three is definitely something that would be very unusual under last season’s method.
[INDENT] However, if everyone involved knows why it is occurring, and has been shown how it produces more accurate rankings; and if everyone has been reminded that they are getting more time on-the-field as a side benefit; then I think that contestants will, in general, take the change in stride, even if they don’t wholeheartedly embrace it at first.[/INDENT] [/INDENT]

To steal a phrase from you, that was something of a “social experimenting” attempt. Also, it was based on the flawed assumption that team number is a useful enough indicator of team success on the field.

I’m not 100% sure what that fiasco was supposed to accomplish, but it was implemented incorrectly, was based on flawed assumptions, was a surprise to general population, and so far as I know, can not be justified by any public simulations of its consequences. In this thread we (all of us) are not making those same mistakes.

You are going to have to point this out to me. (with a link or a quote). I don’t recall seeing this in the results or in his conclusions/conjectures derived from those results.

Are you perhaps thinking about his descriptions of the experiment set-up in which the experimenter knows/assigns a scoring ability to each simulated robot so that the experiment can later tell us if the final rankings are a good estimate of those assigned scoring abilities (scoring-ability is the one-dimensional metric that is used to order/rank the one-dimensional simulation’s robots 100% “correctly” when the experiment is set up).

Blake

Bill’s method does not produce non-random.schedules. Bill’s method randomly selects teams to populate (tri) matches in which the 4 randomly selected teams get to enjoy competing in all permutations of 4 teams rather than being stuck with the results of only one of those permutations. To say it another way, **random **selection of the teams in matches is combined with the tri_match notion that improves the tournament’s ability to identify valid alliance captains.

See my other message below. Are my assumptions reasonable? Is my arithmetic correct? Do the results lead you to rethink this objection?

I agree; and I agree that there is a further complication because a defender’s effect on any one match strongly depends on the scoring method of the robot they are harassing. However, I am not convinced that this will change the net result of the Bill’s original simulation.
To assert that, one either needs to do the math or run an adequate simulation. No one has posted the results of doing either.

At the worst, I think that accurately incorporating defense into the simulation will reduce but not eliminate its advantages. At best I think the advantages will continue to be substantial. These are my opinions, and like all of our opinions on the subject, they have limited usefulness without some hard facts to back them up.

See my other message below. Are my assumptions reasonable? Is my arithmetic correct? Do the results lead you to rethink this objection?

If I recall Bill’s results correctly, dramatically increasing the number of matches in a tournament run using last season’s methods had far less effect on the rankings’ accuracy than the effect of switching to Bill’s suggested method.

I think you are up against a diminishing-returns situation here. See my comment immediately above and see my other message below. Are my assumptions reasonable? Is my arithmetic correct? Do the results lead you to rethink this objection?

Bill and I have spoken in person for about 15 minutes once and have exchanged a few emails. You made a good guess, but are mistaken. He and I aren’t close friends. I simply like the results of his math and the carefully-described, easily-repeated, easily-tested method he used to develop them. Future scientists, engineers, and business managers/owners should learn how to apply the same methods and rigor.

I don’t remember the exact number but I am pretty sure that teams at the Virginia FVC Championship (in Farmville) were able to participate in more than 3 qual matches. I think the number was more like 5 or 6.
Regardless, Bill’s team went to another championship event and to the FVC World Championship, and I don’t think his suggestion is prejudiced too much by his experiences at any one of those events.

Blake (still enjoying the debate) Ross

Blake, Thanks for filling the void once again.

Sean is correct in that I actually did say:

For those who are not looking at the results, keep in mind that simply switching to a “Total Points” system far surpasses the ability of the current algorithm as well as that of the grouping algorithm. Furthermore, it does away with the inconvenient requirement imposed by the grouping algorithm that a specific multiple of robots be used.

If I were to lobby for a new system, it would probably be in the direction of a random assignment of partners, playing in the grouped method, using a total points method of scoring. I haven’t actually tried to simulate the above, so I can only speculate that it would produce the most accurate and flexible system.

Sean, your comments made me curious as to the probability of repeat partners (offensive or defensive) that a team would typically encounter when assigned on a random basis. Obviously the probability goes up as more matches are played and fewer unseen partners remain.

If I have done the calculations correctly, a random selection will produce fewer repeats than the grouped method in most cases, but the number is not as dramatic as you might expect. In fact, when using 32 teams, the number of repeats is about equal over 6 matches and actually less for the grouped method when playing 9 matches.

Note that in the grouped method, you will never have the same offensive partner, and you will always have the same defensive opponent twice.

I have attached the spreadsheet used to calculate this. In genial, it supposes that if you are on the third match, your team has played with/against 6 robots in the prior two matches. This means that the probability of a repeat partner in the third match would be the following for 32 robots:

               6/31 + 6/30 + 6/29

The above really needs to be adjusted to account for the fact that one of the prior 6 partners may have already been a repeat. If, for example, this prior probability was 0.3, then the corrected calculation would be:

             (6 - 0.3)/31 + (6 - 0.3)/30 + (6 - 0.3)/29 

As always, I am open to any corrections/improvements.

Repeat Partners Analysis.xls (40 KB)


Repeat Partners Analysis.xls (40 KB)

Bill,

ManicMechanic (Yolande) and I have spent some time figuring out how to schedule matches that create no repeat opponents or allies for any/all team(s) until all the other teams in a tournament have been used in one of those capacities (i.e. until one is forced to start seeing repeat opponents/allies).

The work has gone reasonably well (and I confess that I haven’t dug through any archives to see how many times this has already been solved in the past - I’m doing it for fun as much as to create a useful tool). When finished it will be a Java program that include a GUI for not only showing the match “pairings” but will show info to help scouts and to help teams focus on just their own matches, if they care to.

Success with this project should make it unnecessary to see repeat opponents/allies; and if the core algorithm is adopted by APs, it should improve the last-season method of running FVC Qual matches.

Side notes:

  • I haven’t tried to extend it to 6-team FRC matches yet; but I am optimistic that I will be able to do that.
  • So far, the more general of the two scheduling algorithms we are working on is not a closed-form equation or procedure that is guaranteed to work in all circumstances. At the present it is a set of heuristics (That work well but aren’t perfect).

Blake
PS: Does anyone out there want to do an archive search for me; or does anyone have info from other sources about past or current algorithms?

Bill & Sean,

If I remember correctly, this suggestion from Bill was about giving all robots on a victorious alliance a qual score increase equal to the total points earning by the alliance in the match they won. It was not a suggestion that we attempt to track the number of points scored by each individual robot. This would take the place of giving each winning alliance member 2 qual points.

Am I remembering correctly?

Blake

@ Blake:
How do you get a tri_match (3 matches) in the time it takes for 2 regular matches? If it is proposed to run these matches back to back with little (or no) down time, I think that is a far worse factor than the fewer opponents. Repair/downtime is essential to being competitive, and often, even functioning. If you have no time to repair, then if something breaks in the first match of the set, then you’re essentially doomed (along with your partners) in the next two.
If that factor was removed, then you’d be running identical number of matches to the previous style, and as shown (with the exception of 9 matches and 32 teams), you’d get less repetition, particularly in the larger competitions where the repetition was half of the “tri_match” format or even less.
And Farmville did only get three qualification matches per team, I’m positive about it. Each team also got a few practice rounds in during the morning, and 24 of the 29 got at least one match in the eliminations (their alliance’s at least got two).
And my comments weren’t that you two were friends, but rather that because he has met you, that it is likely he’s involved with Vex in Virginia to an extent, and likely went to Farmville.
Finally, I don’t think there’s six hours of actual playtime for qualification matches once you factor in opening ceremonies, inspection, practice matches, lunch, and alliance selection. At least not from what I’ve experienced at my previous one-day vex events.
And yes, I think the defense would change his results, given that you face each team twice as a defender. You could be the best at scoring Goal X, but if Defender Y can shut down Goal X, you’re likely to pick up two losses (and you’re less likely to face Defender Y twice in the current format).

@Billw
The results are curious, as I’ve been to an FRC competition (in 2006, so not with last year’s algorithm) with around 32 teams, and with 9 qualification matches (and with 3 teams per alliance, so even more total partners), and we had far less repetition than that. Perhaps FIRST’s [old] algorithm is in fact not truly random, but written to avoid repeat match-ups whenever possible.
In your simulation, it does show at the 32 team (which is moderate sized, only a couple vex competitions were larger), that repetition in your system is not much worse, but with the likely outcome of the 3-6 match range (based on my experience with FVC), it is still slightly worse, but perhaps in the tolerable range. But when applied to something like the Championship event in Atlanta, which had hundreds of teams and each only had 4 qualification matches, it would be a much higher repeat range.

Let’s suppose that teams would ideally play nine matches across a 4.5 hour window. This means that there are nine trips to/from the pits, nine attempts at getting drivers/coaches on the field, nine attempts at making sure the batteries are charged and plugged in, nine attempts at obtaining and correctly inserting crystals. That is a lot of coordination with a lot of room for mistakes–all of which I witnessed this past season over and over when only playing 3 or 4 times. (Nine trips in Atlanta might not even be physically doable–given the distance.) So, I had this idea that it might simplify things if teams played three successive matches with two minute breaks in between. This is not unlike what occurs during the finals. I also recognized that some robots will fail. Therefore, I proposed that the previous teams remain at the field and be available to fill in for no-shows and broken bots. Even though this system adds only a minimal efficiency gain (.28 vs. .37 matches per minute by my calculations), you greatly eliminate the “herding cats” problem of running nine matches.

Sean, I understand that you do not want to play any repeats, and if you can change the scoring to a “total alliance points” (not total per-team points) method, there is practically no benefit to doing so. I do believe, however, that there is a benefit of only having to “assemble the troops” three vs. nine times. So perhaps it would be possible to mix the teams playing on the two fields and create a no-repeat partner system that still plays three matches in relatively quick succession.

Blake, do you know what the total number of no-repeat partner combinations is for a given number of teams? If so, how many teams are necessary to enable nine no-repeat matches per team?

Although I am not writing the code, it seems to me that you could solve this algorithmically by tracking teams prior partners and removing them from a list of teams still available and then randomly selecting from that pool. I do think it becomes much more complex if you do not have sufficient teams to fill out the schedule with entirely unique partners.

Given n teams, the number of no-repeat partner combos is :
(n-1) + (n-2) + (n-3) + … + 1

To get 9 unique partners, you need at least 10 teams.

of combos//pairings listed (teams are labeled 1 through 10)

9 ///////////// 1-2, 1-3, 1-4, 1-5, 1-6, 1-7, 1-8, 1-9, 1-10
8 ///////////// 2-3, 2-4, 2-5,… 2-10
7////////////// 3-4, 3-5,…3-10

1////////////// 9-10

Add up the first column, and you get 45 pairs. Since 4 teams (2 pairs) play a match, you need 22.5 or 23 total matches.

The hard part is not in choosing unique partners – it’s in choosing unique opponents. Since each team plays 2 opponents, you’re twice as likely to see a given opponent. Also, pre-setting the condition of avoiding repeat partners first gives less freedom in choosing opponents.

I have been looking at this as system of entirely unique groupings where none of the four participants has had any of the others as a prior offensive partner or defensive opponent.

A pair can be solved by using the standard combinatorial formula: N! / P!(N-P)! where N is the total number of elements and P is the number of elements in each grouping. The problem is that it does not work for P greater than 2 because it does not enforce the restriction that members of a grouping may appear only once with any other member.

I don’t know of a formula that answers this question, although I would expect it exists. In any case, I believe it would show that for N=8, P=4, the result is 2; for N=16, P=4, the result is 12. My brain is not wired to figure it out via brute force out any further than that. And given the way the combinatorial formula works, I very much doubt that I could derive the underlying relationship on my own.