ARENA Fault

ARENA as it is defined in the rules.

The ARENA includes all elements of the game infrastructure that are required to play Breakaway: the FIELD, the ALLIANCE STATIONS, the GOALS, the BALLS, and all supporting communications, arena control, and scorekeeping equipment.

The only rule I see regarding ARENA faults.

<T18> If, in the judgment of the Head Referee, an “ARENA fault” occurs that affects either the play or the outcome of the MATCH, the MATCH will be replayed. Example ARENA faults include broken field elements, power failure to a portion of the field, improper activation of the field control system, errors by field personnel, etc.

A common response that I have heard is “After reviewing the log of the entire match nothing seems out the ordinary so the match will not be replayed”. How is 1 to 4 robots loosing coms not something wrong? Something is clearly wrong. Thousands of fans can tell with out looking at a log that something went wrong.

Outside of FIRST when someone has an IT related problem and it can not be identified from the log we don’t tell the person(s) they are out of luck and we aren’t going to help them. The IT department looks into the problem and rectifies the situation as soon as possible.

The least they could do is have a professional on site at each regional to inspect the robot after the match back in the pit and help figure out what is wrong. How can a bunch of high-schoolers be expected to find the problem when the inspectors at the event can’t. From the looks and sounds of it the only reason that help is not offered is that no one actually knows what causes all the comm problems.

If FIRST can not help us troubleshoot the mysterious comm problems that never happen at the shop but randomly happen on the field then teams should be given the benefit of the doubt and the match should be replayed when they occurr.

If the field is communicating correctly, then the problem lies with either the teams or Murphy. Quite often, robots lose communication if a cable gets knocked loose–always check your radio-cRIO cable before and after the match. This is quite difficult to pinpoint using field equipment–everything will look normal, but the robot won’t do a thing.

I know that at Arizona, we did have two experts (not necessarily professionals) who could find errors. One designed the FMS system and was judging the event; the other was a beta tester (2 years) who got a volunteer pass so that he wouldn’t keep getting sent out of the field area because the FTA was always asking him to look at something. Sometimes, even they couldn’t figure out why the comm link was lost.

As for replaying such matches when they occur, there is a really really good chance that that will cause the event to run very much over the time allowed. Not just a couple of minutes, either. We’re talking running over the way the Championship always does–a couple of hours.

Something obviously went wrong. It is not obvious that what went wrong has anything to do with the ARENA. If it was really an issue of losing communication, the FMS logs would confirm that to be the case.

More than one team has replaced their WGA and seen their troubles go away. That’s not an ARENA fault. More than one team has confirmed problems with reading joysticks and/or the Cypress board after letting their Classmate go to sleep or when running low on battery power. That’s not an ARENA fault either.

Team 45 lost a qualification match when the robot appeared to disable itself midway through autonomous, showed that it was re-enabled at teleop but with the drive motors nonresponsive, and sat there not moving for the rest of the match. Obviously a communication failure, right? Wrong. The diagnostic lights were green the whole time. The real fault was the battery we used for that match, which decided to die sometime between having been charged and the start of the match. (In that case, the FMS did pick up on the low voltage issue, and so did the drive team who knew what to look for on the Driver Station.)

I’ve pulled two parts out of Alan’s response here to highlight something that I have found extremely helpful in diagnosing our robot issues in the past.

The Driver’s Station and Dashboard provide a large amount of feedback about what is going on with your robot.

At our event last year our robot stopped moving in three separate matches. By observing our Dashboard during these faults I was well aware that our DS was still connected to the robot and in 2 of the 3 matches to the cRIO. After thorough checks of the electrical system a few loose wires/chassis shorts/other issues were found and the problems never returned.

Knowing what information is provided on the Driver Station and Dashboard and how to use it to diagnose your robot is the best way to help track down these mysterious faults.

I understand your frustration with regards to robot(s) losing communication with FMS during matches, but you need to understand a few things.

  1. Sometimes, it is genuinely the fault of the robot (or team that fielded the robot). Robots can lose communication with FMS for a variety of reasons, a large number of which result from how the team hooked up their robot. A bad ribbon cable, poor electrical connection, etc. could all cause this. FMS logs will just show a large number (100%) dropped packets for the duration of such events. However, a large number of other events can cause FMS to show 100% dropped packets, hence it is (near) impossible to arrive at a sound conclusion. As much as any of us who’ve been hit by comm issues doesn’t want to admit it, it is very like that the fault isn’t in FMS, but its in some aspect of our robot. In the rare situations where FMS did fail, I wholeheartedly trust the people behind the scoring table to diagnose the issue accordingly and act appropriately, but I’m also willing to accept that they can’t possibly figure everything out. Remember, our control system consists of a ultraportable PC, a wireless gaming adapter, and an industrial grade controller. Were any 2 of those components really intended to work in unison? No. This leaves plenty of room for small mistakes to cause catastrophic results (consider all the possible issues with the radio’s reset button, a plug for the radio that’s probably not designed to be shaken as robots collide/cross bumps/etc, and all other potential failure points, these wouldn’t be on a system who’s sole purpose was to robustly control FRC Robots. They also would probably make the system cost prohibitive).

  2. The field crew does its very best to give every team the chance to play; regardless of whether their comm issues are robot-sided or field-sided. Genuine mistakes happen, and the FTA and co. would never intentionally ignore situations in which it is clear that a FMS bug caused robot(s) to lose communication. However, I have yet to see a situation in which FMS causes less than all of the robots on the field to lose communication.

  3. The FTAs do their best to diagnose the issue, but understand that they’re people too. I know first-hand that the FTA for the TC and Detroit FiM District events will stop at nothing to pinpoint the cause of robots not communicating with the field right. The FTA at Cass Tech was also obviously working extremely hard, as the back of his shirt was clearly covered in sweat. Unfortunately, there were still robots that had comm issues at both events. Nevertheless, it is important to credit the hard work that the FTAs (and FTAAs) do at all the events, since they’re the ones working (largely) behind the scenes to make the whole event run smoothly.

  4. All that being said, it is crucial that each and every team take all necessary measures to ensure that their robot doesn’t become the cause of comm issues. Ensure that all electrical connections (especially those to the radio, cRIO, etc) are secure, especially if you traverse bumps. Ensure your code is robust, as a watchdog bug can (and will) cause a loss of comm, either momentarily or permanently.

  5. Remember that FRC competitions are volunteer run. The FTAs are volunteers, the refs are volunteers, the querers are volunteers. If you know a cRIO/controls/something expert who’s willing to volunteer at the events to look at robots who’ve had communication issues, then invite them to offer their help. Otherwise, other teams are more than willing to provide any assistance they can. If you’re continuously having comm problems, ask the FTA if you can try to sync to the field after the end of matches on Friday (or maybe even at lunch).

I guess that’s all I have to contribute…

To take a shot in the dark here, there has been a major problem on the Driver Stations this year. There has been problems of the Joysticks saying that they are connected but not responding, most times the I/O board will be attached to the hub and be draining too much current for everything connected to the hub to remain operational. There have been some cases where it has happened with just joysticks connected to the hub, there’s been multiple solutions thrown out there to be tested to solve the problem. Now, relating to your particular situation there is a chase that it was a field problem, it plausible, but by some chance it is also probable that this lack of Joystick response to the Classmate is the culprit, and this problem does not appear on the log.

I keep seeing this idea presented in all these threads about suspected field faults. I disagree with it, and I’m surprised more people don’t feel the same way.

  1. Most of us have no idea what the FMS even logs in the first place, and therefore what kind of failures could be detected with these logs. For all I know, the log may consist of a timer that fires every 100ms that just writes “Everything is fine” into the logfile. I’m sure that the logs contain lots of data, and maybe it’s enough that it really can rule out anything but a team problem, but how do we know there isn’t a bug in the logging mechanism? Or some other gremlin? Surely anyone who’s been involved with software for more than a few weeks can admit that bugs can end up anywhere and can easily trick you into thinking something is working when it isn’t.

  2. Somewhat of a rehashing of #1, can the field system confirm that packets from the OI are getting to the robot? Or does it just verify that it can “see” the robot and OI from it’s vantage point? Is it possible that FMS can be successfully connected to the team’s OI and successfully pinging (or whatever) the robot, but that the OI and the robot aren’t talking to each other?

  3. Dozens of teams are reporting problems here. Ultimately, it may not be a problem with the field hardware, but it seems pretty clear that there are defects with the system as a whole (the system being everything from the Classmate and Rio to the arena controls, the WiFI adapters, joysticks, etc). Who’s responsible for the overall operation of the entire controls package? Who’s looking into these reports of issues? Surely all the technical folks at FIRST aren’t just writing this off as a team issue, are they? Even if they believe it is the fault of the teams, I would hope that they would want to get to the bottom of it anyway, just to be sure. With so many different vendors involved in the system, though, I’m not sure who takes ultimate ownership of a potential system-level issue.

I’ve been involved in enough FIRST competitions to know that when the people working at the event say, “there were no errors with the field”, what they usually really mean is, “We couldn’t find any specific problem, and therefore we’re going to blame you (the team) so we don’t have to replay the match. If we admit that we aren’t 100% sure what happened, we’ll be here for weeks as we replay almost every single match.” I’ve worked at plenty of events and I’ve been part of these conversations several times.

If the best of the best can’t figure it out then there is clearly something wrong here. FIRST should not provide us with a control system that is so advance that no one is able to troubleshoot. Then tell us we’re out of luck when a team that has a 15-0 record mysteriously looses comms in the Finals and they loose it all with no explanation at all.

I’d have to argue that when one robot loses comms and another video feed at the exact same time that there something wrong between the alliance station and the robot…The only thing that falls in to that category that is between the alliance station and the robot that I am aware of is the Arena/FMS.

We tried to do that. We had the newer bridge and were trying to configure it during matches. We asked a question regarding the WPA or WPA2 setting to the WPA inspector. He did not answer the question instead took the radio set it to factory settings plugged it in to his computer and tried to use his script/program to configure it. He got confused as to why it didn’t work and wanted to go watch his own team compete. When we tried to call a timeout to configure it and swap radios the head ref told us no.

We disabled the classmate from going to sleep early in the season. During the final matches used another team’s classmate battery because ours was dead.

That is my teams exact opinion in the matter.

Certainly the control/field system interaction has grown complex, and there are far too many places and ways gremlins can interrupt normal robot operation. It’s not like the IFI days when an expert on the full system was delivered with each field. Even in the IFI days, although they look rosey two years later, not every robot death on the field could be or was explained.

Nowadays, without a single source supplier, it’s mostly luck if a Regional ends up with a volunteer or two with the necessary skills and experience to chase down and troubleshoot team issues with the field. With technical volunteers scarce in any case, to find one free to roam between the field and the pits to follow up with team troubleshooting is indeed rare.

I worked the NJ Regional running between the field and the teams to track down obscure problems that teams were having and do the same thing at the Long Island Regional. Part of the reason teams tend to blame the field when their robot stops is the black box nature of what FMS supplies. It’s common for teams to blame the field, because one of their motors didn’t work (but it worked in the pit!), or if their robot takes off on it’s own.

Almost without fail teams who blamed the field did not look at their own multitude of status lights and couldn’t give me any useful information. They couldn’t even tell me what the RSL was doing when their robot failed. I had to be there on the field to analyse the problem and provide a correct diagnosis (correct diagnosis being one that finds a problem that would cause the symptoms).

The art of troubleshooting is something we mentors need to teach more of, but I’ve come to realize that many mentors don’t have good troubleshooting skills either.

Most of the time, the best of the best did figure it out. The one they couldn’t figure out was why a particular combination of new/old adapters would trigger a field-wide comm loss for no apparent reason. No other combination did that. Try troubleshooting that one…

The WPA inspector apparently didn’t read the update that informed everyone of the program. You can’t use it on the new bridge–it has to be done manually. His wanting to go see his own team compete, however–while that’s understandable, there’s a time and place to do it, like when there’s not a team that needs your help right now to be able to compete.

As for the timeout, there are specific rules regarding when you can/can’t use one. Quals, you’re out of luck. Elims, there’s a specific window to call it–if you don’t call it in that window, you’re out of luck.

Regarding the loss of comms and the loss of video feed at the same time on two different teams: It may be that there were separate failures that just happened to happen at the same time–say, alliance partners hit each other by accident and both cables come loose simultaneously. That’s not a field issue, it’s a robot issue.

I agree, and I will add that the opposite is also true: it’s common for event staff and volunteers to blame the teams’ equipment, because the field worked fine in the previous match. I think both parties assume their side is correct and therefore the other must be broken. The reality is there have been enough root-caused issues with both robots and the field that both should be suspected when an issue occurs. There’s plenty of threads in the scorekeeper’s forum on usfirst.org that talk about having FMS crash multiple times and having to “reboot the field”, and obviously there’s been enough issues found with wiring etc. on robots over the years too.

We had the entire Blue side dead for Teleop in an early Quals match.
I’m sure everyone in the stands “knows” it was a field fault.

Of course one robot had a 5v battery, another had a 0 volt Classmate, and the third was pinned under a partner and didn’t want to rip it’s camera off trying to get out from under.

It seems that Classmate battery issues are definitely a contributing factor to the feeling that the field doesn’t work very well. I wish someone could explain to me why FIRST is torturing themselves by not providing a $5 power strip at each alliance station and in turn suffering from the massive number of reported issues that are related to Classmate batteries. It seems like that rule isn’t doing anyone any good and is just making all the perceived problems appear worse than they are. I guess I don’t have much sympathy for FIRST if they’re getting blamed for field faults when the issue is really a low battery when they could so easily solve that particular problem.

Be careful there. I’m a scorekeeper. :smiley: But yes, sometimes my “law” does come into play unfortunately.

Everyone needs to remember that with every year that FIRST changes something with the control system, there will be a huge learning curve for all to understand how it all works. Also, it takes time to really fine tune the system, yeh NI and FIRST may be able to debug and get the system working to the best it can, but the true test really won’t come until the Regionals when we bombard the system with all we got.

I know that in DC there were many teams that were having comm issues, but I assure you (99.9% at least) that these problems had nothing to do with the FMS faulting. Many teams were forgetting to plug their raido’s back into the e-port on the cRio, or were forgetting to put a fresh battery in their robot, or hadn’t restarted the DS program since their last match to name some of the major issues. We had one team that barely even moved the whole competition because they were having some sort of comm issue after another, but they were gracious and went back to the pit and worked hard to figure out solutions. We had a NI rep at the regional and he worked very close with that team to help them through these problems. It is issues and interactions like that that will help everyone learn more about the system and where it’s weaknesses are and how we can improve on them in the future.

I know that it gets frustrating to be out there and all of a sudden not be able to compete, but remember your gracious professionalism and trust that the technical people behind the scenes are working hard to solve any and all problems and will make sure that there is a fair solution that comes out of it.

Any good troubleshooter will suspect everything. I witnessed some of those FMS reset issues, so I know it’s not faultless.

The problem is of course that there are far too few troubleshooters who straddle the fence and can unemotionally examine both sides, and aren’t consumed by official duties. I’ve always been able to find a problem with the robot that solved the issue (last year and this year), as long as I’m there to examine the status lights and diagnostic messages. Even with wireless bridge failures the statistics page on the bridge usually tells that story, as long as I can prevent the team from turning their robot off after failing in a match. That doesn’t mean I don’t list FMS as a potential culprit every time. In fact that’s always the first consideration, because it needs to be checked immediately while the problem is occurring. FMS failures tend to take the whole field down, or one entire end, or a single driver station eStop might fail, but it’s pretty obvious to the field crew when that happens.

You can’t actually solve a problem unless you can divorce yourself from taking sides. If you find yourself blaming something you know nothing about, then that’s not troubleshooting and it won’t solve the problem.

One of the issues I’ve noticed this year and last is that teams look on the NI reps as experts in the whole system, whereas, they typically are not. They are all experts in LabVIEW, many in cRIO operation, some work with teams and know the whole FRC control scheme, but few of them know anything about FMS.

706 was with 2481 in the finals at Milwaukee. We had the camera loss at the same time as they lost all control. It was imediatly after the auto mode ended and there had been no robot to robot interactions and no collisions with the field. After the match we logged out of driver mode and back in and regained the camera. Sure was hard trying to win the match with 2481 down again and a sub playing defense. Nothing against the sub, they did a great job and kept the score down to 7-3 but with noone feeding us balls from middle we were pretty helpless. As coach I am hitting myself now for not telling our third match defence partner(sorry I don’t remember the team number) to abandon defence and feed us some balls. Grrr Hind sight.
Anyway, I have to ask why is it that we are using a control system that is so complex and error prone? I can go to a model airplane field and have 15 planes in the air at the same time and none of them will have com problems.
Are there some things that every team should do between matches other than look for hardware issues like rebbot the classmate or log out and back in? I know that some computer systems need to be restarted regularily to keep from getting flaky errors. Also, having power at the drivers stations could be helpfull here. Is it legal to bring a battery and an inverter to the drivers station?
Bruce

Yes.

http://forums.usfirst.org/showthread.php?t=15017

I agree too.

That’s the first thing I put a stop to when I was asked to Tech the SBPLI field last year-empty speculation by field staff. If the field staff didn’t KNOW what the problem was, they were to refer it to me and not start guessing. It’s actually harmful in that it usually focuses troubleshooting in exactly the wrong place. I worked the WPA desk one regional where if they had any field connection problems they sent it back to us to be reset, but I never found even one that was set incorrectly. Of course, the robot that caught on fire was plenty obvious…

It worked out well for Long Island last year, but I still had an IT professional telling me it was a field fault that shutdown only his team’s drive motors. His explanation was that it couldn’t be his code, and to be fair it actually wasn’t, but it also wasn’t a field failure. It could have been prevented by adding an error check in his code.