I’m curious about the prevalence and root cause analysis of issues controlling robots during competition:
Has your drive team ever lost complete control of the robot for a second or more in any match?
Has responsiveness been delayed by more than a quarter of a second and made it unnatural to drive?
Did you have the proper analysis tools to find out a root cause to the problems?
What percentage of control problems were you able to isolate?
What percentage of problems didn’t show up again, but had no clear indication of what caused the problem in the first place.
My perception is solid problems are isolated and solved clearly. Intermittent problems and sluggishness are sometimes considered part of life instead of being isolated and resolved.
How do best teams determine that they have enough safety margin on their radio signal to never worry about being in a hard pushing scrum with three other robots in the corner farthest from radio access point?
We were brainstorming about logging voltage, radio signal, transmission delay to the DS and transmission delay from the DS at a 100ms rate and link it to video for the matches.
I know we’ve had responsiveness problems in the pits. They were always associated with a ton of CAN errors. I’ve yet to discover the root cause of the CAN errors, aside from one of the jaguars always flashes. I’m certain the fact of the CAN errors is what causes the delayed responses, since the CAN libray uses blocking calls, which means everything will stop until the command times out or gets a response. So the program hits a snag, then eventually times out and updates everything with rather older joystick values that haven’t been refreshed.
Yes, we’ve had this intermittently, but it never was a major problem, I believe. Last year with the new Classmate there was an internal issue that would periodically trip the watchdogs at a concerning rate, but we fixed it after searching on Chief Delphi. We sometimes get errors in the diagnostics tab when starting up, but we are quite confident that this is a common occurrence during boot-up (it doesn’t affect control since we haven’t enabled yet). Most of the time when the robot hiccups it’s because of a watchdog error, although most of those times we don’t even notice it; it seems to be instant. We often can’t think of what might be wrong, though. But we would check for lagging code, the diagnostics tab of course, a dead battery, if the drivers are cranking on the controls and drawing too much current…mostly basic things (we don’t have a plan like you mentioned). Sometimes we do image the cRIO and that’s usually a good “reset” point to start from.
Always flashes as in the same flash as a disabled Jaguar? (Slow yellow flash?)
Or an intermittent blink? Solid yellow (or Red/Green) - off - working again?
399 had the second during build. It manifested itself as a lag, especially when driving backwards.
We were able to narrow the cause to a single Jag [the one that intermittently would flash], and by removing it solved the unresponsiveness and all of our CAN issues.
Under BDC-COMM, that Black Jaguar would remove itself from the CAN bus and not respond until it was re-enumerated, when it was commanded to go a certain direction [Backwards].
You know, I’m not certain if it’s always the same jag. It was doing the slow yellow disabled blink and we couldn’t see it on the CAN bus, until we cycled power on the robot and everything was fine. I’ll tell our programmer to note which jag is causing the problems the next time it happens and we might just swap it out for a spare one if it’s the same one.
Related note: was your problem jag a black jaguar on your drivetrain or another system subject to sudden heavy current reversals? The gray jags have issues with sudden reversals blowing up gate drivers and disabling half the h bridge. Perhaps the new ones are subject to some other, more unlikely sort of failure when hit with a sudden reversal.
It used to be on the drivetrain.
We noticed that it sometimes stuttered (and would produce a whole bunch of CAN errors) and then we took it out.
When it was run under BDC-COMM, it wasn’t connected to any motors just the Serial port and terminator.
We figured out it was the camera as well. I had noticed during development that on occasion the camera image being displayed was 3-5 seconds behind live. So some fairly major buffering is occurring from the cRIO. If the bandwidth is marginal during a match(all teams using cameras or simply interference) then it is reasonable that any commands and responses get stuck in a queue behind video that is being buffered.
We are using Java and could never get the camera commands to set lower resolution/quality. With version 28 our camera connection stopped working for analytics in autonomous mode.
Trading email with wpi I got some advice to make sure the M1011 firmware was updated and to put the camera in DHCP mode versus static IP as indicated in the install guide. This implies the cRIO has a DHCP server and using axis software library to discover cameras. Made the changes and sure enough camera started working again.
Now the bad news. During practice at regionals they had a ton of problems getting robots to register. They fixed the problem at 3:30 and we were able to get in some practice sessions. Autonomous mode in our code was a big problem and we would loose the robot for short or long periods. Since we couldn’t find the problem we turned off Autonomous mode but left the camera init in the code.
Our first three matches we had a dead non-controllable robot. Finally figured out issues with bandwidths could be the problem and then occurred to me that the camera was still streaming video and that wasn’t helping. Disabled camera in the code and for the most part our robot control problems went away.
We finished 26 out of 60 at Florida Regionals as a rookie team where we really wanted those three matches back and would have placed much higher. We have a very reliable minibot, can score the top row and had autonomous mode working until it was disabled and couldn’t spend time with all the other problems with communications. Just found out we won the Rookie Allstar award so we get to go to nationals and will see if we can get another shot at being a competitive robot.
Need to figure out in the Java api if it is possible to use the camera in autonomous mode and then disable it during the match. Has anyone had any luck setting the camera resolution/image quality or disabling the camera from streaming after startup?
i lost complete control of my robot for a entire match for a unkown reason while competing myself. my robot would not respond at all. the people behind the desks said that “everything on their end was fine, so it must be us”. but when we went back to the pits and tested it, it worked fine. this happend to a few other teams throughout the day. i was getting very annoyed with the whole thing myself.
This is exactly what happened to us with CAN on jaguars at Traverse City. We sit still a match. Go back to test it at the pit and everything looks fine. Go out for the next match and the robot sits still again. All this time it says communication is fine. Decide to control our drive instead by PWM and CAN stopped pestering us until a jaguar blew knocking out our entire robot because the CAN stopped the robot due to inability to find all jaguars.
Our team encountered no robot control twice this weekend. During both matches the autonomous worked, but once the robot switched to teleop there was no control at all. The team was told both times that the field showed the robot was communicating and the driver’s station showed no loss of comm.
I quizzed an alumni who was volunteering for a little more information and he stated that they found that our driver’s station was showing no signals from the joysticks. However, after discussing this with our driver, he had diagnosed the joysticks while the match was running and found that the driver’s station was seeing them correctly. Keep in mind, the field and the driver’s station showed no errors what-so-ever.
After this match we took the robot back to the pits WITHOUT TURNING IT OFF, and tethered it to the driver’s station. The robot continued to not respond to any teleop control, but showed that it was communicating and the cRIO was flashing the User1 indicator, which generally indicates that teleop is active. We rebooted the program and it worked fine after that point.
The frustrating bit is not once could we duplicate this while tethered and it never happened while testing the robot for two weeks during the build season. So far it has only happened when connected to the game field. Unfortunately, one time had to occur during a quarter final match. :\
It has been very disappointing and I hope something is done to address it soon as we haven’t been able to determine any obvious cause.
Edit: I should point out that our robot is controlled using Jaguars only. Four Jaguars run off PWM signal to control the 4 drive motors (mecanum). Another 4 are operated off CAN to control the tube mechanism. None of the Jaguars work at all when this occurs.
I did find in testing that once you enter in autonomous mode you need to keep track of the time spent and return from the method after 15 seconds. I have a keepGoing() method that I put in all the loops and before the next step that forces a return if 15 seconds has expired.
If not, the joysticks are not sampled and the appropriate drive method called. We are doing Java but assume the program constructs are the same in labview or C++.
Just because the field switches from autonomous mode to telop mode does not impact the code you have running on your robot.
We had talked about the possibility of the robot not going out of autonomous mode, but wouldn’t that cause the User1 indicator to turn off? I’ll admit that this assumption is not based off direct knowledge, but what I had been told by the mentor who is experienced with LabView and the cRIO.
Stuck with PWM output this year (absolutely no reason to switch to CAN – everything that can be done with it can also be done by coding the Crio) and lost our camera. Everything runs so much smoother now.
Unless lab view does something special to understand the notion of autonomous vs telop I think it will simply do in a program loop or block of code what it is asked to do. If you spend 30 minutes in a loop adding up numbers no reason why the code would return/yield/interrupt that block of code because the mode changed on the field.
The way the Labview framework is written you cannot get stuck in Autonomous mode. When the field changes over, your autonomous code is terminated whether you are in a loop or not. If you change any of the framework or put your auto code somewhere other than the Autonomous Independent VI, all bets are off. Also I’m not sure if this functionality will work or not if you are in a loop that never yields the processor (IE no loop delay).
This is not the same as the C++ and Java frameworks which have no such functionality (from what I’ve been told)