We’ve been seeing some odd and somewhat alarming behavior during testing of our practice robot over the weekend. Our control system setup is Java, Jaguar CAN running closed-loop speed control of the drive motors.
Basic symptoms are a loss of control and SmartDashboard output while the driver station stays in teleop enabled state. Note that in some cases the drive motors would continue to operate at set speed although there was no control authority from the DS !
Pressing disable on the DS did stop the motors. A cRIO reboot was needed to run properly again.
Initially the onset of the problem appeared to coincide with sudden polarity reversal of one or more CIMs, implying momentary voltage drop/brownout. We implemented setpoint rate limiting and while the problem frequency decreased it was not eliminated.
The DS log viewer shows the cRIO CPU dropping off during the loss of control event, the DS staying in teleop but the robot state leaving teleop and apparently continuing to report voltage (see attached image).
Finally there was no evidence of a user code crash on the output terminal of Netbeans.
We are competing in a Week 1 regional so I’d appreciate anyone’s thoughts on potential causes.
We’re using the command based programming model, so the scheduler is within wpilib.
My primary concern is that the robot is not controllable under these circumstances, and having it under control is usually FRC’s prime directive.
I guess in the broadest sense it is “controllable” since changing the DS to disable (& presumably emergency stop) does disable the motors.
I agree that a loop sounds like a reasonable cause. The command based scheduler is a cooperative scheduler. All of the command methods are assumed to execute quickly, with no delays or loops. If you have a delay or a loop in initialize, execute, isFinished, end, or interrupted, you would keep other commands from executing.
What kind of error handling code are you using for your CAN Jaguars? Any loops or delays?
We don’t have any loops within the command or subsystem methods themselves (at least there are no occurrences of “for” or “while” in the code).
In our current version, CAN timeout exceptions are handled by marking a fault status on the subsystem. Fault status is reported every few iterations of {Auto|Teleop|Disabled}Periodic() to the SmartDashboard.
(A better approach would be to report changes in fault status only and do so immediately - we’ll change the implementation soon.)
We do not attempt to re-transmit the CAN message, but we also don’t disable the subsystem.
The only source of delays in the user code are the CAN operations themselves. There are a maximum of 4 CAN messages during the drive execute method - setX() on both drive motors and optionally configNeutralMode() if a button changes state.
Note that since we are running closed loop on the Jaguars we don’t need to ask for encoder readings.
The worst case for any command method is 10 CAN messages to set up the drive closed loop for both motors and that happens only during the init() method once per teleop period.
Finally, thanks for the loop/delay suggestion. I can’t find any likely culprits in the code, but I’ll take a deeper dive.
We found and fixed this problem but I thought I should reply to this thread for posterity in case it appears in anyone’s future search results.
We typically see two types of error from the CAN system
something serious usually caused by wiring problems (power or CAN) and detected in the CANJaguar constructor, or
transitory errors from an occasional packet.
When we see a transitory error we set an indicator light for the appropriate subsystem on the SmartDashboard and recently we have also been logging the error message.
The error message was logged from the exception using the getMessage() method, but we discovered that getMessage() can return NULL so we were sometimes sending NULL to the SmartDashboard hence getting a runtime exception and user code crash.
To solve the problem we made two changes:
wrap the top level code in a try {} catch {}
using the toString() method for the exception to safely pass a string to SmartDashboard