Swerve, CAN, and motor safety

My team is pondering swerve-drive as an off-season project and could use some advice. With the number of motors and encoders implied by swerve, CAN bus seems like a natural choice. CAN also lets you use convenient embedded encoders in NEO motors via SparkMax speed controllers. PWM is essentially a write-only interface from the perspective of the RoboRIO, forcing the use external encoders.

However, our team just made a (somewhat reluctant) decision to move back to PWM for drive due to the lack of motorSafety support on CAN-connected speed controllers. We had an unexplained motor safety issue in competition on our CAN-connected drive that drove us to this decision. PWM gets you motorSafety support in the firmware of the SparkMax as well as library support in WPIlib that descends from motorSafety.

We’re interested in hearing how folks are implementing swerve. Are you using embedded or external encoders? Are you using CAN? If so, how are you coping with a lack of motor safety support?

We are swerve neophytes, so please be gentle…

1 Like

What exactly was this motor safety issue that drove you to abandon CAN? The WPILIb motor safety interface is fairly basic and many teams don’t use it at all without major safety issues.

4 Likes

We had a robot go completely nutso as soon as it was enabled in autonomous (despite having no autonomous code at the time). It remained nutso throughout the match and didn’t respond to any input from the driverstion until it was e-stopped by the field. A cause was never determined and it never happened again. Nonetheless, that’s not the sort of thing we’d like to have happen in a demo situation around people/children.

So are you saying that the WPIlib motorSafety stuff isn’t reliable? Using it was the advice we got from an NI representative.

While I don’t want to discard the possibility that there is an issue with the CAN devices, keep in mind you are taking action on a fix without proving root cause. This is generally not a good way to engineer solutions.

7 Likes

Possible causes were:

  • We had a color sensor plugged into the I2C port on the RoboRIO. That is apparently a known problem that can cause crashes and disconnected/unresponsive driver stations.
  • We did discover a CAN bus intermittency later in the competition that might have triggered the strange behavior.
1 Like

It does not do anything unless you explicitly enable it or use one of the prebuilt drive classes that turns it in for the motors injected into it.

Even when on, it is only a guard against hanging the motor output loop, which is one particular failure mode that rookies tend to encounter. It would likely not help in the situation you described.

I believe motor safety is based on a heartbeat. if the motor controller stops receiving the heartbeat it’ll shut down.

All that being said, to specifically answer your questions -

Both.

Yes

There is a safety mechanism:

1 Like

The SparkMax advertises that it will set power to zero if it doesn’t receive communication after 60ms (if memory serves). This is the functionality we were after.

OK. Thanks. So you’re implementing your own motor safety mechanism using the CAN heartbeat?

No user implementation required, it lives at the CAN bus driver level inside the RIO. The NI/WPILib-provided libraries will always generate the heartbeat message (as long as their code is executing) and all FRC-approved CAN devices are required to listen to it and respond to it.

What we do ensure is that during disabled, all motors are commanded to zero, integrators are reset, autonomous paths are re-initialized to their starting point, etc.

2 Likes

Interesting, thanks. In our anomaly a CAN-connected motor or motors (one one side of the chassis) somehow got turned on as soon as we were enabled. This was despite having no autonomous code at the time, and the behavior continued unabated across the transition to teleop until they finally shut us down for doing unpredictable donuts on the field. I’m glad to know the CAN heartbeat is implemented by FRC-approved CAN vendors but it didn’t help us in this case. I’m puzzled as to how this could have happened if the heartbeat function was working.

Could you give more details of “nutso”? Is there a video? I’m assuming the robot was making uncommanded movements of some kind. Was the robot powered up on the field, or transported to the field already powered?

This was originally reported as a crashing problem but there have now been numerous reports that it causes any code that reads the colour to pause for a long time (>20ms), which makes the robot laggy and unresponsive for both autonomous and teleop. There has been no suggestion that it causes uncommanded motion.

1 Like

Is the code that was running on the robot at the time available to inspect? I have a couple additional theories to add in:

  1. Non-zero I gains in closed loop controllers, which accumulated during disabled and got too “wound up” as soon as the robot was enabled
  2. Faulty driver station joystick
  3. Long-running parts of code that interrupted the normal input-calculate-output cycle (including, but not limited to, I2C-driven lockups)
  4. Motor controllers were set() to nonzero values in teleop before you got on the field. Then, when the FMS commanded the robot into autonomous, the heartbeat let the motors enable, and they took off at the last set() value, which happened to be nonzero in this case.
  5. Faulty encoder readings feeding closed-loop control

If you haven’t seen it yet, Fishbone Diagrams are a good way to help organize and track this sort of investigation.

1 Like

There is a video… but I’ll have to find it. Basically a motor or motors on one side of the (tank-drive) chassis clearly turned on as soon as we were enabled in Autonomous. This was despite having no Autonomous code at the time. This caused us to basically do donuts. The behavior continued consistently across the Auto/Teleop transition up until they finally killed us. There was no responsiveness to DriverStation commands so our drivers had to just watch.

Do you have the driverstation logs for this match? The complete lack of responsiveness is similar to I2C lockup I’ve seen elsewhere, but I’m not sure if you’d see this with no autonomous code at all.

Is it possible for us to see your source code?

I would not expect the motor safety heartbeat to help here. It would kick in if the roboRIO crashed (or was e-stopped) or was severely congested. Otherwise, if the heartbeat continues, motors will continue to obey their last setting.

1 Like

We did look at the logs at the time, but I’m not sure they were retained. A very knowledgeable NI employee (who’s name I forget) helped us with this at the competition and escalated the issue with NI at the time. The best answer we got back was either a short (a distinct possibility as we did find a CAN bus issue) or “ghosts”…

They shut you down? That means the heartbeat mechanism worked.

Time to investigate what your code was feeding and any preconditions were set on the motors.

3 Likes

Code has been modified dramatically since.