Motor sometimes does not respond to commands on swerve steering (Falcon 500)

We’ve been seeing a really weird issue with one of our swerve drive steering motors. Seemingly at random, when we power on or deploy code, sometimes our front left steering motor does not respond to commands. When this happens, we are able to read encoder values, bus voltage, velocity, etc from it, but it does not respond when we tell it to go to some position. The motor seems to be operating on the CAN network perfectly fine (it flashes amber when disabled, turns solid amber when enabled, and the status lights just stay solid amber regardless of what we tell the drivetrain to do).

What makes this especially weird is it is always the front left steering motor, and we have not seen this issue on any other motors (drivetrain or otherwise). Because this issue only happens on one of our modules, and not on any other modules (all of which run identical code), it doesn’t seem super likely to be a software issue from our end.

We replaced the motor and still saw the issue on the replacement, so it is unlikely to be a hardware issue. Also, every time this happens, if we reboot the robot or redeploy code, the issue goes away and just acts normally.

The only thing I can think of that could be special about this motor between swaps is that it’s the first device past the CANivore on our bus. I have no idea why that would matter but it’s the only thing I can think of that we have not yet accounted for.

It’s super frustrating to deal with this because the band-aid fix is so easy but we have not yet found out how to tell that this is happening before trying to drive. We are concerned that we will go on the field and won’t see this issue coming, and our match will be ruined as a result.

I believe the issue only started showing up after we switched to using a CANivore for our CAN bus, but I’m not sure about that. We regularly see CAN bus issues show up in our driver station console (CTR: CAN frame not received/too-stale) but these show up for random motors and no other motors seem to have their performance affected by this. I’m also not sure why these errors pop up – we have a lot of motors, but our CANivore bus usage on Phoenix tuner sits at below 50% all the time.

Has anyone seen any issues like this in the past, and if so, what can be done to resolve this?

Code is at https://github.com/team4099/RapidReact-2022

This sounds like a possible race condition in the CTRE robot-side code. Have you tried changing the bus order?

we saw a similar issue when running autons multiple times in a row without deploying code between them, check that condition as it may be a value being saved or something like that

1 Like

We have not, but we will try it today. My concern would be that if that does work then it will just be another motor (likely the drive motor for that module) that will have the same issue, so it’s not really a permanent fix. But we will try it for diagnostic purposes.

Our issue seems to pop up right after code deploy, not just after running for a long time. It does not seem that this is an issue with some state being maintained poorly. Initially we thought it may have something to do with an encoder value overflowing or something, but we reseed the internal encoder position from the absolute encoder on teleop init. The issue does not go away with a disable/enable so this seems to be unlikely to be the issue.

We have also checked faults/sticky faults when the issue happens but we have not seen anything there either.

Yeah, that’s the reason I was suggesting it. I don’t think there’s any obvious quick fix here; the fact that the issue is intermittent strongly suggests a race condition, and the only code that seems likely to occur in is the CTRE framework code, which is known to be strongly multithreaded (but is also proprietary…).

It looks like you’re running Phoenix version 5.20.2, our kickoff release.
We’ve since released two more versions of Phoenix, both of which include improvements for CANivore, and notably 5.21.1 had major performance improvements and diagnostic server improvements.

I would recommend updating Phoenix to our latest version, 5.21.2, and let us know if you still see the issue, as any of the updates since then could resolve your issue.

3 Likes

Huh, not sure how we didn’t catch that in our debug process – we made sure to update the firmware for the CANivore and our Talon FXes, but somehow missed this. We will try that today.

Is this a known issue with the old libraries, or alternatively, is it something you suspect could happen with the old libraries (knowing what was changed in the newer versions)? I’m asking because we don’t have that much testing time before DCMP load in tomorrow. This issue for us has been intermittent enough that I’m not sure we will be able to conclusively know if it still happens with the new libraries.

It’s not a known issue with the old libraries, but we haven’t done a lot of testing on the old libraries with CANivore, as we knew there were issues with 5.20.1 that 5.21.1 fixed.

I wouldn’t be surprised if the issue was related to something fixed in 5.21.1 or 5.21.2.
The symptoms you describe can be one of three things:

  1. Falcon is commanded to 0 percent output
  2. Falcon has zero’d out PID gains.
  3. Race condition between Phoenix and CANivore.

If it’s 3, updating should fix the issue. Otherwise, a self-test snapshot of the falcon while it’s misbehaving should provide the information necessary to diagnose between 1 and 2, and how to narrow down your search for mitigation or fixing.

2 Likes

If the Phoenix update doesn’t help, I’d try:

  • When calling steeringFalcon.configAllSettings, call it with a timeout and check the error return.
  • Reorder the swerve modules in the code so that the problem module is not first.
  • (As @TytanRock says) Read the PID values from the Falcon when it’s misbehaving.
1 Like

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed.