CAN bus load is fluctuating severely after some issues with our drive system

One of our motor controllers was going into neutral (brake) at a very consistent period during forward driving, on the ground or on blocks. We tracked it down to one of our sub-systems causing a delay, and after removing it from the code the drive system was behaving normally again.

After this we noticed the can bus load was fluctuating from 35% to 70% in fractions of a second. This is definitely not normal, especially considering that by then we had removed practically everything except the drive code, so the can bus should be experience less than 35% load the entire time. We had just checked every connection in the drive system’s CAN bus, and reset all the CAN ids to troubleshoot the drive problems. We’re pretty stumped so any ideas at all would be greatly appreciated.

I’ve attached a screenshot of the can bus load to show what I mean.

Since you say you already checked everything physical (wiring, etc), I would start looking for a software cause. Comment out (or otherwise disable) as much as possible and confirm the symptom disappears, then re-add code until it reappears. This is easy to do with command-based (if using java or c++, as the presence of the class is irrelevant so long as there’s no references to it), but can be done fairly easily regardless of the style of your code.

Also, please share your code if possible (ideally via github or similar source control platform) if you’d like more eyes on it.

Would also be useful to know which devices are on the CAN bus.

edit - that maybe sounded snarky, but not my intent. Posting code will help with some of it, but also good to know if the PH vs. PDP is used, same with a pneumatics controller and so on. Plus whatever devices might be on the bus but not yet programmed for, if any. Also, OG Rio or Rio2? More details the better.

Which subsystem was the one you removed to make the driving symptoms go away? What are the devices associated with that subsystem, etc?

While we’ve never done it in production/competition code, one thing that I have experimented with in reducing the CAN bus traffic is to extend the manufacturer’s libraries. We use talons, so briefly would be something like this:

public class LowCAN_WPI_TalonFX extends WPI_TalonFX{

double m_last_speed;
ControlMode m_last_control_mode;

    LowCAN_WPI_TalonFX(int devicenumber){
        super(devicenumber);
    }

    @Override 
    void set(Controlmode mode, double speed){
        if( m_last_controlmode != mode || m_last_speed != speed ){
            super.set(mode, speed);
        }
        m_last_controlmode = mode;
        m_last_speed = speed;
    }
}

May not fix your problem but your can bus utilization will certainly be a lot less.

Edited to add constructor in the off chance you just tried my code snippet directly

1 Like

This used to be necessary but I believe all of the modern FRC vendorlibs do this internally nowadays.

I just did a cursory look at the WPI_TalonFX implementation, I didn’t see it in there. Not sure about any of the other vendors.

The volatile fluctuation of the CAN bus utilization measurement was something we saw during beta - it appears to be a bug in the DS reporting of bus util.

There’s nothing that can be done on the user end to improve the measurement, but the good news is that this doesn’t impact the real bus utilization or your ability to use CAN devices.

Practically speaking on-change checking like this won’t impact your bus util.
This is done in the low-level of Phoenix, plus the control data is sent periodically anyway.

2 Likes

Here’s our robot’s code that was causing the issue with the drive motor, I’ve messaged Max, the software engineer who got it working again, to have him commit the working code, but from what he said I believe the only code left is the DriveSubsystem and the ManualDriveTank command, along with the bare-bones necessary stuff of course.

https://github.com/Carlthird/V386-Troubleshooting

I see, is there anything in particular that seems to cause it? And could it present a hazard by pushing CAN bus utilization above 100% once we add more devices and code beyond those needed for just driving? Thanks!

Wanted to add here - for CTRE, checking this isn’t necessary. However, if you are doing utilizing REV and setting PID constants it is absolutely necessary. Adding the “on change” checking to REV for the spark max took us from over 80% to a much more reasonable 40%.

So in this case, be aware of what vendor you are using.

You mention “Adding the “on change” checking to the REV”… Where is that? in code or in Rev Hardware?

Also, is a noisy can bus normal? our utilization is from 50% to 90%. The line graph is 40% wide. That is with the bot sitting.

Thanks!

1 Like

We’re seeing a similar issue, only more extreme. It isn’t affecting our normal operation at all, but our CAN bus traffic is spiking roughly every 240 milliseconds, usually up to 100%. We think the culprit is the Falcon (TalonFX) motor controllers. The size of the traffic spike seemed to correlate with the number we had enabled.
In addition, we looked at the get and set StatusFramePeriod functions (found in the CTRE BaseMotorController class). The getStatusFramePeriod suggested that the following statuses have a period of 240 milliseconds by default:
Status 4: AinTempVbat (analog input, temperature, battery voltage)
Status 12: Feedback1
Status 13: Base_PIDF0
Status 15: FirmwareApiStatus
We then wrote some code to setStatusFramePeriod to various prime numbers, in the hopes of reducing the amount of overlap. However, when we ran this, it errored, making our TalonFXs to drop off the CAN bus. It’s quite possible we wrote some bad code, but we’re wondering if anybody else has attempted a similar solution, or if anybody has insight into why changing the status frame period could cause the TalonFXs to error
Thanks!

What was the error? And the spiking to 100% is normal due to the bug linked above.

I don’t recall the exact errors at the moment, but they seemed like the general “can’t find” error for the motors afterwards (all of the configuration commands afterwards failed, and it grayed out in Pheonix Tuner). Also, that bug sounds consistent with what we were seeing, but I’d be curious to know if it also explains the periodicity of the spikes.

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed.