Command Scheduler Loop Overrun

A .zip of some of tonights Driver Station log files can be found on drive.

Our code is hosted on github as MusicalToaster.

Each subsystem has a thread (the camera class does have a thread, but is not a subsystem any more. This approach is similar to what CTRE did with their SwerveDrivetrain class.

It became an issue when we were using Phoenix Tuner to run the right shooter motor so we could develop a table for determining good angles at different distances for our turret. It would run for maybe 5 seconds, then the loop over run error would occur, and it would stop.

We noticed that the code would just restart, and I am unsure what the cause of it restarting would be. Loop Overrun is just a warning, right?

We commented out telemetrization of a few of the subsystems states to see if reducing outputs to network tables would resolve the issue.

Should we use periodic() instead? Having a separate thread theorectically allows for us to be able to update values more often, but if for some reason it’s causing us issues, then we can use periodic(). As of right now, as you can see in the logs, the CPU usage is relatively low. It’s not like the threads are fighting for CPU in my mind, but I could be wrong.

We can not afford another whole night to deal with this blunder. Any suggestions to help guide us in the right direction and pin point the root cause are greatly appreciated.

I am also thinking that it could be the thread priority. Unlike in the SwerveDrivetrain class from CTRE, we are not setting the thread priority to 1 like them. I do not know what the priority of a thread is by default, but if nothing else, we will try setting the priority for each of the threads tomorrow.

This is a known issue with using “.whileTrue()” in your button bindings. You can fix it by changing it from "controller.a.whileTrue()"to “controller.getHID.getAButton()”

More info can be found here Periodically calling CommandXboxController button functions causes CommandScheduler loop overruns · Issue #5903 · wpilibsuite/allwpilib · GitHub

2 Likes

I saw that, thats why you can see instead of using the TGR enum we have for naming our triggers in RobotContainer.java, for any controller input where we just need the boolean value, we switched to using the returned value from the XboxController class within CommandXboxController class using getHID or in a few cases, the method of CommandXboxController returns the value from the hid variable.

Now you’re saying that any code using the Trigger class, even the .whileTrue calls that are only called once during initialize(), are going to cause issues? I’m not familiar with how these Triggers work behind the scenes.

What’s the work around for triggering commands using the controller without using whileTrue? Wouldn’t I have to create a new Trigger and reference the controller boolean from the hid? is that not the same thing?

You’re saying there is something different between these two lines of code? or is there another way that I’m not aware of?

new Trigger(() -> driverController.getHID().getAButton()).whileTrue(Commands.none());

driverController.a().whileTrue(Commands.none());

No, that’s definitely not the case. The only issue is with calling a trigger factory (ie, a(), leftBumper() etc) repeatedly as a way of polling the button. You also don’t want to call binding methods (such as whileTrue) each loop, but that’s not a common issue.

3 Likes

Thank you for clarifying. I believe I have already resolved all the places in code that falls into the criteria that create the issue, and still we had loop overruns.

We may have a few problems here. The crashing may be due to the same issues this solution by Peter Johnson attempts to solve for memory issues. We have roborio 1 and don’t plan on updating, so we might have to try the sysctl changes made there and see if that resolves the crashing, because loop overrun shouldnt be causing crashing.

@Peter_Johnson Can you comment on this?

We also observe unexplained loop overruns (with a Rio 1). Should we try caching all our triggers?

Thanks!

See Ryan’s response. Don’t create new triggers in periodic at all (eg by calling a trigger factory function like a()).

1 Like

Oh… rereading the post carefully, the problem only occurs when calling a trigger factory in periodic. We don’t do that. Shouldn’t post so late. Sorry.

Tonight, we went all out.

To get to the bottom of our loop overruns (which are causing more issues than just warnings), we went ahead and did a few things.

We tested an empty TimedRobot. This performed without overruns.

We stripped our code down to drivetrain. with only drivetrain, we had overruns.

We are using CTRE swerve library. We have a couple classes to extend the functionality. They create a thread. I don’t see how this thread could impact overruns, but we decided to copy their SwerveDrivetrain class and make it extends SubsystemBase and put the thread logic into periodic().

By printing out the time that certain code chunks in the periodic function took, we noticed that the spikes in loop time were primarily coming from when the module state was getting applied. In other words, when calling setControl on the TalonFX class, every once in a while that would spike beyond normal. It was nominally at like 2.5 ms for just applying the controlrequests. Spike for applying all 4 module states could be more than 10ms for sure. I don’t have the data from tonight, but you can view the code on the “overruns” branch.

2.5ms seems like quite a bit of time for nominal, and the spikes are enough to cause just this bit of code in the ControlRequests to cause loop overruns.

Is there something I am missing? Is there a reason we are seeing this take this long? Is this amount of time to be expected? If so, why? When we have to also run other subsystems, even if this call is on another thread, the roborio 1 only has 2 cores to my knowledge. I would think this could and seemingly is causing overrun issues whether it is on its own thread or it is being ran in periodic or directly from a command.

This is quite concerning considering we make the calls to our other motors in our other subsystems as well. Do note, we did change the loop time to 0.010 while testing, but with just drivetrain, this shouldn’t be a problem, but it is.

1 Like

Hi Ryan,

Just to be absolutely clear to your comment as we’re chasing our own loop overrun issues and I’m new to this stuff.

The issue is only if you are doing your own button polling in a repeating loop right. (like a periodic function) If you are configuring button bindings in RobotContiner constructor it should still be safe? Is that still a safe pattern?

If youre constructing buttons (i.e. new JoystickButton(...)) or polling buttons (i.e. myController.getRawButton(...)), whether in a periodic loop or otherwise, you’re fine.

If you’re using the Trigger factory methods (i.e. myCommandXboxController.a(). Really nearly any method that returns a trigger), then that shouldn’t be used in a periodic context. If you call it once (i.e. in configureButtonBindings), then you’re fine.

2 Likes

Thanks that tracks, I thought we were going to have to build our own event processor. I just wanted to verify the safe pattern.

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed.