What actually happens if your commands takes more that 20ms

I know all commands should finish their execute loops well under 20ms, but what actually happens if the code doesn’t finish in time.

For example, lets say the start of Auto launches a command, and inside this command a really complex trajectory path is loaded where it has to read a text file, build splines, parameterize, profile etc, etc. and this line of code doesn’t finish inside 20ms.

  1. Will the next .run() call of the scheduler (called from robot periodic) start executing the execute loop of the command again even though the last loop hasn’t finished,
  2. If so does it terminate the last execute loop because it didn’t finish in time?
  3. Or does robot periodic get delayed and not fire when you would expect it to because the scheduler is still waiting for one of the command’s execute method to finish?
  4. Is there any multithreading happening on the RIO or is all code just being run sequentially?

I know people are going to say that you shouldn’t be building paths in commands execute methods and they should all be built in the robot init method before auto starts etc, etc, but I’d like to know what actually happens if a command execute method actually takes say 100ms to finish? (eg a very poor implementation of on the fly motion profile planning from your current position to some known point on the field)
Thanks
Warren.

3 Likes

TimedRobot and scheduler run everything within a single thread. So the answers are:

  1. No
  2. No (N/A)
  3. Yes. What actually happens is it gets run as fast as possible as many times as possible until it “catches up”. Eg if you have a 20 ms loop that usually takes 10 ms and it gets delayed 100 ms one time, the next 10 iterations (100 / (20-10)) will run back to back. I think this is actually not the behavior we want, so we may change this next year to not try to “catch up” but rather simply run the next iteration at the next periodic scheduled opportunity.
  4. TimedRobot and the command scheduler are all single threaded. There are many other threads running in the background and you can create your own threads, just keep in mind the mutex protections required for variables shared between threads. Some wpilib functions do call callbacks on other threads, eg NetworkTables callbacks and Notifier callbacks.
1 Like

I didn’t realize and I don’t like that it tries to catch up. I just always assumed those loops were lost. This could cause serious problems with integrals if teams don’t take care to use a measured dt. It also messes with anything that is based off of loop counts and not the clock.

Slightly unrelated but I’ve noticed that our loop generally is < 1ms with an occasional one over but still < 10ms. If the robot is on but DS isn’t connected, the driver station and shuffle board are running and then you connect the DS to the wifi network. The DS software seems to lag. The loop times measured are then around 14ms with many over 20ms. Close shuffle board and DS and re launch and the loop time is back to <1ms. Is there a known problem here? This seems very close to the sequence that would take place when connecting on the field. I’d need to double check but I think the comms light is half red half green when this condition occurs.

1 Like

I also don’t like the current behavior; we can’t change it mid season (as it could break existing code) but will definitely look at changing it for next year.

While the DS is connected, I would expect to see slightly higher loop times, because there is higher cpu utilization (eg NetComm is sending/receiving packets, NT is sending/receiving data, etc). On initial connection, there might be a very short term slowdown as new code runs that hasn’t been touched before at the Java layer, but that should settle out within a loop iteration or two. I have no idea why it would be as extreme as you are describing continuously and not return to “normal” until after a DS restart. The half green indicator on the DS indicates there is a UDP connection but not a TCP one.

2 Likes

There’s been a PR open to fix that for two weeks ([wpilib] Fix repeat TimedRobot callbacks on loop overrun by calcmogul · Pull Request #4101 · wpilibsuite/allwpilib · GitHub), but the tests fail nondeterministically. We’ve seen issues like this before due to race conditions in the sim Notifier impl, so that PR may be blocked on us rewriting the sim Notifier impl (again).

Also, Watchdog, MotorSafety, and TimedRobot each have their own cooperative multitasking schedulers around Notifiers that we’re planning on refactoring at some point.

1 Like

Thanks Peter, Kyle and Tyler,

This is precisely the info I was looking for.

One method would be to split the command into several parts and store the state in the command or subsystem.

Pass 1) read file etc etc

Or

Push the info network tables and have a coprocessor (on robot or off) do the calculations.

Don’t wait for the results, but check the status of the job each entry into execute.

I believe execute runs in a manner very similar to periodic.

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed.