I’ve recently been a bit unhappy with my teleop main loop running kind of slow. Without any wait/sleep added in, I think each iteration takes around 50-70 ms, and sometimes starves Robot Drive (the “loop is not running fast enough” errors).
This year I’ve also been using a ton of variables in network tables, one for basically every constant on the robot. So that means maybe 50 network tables calls for each iteration of teleop.
When trying to figure out why the loop is slow, I considered that maybe retrieving these values is kind of slow, if it has to deal with things over the network. Could this actually be the case, or is the loop slowness probably caused by something else?
NetworkTables should actually be fairly efficient in that use case. All of the data is cached, so reading 50 variables should not take long at all.
There are a few other things to look at however. There are some methods in the WPILib that take 0.2-0.5ms to run, so if you are calling multiples of those they add up quick. One of the biggest culprits of this are the Driver Station mode checks. Calling IsEnabled, IsTeleop, or any of those types of methods do take a lot of time, so if you are calling those in a loop there becomes a lot of lag. Joystick get methods other then axes or buttons also take long to perform. The last ones are CAN calls, which can take a while, especially reads from CAN Talons, which depending on the parameter can take over a millisecond.
Joystick calls are not slow. They access variables stored in memory that are received from the driver station at regular intervals (around every 20ms for a Driver Station Control Packet). When one of these packets is received, the joystick values are stored in memory, and those are accessed through the Joystick class. There is nothing slow about this at all, in fact, it’s probably the fastest way to do something like this. I don’t usually applaud WPILib for doing something right, but this is one of those times.
This is true for a majority of WPILib’s get functions, as it enables control loops to run quickly without ‘blocking’ for an external resource. Reading up on the different types of frames used by CAN, The Driver Station and FMS will give you a good idea of why all of these methods are non-blocking.
If you’re wondering about your ‘taking too long to loop’ issue, I’m afraid there’s not much help we can provide unless you provide some code. If the loops are taking too long, there are two possible things.
a) The issue is occurring within the loop. Somewhere in the loop, your code is running slowly. This may be an intensive math problem or a function call. If you’re using reflection anywhere, that will be your culprit.
b) The issue is occurring outside the loop and bogging down the system. This can be caused by delayed Driver Station / FMS Control Packets (check your wifi router), or by having too many active threads. A good status indicator is your RSL (Robot Signal Light). If it’s flashing irregularly, it’s an issue outside the loop and your RoboRIO is being bogged down by an intensive process.
If you’re running any kind of vision on your RoboRIO, this can also be an issue with the system being bogged down.
If you actually look at the code, Joystick get button and get axes calls are cached, however calling GetJoystickName or GetJoystickIsXbox are not cached, and are slow. The Robot State methods are not cached either, and directly call into the FPGA and are slow. And finally, in CANTalon, whenever some parameters are requested, there is an explicit Thread.sleep for about 1ms called, which also causes a context switch, and could be delaying even longer. So even though these methods should be non blocking, in the current implementation they are not exactly, however they are known block times, rather then indefinite waits like actual blocking calls would be.
GetJoystickName and GetJoystickIsXbox both retrieve values from the HALJoystickDescriptor, which stores values in memory (source 1source 2). The Joystick Descriptor is updated each control packet as defined by DriverStation::GetData(). This code appears to be common between Java and C++ variants of the library.
A state retrieval doesn’t affect the inner loop at all, since state ticking is done by WPILib before user code, and therefore should not cause a loop to slow.
I will, however, admit that CAN Status Signals are delayed, and this was a mistake in my reading. My apologies for that.
FWIW, in the past NetworkTables has had locking issues. However, it was completely rewritten this year by someone I trust to write it correctly, but it’s possible that there are some issues somewhere that haven’t been caught yet.
Even without bad locking issues causing delays, I’d be willing to bet that each read of a NetworkTables variable is going through a lock (I haven’t looked at this year’s code) or two, so it will add up a little bit… though, probably not 75ms worth (take it out, see what happens?).
The way I get around this latency in pynetworktables is we provide “auto update values” and ntproperty, which just create a placeholder object that you can read from atomically without taking a lock. The object gets updated by a subscriber which will update the value anytime an update is received from the network – so it’s only changing when the value actually changes.
Not true for Java… if you look at the Java code most of the methods are marked as ‘synchronized’. The DriverStation thread task calls getData() which goes into HAL a bunch so it’s probably not particularly fast (JNI overhead) – and it’s also marked as synchronized, so they’ll all block each other in various ways.
I didn’t realize that C++ did this differently though, very interesting.
Yeah Java does it differently. I have some patches that bring Java and C++ closer together in Gerrit, but they are not going to be merged until after the season. They also use ReaderWriterLocks and double buffering in order to reduce memory usage and increase speed as well. Also Javas GetData doesn’t get the status data anyway, so those are not cached and called individually.
Also, its not HAL or JNI overhead. For some reason, any call into the FRC_NetworkCommunication library takes 0.2-0.3ms. Doesn’t matter which language, just entering those takes the time. So when the DriverStation Robot Mode check methods are called, every single time they enter that library, and every single call takes 0.2-0.3ms. If you run a loop in Teleop, that in a for loop calls DriverStation.IsEnabled 300 times, that entire sequence will take about 60 ms to run.