We are experiencing a lag issue in our code


#1

our teleop is taking to long around 23 ms. I have been through the code and i can not find any issues was hoping someone would see something we are not.


#2

We found this year that using SmartDashboard.put* in the periodic() method of a subsystem greatly increased the scheduler loop’s run time.

You could try commenting those out and running a simple timing function on Scheduler.getInstance().run() to see if it had any effect on loop times.


#3

We attached visualvm, a free profiler, to the running roborio code to see where it’s spending its time.
Interestingly, updates of the LiveWindow data showed up quite high.

Drilling down further, LiveWindow.updateValues() ends up querying for example our Solenoids, which was surprisingly slow and unfortunate because we never actually look at the LiveWindow data. The fix there was to call LiveWindow.disableAllTelemetry() to suppress it.
We also throttled some smart dashboard updates, only sending data every Nth call to the periodic methods, but don’t want to suppress them because the data is actually useful to have on the dashboard.

My main point: Try to measure where your code is really spending its time, for example via visualvm.


#4

I am concerned about LiveWindow on our robot; we appear to be having trouble with NT updates. But we are not using Test mode and thus LiveWindow should not be active. See https://wpilib.screenstepslive.com/s/currentCS/m/smartdashboard/l/255405-enabling-test-mode-livewindow

Do you (or anyone else) know why LiveWindow would be running when Test mode is not enabled?


#5

LiveWindow telemetry is always active unless you disable it (e.g. with LiveWindow.disableAllTelemetry()). Test mode only enables NT changes changing things like motor values.


#6

It looks like you’ve managed to hit this issue (hence confirming that more than one team is definitely running into this):

which I suspect is related to the performance issues raised by CTRE: CTRE Phoenix 5.14.1 performance improvements?

It would be great if you could show us more detailed timing. It’d be interesting to see what parts of the Solenoid code actually end up being slow, and how long it takes on average. (I suspect most of the time is spent in HAL.getSolenoid?)


#7

Umm, should that not be announced loudly? Certainly, it is not obvious from the documentation. That can be quite a load on the Rio and on the network. We have been experiencing what looks like overloading of NetworkTables, and this is now my #1 suspect.


#8

No, NetworkTables does not consume much network bandwidth or CPU. A few hundred bytes at a 10 Hz rate is not much data. In terms of CPU load, the NetworkTable set() calls are non-blocking; it’s basically grabbing a lock (that doesn’t have much contention on it) and setting a value in a hash table, this takes a dozen microseconds at most per value, so even a hundred values would be less than 1 ms.

However, if you have a lot of CAN devices, those CAN calls are indeed turning out to be much higher overhead than we had anticipated, and indeed don’t need to be polled every loop (since the data is actually sent at a much slower rate).


#9

I believe we have around 12 CAN devices (lots of motors), so yes that could be a problem.


#10

In our case LiveWindow.updateValues() was mostly stuck in getting the solenoid states via SolenoidJNI.getSolenoid(). The Java profiler doesn’t show detail below the JNI call, but I assume it’s CAN bus access.
Recent CTR blog message suggest that the CTR lib optimizes CAN bus access, i.e. only the first call out to a CAN bus device set fetches data, and other calls within the same update period get cached data. But without the LiveWindow we would have no solenoid access at all unless somebody pushes buttons which actuate them, so that would be even better.
Looking at the LiveWindow code I was thinking that it’s only active in test mode. Took a while to notice that it’s actually always enabled unless you specifically disable it (and then it would only run in test mode).


#11

For what it’s worth, being able to run visualvm has been extremely helpful to us, but it’s a little hard to enable it because you need to add options like these to the java command that starts the robot:

-Dcom.sun.management.jmxremote=true -Dcom.sun.management.jmxremote.port=1099 -Dcom.sun.management.jmxremote.local.only=false -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -Djava.rmi.server.hostname=172.22.11.2

That’s doable for somebody familiar with Linux: ssh to the robot, use vi to edit the robotCommand. But then you need to re-do that each time you deploy code to the robot, which restores the robotCommand.
Would be great to add that to the gradle setup, but no idea how to do that.


#12

you can put those args in the jvmArgs array in the frcJavaArtifact section of your build.gradle to accomplish that.


#13

Just tonight I found what seemed like an major delaying factor in Java:
Improbable Java slow down in combining doubles and strings

This seems to explain the problems we’ve been seeing, and I could see it causing ripple problems throughout the code stack if my analysis is correct.


#14

Did you have to do anything special to get visualvm to read the data from the RIO? I had no luck (but I am far from a Java expert) when I tried.


#15

Our code doesn’t seem to have a jvmArgs option in the default-generated frcJavaArtifact section.
Can you provide an example or point to a link that shows how to add JVM arguments properly to the build.gradle file? Would it just look like this?

        frcJavaArtifact('frcJava') {
            targets << "roborio"
            // Debug can be overridden by command line, for use with VSCode
            debug = frc.getDebugOrDefault(false)
            jvmArgs = [
                "-Dcom.sun.management.jmxremote=true", 
                "-Dcom.sun.management.jmxremote.port=1099",
                "-Dcom.sun.management.jmxremote.local.only=false",
                "-Dcom.sun.management.jmxremote.ssl=false",
                "-Dcom.sun.management.jmxremote.authenticate=false",
                "-Djava.rmi.server.hostname=172.22.11.2"
            ]
        }

#16

Here’s what I did:

Get visualvm from https://visualvm.github.io

When you launch it, you’ll see locally running Java programs.
The hard part is instrumenting the roborio so that you can connect to the java instance running inside the robot.

For now we do that by connecting to the robot and editing the file over there that starts the robot code.
We installed ‘git’ support, and when you open the “git bash” shell, you can connect to the robot via

# Use 10.TE.AM.2 when using radio, otherwise the 'USB' IP address
ssh lvuser@172.22.11.2

If ssh issues “WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED!” because you had used it with a different RoboRIO:

# Remove old SSH info, then try to connect again
rm .ssh/known_hosts
ssh lvuser@172.22.11.2

There is no password required.

Edit the file robotCommand . By default it will look like this:

/usr/local/frc/JRE/bin/java -Djava.library.path=/usr/local/frc/third-party/lib   -jar "/home/lvuser/NameOfYourProject.jar"

Add these settings:

-Dcom.sun.management.jmxremote=true -Dcom.sun.management.jmxremote.port=1099 -Dcom.sun.management.jmxremote.local.only=false -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -Djava.rmi.server.hostname=172.22.11.2 

The settings need to be added after the .../bin/java , and use the correct IP for the hostname setting. To do that with the vi editor:

vi robotCommand
# Move cursor to the end of .../bin/java
# Press 'i' for insert
# Paste the text for the added settings
# Press `ESC` to exit insert mode
# Type ':wq' and press 'Enter' to write and quit

Stop the running Java instance:

/usr/local/frc/bin/frcKillRobot.sh

After a few seconds you should see that a new one is automatically started, this time using the additional JMX settings:

ps | grep java

Back on the laptop, start visualvm and connect to the program running on the robot.

  • File, Add JMX Connection
  • ‘Connection:’ 172.22.11.2:1099 (respectively 10.TE.AM.2:1099)
  • Check ‘Do not require SSL connection’
  • A new entry with a ‘pid’ should appear under the ‘Remote’ list. Double-click, then check ‘Monitor’, ‘Sample… CPU’ etc.

As mentioned in the discussion above, it should be possible to add those -Dcom.sun.m... settings to the build.gradle file, and then you simply deploy your code as usual, no need to ssh into the robot and edit files over there after each deployment.


#17

Just tried the build.gradle route. Adding this worked for me:

 artifacts {
        frcJavaArtifact('frcJava') {
            targets << "roborio"
            // Debug can be overridden by command line, for use with VSCode
            debug = frc.getDebugOrDefault(false)
            // ADDED SECTION TO ENABLE visualvm
            jvmArgs = [ "-Dcom.sun.management.jmxremote=true",
                        "-Dcom.sun.management.jmxremote.port=1099",
                        "-Dcom.sun.management.jmxremote.local.only=false",
                        "-Dcom.sun.management.jmxremote.ssl=false",
                        "-Dcom.sun.management.jmxremote.authenticate=false",
                        "-Djava.rmi.server.hostname=172.22.11.2"
                     ]
             // END OF ADDED SECTION
        }

Now simply ‘deploy’, then on the laptop start visualvm, File, Add JMX Connection, ‘Connection:’ 172.22.11.2:1099, …


#18

Ideally, since gradle is uploading to the roborio and thus knows its IP adddess, it should be able to automagically place that IP address in the "-Djava.rmi.server.hostname=<The IP Address>", but no idea how to do that.


#19


this is our cpu usage. not sure what waitfornotifieralarm is but it seems to be using up the entire cpu.


#20

I would call that a false positive. That function is waiting for the next time step in TimedRobot. So it would make sense that most of the time is being spent there (e.g. if your code spends 10 ms doing something, and the loop time is 50 ms, it will spend 40 ms in that function).