Teams, what is your average Rio CPU utilization?

This is half a help post, half a “what’s normal” post.

With all the complex stuff being done on the Rio in modern WPILib, I’m looking to get an idea of what CPU utilization is common. For context, we often find ourselves at 95-100% cpu (on Rio 1) for most of the runtime. We’re using several PID controllers and SwerveDrivePoseEstimator but actual AprilTag detection and pose estimation is done off board. Profiling hasn’t uncovered single major contributors to this.

Is high CPU utilization just something we should expect with the Rio 1? What kind of numbers are other teams running at?

2 Likes

Just to confirm data points, this is with the latest WPILib with the NT performance fix?

Yes, 2023.4.3 helped a little with loop overruns anecdotally but didn’t seem to significantly lower cpu usage.

I’ve heard that reimaging the Rio can help with CPU usage. No idea why that would be.

Also we are pushing maybe 100 topics from user code.

We’re counting CPU usage from the DS log/diagnostics panel.

I don’t remember exactly but I think we’re also quite high around maybe 80-90%. We’re also doing path generation on the fly and quite a lot of logging with Advantage Kit.

1 Like

We do OTF path generation but this is even while disabled between code start and the first time we’d be calling the generator.

Have you found there to be significant effects on loop runtimes?

1 Like

60-80ish

If teams with very high cpu usage can share links to their code, that would also be helpful to enable further investigation in the off-season. It can be hard to reproduce directly without a full robot because of CAN effects (eg every received CAN message takes some amount of CPU to process, and also CAN device not present errors swamp other things), but it might help identify common things to look at.

Here’s ours: GitHub - frc6995/Robot-2023 at lvr. We’re getting loop overruns faster than I can read them, mixed with CAN errors of timed out frames (presumably from dropped outgoing spark max config messages)

Here’s our code too: GitHub - pittsfordrobotics/ChargedUp2023: FRC 3181's code for the 2023 season

We were able to clean up the loop overruns at one point, but I think they might be back again. I was also thinking about trying to optimize some status frames for SparkMaxes since we also do drop CAN messages occasionally. The code also isn’t fully merged in to master, but will likely be in the next few days.

1 Like

One idea, although it would be a painful finding: try JDK 11 instead of 17? It might not work depending on what libraries you’re using though. You have to remove /usr/local/frc/bin/java (or reimage) and change build.gradle per GitHub - wpilibsuite/GradleRIO: The official gradle plugin for the FIRST Robotics Competition

Why would that be so painful?

It means we would potentially want to stick with 11 indefinitely and this teams wouldn’t benefit from newer Java features. Or glass half full, we just need to do a better tuning job.

We have not profiled our code, but our loops commonly take 22ms with WPILib 2023.4.3 on Rio 1

FRC2495/FRC2495-2023: Java Code for CHARGED UP (github.com)

How do we get started with profiling? Thanks.

We recently opened an issue to add this to the docs, but here’s a team page on how to set up VisualVM: GitHub - team2393/FRC2023: 2023 Robot

1 Like

I think that changing the line in build.gradle (without removing the JRE from the RIO) should be enough, but I haven’t tested this.

We’ve been having some pretty bad loop overrun issues all season, so we’re going to give this a shot. Question about the instruction to

Set gcType = 'CMS' in the java deploy block to use the CMS garbage collector

I’m getting an error message

  • What went wrong:
    A problem occurred evaluating root project ‘Robot-2023-Charged-Up’.
    Could not set unknown property ‘gcType’ for extension ‘deploy’ of type edu.wpi.first.deployutils.deploy.DeployExtension.

when I add the line above to the deploy {} block in build.gradle. Am I missing something?

The docs are wrong, and need to be fixed. It needs to go in the FRCJavaArtifact block.

uhh

        // TODO: Remove this when we're done optimizing our code
        try {
            Field watchDog = IterativeRobotBase.class.getDeclaredField("m_watchdog");
            watchDog.setAccessible(true);
            Watchdog actualWatchDog = (Watchdog) watchDog.get(this);
            actualWatchDog.setTimeout(1);
        } catch (NoSuchFieldException | IllegalAccessException e) {
            throw new RuntimeException(e);
        }

On a more serious note, I’ve found that the default garbage collector settings aren’t ideal for the amount of garbage my team generates. Specifically, the -XX:MaxGCPauseMillis=1 appears to be too low for us, and we saw more consistent loop times by increasing the Max Pause time a few ms. I suspect having this at the default 1ms is causing the GC not to collect fast enough, which eventually leads to the GC doing a Full Collection (because too many objects are promoted old gen)

On that note, there doesn’t seem to be a nice way for us to configure this flag because it’s automatically put in, and I haven’t found a way to be able to remove it. Would it be possible to add some sort of exit hatch so we could easily customize the flags ourselves?

1 Like

You can set gcType() to ‘other’ and then you can pass in your own args to jvmArgs.

1 Like