Huge memory usage & OOM crashes

For a while, we’ve been facing the occasional crash due to an out-of-memory error. Previously, it happened once in a blue moon, no big deal. However, in the past 24 hours, it’s gone from crashing after 30 minutes, to crashing after 5, to now crashing after less than 5 seconds after the robot code starts.

Here’s a link to our code.

In order to help debug, I looked at all our periodically running functions. All of them are either logging or SmartDashboard calls. Removing them changes nothing. No heap objects (new) are created in any periodic functions.

Checking with top, the RoboRIO’s free memory steadily climbs from ~30MB free, decreasing by 4MB every 2-3 seconds, until it reaches 0 and it crashes. Per another thread we disabled the web server to no avail.

We also tried to run VisualVM on it, but were unsuccessful as it crashed before we could connect. Instead, I tried running VisualVM on a simulation, and what I found is that HashMap$Node is being allocated at a crazy high rate.

It climbs from a few KB on startup to lots and lots of megabytes as the program ages. Occasionally GC will clear it but that takes a while.

We never use HashMap so it’s likely a vendor library.

Help would be appreciated. Thanks

1 Like

Can you click on the + next to the HashMap to drill down where it’s being called from?

Just curious. Do you have a Rio 1 or Rio 2? We had a memory issue with the Rio 1 crashing continuously and updated to the Rio 2 and have had zero problems since.

Didn’t realize that existed, thanks.

Everything seems to be coming from the pathfinding from PathPlanner. I’ll take a closer look.

We have a Rio 1.

We had a similar issue and had to update to the Rio 2. Everything is working great now. This issue didn’t start happening until this year. I don’t know exactly what changed, but our memory would slowly get eaten up until a crash occurred.

OK, so I removed pathfinding from the project temporarily. Nothing’s using up more than 2MB, and it’s still crashing.

It’s still crashing in 5 seconds? Even after a full Rio reboot?

Just simulated this and was about to tell you, something very weird is going on with your pathplanner :slight_smile:

1 Like

Yes. The only thing that seems to be using up any appreciable amount of memory in the simulation is some odometry stuff, but nothing huge.

What coprocessors do you have sending data? On a phone so limited in code review ability right now.

Here’s a recent error log in case it’s useful

hs_err_pid8355.txt (61.7 KB)

Orange pi and Limelight, however the Orange pi was unplugged for the past few hours. Removing the limelight does nothing.

Plenty of available memory, might be memory fragmentation? There’s a sysctl you could try setting: Memory Leaks - #11 by truher

Another more major step would be to switch to an alternative lvrt process to free up more memory. I think there’s a link in that thread to one.

5 Likes

That appears to have fixed it! Thanks.

Thank you Mr. Johnson for solving our problems. You are greatly appreciated. W man

3 Likes

FWIW this should probably be on an frc-docs page outlining how to deal with memory issues, I can probably write this sometime soon

I will have to try this tomorrow. I believe we are experiencing similar issues, although it’s convoluted with loop overrun issues at the moment.

What was your fix? Serial gc?

EDIT: I misunderstood… Did the sysctls?

The default this year is serial GC. I think it was the sysctls? But clarity on which one(s) helped would be helpful so we can document it (and even do a patch release which does it for teams for this year, and fix it in the image next year).

1 Like