Unusually high memory usage by WPILib

We noticed this towards the end of the beta testing program, but it would appear that WPILib is using a very large amount of memory for seemingly nothing. Normally, I wouldn’t be too worried about memory usage, but it frequently interferes with our deploys (using GradleRIO, it gets in a loop where code cannot start because there isn’t enough free memory, requiring a RIO reboot). It also leaves very little room in memory for third party libraries.

To verify that the issue was actually WPILib, we created an empty project consisting of a single, empty robot class. No third party libraries were included. We deployed this code and checked memory usage using the ‘top’ command over SSH and saw 89% (around 25 MB free) system memory usage from the user program. These tests were run on the latest firmware (2018_v16) and WPILib release (2018.1.1).

For reference, a fully loaded robot program on our 2016 robot (with 2017 firmware and WPLib, with third party libs) leaves around 89 MB free.

1 Like

Bumping for visibility since this post took so long to get approved. We’re still experiencing this issue.

1 Like

Note that the “free” reported by top does not represent the amount of available memory (as Linux will use up memory for caches). Similarly, the VSZ reported by top does not represent the amount of memory a process is actually using (the resident set size). Conceivably this could be an issue with GradleRIO not waiting for the previous robot program to die before the new one is started; generally there’s only enough memory on the rio to run one robot program at a time.

The output of the following commands would be helpful to see if you actually are seeing unusual memory usage:

cat /proc/meminfo
cat /proc/PID/status

where PID=the PID of the FRCUserProgram as shown in top.

Just got to the shop and ran some tests. Only 84% used by robot program this time, leaving 52-54 MB free as reported by both top and the Driver Station.


$ cat /proc/meminfo

MemTotal:         250280 kB
MemFree:           54264 kB
MemAvailable:     129244 kB
Buffers:               0 kB
Cached:            91404 kB
SwapCached:            0 kB
Active:            87628 kB
Inactive:          35012 kB
Active(anon):      47052 kB
Inactive(anon):      656 kB
Active(file):      40576 kB
Inactive(file):    34356 kB
Unevictable:       37516 kB
Mlocked:           37516 kB
HighTotal:             0 kB
HighFree:              0 kB
LowTotal:         250280 kB
LowFree:           54264 kB
SwapTotal:             0 kB
SwapFree:              0 kB
Dirty:                 0 kB
Writeback:             0 kB
AnonPages:         68760 kB
Mapped:            50548 kB
Shmem:               864 kB
Slab:              18168 kB
SReclaimable:       8532 kB
SUnreclaim:         9636 kB
KernelStack:        2192 kB
PageTables:         1712 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:      215240 kB
Committed_AS:     173980 kB
VmallocTotal:     770048 kB
VmallocUsed:           0 kB
VmallocChunk:          0 kB


$ cat /proc/<PID>/status

Name:	java
State:	S (sleeping)
Tgid:	3425
Ngid:	0
Pid:	3425
PPid:	3416
TracerPid:	0
Uid:	500	500	500	500
Gid:	500	500	500	500
FDSize:	256
Groups:	5 46 498 500 
VmPeak:	  211072 kB
VmSize:	  211072 kB
VmLck:	       0 kB
VmPin:	       0 kB
VmHWM:	   32440 kB
VmRSS:	   32040 kB
RssAnon:	   15872 kB
RssFile:	   16124 kB
RssShmem:	      44 kB
VmData:	   47336 kB
VmStk:	     136 kB
VmExe:	       4 kB
VmLib:	   25440 kB
VmPTE:	      74 kB
VmPMD:	       0 kB
VmSwap:	       0 kB
Threads:	17
SigQ:	0/1947
SigPnd:	0000000000000000
ShdPnd:	0000000000000000
SigBlk:	0000000000000000
SigIgn:	0000000000002000
SigCgt:	0000000181005ccf
CapInh:	0000000000000000
CapPrm:	0000000000800000
CapEff:	0000000000800000
CapBnd:	0000003fffffffff
CapAmb:	0000000000000000
Cpus_allowed:	3
Cpus_allowed_list:	0-1
Mems_allowed:	1
Mems_allowed_list:	0
voluntary_ctxt_switches:	3
nonvoluntary_ctxt_switches:	59

Hopefully this is helpful. Most of this is over my head!

1 Like

Just to provide some clarification, we usually only start having issues after several code redeploys. We use GradleRIO for our deploys (guess I should try eclipse to make sure that’s not the problem). Once the issue starts, it doesn’t stop until we reboot the RIO. Below is one of the errors we’ve seen (it happens repeatedly until the RIO is rebooted).

thread -1443068816 also had an error]  OpenJDK Client VM warning: INFO: os::commit_memory(0xa9f79000, 12288, 0) failed; error='Cannot allocate memory' (errno=12)  # /tmp/hs_err_pid4603.log  # An error report file with more information is saved as:  # Native memory allocation (mmap) failed to map 12288 bytes for committing reserved memory.  # There is insufficient memory for the Java Runtime Environment to continue.  #
java.lang.OutOfMemoryError: unable to create new native thread 

This error is followed by a stack trace of a ThreadPoolExecutor, saying it can’t create a native thread (we do use executors in our robot code, however since the error says it can’t create the thread, I don’t think we’re leaking any memory). I’m not posting the trace since it doesn’t contain any WPILib or robot code calls, only ThreadPoolExecutor methods responsible for creating threads. We’ve also had a variant of this error happen on the empty, no external lib project, but without a stacktrace.

Another error we see a lot is this one

Error: failed /usr/local/frc/JRE/lib/aarch32/client/libjvm.so, because /usr/local/frc/JRE/lib/aarch32/client/libjvm.so: cannot map zero-fill pages 

This error doesn’t crash robot code, but prints out repeatedly in the console. This also happens after code redeploys, and can be fixed by a reboot followed by a deploy. This error is by far the most frustrating, and happens the most frequently.

It almost seems like the JRE is leaking memory across redeploys, but that’s just a guess.

My posts seem to be taking a long time to get approved, so I have no idea when this will go live, but hopefully this info is helpful.

1 Like

That looks pretty normal. There’s 129M memory available (MemAvailable in /proc/meminfo), and the resident set size (VmRSS in /proc/PID/status) of the robot program is 32M. The DS is reporting MemFree rather than MemAvailable.

It sounds like GradleRIO may not be properly killing the robot program? Memory is pretty tight overall so it’s important for deployment that the old robot program completely die before the new robot program is started. Have you had any issues when deploying with Eclipse?

The part of GradleRIO that kills the current robot task appears to be here - this might help with debugging.

The Eclipse plugin is doing something very similar.

GradleRIO has called sync before the script to kill robot code at least as far back as 2017, and we had no issues with it.

As best we can tell this started happening between FRC 2018 Beta 3 and FRC 2018 Beta 5. We’re planning to replace the Zulu JRE with an Oracle one to see if that helps anything.

Which GradleRIO version are you using? Since one of the near-kickoff releases, we’ve switched to a new version of the Zulu JRE that decreases memory usage pretty significantly. Keep in mind GradleRIO doesn’t automatically delete old JRE’s, so if you SSH in and run rm -rf /usr/local/frc/JRE/ as the admin user it should bypass GradleRIO’s cache check and redeploy the JRE.

2018.01.15 is the latest GradleRIO release.

GradleRIO calls sync to make sure the files on the system are committed to flash memory, to avoid corrupted files when rebooted suddenly. GradleRIO has done this since 2016.

If the issue represents itself, please let me know ASAP.

We are using GradleRIO 2018.01.15, we try to stay up to date with the latest release.

I’ll try redeploying the JRE today, but we did reimage the roboRIO with the kickoff release, so there shouldn’t be an old JRE.

Perhaps a gradle command to force the redeploy (eg execute that command) wouldn’t be a bad idea.

I only got to do a bit of testing on this today, I tried removing the current JRE and redeploying with GradleRIO but the JRE deploy did not seem to work.

Pic of what I did:

We’re going to reimage the RIO tomorrow and continue testing with Zulu and maybe a different JRE.

School was canceled for us today, so I was able to spend a long time trying out various things.

I started by completely formatting our RIO using NI MAX. Next, I flashed the latest image (2018_v16). My first test was to use the old java installer to deploy the Oracle JRE onto the RIO. After loading our full robot code, we saw absolutely no memory issues whatsoever. Running free -m reported an average of 109 MB free after accounting for Linux caches. I was able to re-deploy 25 times in a row without a single issue.

Next, I once again fully reformatted the RIO and this time allowed GradleRIO to deploy the Zulu JRE. After deploying code just 3 times, I was greeted with the same looping “out of memory” errors.

 OpenJDK Client VM warning: INFO: os::commit_memory(0xacb9a000, 131072, 0) failed; error='Cannot allocate memory' (errno=12) 
 # 
 # There is insufficient memory for the Java Runtime Environment to continue. 
 # Native memory allocation (mmap) failed to map 131072 bytes for committing reserved memory. 
 # An error report file with more information is saved as: 
 # /tmp/hs_err_pid5103.log

Once again, the “error report” was empty.

For now, the solution for us will be to use the Oracle JRE. However, the simplicity of having the JRE deployed for us is very convenient, and it would be great if we could figure out what is causing these issues.

1 Like

I can’t seem to recreate this. Have you run at least one ./gradlew build while connected to the internet? Try doing ./gradlew deploy -Pdeploy-dry while connected to the internet, and then a ./gradlew deploy if the problem persists.

I had done a build while connected to the internet prior to that screenshot, but I’m also kind of dumb so I may have screwed up somewhere. I’ll get one of the programming students who actually know what they’re doing to test it again today. :o

We’ve been snowed out of our shop for several days so testing has been limited, but we were able to reproduce the memory issues with Zulu. Cameron has a post that’s waiting for approval describing his testing.

The Zulu JRE does take more memory at startup than the Oracle one, in part because it’s a full JRE instead of a compact2 JRE. The 2018 Beta 3 release (available at https://github.com/wpilibsuite/zulu-jre-ipk/releases) was a compact2 JRE, so it did have lower memory usage, although I think that one might not support remote debugging (Zulu provided another one that does but we never packaged a release for it). We reverted to the “full” JRE for the final release because we worked with NI to decrease memory usage elsewhere in the system (https://github.com/wpilibsuite/EclipsePlugins/pull/154).

There seem to be very few teams running into this issue, which still makes me think it’s GradleRIO related.

Are you guys still running Kotlin? Do you use multiple Gradle projects with a single parent build file?

After updating to 2018.1.15, make sure you use the cleanEclipse task and --refresh-dependencies flag.

Just trying to think of the nuances I’ve run into this season.

Peter, I’m still pretty sure it’s us screwing up somewhere, just not sure where yet. The only functional differences I could discern between GradleRIO’s deploy and the EclipsePlugin version is when sync is called and having calls to both the C++ and Java kill scripts.

GradleRIO:

ctx.execute("sync")
ctx.execute(". /etc/profile.d/natinst-path.sh; /usr/local/frc/bin/frcKillRobot.sh -t -r 2> /dev/null")

EclipsePlugin:

command="${deploy.kill.command}; sync" />

Which for Java works out to:

/etc/profile.d/natinst-path.sh; sync

Cam has a post that should go live whenever a mod gets around to it, he wiped a different one of our roboRIOs, loaded the stock 2018 image onto it, deployed the Zulu JRE, and had the same memory issues. Replacing the Zulu JRE with an Oracle JRE resulted in no memory issues after 20+ deploys. I don’t know all of the details on this as I wasn’t around for testing.

Jesse, we are running Kotlin for everything, and yes we’re using multiple project. That’s a good point about cleanEclipse.

Currently we’re planning to just use the Oracle JRE, but will continue testing.

Have you tried deploying using Eclipse instead of GradleRIO from a freshly imaged RoboRIO?

One major difference is I think GradleRIO runs those commands as admin rather than lvuser.

I also don’t know if GradleRIO disables the StartupDLLs when deploying the JRE. Specifically, I’m talking about the sed command to comment out the StartupDLLs shown here: Reduce lvrt RAM usage by commenting out StartupDLLs. by PeterJohnson · Pull Request #154 · wpilibsuite/EclipsePlugins · GitHub. You can of course check this on your RoboRIO by looking at /etc/natinst/share/ni-rt.ini.

I was actually just about to post about that, it looks like GradleRIO isn’t using the sed command, as best I can tell it should go here.

I’m not sure if Cam has done testing with Eclipse yet, will double check, and will check that ini when I get a chance.