No robot code after running for a while (Java VM ran out of memory)

During practice, our drivers complained that after running the robot for a while, the robot would disconnect and said “No Robot Code” until we reboot the RoboRIO, couldn’t even “Restart Code”, must do a RoboRIO reboot. I finally looked into the problem and found out the Java VM ran out of memory.
This is the first year our team switched over to Java. We used C++ in previous years. We have a C++ library that we ported over to Java. One of the modules is Vision Targeting. I suspect this module is “leaking memory”. In our C++ implementation, we allocated memory for processing the video frame and freed it when we are done with the frame. But in Java, there is no “free memory” because Java relies on the garbage collector to reclaim memory. I suspect the reclaim process is not immediate and depends on if the Java VM can detect the memory is no longer in use. So if we are processing video frames at some frame rate, we are losing memory on the video frame at frame rate. That could be substantiate after a while. How would one deal with this kind of issues? Sorry if this is a naïve question, we are new to Java.

When the Java VM detects that it is running out of memory, the garbage collector should start trying as hard as it can to destroy unused objects, preventing you from running out of memory. You should make sure you are not somehow retaining a reference to old objects, preventing them from being garbage collected. It may also be that you are allocating memory too fast for the garbage collector to handle it, but I’m not sure if this is possible.

Another recommendation I have is that you try to reuse objects as much as possible between frames. Memory allocation is a fairly expensive operation. Especially in Java, since you cannot explicitly destruct objects, when you construct large objects in a fast loop, the garbage collector has to work very hard to destroy them, which can slow down your program.

It may be helpful to note that we ran into similar issues, despite not using any vision tracking. Our diagnostics showed that it may have been an issue related to too many DirectByteBuffers being allocated. We couldn’t figure out exactly why this happened, but it may be related to the fact that WPILibJ allocates a DirectByteBuffer and lets the Garbage Collector collect it … every single time that it makes a JNI call. However, I couldn’t seem to fix this by caching them, so I’m not sure if it was the issue.

It might be helpful to disable your vision tracking code to see if the problem is with vision tracking or a different part of your code.

There is no real way to guarantee that a DBB is freed, even if you give a user the option. Especially in a Real-Time Operating System where terminating the memory may not necessarily free all reserved memory (WPILib’s Deploy Task does kill the process) but you would hope this was tested.

To the OP, you have a few options:

  1. Provide an explicit native method that de-allocates the resources held by the native code. This requires direct action but does not require too many changes to your C++ code. Although it does depend on the code itself. The standard library itself has this option although it is hidden in DirectByteBuffer as a Deallocator which makes calls to the hidden VM class, hidden Bits class and makes extensive use of Unsafe operations (obviously… it’s native memory). As a side project I worked on more secure DirectBuffer allocation/deallocation myself.

  2. Change your code to be re-usable, this is an extension of option 1 really but it just takes that and prevents you from allocating a new Java Object.

  3. Run a Profiler. I have been trying to figure out the best way to do it and I don’t have access to any kind of control module but deploying in Debug mode should active something from what I have read and seen. Learning how to debug a Java program on Linux might be useful as well, look into using opkg to install some helpful tools. Command Line profiling is always the best.

This is assuming you are absolutely confident that the problem is the native code and not some other memory allocation gone awry. Watch your references and always micro-test when possible.

We examined our vision target module closely and found only one potential problematic allocation per processed frame. We have now changed the algorithm so that it no longer does that allocation. We will test this new code tonight. We hope that will fix the problem.

So far so good, the problem seems to be gone.

Would you please elaborate what you mean by “problematic” allocation? i.e. Why did you identify it as potentially problematic?

Our vision targeting module is doing its work in a separate thread processing each video frame. It saves the processing result away for the main robot thread to retrieve when it needs the target info. Part of the processing result is the filtered image. Since the processed image is saved away for the main robot thread to access later, the vision task thread allocated another frame buffer for capturing and processing the next frame. That’s the problematic allocation. We looked closer on why we need to save away the processed frame and discovered it used the frame to determine the distance to the tote (sample vision code from WPI). Looking further, the method that needs the processed image is to determine the image size. In particular, just the image width. Therefore, we can easily determine the image size right after processing the image and just save away the image size instead of passing along the entire image frame. Therefore, we no longer need to allocate another frame buffer for processing the next frame. Just re-use the previous buffer.

Thanks for the detailed explanation.

The question still remains I guess: why did that “problematic allocation” cause the code to crash after an extended period of running?

It would be useful to understand exactly what’s going on. Such understanding could provide Java teams with some guidance concerning memory allocation.

Java gurus?

In our previous C++ implementation, the main robot thread will free the image frame once it’s done with it. But in Java, there is no “delete” object. Java relies on the garbage collector to automatically reclaim unused memory. I am not an expert in Java but my understanding is that garbage collection is a relatively expensive operation. It needs to scan references to objects to make sure an object has no reference before it can be reclaimed. Because of the expensive nature of the operation, garbage collector tends to run only when memory is running low (below a certain threshold). In our case, we are allocating image frame at processing rate (could be as fast as the video frame rate). I am guessing the garbage collector was not able to catch up by reclaiming enough memory for the allocation so the Java VM simply run out of memory and crashed.
In my personal opinion, I always don’t like the fact that Java takes away the ability for programmers to explicitly free memory. Garbage collector to auto reclaim memory is nice but it doesn’t have to take away the delete operator. I know why they did it but I don’t agree with it.