Our team is currently working on building a robot for an offseason competition and we’ve run into several out of memory errors when testing code on the robot. I believe I’ve narrowed down the issue to the number of NT clients connected because of some tests that I’ve conducted. The issues started when we added a Limelight 2 (v2023.6.0) for Apriltag vision and connected it to the RoboRIO through a network switch. When we turned on the robot it would work fine until the point where I connected either AdvantageScope (v3.0.0-beta-5) or Shuffleboard (v2024.1.1-beta-3). The robot would continue running for about 2-4 seconds until it threw a memory error. It would then restart the robot code and then throw another memory error after ~2 seconds of running. I tried disconnecting the Limelight (and removing references to it in the code) and was able to connect two instances of both AS and SB to the robot without issue for several minutes. With AS connected I tried reconnecting the Limelight and once again it would error after a couple seconds. I retried this test with SB connected instead of AS and it had the same result. We had a Limelight 1 available so I tried connecting that alongside the Limelight 2 without connecting anything else to NT and yet again it threw a memory error.
I don’t really know how to solve this issue besides not connecting AS or SB to the robot but that’s a huge detriment to the software team and the drivers. I know that other teams are able to run a lot more cameras than we are and still connect AS and SB to their robot and be perfectly fine, so I’m very curious as to what is the actual issue with our robot. One of our programmers also brought up the fact that the Limelight 2 connects with NT3 instead of NT4 which might be an issue if NT for some reason doesn’t like using both NT3 and NT4 at the same time. I’ll have more info on the memory errors later today when I recreate the issue and record what gets printed to the console.
Here’s our robot code repo if somehow we’re doing something very wrong in our code that’s causing the memory errors. We’re using WPILib v2024.1.1-beta-3.
It might be better to take this conversation to the beta forums so we can track it better. What would help me debug as well is if you can capture what all is getting published (eg a copy of the datalog if you’re saving one). One thing that’s unique to NT3 clients is they effectively subscribe to everything in NT, but that’s also true of Shuffleboard currently. It may also be that there’s a large amount of data getting sent by the Limelight that causes a memory allocation spike as we replicate it to other clients. Do you see a problem with just limelight connected (without AS and/or shuffleboard)?
My guess is that the combination of lots of data being sent by Limelight combined with AS subscribing with options to send the full history of things may be causing a lot of memory usage (preserving history with less frequent periodic sends means the server has to save it until it can send it). Does Limelight+SB work without AS connected?
Limelight & SB still throws a memory error even without AS connected.
I made this spreadsheet when performing the tests to try to narrow down what would be the issue (note that “Limelight 1” is the physical Limelight 2 and “Limelight 2” is the physical Limelight 1)
Log notes generated by reviewing the console that is logged by AdvantageKit in AdvantageScope
Memory error is consistently
OpenJDK Client VM warning: INFO: os::commit_memory(0xb439f000, 32768, 1) failed; error='Not enough space' (errno=12)
#
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (mmap) failed to map 32768 bytes for committing reserved memory.
# An error report file with more information is saved as:
# /tmp/hs_err_pid2652.log
(We’ve gotten other types of memory errors so I thought I’d clarify)
Log notes:
0_InitialBootup
Log after powering on the robot
Crashes 20 seconds after Shuffleboard is connected
4_Disconnected_SB
Disconnected Shuffleboard shortly after robotInit()
Crashes 0.167 seconds (!!!) after AdvantageScope is connected
6_LongCrash
Somehow runs for a full minute before crashing
7_Disconnected_LL
Disconnected the Limelight
Constant lljson error prints because of disconnected Limelight
Unknown crash cause: Probably because I started downloading the logs then