RoboRio Chronically Crashing

We had a great morning of robotics before all of a sudden being plagued with some sort of native crash.

I can’t figure it out. I’ve collected log files of each crash and it seems like it all comes back to the same line.

C [libntcore.so+0x7d0d4] (anonymous namespace)::SImpl::SetValue((anonymous namespace)::ClientData*, (anonymous namespace)::TopicData*, nt…

I’ve reverted code to code thats worked for hours. Restarted driver stations, checked all can connections. Swapped from a rio 1 to a rio 2. ETC.

I’m at a loss. This dosen’t make any sense. I’m currious if anyone has seen anything similar before I dive in more. I attatched 3 different error logs.

hs_err_pid1761.txt (55.2 KB)
hs_err_pid2377.txt (55.1 KB)
hs_err_pid3095.txt (56.4 KB)

2 Likes

Seems like the attachment didn’t work properly.

@Peter_Johnson

Have you tried closing all of the network tables in April softwares on your driver station? Total shot in the dark on that one

What version of WPILib are you using?

If you can post your code somewhere I can see if I can reproduce. There’s a debug step we can do to get more info but it requires ssh’ing in and running some commands.

Should be there now. Sorry about that

I believe now the latest version. I had to update because i wanted that poseEstimator CR

Define latest version? Posting code would be awesome

I think it was the latest releas

1 Like

The crash almost looks like some sort of parellel bug but i dont really do anyhing in parallel. Im at a loss

Posted the code!

I’m not able to reproduce it quickly on my Rio here, but that could be for a variety of reasons (e…g different hardware, clients, etc). What dashboards/other clients are connected?

Could you run through the below steps and see if you can trigger the crash (this will produce a core dump file I can dig through in more detail).

ssh into the Rio as lvuser and run the following:

ulimit -c unlimited
/usr/local/frc/frcKillRobot.sh -t
./robotCommand

This enables core dumps and runs the robot program interactively. When the program crashes, there will be a long delay before it returns back to the prompt as it writes the core file. Then use sftp or ftp (e.g. FileZilla or Windows Explorer) to copy the /var/local/natinst/log/core_dump.!usr!local!frc!JRE!bin!java file to your local machine, zip it up, and send it to me. This file is about 100 MB unzipped, but should be a lot smaller zipped up.

Got it! Thanks for the reply. I wont be around my rio until Monday but i will then.

Anything right now i can look into as of why?

The weirdest thing to me is it ran fine all morning and then all of a sudden started doing that.

Connections wise… 2 instances of photon vision. We had shuffle board and the smartdashboard on one pc…and another pc running shuffle board on the other

So odd to me.

Has anyone else run into something like this?
No idea if its wpilib related but am going to probably also downgrade

1 Like

I guess its related to where that exception is… it looks like some sort of networking function.
Is there a concern other clients could cause this?

Long shot…does the issue persist when using JDK 11?

haven’t tried that yet but I can

Theres been some speculation on discord with another team seeing this issue that its related to the quantity of data put on the NetworkTables. I guess removing DataLog usage helped one team.

I am using the NT pretty heavily (it should be able to support the amount of data) but I guess that could be it? Trace looks related to some sort of client

I was able to reproduce and figure out what was happening. Fix will be in 2023.4.1.

1 Like

Incredible news! Thank you for your hard work!

What was the issue? Seemed like it was something in the native code but I don’t know that codebase even remotely.

Do you know when 2023.4.1 is coming? There isn’t a beta/dev branch is there? If this is the same crash we’ve been seeing its very annoying for testing =)

Instructions on how to test the fix were posted in your GitHub all wpilib issue a few days ago.