Fatal WPILib bug

Hello,

I am getting a heap corruption on my team’s robot and I would like to verify that my robot’s image is valid. Can someone please compute and post the checksums of your copy of the 2014 image. The problem appeared when I reimaged the cRIO to 2014v52. While running the 2013 image the robot does not crash. When I instantiate a Gyro, WPILib silently corrupts memory. I can demonstrate this by running (https://gist.github.com/electromatter/8995248) in test mode. When I run that program in Test mode and switch to Teleop, the robot crashes within a second. However, when I run the program only in Teleop, the robot does not crash. The jump to 0xEEEEEEEE is not consistent.

related files:
C:\WindRiver\WPILib\cRIO_Images\FRC_2014_v52.zip
C:\WindRiver\WPILib\WPILibC++Source20140101rev3876.zip
C:\WindRiver\vxworks-6.3 arget\lib\WPILib.a

md5 sums:
1077394c36cb21eb698ef16484c4c14c FRC_2014_v52.zip
a769926dbce97e3f9222e6fa79281068 WPILib.a
84869dd148975d43b8fad2670a6a02b8 WPILibC++Source20140101rev3876.zip

sha1 sums:
0989431bad7e34d5fcce2d42e290aa313dafb764 FRC_2014_v52.zip
a7edb741b589fb53f8e49eaaa38cd242de463dfa WPILib.a
bd6de27d6ba8c6814c2ba43f6e0f121bb8876ecc WPILibC++Source20140101rev3876.zip

Information from the debugger:

Exception in Kernel Task FRC_RobotTask:0xd0ce00
at pc=0xEEEEEEEC

Vector 0x200: Machine Check status=0xEEEEEEEE

Faulting module: FRC_RobotTask - 0xd0ce00

Stack trace:
0xEEEEEEEC
LiveWindow::SetEnabled() - LiveWindow.cpp:64
SimpleRobot::StartCompetition() - SimpleRobot.cpp:141
RobotBase::robotTask() - RobotBase.cpp:145
vxTaskEntry() - 0x000b48cc

I’ve been watching the OP struggle with this issue for a few days now. Any help you could offer would be greatly appreciated.

Our next step will probably be to redownload and reinstall the v52 image.

Do you get similar behavior when switching from Test to Autonomous?

I have a few theories, but don’t want to get too in depth without knowing a little bit more. Re-imaging the cRIO shouldn’t hurt it even if it’s not the cause of the error, so I wouldn’t hesitate to try that, though all of your checksums match what I got here.

(Also greetings from a NHGS alum)

The corruption or crash could be caused by the user code, by WPILib, or by a corrupted image. As mentioned already, it will not hurt to reimage, though the file system is quite robust.

I suspect that the symptom will remain. I’d suggest debugging the Test code. If DriverStationLCD::GetInstance(); doesn’t return a valid object, the next line will crash. I also don’t know much about the WPILib C++ implementation, but it could be responsible. Independently put in the LCD and gyro code and see which seems responsible.

Greg McKaskle

Since the crash happens when switching from Test to OperatorControl, and both methods use the DriverStationLCD method, I’m wondering if the bug is somewhere in the ~Gyro stack, which would explain that it happens when switching to other methods that don’t contain that object.

If this is the case, it might be mitigated by making your Gyro object a member of the class rather than an object local to a particular method.

pcfens: I moved the Gyro to the stack to make the test more localized. I have tested code with Gyro as a member, it had the same problem.
Greg: GetInstance should throw std::bad_alloc if allocation fails. Otherwise, GetInstance should return a valid object.

Do you mind updating the Gist to reflect the test that you’re running now with the same issue?

The test code that was posted does assume that a valid object pointer is returned. If the LCD class is implemented properly, that isn’t a problem. But if it isn’t, the test code will dereference and crash.

I can’t debug the code, but that is one of the things I’d check.

Greg McKaskle

This is probably something you’ve already checked, but can you confirm that your analog module (NI 9201) is plugged into slot 1 on the cRio?

At some point these were switched on ours, and I found that when I tried to instantiate a gyro, it would throw an exception. I didn’t dive too deep into the details of what the exception was so I don’t know if it matches what you’re getting.