CRio Connection: Buggy

I re-imaged our cRio to v27 first with just NetConsole and then with both NetConsole and CAN Jaguar Black Serial plugin. Of course, now it seems to work. I ran LabView code that was essentially the default FRC cRIO Robot Project modified to drive a single motor using CAN.

From the console log, I did see two troubling messages when the cRIO boots:
interrupt: PCI Error: initiator aborted due to timeout
interrupt: PCI Error: initiator aborted due to timeout

I’m not sure what they mean but the system appeared to work. I was able to deploy and run CAN code. The system crashed once, I think when I was using the System Monitor. In the several times I ran code from LabView I received the Unable to connect to real-time target only once, accompanied by four flashes in the status LED. I didn’t see anything in the netConsole at that time.

I’ll do more testing tomorrow. The NetConsole log I captured is attached.

NetConsole2.txt (7.37 KB)


NetConsole2.txt (7.37 KB)

We afre seeing the same issue here. Spent a few days trying to figure out what was wrong with the rio before finding this.

We flashed with net console and CAN enabled for the black jags (didn’t try with can disabled).

Flashed to v27 innumerable times (flashing always worked fine), labview would not deploy code. I was able to ping and ftp into the rio without issue. The driver station would see robot comms on bootup momentarily, then it would lose it. I was able to successfully deploy from labview maybe 10% of the time. After a sucessful deploy, control through the driver station worked reliably.

This is pretty easy to replicate on our end.

The only fix that seemed to work reliably was to flah back to v25.

We seem to be hitting bugs this year every step of the way. Last week it was problems with the encoders. I don’t mean to complain, but these types of issues don’t help when your trying to teach the kids how something “should work”

What gives?

I have an update from last night.

We added case structures to timed tasks to be sure that none of our code in timed tasks runs unless the robot is enabled. It took about 4 tries to deploy this software (using Run As Startup). After getting this software on the cRIO, we never had another problem.

It makes me think that something in v27 is taking up a lot of CPU overhead causing issues with the communication. I seem to remember issues last year when the user software started taking up too much CPU time, it became difficult to deploy code. Perhaps this is similar.

It does have the look and feel of something that doesn’t finish booting due to some race condition and locks up specific cRIO resources. Once you break free of it, everything is fine.

Some services are blocked or slowed while others remain unimpaired.
Probably, due to specific resource conflicts.

I would hazard that it isn’t cRIO CPU overhead though, since some low-priority services (ftp) sometimes work. Those are the first to fail when there’s a CPU shortage.
The 4-flashes might indicate that memory isn’t being released by something that sucked it up, then doesn’t complete and just sits there. Maybe two boot tasks asking for a combined 110% of memory at the same time, then waiting for the other to give up first. Sort of an OS hydra staring contest.

One thing to test, if it happens to any of us again, is to run the LabVIEW System Manager and monitor the CPU and Memory usage for clues. Do it before attempting to download or do anything else, while the 4 flashes are occurring.
If anything significant happens take a screen shot and post it for the rest of us.

Well, I spoke too soon. When our programmers started up the cRio this afternoon we again got the four flash status light and inability to deploy code to the cRio. They reimaged the cRio with v27 without NetConsole and everything seemed to work.

To debug this further, I reimaged with the v27 with NetConsole and CAN Black Jag. The attached log shows the results of two instances of failed boots. Both resulted in four flash status lights. I tested several more times and got the same results - we had no successful boots. I re-imaged without the NetConsole and the system works. CAN is enabled and working.

It appears from the logs that we are getting two crashes without a reboot which is one of the interpretations of the four flash status. All appear to be an 0x00300 exception.

So it may not be memory related. Though removing the NetConsole code seems to help. The other day when the system was working with the full image including the NetConsole, System Manager reported that I still had about 18900K of free memory.

netConsole0210.txt (17.4 KB)


netConsole0210.txt (17.4 KB)

Since some of you have narrowed it down to what appears to be NetConsole.out, but the log that captured the exceptions all showed lvrt.out on the call stack, I’d like you to try pulling the NetConsole.out binary from the v25 image an placing it in the v27 image and attempt to reproduce the issue. It’s possible that something the NetConsole is doing on startup is causing lvrt.out to crash, but I’d like to make sure the v27 version of NetConsole is to blame and not the v27 environment.

Thanks,
-Joe

I’ll give that a try this evening.

We were able to reproduce the issue on one of the cRIOs here and believe we have fixed the issue causing the 4 blinks and the “Exception code: 0x00000300”

Please download the new image from FIRST Forge here: http://firstforge.wpi.edu/sf/go/projects.wpilib/frs.crio_images.crio_image_v28

Let me know if this resolves the issue for you.

Thanks,
-Joe

Thanks Joe! We’ll try it tomorrow.

As you expected, replacing the NetConsole.out file in v27 with the v25 file removed the error.

We flashed to v28 today from v25.
Everything appears to be functioning as expected. We’ll report back if we see any issues.

Thanks!

Our experience seems to indicate something going on with the “Enable NetConsole - CAN Driver Plugin - Black Jaguar Serial Bridge” option that we chose to install with our v25 and v27 imaging.

At 1:30PM, we installed the FRC_2011_v27.zip image that came with the latest Labview v3.1 update, via the tether. The dialog box indicated that the installation was successful. I rebooted/cycled the cRIO while the main programming crew was at lunch.

At 1:50PM - when the main programming crew came back - they simply tried to install our working baseline code. The download just about immediately stalled and threw an error. Seems that an entry in the ni-rt.ini file under the “Startup Dlls” entry section on the cRIO was incompatible with the newer image. We ftp’d into the cRIO, pulled the ni-rt.ini file to the Developer laptop and deleted the entry - and placed the ni-rt.ini file back on the cRIO. At least that got us back to an earlier version of the ini file.

At 2:20PM – after finding a copy of the v25 image on a Mentor’s laptop - we imaged the cRIO with the older v25 image. We checked the “Enable NetConsole - CAN Driver Plugin - Black Jaguar Serial Bridge” options. This imaging was also successful. AND our CAN/Black Jag code now worked. We figured we lost about 1 1/2 of prime Saturday programming time dealing with this anomoly.

Our experience with v27 and our basic CAN/Black Jag-centric baseline code indicates that there is a serious incompatibility. We figure that until v27 is “fixed” - we will just have to use the v25 image.

v27 had changes to a number of components including NetConsole. If NetConsole was enabled, the changes were overflowing a thread stack and in some situations caused the crash, reboot, and blink. v28 is identical to v27 except it contains a NetConsole with about 16KB more stack for the thread.

Obviously you can continue with v25 for the remainder of the build, but v28 has been posted. You can try it if it fits your schedule. FIRST should officially announce it in the next update – the bug was reproduced and fixed on Friday.

Greg McKaskle

Thx Greg, Thx Joe - we’ll try the new v28 image when we get back to our Developer laptop.

We reimaged with the v28 image and things seem to be working as expected. Joe, thanks for figuring this out and fixing it!

how do we update to v28 please help?

Take the LabVIEW v28 image from Source Forge and drop it in:
C:\Program Files\National Instruments\LabVIEW 8.6\project\CRIO Tool\FRC Images\FRC_2011_v28.zip

Then it will appear in the cRIO Imaging Tool list.

@ mark
Thanks ill try it when I get to build site.

It works thank your very much.

Just to cover some bases for ChaosX73:

I can’t seem to image the cRio from my personal laptop, so I have to use the classmate. I ‘bricked’ (which means to turn it into an expensive paper-weight) our cRio by using the imaging tool on my laptop, so the way you have to fix it is format the cRio. You format it by putting the cRio in safe-mode. Once its in safe mode, try to deploy code using the imaging tool. It will recognize it in safe-mode and ask if you want to format it, do so. This will return it to its normal state so you can put new code on.

Its likely not bricked, so try and just deploy v28, but if your computer doesn’t recognize it and the IP address doesn’t seem to be recognized by your computer, its likely bricked.

Again, this is just to cover some bases for ChaosX73. Hope it helps! :smiley:

where do we put the downloaded image on the PC so it shows up in frc crio imaging tool.
thanks:)