cRIO Drops connection after deploy

Hello -

We’re running a new 4 slot cRIO on a Lenovo ThinkPad, imaged with the 2013 LabVIEW firmware. The lenovo’s IP address is configured to 10.0.23.5 for ethernet, and 10.0.23.9 for wireless, with a subnet of 255.0.0.0. (The cRIO is configured at IP 10.0.23.2)

When deploying code (either directly over ethernet, wireless, or bridged ethernet), the initial deployment goes smoothly, albeit extremely long. If I am to stop RobotMain to change a few things and redeploy, the deployment succeeds however the cRIO will promptly drop communication with the Drivers Station, causing us to have to redeploy from the start.

Note, the Dashboard is closed during the entire process.

Any help?
Dom

Are you running the Driver Station on the same computer you’re programming on?

What do you mean by “deploy”? Are you actually choosing the Deploy menu command, or are you just running Robot Main?

Describe the symptoms. What exactly do you see happening when what you call “the cRIO will promptly drop communication” occurs?

I’m developing the project that is currently being used on the cRIO with the same computer that is connecting to said cRIO.

The cRIO has had code deployed and set as startup, however in the context of this issue, I am running RobotMain.

I remember looking at the cRIO LED’s today when it was dropping comms, but I completely forgot what they were flashing. I believe the User1 LED was doing something funky. I can verify tomorrow.

When I run RobotMain initially (run button), the code will deploy properly and function as expected. However, when stopped to edit small details and then redeployed (run button), we encounter an issue. The code will deploy properly, and the grid on the front panel of RobotMain will disappear as if it were running properly. When I then switch to the Driver Station, the Robot Code light remains red, and the Communications light turns from green to red. RobotMain will then echo an error, “Communication with realtime target lost”.

EDIT: The cRIO has code built and deployed as startup on it, if that has any bearing in the matter.

I would suggest connecting up a serial cable or running netconsole to see what is going to the console. That sounds like your cRIO may be crashing either during deploy or shortly after.

Greg McKaskle

It doesn’t seem reasonable to do otherwise (i.e. to do development on a computer but not connect that computer to the cRIO), so it’s likely that I don’t understand what you mean.

When I run RobotMain initially (run button), the code will deploy properly and function as expected. However, when stopped to edit small details and then redeployed (run button), we encounter an issue. The code will deploy properly, and the grid on the front panel of RobotMain will disappear as if it were running properly.

I’m going to accept your statement about “deploy properly” without asking you to describe exactly what is happening, but if we find out later that there’s something going wrong that you think is normal you’re going to feel very silly. :slight_smile:

How are you stopping execution? Are you clicking the big “Finish” button, or are you using the “stop” icon in the Robot Main window’s toolbar?

The “run” arrow should tell you whether it’s running or not. If it turns from white to black with “motion” lines, it’s running.

When I then switch to the Driver Station,

What does “switch to the Driver Station” mean, exactly?

the Robot Code light remains red, and the Communications light turns from green to red. RobotMain will then echo an error, “Communication with realtime target lost”.

Does the error occur only after switching to the Driver Station? If you wait before switching, does the error happen later?

And what would I be looking for?

Alan -

I’ve tried both using the large “FINISH” button on the front panel of RobotMain, as well as the stop icon next to the run icon in the LabVIEW toolbar above the VI.

When I say “switch to the Driver Station”, I am referring to the application. I switch from RobotMain to the 2013 Driver Station.

I have not tried waiting before switching, but I will try tomorrow.

If you look at the console during boot, it is similar to what you’ll see on a linux console or a DOS console. The stream of text will tell you that is is booting, some components and some version numbers, and at some point will stop changing.

Then if you have an error at the driver or OS level, it will print out details about the error – very useful.

If it crashes, it will often tell you the library name, maybe an address or reason for the crash – also very useful.

Sometimes it crashes and doesn’t tell you anything, but you see the boot sequence and know for certain that it crashed.

Greg McKaskle

I assume I have to reformat the cRIO to enable the NetConsole?

If you didn’t make that selection when you first imaged the cRIO, then yes you have to go back and click the box so the netconsole part of the image gets downloaded to the cRIO.

Enabling NetConsole doesn’t require reformatting the cRIO. You just have to run the Imaging Tool and apply the appropriate selection.

Ah, gotcha. I’ll have to do this Tuesday/Wednesday…we were just given the chassis to populate with various electronic components and wires.

Dom

-> * Loading nisysrpc.out: nisysrpc
* Loading NiRioRpc.out: NiRioRpc
* Loading nivissvc.out: nivissvc
* Loading nivision.out: nivision
NI-RIO Server 12.0.0b10 started successfully.
* Loading visa32.out: visa32
* Loading niserial.out: niserial
* Loading NiFpgaLv.out: NiFpgaLv
* Loading FRC_FPGA.out: FRC_FPGA
* Loading FRC_NetworkCommunication.out: FRC_NetworkCommunication
FRC_NetworkCommunication version: p4-1.4.0a10
FPGA Hardware GUID: 0x1394F6DC1FEB42EC6910E5767ED1D22C
FPGA Software GUID: 0x1394F6DC1FEB42EC6910E5767ED1D22C
FPGA Hardware Version: 2012
FPGA Software Version: 2012
FPGA Hardware Revision: 1.6.4
FPGA Software Revision: 1.6.4

Startup Application: /c/ni-rt/startup/startup.rtexe

Welcome to LabVIEW Real-Time 12.0rc7
Too much error data!
Too much error data!
*** BEGIN SYSTEM EXCEPTION LOG ***

Target type: cRIO-FRC II
Target code: 75C7

System time (UTC): 1970-01-01 00:04:07
System tick count: 247171 ms

Exception code: 0x00000300

Register contents:
DAR = 0x65727665  DSISR = 0x40000000
MSR = 0x0000B012  FPCSR = 0x82002000
 LR = 0x01874C6C    CTR = 0x00000000
 CR = 0x00000000    XER = 0x00000000

GPR  0 = 0x01874C6C    GPR  1 = 0x01150b70
GPR  2 = 0x00000000    GPR  3 = 0x65727665
GPR  4 = 0x02464994    GPR  5 = 0x00000000
GPR  6 = 0x00000000    GPR  7 = 0x00000000
GPR  8 = 0x00000010    GPR  9 = 0x02320000
GPR 10 = 0x00000000    GPR 11 = 0x00070000
GPR 12 = 0x24000442    GPR 13 = 0x00000000
GPR 14 = 0x00000000    GPR 15 = 0x00000000
GPR 16 = 0x00000000    GPR 17 = 0x00000000
GPR 18 = 0x00000000    GPR 19 = 0x00000000
GPR 20 = 0x00000000    GPR 21 = 0x00000000
GPR 22 = 0x00000000    GPR 23 = 0x00000000
GPR 24 = 0x00000000    GPR 25 = 0x00000000
GPR 26 = 0x00000000    GPR 27 = 0x00000000
GPR 28 = 0x00000000    GPR 29 = 0x0265daa0
GPR 30 = 0x018714C0    GPR 31 = 0x00c75800
 PC = 0x018750a0 in module 0x0

Thread ID: 0x01150DB0   Thread name: LabVIEW Execution System 2 Thre
Thread stack base: 0x01150DB0  stack size: 131072

Call Stack:
0x1df8900+0x284: ThThreadDestroy () in module lvrt.out
0x1e59b24+0x1124: OnOccurrenceAndOccurAtTimeForExec () in module lvrt.out
0x1e58d3c+0x33c: OnOccurrenceAndOccurAtTimeForExec () in module lvrt.out
0x18750a0+0x3fc8f8: _dtors () in module 0x0

All Loaded Modules:
	MODULE NAME     MODULE ID  TEXT START DATA START  BSS START
	--------------- ---------- ---------- ---------- ----------
	FRC_NetworkCommunication.out 0x00e96728 0x010dcfd0 0x01110308 0x01110438
	   FRC_FPGA.out 0x00e84330 0x01040668 0x010b0060 0x010b01f8
	   NiFpgaLv.out 0x00e80538 0x00e891e0 0x00e961b8 0x00e96398
	     NiFpga.out 0x00e7fb40 0x00f23a58 0x00f6c970 0x00f6cf10
	   niserial.out 0x00ec4f90 0x00fa9420 0x0103f8f0 0x0103f9c0
	     visa32.out 0x00e7f5c0 0x00ea80e8 0x00ec1610 0x00ec1780
	   NiViSv32.out 0x00da8b88 0x00e3b2e8 0x00e49d50 0x00e4a8d8
	   nivision.out 0x00e39260 0x030b22a0 0x03a0aa48 0x03acc3f0
	   nivissvc.out 0x00e39e28 0x011cedb0 0x014767e0 0x014976c0
	   NiRioRpc.out 0x00e2b8a8 0x00d3dd28 0x00d47978 0x00d47b50
	   nisysrpc.out 0x00cf6e98 0x00d54790 0x00d8f6b0 0x00d92ee8
	   nisysapi.out 0x00cf5528 0x00dac150 0x00e26868 0x00e2b468
	 NetConsole.out 0x02370840 0x0236e4c0 0x0236fa38 0x0236fa78
	startuppatch.out 0x0236d6f0 0x023f0cc8 0x02404968 0x02405628
	   nisvcloc.out 0x0236ca48 0x0238bef0 0x02391a60 0x02391af0
	   tsengine.out 0x023522f8 0x023c5d38 0x023eea28 0x023f0b78
	      nipci.out 0x0233d7e8 0x0234fae8 0x02352018 0x023520b0
	niriompc5125k.out 0x0233bad8 0x023720c8 0x0238bcd8 0x0238bd50
	   niriosrv.out 0x00aea5e8 0x00b4efc0 0x00c0fd70 0x00c0ff60
	       lvrt.out 0x00ab92e0 0x0192a8b8 0x022b3f70 0x02318a88
	     nirpcs.out 0x00abb578 0x00b30cf8 0x00b4e800 0x00b4ee48
	   niCPULib.out 0x00ab8518 0x00abca28 0x00ae1238 0x00ae17d8
	   libexpat.out 0x00ab7f90 0x00aed738 0000000000 0x00b0b5c8
	     nirtdm.out 0x009022c0 0x00a2fd00 0x00a39f30 0x00a3a058
	   libiconv.out 0x00a2f778 0x008c2b20 0x009021c8 0x009021f0
	   ftpserve.out 0x008ad970 0x008ae138 0x008be3b0 0x008bef00
	nimdnsResponder.out 0x00890b10 0x00a60c58 0x00aa5c40 0x00aa5cc0
	    vxfpsup.out 0x00876828 0x00877080 0x00877470 0x00877578
	     ni_emb.out 0x00863278 0x00863800 0x008757e8 0x00875b20
	     lvuste.out 0x00862518 0x00862cd8 0x00862f30 0x00862f60
	     target.out 0x00825a70 0x009f96b8 0x00a021b0 0x00a023a8
	    vx_exec.out 0x00824700 0x0092f358 0x009f5288 0x009f7570

Memory statistics:
Total system memory:           130551792 bytes
Free memory:                   74232400 bytes
Largest free block:            72530880 bytes
Peak usage:                    56429856 bytes

*** END SYSTEM EXCEPTION LOG ***

[SYSTEM MESSAGE] System is Shutting Down...

Two deploys before the error occurs. I noticed when the code is first deployed, “Too much error data!” is displayed. This remains the case when code is redeployed.

Sorry I didn’t get to this earlier. Work stuff.

The crash log is very helpful.

I’m pretty sure that the “Too much error data!” messages occur when the robot buffer for errors won’t fit into a status packet. The status packet will contain some of the messages, some will be left out, and some text about messages being left out will be appended to the message display and log file. I’m not sure that this has anything to do with the error, but since that code is exercised pretty rarely, it could give a hint.

Can you give more context on what is going on when this occurs? THThreadDestroy() is a wrapper for the posix threads or whatever thread package that LV is using on the OS. As far as I know, LV doesn’t often destroy threads during execution. This seems like LV is either exiting or is reconfiguring itself for a different execution and a data structure in the thread or owning the thread has been corrupted.

Do you think this would be pretty repeatable? Could you send your code to us and tell us how to run it in order to cause the crash?

In the meantime, it may disappear as you change your code. Please keep you spidey sense alert for what you think it was related to. In particular, these sorts of corruptions are generally due to a DLL or .out that hands LV a bad structure or writes past one of LV’s structures. The LV generated code is obviously capable of the corruption too, but those are less common.

Feel free to contact NI support or my PM if you need more info on how to get this to us.

Greg McKaskle

The snow is keeping us out of our shop, but hopefully come Monday we’ll be back to work. I’ll be able to give more information then.

We are having the identical problem that the OP described. Here’s how I describe the sequence.

Setup:
Using the same laptop running Labview for code development and for running the driver station. I believe we had the dashboard application closed. Both Labview and the driver station application are actively running on the computer. The laptop is directly wired into the D-Link to eliminate the possibility of wireless issues.

  1. Power up robot and wait for the cRio to come online.
  2. Open Robot Main.vi and click the white arrow to “run” the code.
  3. Wait for the (slow) initial deployment to complete. No errors. Click the “Close” button on the deployment dialog box.
  4. Click “Enable” on the driver station and operate the robot. All user controls work (joystick, etc). The robot behaves perfectly according to the latest code changes.
  5. Click “Finish” on Robot Main.vi and click “Disable” on the driver station.
  6. Make some minor code change in Labview, save it, and click the white arrow in Robot Main again.
  7. Wait for the very fast re-deployment to occur, as it only updates the changed parts of the program in memory. This re-deployment appears to occur without error. Click the “Close” button on the deployment dialog box.
  8. Immediately followed by a pop error saying communication with the target has been lost.

This sequence, after one night of troubleshooting, appears to be very repeatable and consistent. We will try enabling the netconsole feature and seeing if we get similar errors as the OP. Has there been any further progress with troubleshooting this problem?

If this is repeatable, please bundle up your project folder and send it to NI support so that it can be investigated. I don’t know of a bug or a reproducible set of steps that causes this behavior.

Greg McKaskle

Hi. I’m from team 4125 and we are using also using a Lenovo Thinkpad. The same thing is happening with us. We’ve tried everything, with no luck. Maybe its the laptop? We’re going to try to solve this problem by switching to the computer we used last year. We’ll let you know how it goes…

Despite many reformats and tweaks to our code, we continue to have the same issue as initially described. We will be taking a second cRIO to competition with us in the event we need to switch them.

Note, the issue is only present on one of our cRIOs. The issue does not appear on any of two other cRIO II’s formatted with the 2013 LabVIEW firmware.

EDIT - @WeRrobots - I find it hard to believe the issue is stemming form the computer. The error is not reproducible on other cRIOs.

We had issues until we realized the new routers have QOS enabled on them, I would try turning QOS off on the router if you have not already.

Spencer - The symptoms were reproducible when connected directly to the cRIO via an ethernet cable, through the (2012 legal) dlink 1522 Rev A - wired and wireless, as well as through the (2013 legal) dlink 1522 Rev B - wired and wireless.