View Full Version : Non-Classmate DS Problems
ayeckley
06-02-2010, 21:41
Is anyone besides me having problems when running the Driver Station application on a PC (in this case, a laptop) other than the Classmate? The problem is that the cRIO and DS application seem to lose their connection after a random amount of time (anywhere from 10 to 30 seconds). We're not seeing this issue when using the actual Classmate. It would be helpful to be able to leave the Classmate at the school while doing off-site development work using our other cRIO.
Some details:
The problem occurs in both the Disabled and the Teleop DS states.
Sometimes the Communication Link goes down first followed by the Robot Code indicator, other times the reverse sequence. The link will automatically re-establish itself after about 10 seconds, and then go down again after 10 to 30 seconds, the cycle repeating itself ad nauseum.
The problem occurs with the same frequency using our code as well as the default 2010 code, so it's safe to assume its not a programming error on our part.
There are no errors reported in the DS Diagnostic window except those associated with the camera (since it's not connected) and the known-issue with Watchdog errors in that same window. There's no correlation between the watchdog error and the loss of communication (at least as indicated in the Diagnostic window).
The loss-of-connection occurs at the same frequency with the dashboard panel open or closed.
The problem occurs with both deployed (non-volatile startup) and non-deployed (volatile) code.
The cRIO is directly tethered to the PC using a straight cable (not crossover) to eliminate the router/WGA configuration as a possibile problem source.
The cRIO updates have been installed (FRCLabVIEWUpdate2.0for2010).
Driver Station Update #1 is installed.
The PC's wireless NIC is disabled.
The PC is running Windows XP. The same unit used to do a significant portion of the 2009 programming, and worked without issue.
The PC has LabVIEW installed since it is our primary development station this year, but the problem occurs regardless of whether or not the LabVIEW application is executed.
No other applications are running on the laptop, aside from the normal laptop services. Anti-virus is disabled as well.
I'm using an account with Administrator priviledges.
The PC assumes the proper 10.22.52.5 IP address when running the DS application. We use 10.22.52.6 when using the same laptop simultaneously with the Classmate.
The link is a 100BaseT, and there's no indication that the TCP/IP link between the two devices is getting broken.
This same PC is able to act as the development workstation with no issues when the Classmate is used to run the DS application.
Based on the number of questions/issues I'm reading about this year's DS here on CD, it looks like the 2010 control system architecture might be a step backward from last year's Blue Box, despite its ESD-sensitivity :rolleyes:
Is anyone besides me having problems when running the Driver Station application on a PC (in this case, a laptop) other than the Classmate? The problem is that the cRIO and DS application seem to lose their connection after a random amount of time (anywhere from 10 to 30 seconds). We're not seeing this issue when using the actual Classmate. It would be helpful to be able to leave the Classmate at the school while doing off-site development work using our other cRIO.
I use it on a non-classmate all the time with no issues.
Have you tried looking at the cRIO console output to determine if the cRIO is rebooting or if it is just a networking or application drop out? Is it possible that your program is crashing? What language are you programming in?
What is the processor utilization of the PC you are running on? Is it saturated?
Based on the number of questions/issues I'm reading about this year's DS here on CD, it looks like the 2010 control system architecture might be a step backward from last year's Blue Box, despite its ESD-sensitivity :rolleyes:
It seems that most of the issues with the driver station are just getting the update installed. At least you can try again. Perhaps you don't recall that last year if a firmware update failed, the device was bricked and had to be sent in for recovery. How can you possibly step backward from that?
Greg McKaskle
07-02-2010, 07:31
It may be useful to connect a serial cable and see if anything on the console corresponds to the drops in communications.
As for the different LED patterns. Here is what they mean.
The communications LED is an overall litmus test indicating that the DS is receiving some number of valid response packets from the robot in a given second. If that is on, the no code LED means that the communications task is sending back packets, but nothing in the robot code is not providing updates to it.
The other lights under communications are approximately ping results. The first should mirror your link light for the DS NIC. The FMS indicates the FMS has sent control packets, and the others are a successful ping of that device.
So, if the Robot LED stays green, it pretty much rules out the low level networking stuff. It means that the DS can successfully ping the robot every few seconds. If it is going on/off, check to see that the cRIO is not rebooting. Ten to fifteen seconds is all it takes. You can watch for the yellow LED on the cRIO, or watch the serial console.
Once again, if you don't figure it out soon, I'd be interested in the output of the serial console.
Also, you say this works successfully on the laptop DS. What about another PC? If this were being caused by the DS, it would more likely show up as an unresponsive screen or as system watchdogs.
Greg McKaskle
ayeckley
07-02-2010, 09:39
What is the processor utilization of the PC you are running on? Is it saturated?
That seems to be the problem. Driver Station.exe is periodically jumping (and holding) to 100% utilization of one CPU (of two), even with the cRIO disconnected. The processor is an Intel Core 2 T7250 running at 2.0GHz with 1.0GB of RAM. There doesn't appear to be much if any load-sharing between the two processors. I'll try a "lower power" PC with a single CPU (if I can find one) and see if that improves things.
It seems that most of the issues with the driver station are just getting the update installed. At least you can try again. Perhaps you don't recall that last year if a firmware update failed, the device was bricked and had to be sent in for recovery. How can you possibly step backward from that?
I don't mean to throw stones. I know everyone has the best of intentions. I don't have a whole lot of sympathy for users who are having problems because they didn't read the manuals. But I do have a lot of sympathy/empathy for folks who are reading the manuals, who are installing the updates, who are reading the forums, who are using the proper help channels and yet are still having problems. This year's bugs will no doubt get worked out; let's just hope that a new batch of bugs aren't introduced again next year while attempting to add functionality.
As always, thanks for your help. I'm also going to try the console to see if it yields any new info, but at this point the theory I'm going to test is that its an issue caused by the CPU design of this laptop.
ayeckley
07-02-2010, 10:49
Update:
Since we are snowed-out from the school today, the most productive thing I could do was to connect to the cRIO RS-232 console and look for error messages. I've attached the dialog below per Greg's suggestion. I'm not seeing anything that would suggest any problems on the cRIO side (as mostly expected):
cRIO-907x Boot
Copyright 2007 National Instruments Corp.
Bootrom version: 2.4.7
Creation date: Jul 9 2007, 00:45:07
Press any key to stop auto-boot...
1
0
auto-booting...
boot device : tffs=0,0
unit number : 0
processor number : 0
host name : host
file name : /c/ni-rt/system/vxWorks
flags (f) : 0x8
Mounting tffs...
Attaching to TFFS... Datalight Reliance v3.00.1218T
VxWorks Edition for ppc603
Copyright (c) 2003-2007 Datalight, Inc.
done.
3572144
Starting at 0x100000...
ATA device not detected.
Adding 8459 symbols for standalone.
Mounting onboard storage...
Reliance File System Driver
Datalight Reliance v3.2.2 Build 1376BV
VxWorks Edition for ppc603
Copyright (c) 2003-2008 Datalight, Inc. All Rights Reserved Worldwide.
-> lvusEngine: PPC603 CPU detected...
CPU tick frequency: 33.000652 MHz [Using: 1000 MHz]
MAX system identification name: FRC-cRIO-2252
Initializing network...
Device 1 - MAC addr: 00:80:2F:11:42:32 - 10.22.52.2 /8 (primary - static)
Device 2 - MAC addr: 00:80:2F:11:42:33 - 192.168.0.3 /24 (static)
Loading LVRT...
* Loading EarlyStartupLibraries: tsengine
Time sync source: rtc now active
* Loading StartupDlls: NiRioRpc
* Loading StartupDlls: niorbs
* Loading StartupDlls: NiViSrvr
* Loading StartupDlls: nivissvc
NI-RIO Server 3.2 started successfully.
* Loading StartupDlls: nivision
* Loading StartupDlls: niserial
* Loading StartupDlls: FRC_FPGA
* Loading StartupDlls: FRC_NetworkCommunication
FRC_NetworkCommunication was compiled from SVN revision 2064
FPGA Hardware GUID: 0xAD9A5591CC64E4DF756D77D1B57A549E
FPGA Software GUID: 0xAD9A5591CC64E4DF756D77D1B57A549E
FPGA Hardware Version: 2010
FPGA Software Version: 2010
FPGA Hardware Revision: 1.3.0
FPGA Software Revision: 1.3.0
Welcome to LabVIEW Real-Time 8.6.1f2
NI-VISA Server 4.5 started successfully.
[At this point the DS will start losing its connection, with no futher messages generated by the cRIO on the console.]
If the system watchdog were timing out, would we be getting the same "hard" reboot dialog from the console? Or is it some sort of soft-reboot?
Greg McKaskle
07-02-2010, 11:16
The console looks fine to me too.
The system watchdogs and even user watchdogs will primarily show up in the diagnostics screen with a count of how many have been tripped. This finally lets us see even small glitches where the WD would be fed shortly after a timeout.
If you have any development tools on the computer, you may have a tool called perfmon (performance monitor). It will let you view CPU usage and tons of other performance related issues on a single process. It might be interesting to see what it looked like in normal and broken.
Also, there will shortly be an update to the DS which lowers the CPU need for most of the screens and ups the determinism of the control loop by getting rid of context switches within the control loop. If you are interested in testing this out, let me know email contact info via PM.
Another test you could do would be to go to the DS task in the task manager, right click and experiment with changes to the processor assignment and the priority.
Also, while there it might be interesting to see if a virus scan or some other task was responsible for the spike in CPU that seems to be starving the DS.
Greg McKaskle
Tom Line
07-02-2010, 13:44
How many startup processes do you have running on the PC? Is this a PC that kids have access to on a regular basis: i.e. has it been loaded full of non-robotics related garbage that is slowing it down?
I only ask because We've used a 1.4 ghz celeron Dell Lattitude 510 with 512 of ram (now a 1.6 pentium M Lattitude 600) with absolutely no issues performance wise. The first thing I did was to wipe the computer to clean off all the fun programs (limewire, etc etc etc) that tend to get installed.
Radical Pi
07-02-2010, 16:52
You talked about using a 2nd cRIO you have with that laptop in the first post. Does the same thing happen when you switch cRIOs?
ayeckley
07-02-2010, 22:58
Is this a PC that kids have access to on a regular basis: i.e. has it been loaded full of non-robotics related garbage that is slowing it down?
It's my work laptop, so it's full of work-related non-robotics garbage :)
For what it's worth, the CPU overhead is about 4% before I open the DS application, so I suspect that it's not other applications running in the background that are causing the problem. I have noticed in other situations (my desktop PCs, for example) that when Windows Update is running there is no evidence. In other words, I'll be like "gosh this PC is running slowly" and then the PC will be like "but the CPU is only loaded to 4%" and then I'll be like "what about network activity?" and the PC will be like "no, I'm not doing anything" and I'll be like "huh." and the the PC will be like "oh, I've just downloaded an update, would you like to install it now?" and then I'm like "dangit!". [To use the parlance of the day.] I've got WU turned off on this laptop for exactly this reason.
It's my work laptop, so it's full of work-related non-robotics garbage :)
For what it's worth, the CPU overhead is about 4% before I open the DS application, so I suspect that it's not other applications running in the background that are causing the problem. I have noticed in other situations (my desktop PCs, for example) that when Windows Update is running there is no evidence. In other words, I'll be like "gosh this PC is running slowly" and then the PC will be like "but the CPU is only loaded to 4%" and then I'll be like "what about network activity?" and the PC will be like "no, I'm not doing anything" and I'll be like "huh." and the the PC will be like "oh, I've just downloaded an update, would you like to install it now?" and then I'm like "dangit!". [To use the parlance of the day.] I've got WU turned off on this laptop for exactly this reason.
Do you see any difference in processor utilization based on which tab on the DS is selected?
That version of the DS updates some UI components more often than it should... I'm not sure of the implementation of those updates, but it's possible that the machine's graphics hardware (or lack there of) could be causing the big performance decrease. Just a thought.
I have seen these exact same symptoms, ( No Code, Watchdog Timeout, No Communications cycling every few seconds) using the Classmate DS after it has been suspended and restored. I tried exiting the software and suspending from the Windows login screen and the same thing happened.
Restarting Windows without rebooting the Crio restored normal operation. I'll try to re-create the situation and look at processor loading later.
This could easily happen before a match but there is no indication of a problem until it's too late to reboot.
Hopefully the update will help.
Greg McKaskle
09-02-2010, 08:25
I have seen these exact same symptoms, ( No
Restarting Windows without rebooting the Crio restored normal operation. I'll try to re-create the situation and look at processor loading later.
Hopefully the update will help.
Can you verify the version of the DS you are seeing this issue on? It is on the Diagnostics tab.
Greg McKaskle
vBulletin® v3.6.4, Copyright ©2000-2017, Jelsoft Enterprises Ltd.