Log in

View Full Version : Hundreds of "Watchdog Expiration: System X" Errors


lollypop2020
02-02-2010, 19:52
Just a few days ago, our cRIO started giving tons of "Watchdog Expiration: System X, User 0" whenever we enable the robot (it doesn't happen when disabled.) It averages out of about 4 or 5 additional system watchdog expirations per second. If left on long enough, the counter gets up to the thousands. I have tried everything I can think of to fix it, I've re imaged and re-updated everything (I'm 100% positive all the updates have worked), and tried using unchanged new project code. It isn't the wireless(I've tried it with an Ethernet cable directly from the DS to the cRIO) Does anyone know what this is caused by or how to fix it? The robot still works, we can use the camera and motors, but it is slightly noticeable that the jaguars stop communicating very quickly and then start again.

Tom Line
02-02-2010, 20:56
Please use the search function. You will find that NI is already working on resolving this - it is not your fault. It is a bug in the current software.

Greg McKaskle
02-02-2010, 21:28
Can you describe the situations that provoke this? What processes are running? What tab is frontmost on the DS? Is the dashboard also visible? Is Vision displaying in the dashboard? What account was logged in?

Greg McKaskle

lollypop2020
02-02-2010, 21:53
Please use the search function. You will find that NI is already working on resolving this - it is not your fault. It is a bug in the current software.

I did search first, but most of the threads I found were for people getting a few expirations here and there, not the hundreds I am seeing.

Can you describe the situations that provoke this? What processes are running? What tab is frontmost on the DS? Is the dashboard also visible? Is Vision displaying in the dashboard? What account was logged in?

Greg McKaskle

It happens whenever the robot is enabled. I even tried taking all code out of the teleop enabled state, but the expirations still happen. I don't have the DS with me at the moment, but I can find out more about what processes are running tomorrow afternoon. The DS and Dashboard work fine, as does vision. I honestly wouldn't have known anything was going on were it not for the hundreds of system watchdog expiration errors in the diagnostic tab and the periodic flickering of the RSL and Jaguar LEDs. I was logged into the driver account only.

BLProgram2010
03-02-2010, 20:38
We're having the exact same problem, and every time we enable the robot, there is a 'watchdog not feed' message and the robot won't respond to the joysticks.:confused: Please help!!!!!

Greg McKaskle
03-02-2010, 21:04
Watchdog not fed messages will normally indicate that the code is not feeding the watchdog. This could be because there is a loop or slow operation in the Teleop or Autonomous Iterative method.

Does this happen with teleop, auto, both?

Greg McKaskle

lollypop2020
04-02-2010, 00:34
Watchdog not fed messages will normally indicate that the code is not feeding the watchdog. This could be because there is a loop or slow operation in the Teleop or Autonomous Iterative method.

Does this happen with teleop, auto, both?

Greg McKaskle

I used the totally unchanged new project code that was working perfectly a few days ago. I also tried taking all the code out of the teleop method, but the system watchdog still expired rapidly. This happened in both auto and teleop, but only when enabled.

I called NI today, and spoke to someone for quite a bit who was unable to figure out what was wrong. We went through all the troubleshooting steps, but she said that it was beyond her knowledge and that unless I talked to R&D, there was nothing I could do unless the upcoming update fixes it.

Tom Line
04-02-2010, 02:57
FYI - we are also seeing this on a heavily modified project. Literally hundreds.

BLProgram2010
04-02-2010, 16:47
I'm having the exact same problem, Do you know if there is anyway to fix it?

Tom Line
05-02-2010, 00:46
NI is looking into it. It's a bug, but so far it hasn't impacted how our robot actually functions. It's just annoying seeing error messages when we don't think we have errors :D

lollypop2020
06-02-2010, 23:51
I tried installing the DS and Dashboard software on a desktop computer to see if that would improve anything, amazingly enough our system watchdog expirations completely stopped happening. It's good that we have something to work with until the next update, but still rather frustrating that the classmate doesn't work.

With the classmate, I decided to completely start over from scratch. I got out the recovery flash drive, and completely reinstalled windows. I installed the labview update and installed/uninstalled the DS update until it worked. Even after doing all that, re-imaging the cRIO and still using a completely 100% unchanged project, we are still getting the hundreds of errors. It makes me inclined to think that something is actually wrong with our classmate, but I'm not sure.

Greg McKaskle
07-02-2010, 07:49
Is it possible that you are fast user switching and running the DS in both driver and developer accounts?

If not, let me sound like an optometrist for a minute to see if we can discover what is causing this.

Log into the classmate DS as developer. Note that means booting with shift key down or exiting the DS in driver first. Run the driver station from the Start menu. Open the task manager from a right click on the task bar.

If the system watchdogs are occurring, what is the CPU load? Does this get better or worse if you plug in the stop button? Does it get better or worse on a particular DS tab? If you click the button on the dashboard video to halt the video, better or worse? Go ahead and close the DB, better or worse.

If one of these makes the situation significantly different, please post the results.

Greg McKaskle

Tom Line
07-02-2010, 11:54
We'll check that tomorrow, and also check it on another computer. We'll compare classmate driverstation account, developer account, and then that other computer, and tell if you we get any significant differences.

lollypop2020
07-02-2010, 14:46
Is it possible that you are fast user switching and running the DS in both driver and developer accounts?

If not, let me sound like an optometrist for a minute to see if we can discover what is causing this.

Log into the classmate DS as developer. Note that means booting with shift key down or exiting the DS in driver first. Run the driver station from the Start menu. Open the task manager from a right click on the task bar.

If the system watchdogs are occurring, what is the CPU load? Does this get better or worse if you plug in the stop button? Does it get better or worse on a particular DS tab? If you click the button on the dashboard video to halt the video, better or worse? Go ahead and close the DB, better or worse.

If one of these makes the situation significantly different, please post the results.

Greg McKaskle

I am very sure that I haven't been fast user switching, I only ever use one account at a time and make sure the other one is logged off.

When I run the DS and Dashboard in developer, interestingly enough, I get significantly less system watchdog expirations, but still more then if I use a completely different computer. It goes down to about 3 or 4 every 5 seconds rather then the 5-6 per second I was getting when in the driver account.

In task manager, when the DS is running the CPU is always at 100%. The DS usually uses 50-60% the Dashboard uses 10-20% and something called mscorsvw.exe uses whatever is left. Occasionally another process will use a bit of CPU, but nothing unusually high.

Amazingly enough, after a few minutes, mscorsvw.exe went away and the watchdog expirations almost completely stopped. I still get one when switching tabs or windows, but that is much more acceptable then what I was getting before. However, this was with no vision and a e-stop button plugged in. When I turned vision back on, and used the override for the stop button, the number of errors went way back up.

Greg McKaskle
07-02-2010, 22:47
That process is something related to Microsoft .NET compilation. I'm not sure why it is acting the way it is, but perhaps you can google it and figure out what to install or reconfigure.

Edit: I found a number of articles that indicate this is usually caused by a visual studio install. There were several approaches that seemed to correct it including this one.

Kusok wrote:
Well, I don't know why do you do all this insane stuff to stop the process. You can just go to Control Panel/Administrative tools/services and stop the optimization service(which is the first in the list) and set it to manual start.

Greg McKaskle

brianc217
10-02-2010, 18:47
That process is something related to Microsoft .NET compilation. I'm not sure why it is acting the way it is, but perhaps you can google it and figure out what to install or reconfigure.

Edit: I found a number of articles that indicate this is usually caused by a visual studio install. There were several approaches that seemed to correct it including this one.



Greg McKaskle

Does anyone know if this solution works?

Greg McKaskle
11-02-2010, 02:13
The service is easy to put back to auto if this doesn't improve things. I don't have the problem, so I'm just passing on the results of a search, hoping that it will be helpful. I'd also like confirmation.

Greg McKaksle

lollypop2020
15-02-2010, 12:17
The most recent update to labview and the DS happened to fix our problem. We barely get any watchdog expirations now. Thanks!

kavisiegel
16-02-2010, 11:02
I am having the same exact issue. Even after the DS and LV updates, and re-imaging the cRIO with the update as well.

We've been on this for HOURS, maybe 8 in total, because our compressor kept cutting on and off every time we got an error. We blew a fuse and deemed the whole thing unhealthy for the robot. We were getting about an average of 1.3 per second.

We made the odd discovery that it stops when the sidecar is unplugged.. but you can't do much without a sidecar, but if that helps the NI guys pinpoint the issue...

Greg McKaskle
17-02-2010, 05:55
Since the issues we knew of and understood were corrected in the latest update, please give a thorough description of any error messages, DS indicators, and error displayed in the diagnostics tab.

Greg McKaksle

Bongle
18-02-2010, 08:07
We were getting brief Watchdog errors last night. With the robot sitting, doing nothing, we'd get spikes and Jaguars flicking to neutral.

We did some packet-sniffing, and it looks like each "spell" of no signal corresponds to the classmate taking too long to send a packet.

Typically, packets are sent from the classmate and from the robot at 50hz. During each spell of watchdog, we saw that the robot would continue sending packets at a steady rate (once every 20ms), but the classmate would hiccup. In the 3 instances we recorded, the classmate's hiccup packet was sent 100, 150, and 200ms since the one that preceded it, rather than the 20ms we expect.

Here's our setup:
ClassMate:
-Wired to router
-Running just the "bottom half" of the driver station (with the diagnostics, setup, IO, etc tabs). The camera feed and other user dashboard code was running on a different PC to minimize load on the classmate.
-This was observed in both developer and driver accounts
-Running most recent driver station

Robot:
-Communicating wirelessly with router about 6ft away.
-Battery was reading 12.6V
-No motors running
-Main watchdog-feeding thread is in a fast loop that just feeds the watchdog. No camera image analysis, no long computations, just checking joystick values and setting motors
-Two PID loops were running. One at 1000hz, one at 100hz, though the 1000hz one was probably Disable()-d at the time.
-PCVideoServer was running on its separate thread, sending data.
-Running the most recent WPILib build (from early feb)
-Running v20 cRio for C++

Last night the robot was fairly well behaved. All night we probably only saw about 10 of these hiccups all night, but on Monday night it was awful, and on Tuesday night it was apparently equally bad.

Hope this helps, and hope I'm not duplicating a post by a 2702 comrade in another thread.

Racer26
18-02-2010, 09:52
We were getting brief Watchdog errors last night. With the robot sitting, doing nothing, we'd get spikes and Jaguars flicking to neutral.

We did some packet-sniffing, and it looks like each "spell" of no signal corresponds to the classmate taking too long to send a packet.

Typically, packets are sent from the classmate and from the robot at 50hz. During each spell of watchdog, we saw that the robot would continue sending packets at a steady rate (once every 20ms), but the classmate would hiccup. In the 3 instances we recorded, the classmate's hiccup packet was sent 100, 150, and 200ms since the one that preceded it, rather than the 20ms we expect.

Here's our setup:
ClassMate:
-Wired to router
-Running just the "bottom half" of the driver station (with the diagnostics, setup, IO, etc tabs). The camera feed and other user dashboard code was running on a different PC to minimize load on the classmate.
-This was observed in both developer and driver accounts
-Running most recent driver station

Robot:
-Communicating wirelessly with router about 6ft away.
-Battery was reading 12.6V
-No motors running
-Main watchdog-feeding thread is in a fast loop that just feeds the watchdog. No camera image analysis, no long computations, just checking joystick values and setting motors
-Two PID loops were running. One at 1000hz, one at 100hz, though the 1000hz one was probably Disable()-d at the time.
-PCVideoServer was running on its separate thread, sending data.
-Running the most recent WPILib build (from early feb)
-Running v20 cRio for C++

Last night the robot was fairly well behaved. All night we probably only saw about 10 of these hiccups all night, but on Monday night it was awful, and on Tuesday night it was apparently equally bad.

Hope this helps, and hope I'm not duplicating a post by a 2702 comrade in another thread.

This seems alot like what we experienced last night. Our compressor would flick off while not charged, and the solenoids would all get put back to their "off" state, which caused things to move when nobody was touching the controls... also, the top half of the ds (the dashboard part) isnt doing anything. I assume we have to put some sort of call to WPILib for it to start working in our C++ code?