Watchdog!?!?!?!

Ok, so about 3-4 times during the match at Traverse City, MI we got this error on the driver station computer (the little blue box) Usually is says Disabled or Enabled, but those few times it said WATCHDOG. WHAT THE HECK IS WATCHDOG!?!?!?! We had to sit there the whole match til we learned how to do the remote reset from the computer during the match the last time it happened. So anyone have ANY info on this? I have no idea what it is, our programer doesnt either. help please! thanks

I am not an expert on the watchdog, but what I do know is that you have to feed it, and feed it constantly. If the watchdog is not fed, the system is interrupted and you can’t do anything. This is meant to observe a never-ending while loop and “hangs”

I am probably half wrong, so correct me. =] This is my learning experience, too.

Nope, you got it right.

As keehan indicated, The watchdog is a safety feature that fail-safes the robot by shutting down all actuators when it isn’t happy. Keep it happy by feeding it at regular intervals. This protects you when code hangs.

One common watchdog source is putting slow camera code in your fast drive code.

I have seen a similar issue on Team 2022’s robot where their auton would cause the watchdog message. This was caused by their labview implemenation to tight-loop which prevented the watchdog from being fed. Basically your program should “feed” the watchdog periodically so that if your program does something bad and “hangs” it will fail to feed the watchdog thus causing a timeout which will halt your motors. Are you using labview or windriver?

We had several code issues at GSR and it didn’t help that we traveled without a programmer. Our autonomous mode prevented the watchdog from getting fed, which basically resulted in our robot being disabled the rest of the match. It took us forever ti find the error since we had no programmer so we were stationary for 6 out of our 8 matches…

We had the same problem after getting our replacement drivers station in KC. The robot was working one second then no the next. :frowning: Our head programmer was able to fix it, eventually, however - Its seemed like as soon as we realized this was an issue it wasnt the worst thing to fix it was just stunk that it had to flip out before we realized. Even though it counted against us still, the FIRSt ppl were helpful in explaining the problem :slight_smile:

Using printf (in C++ or any other print or file writing call) too frequently also causes the watchdog not to be fed in time. I suggest that any downloaded competition code have only a few infrequent print statements if any

I think the Watchdog has to be fed about every 100 ms, otherwise it will timeout.

ok just to be proactive. ( I have not run in to this problem with our own robot or any other that Ive messed with, but then Murphy’s law n such.)

if you take out all the delay and feeds in auto mode. and replace them with your own way of delaying, AND place a parallel 100ms feed loop, everything will be OK?

I’m not the programmer, and neither is the original poster, so please bear with our ignorance. Our programming mentor has about 2 months worth of experience with LabView.

When you all describe “feeding the watchdog”, what actually does that? From your description, I assume that its some built-in, behind the scenes, function of the code that just happens without any special “instruction” from our programmer. And if the code is too busy or gets stuck in a “while” loop, it doesn’t feed the watchdog.

What would make that happen only intermittently? It seemed to happen to us at one particular setup location. It isn’t camera related, because we don’t use it. We use 4 CIM’s/4 Jaguars, a gyro, two limity switches, and 3 microswitches for setting autonomous patterns. Two of the match failures occurred at the start of the match, resulting in no autonomous motion, and nothing after it either (until we learned how to reset the cRio from the DS (thanks 494)). One of our failures occurred after a successful autonomous. I don’t know for sure if we got the “watchdog” message that time, but one side of the drive train (Jaguar/CIM) didn’t work after autonomous. After the match, we found no problems and the problem didn’t repeat.

We would like to think it was field/system related, but we really don’t know. If its a robot problem (short, programming issue, loose connection, improper connection, etc.) we would really like to know. Intermittent, random, stuff is scary and frustrating and we would like to do whatever we can to fix it (if is something under our own control).

Is there a credible scenario where static electricity could be involved? Doesn’t seem like it to me since it happened at the start of the match. What about startup sequencing? I’ve read where some teams had communication problems when they powered up the robot before the DS. We generally powered up the DS first, but probably not always. I wouldn’t think that would necessarily have anything to do with watchdog.

We generally like the new control system and LabView, but are anxious to find out where the bugs are hiding out and get rid of them.

Thanks for all your help so far in this thread.

I don’t do programming, but I am pretty sure that when Simbotics, ThunderChickens, and Bomb Squad helped us, they found that we had our code in an infinite loop among other problems.

I know that for two qualifying matches, we would put autonomous in and then have no control of drivebase for the rest of the match. However, when we tried to recreate this in the pit with a tether, we couldn’t.

The exact same thing happened to us, except it took us 6 matches to fix it. One of the mentors from 1831 (Chris) found an infinite loop in our drive code. When we took that out it worked.

Did your drive shut down before or after it moved in autonomous? Did the failures occur in your first two matches or was it intermittent? Did the autonomous work in testing before you hooked up to the field system? Did you get the “watchdog” message?

It doesn’t make sense to me that code would generate intermittent hang ups. Our system worked fine before the failure, and then again after rebooting. That implies something outside the code is influencing the system.

As far as I know it is not possible for the field to cause a Watchdog error.
EDIT: It sounds like, based on the document that StephenB linked to that communication problems could potentially cause the system Watchdog to time out.

Depending on the structure of your code it may be possible that one specific case somewhere or one value for a sensor or variable causes a hang that times out the User Watchdog. The robot gets cycled through a specific set of modes on the competition field that may not be the same as what is happening when you are testing on the practice field.

There has been quite a bit of misinformation posted so far about this topic so I thought I’d try to clear it up: http://decibel.ni.com/content/docs/DOC-2957

Main thing is, there are two watchdogs. One you shouldn’t ever worry about, and the other you should only worry about if you want to (it is optional and configurable)

Check out the doc, let me know if I can clear anything up.

We only experienced problems when on the field. Tethered in the pits we had autonomous, on the field we had not autonomous, and had no teleoperated except our manipulator. Thank you Simbotics for finding our problem.

Don’t worry, I was the main programmer and that’s about what I had, too.

Yes. There is a WPI library block which gets input a boolean(?) which I think is tied to some low level vxWorks call. When vxWorks doesn’t see it, then it probably stops the .rtexe and decides to tell the DS that the Watchdog has sadly starved to death. (figuratively)

I highly doubt it is anything but the code…

This, I highly doubt, especially if it was only your robot with this problem.

What I think is possible is that your battery was low (6~7 volts) and cRio had problems. This “symptom” would be different, so I doubt it.

This would be more of an issue with “No Comms” error message, which is worse.

If you compete in another regional, maybe you can upload your code and we can look it over for any unending loops?

Good luck!
Keehun
Team 2502

The watchdogs are actually implemented on the FPGA. So when the cRIO gets a new TCP packet it feeds the FPGA system watchdog. And if you enable the user watchdog and call the feed subVI… you are feeding the FPGA section for that watchdog. Again, this is explained in more detail at: Archived: [FRC 2014] An Overview of FRC Watchdogs - NI Community

I highly doubt it is anything but the code…
This, I highly doubt, especially if it was only your robot with this problem.

Yes exactly. If you say… set up your user watchdog to have a 0.5s time out… and you only feed the watchdog every 1s. Then your motors will cut out for half of every second.

The .rtexe doesn’t get stopped. Your code keeps running just like before, but the FPGA gets set to a fail safe state, where no outputs are usable. You can still read in anything and run like normal, but when either watchdog is tripped no outputs are available.

This happened to us(team 4) about 3 years ago. Our lead programmer was using interrupts while trying to program the gear tooth sensor we had on our robot. This did not work out to well because the code had to finish within 23.6 milliseconds(if i remember correctly), but due to the interrupts it made the code exceed that time limit. This is what tripped the watchdog and caused a code error. You might wanna check any of your code that uses interrupts and make sure that no interrupts are interrupting interrupts(boy does that sound confusing). Hope i helped in any way.

We did not experience the problem when tethered or when working with our own wireless system. Once connected to the field management system we experienced the same problem (“WATCHDOG” appearing in the Driver Station).

Check the wait timers in all your loops. Make sure you have wait timers in all loops!
Don’t make them ALL the same value.
Feed the watchdog into the fastest loop.