Chief Delphi

Chief Delphi (http://www.chiefdelphi.com/forums/index.php)
-   General Forum (http://www.chiefdelphi.com/forums/forumdisplay.php?f=16)
-   -   Watchdog!?!?!?! (http://www.chiefdelphi.com/forums/showthread.php?t=75361)

NC GEARS 01-03-2009 22:53

Watchdog!?!?!?!
 
Ok, so about 3-4 times during the match at Traverse City, MI we got this error on the driver station computer (the little blue box) Usually is says Disabled or Enabled, but those few times it said WATCHDOG. WHAT THE HECK IS WATCHDOG!?!?!?! We had to sit there the whole match til we learned how to do the remote reset from the computer during the match the last time it happened. So anyone have ANY info on this? I have no idea what it is, our programer doesnt either. help please! thanks

keehun 01-03-2009 22:57

Re: Watchdog!?!?!?!
 
I am not an expert on the watchdog, but what I do know is that you have to feed it, and feed it constantly. If the watchdog is not fed, the system is interrupted and you can't do anything. This is meant to observe a never-ending while loop and "hangs"

I am probably half wrong, so correct me. =] This is my learning experience, too.

EricVanWyk 01-03-2009 23:02

Re: Watchdog!?!?!?!
 
Quote:

Originally Posted by keehun (Post 829955)
I am probably half wrong, so correct me. =] This is my learning experience, too.

Nope, you got it right.

As keehan indicated, The watchdog is a safety feature that fail-safes the robot by shutting down all actuators when it isn't happy. Keep it happy by feeding it at regular intervals. This protects you when code hangs.

One common watchdog source is putting slow camera code in your fast drive code.

ozrien 01-03-2009 23:04

Re: Watchdog!?!?!?!
 
I have seen a similar issue on Team 2022's robot where their auton would cause the watchdog message. This was caused by their labview implemenation to tight-loop which prevented the watchdog from being fed. Basically your program should "feed" the watchdog periodically so that if your program does something bad and "hangs" it will fail to feed the watchdog thus causing a timeout which will halt your motors. Are you using labview or windriver?

NoahTheBoa 01-03-2009 23:12

Re: Watchdog!?!?!?!
 
Quote:

Originally Posted by ozrien (Post 829964)
I have seen a similar issue on Team 2022's robot where their auton would cause the watchdog message. This was caused by their labview implemenation to tight-loop which prevented the watchdog from being fed. Basically your program should "feed" the watchdog periodically so that if your program does something bad and "hangs" it will fail to feed the watchdog thus causing a timeout which will halt your motors. Are you using labview or windriver?

We had several code issues at GSR and it didn't help that we traveled without a programmer. Our autonomous mode prevented the watchdog from getting fed, which basically resulted in our robot being disabled the rest of the match. It took us forever ti find the error since we had no programmer so we were stationary for 6 out of our 8 matches...

Hanna2325 01-03-2009 23:13

Re: Watchdog!?!?!?!
 
We had the same problem after getting our replacement drivers station in KC. The robot was working one second then no the next. :( Our head programmer was able to fix it, eventually, however - Its seemed like as soon as we realized this was an issue it wasnt the worst thing to fix it was just stunk that it had to flip out before we realized. Even though it counted against us still, the FIRSt ppl were helpful in explaining the problem :)

The Lucas 01-03-2009 23:18

Re: Watchdog!?!?!?!
 
Using printf (in C++ or any other print or file writing call) too frequently also causes the watchdog not to be fed in time. I suggest that any downloaded competition code have only a few infrequent print statements if any

Caroline2399 01-03-2009 23:24

Re: Watchdog!?!?!?!
 
I think the Watchdog has to be fed about every 100 ms, otherwise it will timeout.

Stuart 02-03-2009 00:00

Re: Watchdog!?!?!?!
 
ok just to be proactive. ( I have not run in to this problem with our own robot or any other that Ive messed with, but then Murphy's law n such.)

if you take out all the delay and feeds in auto mode. and replace them with your own way of delaying, AND place a parallel 100ms feed loop, everything will be OK?

Wayne TenBrink 02-03-2009 00:39

Re: Watchdog!?!?!?!
 
I'm not the programmer, and neither is the original poster, so please bear with our ignorance. Our programming mentor has about 2 months worth of experience with LabView.

When you all describe "feeding the watchdog", what actually does that? From your description, I assume that its some built-in, behind the scenes, function of the code that just happens without any special "instruction" from our programmer. And if the code is too busy or gets stuck in a "while" loop, it doesn't feed the watchdog.

What would make that happen only intermittently? It seemed to happen to us at one particular setup location. It isn't camera related, because we don't use it. We use 4 CIM's/4 Jaguars, a gyro, two limity switches, and 3 microswitches for setting autonomous patterns. Two of the match failures occurred at the start of the match, resulting in no autonomous motion, and nothing after it either (until we learned how to reset the cRio from the DS (thanks 494)). One of our failures occurred after a successful autonomous. I don't know for sure if we got the "watchdog" message that time, but one side of the drive train (Jaguar/CIM) didn't work after autonomous. After the match, we found no problems and the problem didn't repeat.

We would like to think it was field/system related, but we really don't know. If its a robot problem (short, programming issue, loose connection, improper connection, etc.) we would really like to know. Intermittent, random, stuff is scary and frustrating and we would like to do whatever we can to fix it (if is something under our own control).

Is there a credible scenario where static electricity could be involved? Doesn't seem like it to me since it happened at the start of the match. What about startup sequencing? I've read where some teams had communication problems when they powered up the robot before the DS. We generally powered up the DS first, but probably not always. I wouldn't think that would necessarily have anything to do with watchdog.

We generally like the new control system and LabView, but are anxious to find out where the bugs are hiding out and get rid of them.

Thanks for all your help so far in this thread.

big1boom 02-03-2009 00:44

Re: Watchdog!?!?!?!
 
I don't do programming, but I am pretty sure that when Simbotics, ThunderChickens, and Bomb Squad helped us, they found that we had our code in an infinite loop among other problems.

I know that for two qualifying matches, we would put autonomous in and then have no control of drivebase for the rest of the match. However, when we tried to recreate this in the pit with a tether, we couldn't.

NoahTheBoa 02-03-2009 00:54

Re: Watchdog!?!?!?!
 
Quote:

Originally Posted by big1boom (Post 830027)
I don't do programming, but I am pretty sure that when Simbotics, ThunderChickens, and Bomb Squad helped us, they found that we had our code in an infinite loop among other problems.

I know that for two qualifying matches, we would put autonomous in and then have no control of drivebase for the rest of the match. However, when we tried to recreate this in the pit with a tether, we couldn't.

The exact same thing happened to us, except it took us 6 matches to fix it. One of the mentors from 1831 (Chris) found an infinite loop in our drive code. When we took that out it worked.

Wayne TenBrink 02-03-2009 00:57

Re: Watchdog!?!?!?!
 
Did your drive shut down before or after it moved in autonomous? Did the failures occur in your first two matches or was it intermittent? Did the autonomous work in testing before you hooked up to the field system? Did you get the "watchdog" message?

It doesn't make sense to me that code would generate intermittent hang ups. Our system worked fine before the failure, and then again after rebooting. That implies something outside the code is influencing the system.

Vikesrock 02-03-2009 01:01

Re: Watchdog!?!?!?!
 
Quote:

Originally Posted by Wayne TenBrink (Post 830035)
Did your drive shut down before or after it moved in autonomous? Did the failures occur in your first two matches or was it intermittent? Did the autonomous work in testing before you hooked up to the field system? Did you get the "watchdog" message?

It doesn't make sense to me that code would generate intermittent hang ups. Our system worked fine before the failure, and then again after rebooting. That implies something outside the code is influencing the system.

As far as I know it is not possible for the field to cause a Watchdog error.
EDIT: It sounds like, based on the document that StephenB linked to that communication problems could potentially cause the system Watchdog to time out.

Depending on the structure of your code it may be possible that one specific case somewhere or one value for a sensor or variable causes a hang that times out the User Watchdog. The robot gets cycled through a specific set of modes on the competition field that may not be the same as what is happening when you are testing on the practice field.

StephenB 02-03-2009 01:02

Re: Watchdog!?!?!?!
 
There has been quite a bit of misinformation posted so far about this topic so I thought I'd try to clear it up: http://decibel.ni.com/content/docs/DOC-2957

Main thing is, there are two watchdogs. One you shouldn't ever worry about, and the other you should only worry about if you want to (it is optional and configurable)

Check out the doc, let me know if I can clear anything up.

big1boom 02-03-2009 01:03

Re: Watchdog!?!?!?!
 
We only experienced problems when on the field. Tethered in the pits we had autonomous, on the field we had not autonomous, and had no teleoperated except our manipulator. Thank you Simbotics for finding our problem.

keehun 02-03-2009 01:10

Re: Watchdog!?!?!?!
 
Quote:

Originally Posted by Wayne TenBrink (Post 830024)
Our programming mentor has about 2 months worth of experience with LabView.

Don't worry, I was the main programmer and that's about what I had, too.

Quote:

Originally Posted by Wayne TenBrink (Post 830024)
When you all describe "feeding the watchdog", what actually does that? From your description, I assume that its some built-in, behind the scenes, function of the code that just happens without any special "instruction" from our programmer. And if the code is too busy or gets stuck in a "while" loop, it doesn't feed the watchdog.

Yes. There is a WPI library block which gets input a boolean(?) which I think is tied to some low level vxWorks call. When vxWorks doesn't see it, then it probably stops the .rtexe and decides to tell the DS that the Watchdog has sadly starved to death. (figuratively)

Quote:

Originally Posted by Wayne TenBrink (Post 830024)
What would make that happen only intermittently?

I highly doubt it is anything but the code..

Quote:

Originally Posted by Wayne TenBrink (Post 830024)
We would like to think it was field/system related, but we really don't know.

This, I highly doubt, especially if it was only your robot with this problem.

Quote:

Originally Posted by Wayne TenBrink (Post 830024)
Is there a credible scenario where static electricity could be involved?

What I think is possible is that your battery was low (6~7 volts) and cRio had problems. This "symptom" would be different, so I doubt it.

Quote:

Originally Posted by Wayne TenBrink (Post 830024)
What about startup sequencing? I've read where some teams had communication problems when they powered up the robot before the DS. We generally powered up the DS first, but probably not always. I wouldn't think that would necessarily have anything to do with watchdog.

This would be more of an issue with "No Comms" error message, which is worse.

Quote:

Originally Posted by Wayne TenBrink (Post 830024)
We generally like the new control system and LabView, but are anxious to find out where the bugs are hiding out and get rid of them.

If you compete in another regional, maybe you can upload your code and we can look it over for any unending loops?

Good luck!
Keehun
Team 2502

StephenB 02-03-2009 01:32

Re: Watchdog!?!?!?!
 
Quote:

Originally Posted by keehun (Post 830046)
Yes. There is a WPI library block which gets input a boolean(?) which I think is tied to some low level vxWorks call. When vxWorks doesn't see it, then it probably stops the .rtexe and decides to tell the DS that the Watchdog has sadly starved to death. (figuratively)

The watchdogs are actually implemented on the FPGA. So when the cRIO gets a new TCP packet it feeds the FPGA system watchdog. And if you enable the user watchdog and call the feed subVI... you are feeding the FPGA section for that watchdog. Again, this is explained in more detail at: http://decibel.ni.com/content/docs/DOC-2957


Quote:

I highly doubt it is anything but the code..
This, I highly doubt, especially if it was only your robot with this problem.
Yes exactly. If you say... set up your user watchdog to have a 0.5s time out... and you only feed the watchdog every 1s. Then your motors will cut out for half of every second.


The .rtexe doesn't get stopped. Your code keeps running just like before, but the FPGA gets set to a fail safe state, where no outputs are usable. You can still read in anything and run like normal, but when either watchdog is tripped no outputs are available.

Arthur S 02-03-2009 02:21

Re: Watchdog!?!?!?!
 
This happened to us(team 4) about 3 years ago. Our lead programmer was using interrupts while trying to program the gear tooth sensor we had on our robot. This did not work out to well because the code had to finish within 23.6 milliseconds(if i remember correctly), but due to the interrupts it made the code exceed that time limit. This is what tripped the watchdog and caused a code error. You might wanna check any of your code that uses interrupts and make sure that no interrupts are interrupting interrupts(boy does that sound confusing). Hope i helped in any way.

PinionTwister 02-03-2009 07:55

Re: Watchdog!?!?!?!
 
We did not experience the problem when tethered or when working with our own wireless system. Once connected to the field management system we experienced the same problem ("WATCHDOG" appearing in the Driver Station).

Check the wait timers in all your loops. Make sure you have wait timers in all loops!
Don't make them ALL the same value.
Feed the watchdog into the fastest loop.

professorX 02-03-2009 08:58

Re: Watchdog!?!?!?!
 
Quote:

Originally Posted by Wayne TenBrink (Post 830024)
...
Two of the match failures occurred at the start of the match, resulting in no autonomous motion, and nothing after it either (until we learned how to reset the cRio from the DS (thanks 494)). One of our failures occurred after a successful autonomous. I don't know for sure if we got the "watchdog" message that time, but one side of the drive train (Jaguar/CIM) didn't work after autonomous. After the match, we found no problems and the problem didn't repeat.

Can you please share with us how to reset the cRIO from the DS?

I appreciate it.

Tom Line 02-03-2009 09:12

Re: Watchdog!?!?!?!
 
To try to clarify this (and get rid of the cute "feed" term):

Feeding the watchdog is nothing more than letting the watchdog code run once in a while. This tells the system that the robot is still "under control".

A VERY easy way to not allow the watchdog code to run is to put your camera code in the same loop as the drive code. The camera code is processor hungry and will run as often as it can - and not allow the watchdog to run.

As other people have said - infinite loops, looping with no wait statement, and a couple other items can cause the watchdog to kick off. We struggled with it earlier when we put our camera code in our drive code as an experiment. It was a good learning experience!

EricVanWyk 02-03-2009 09:44

Re: Watchdog!?!?!?!
 
Quote:

Originally Posted by Stuart (Post 829993)
ok just to be proactive. ( I have not run in to this problem with our own robot or any other that Ive messed with, but then Murphy's law n such.)

if you take out all the delay and feeds in auto mode. and replace them with your own way of delaying, AND place a parallel 100ms feed loop, everything will be OK?

The parallel loop effectively kills the usefulness of the watchdog - even if your important code hangs, this artificial loop will keep it fed. At that point, just disable the dog and be done with it (I don't recommend this).

Wayne TenBrink 02-03-2009 13:01

Re: Watchdog!?!?!?!
 
Regarding our robot, apparently we do have some camera code in the program even though we aren't using the camera. We had previously planned to use it, and our programmer had made some progress with tracking software. We later decided to focus on defensive maneuvers and enter teleoperation with a loaded hopper.

Any code for driving the motors (or servos?) is disabled, but apparently the code is still trying to collect images from the camera. We will remove all of that. I don't know if it is (was) in the same loop as the drive code.

We will also verify all the device setup (IP addressing, etc.).

To professorX: To reset the cRio from the DS, simultaneously press all three white buttons on the DS and hold for a second or so. A new menu will come up and you can "select" reset. It takes about 20 seconds, but thats a lot better than sitting idle for the entire match. I recommend you test/practice in the pit on tether.

Thanks to all for your help on this.

NC GEARS 02-03-2009 16:13

Re: Watchdog!?!?!?!
 
Yes, thanks to everyone for the help. Hopefully we can resolve our issues with your support. Thanks again to everyone.

Kingofl337 02-03-2009 16:24

Re: Watchdog!?!?!?!
 
I think the driver station should be updated so that it reads either
"System Watch Dog" or "User Watch Dog" that would help teams who
are not sure which is active.

I also wish they would change the DS so it says "Status: No Code" instead of "Battery: No Code". Because people keep asking, "Why does the battery need code?"

StephenB 02-03-2009 18:57

Re: Watchdog!?!?!?!
 
Quote:

I think the driver station should be updated so that it reads either
"System Watch Dog" or "User Watch Dog" that would help teams who
are not sure which is active.
since the system watchdog is tripped entirely by communication loss, that event is already covered. 'No comms' means no network connection and therfore the system watchdog has been tripped. 'Watchdog' means the user.

Mike Bennett 02-03-2009 20:22

Re: Watchdog!?!?!?!
 
Now that we have learned that we probably need a timer delay, the question has been raised as to "what is a reasonable delay?"
For our robot during autonomous mode the only thing that might be time critical is the reading of the gyro to test if we have turned the number of degrees that we want.

It would seem that we don't want to tie up the CPU any more than necessary.

Tom Line 03-03-2009 12:11

Re: Watchdog!?!?!?!
 
In the past, all examples I have seen suggested something on the order of 5-10 milliseconds.

professorX 03-03-2009 19:28

Re: Watchdog!?!?!?!
 
Quote:

Originally Posted by Wayne TenBrink (Post 830347)
Regarding our robot, apparently we do have some camera code in the program even though we aren't using the camera. We had previously planned to use it, and our programmer had made some progress with tracking software. We later decided to focus on defensive maneuvers and enter teleoperation with a loaded hopper.

Any code for driving the motors (or servos?) is disabled, but apparently the code is still trying to collect images from the camera. We will remove all of that. I don't know if it is (was) in the same loop as the drive code.

We will also verify all the device setup (IP addressing, etc.).

To professorX: To reset the cRio from the DS, simultaneously press all three white buttons on the DS and hold for a second or so. A new menu will come up and you can "select" reset. It takes about 20 seconds, but thats a lot better than sitting idle for the entire match. I recommend you test/practice in the pit on tether.

Thanks to all for your help on this.

Thank you for the help.

Alexa Stott 03-03-2009 19:46

Re: Watchdog!?!?!?!
 
A few notes on some of the things that have appeared in this thread:
1. I recommend disabling the watchdog entirely. This should be done by simply entering GetWatchdog.SetEnabled(false), NOT by "killing" the watchdog. That causes even more problems. I found this feature to be more of a pain than anything.
2. We have discovered that, if anything goes wrong in autonomous regarding code issues, you will be unable to retain control during teleop unless you reset, so resetting the system from the DS is critical. As such, we went through the process with our drive team. I highly recommend all programmers make sure that at least one driver knows how to do it. Wayne TenBrink posted instructions earlier on how that's done.

Vikesrock 03-03-2009 20:39

Re: Watchdog!?!?!?!
 
Quote:

Originally Posted by Alexa Stott (Post 831325)
1. I recommend disabling the watchdog entirely. This should be done by simply entering GetWatchdog.SetEnabled(false), NOT by "killing" the watchdog. That causes even more problems. I found this feature to be more of a pain than anything.

Watchdog's exist for a reason. I would not give a blanket statement saying disable the watchdog because for a number of teams this may be a bad idea. I know that there are parts of our robot that could be broken if our code were to hang and not be disabled by the watchdog.

For some teams disabling the watchdog may be the easy solution for now, but the "better" solution is probably a properly fed watchdog.

kirtar 03-03-2009 20:47

Re: Watchdog!?!?!?!
 
In any case, according to our programmers, not feeding the watchdog was the most common mistake in autonomous programming at Buckeye.

Alexa Stott 03-03-2009 21:15

Re: Watchdog!?!?!?!
 
Quote:

Originally Posted by Vikesrock (Post 831360)
Watchdog's exist for a reason. I would not give a blanket statement saying disable the watchdog because for a number of teams this may be a bad idea. I know that there are parts of our robot that could be broken if our code were to hang and not be disabled by the watchdog.

For some teams disabling the watchdog may be the easy solution for now, but the "better" solution is probably a properly fed watchdog.

We have not seen any problems arising from not using the watchdog, and we disabled it in all our code.

StephenB 03-03-2009 21:26

Re: Watchdog!?!?!?!
 
either way works. the main reason the watchdog was put in was for debugging. if you hit a breakpoint, you might not want the robot to keep chugging along.

Vikesrock 03-03-2009 21:27

Re: Watchdog!?!?!?!
 
Quote:

Originally Posted by Alexa Stott (Post 831389)
We have not seen any problems arising from not using the watchdog, and we disabled it in all our code.

Which is why I said it may work for some teams. We will not be disabling our watchdog because if our code is actually hanging somewhere we want the outputs disabled so our robot doesn't break itself or incur a penalty.

BLAQmx 03-03-2009 21:44

Re: Watchdog!?!?!?!
 
NI FIRST Support: FRC Robot Modes / States Explained

This is some documentation on these modes/states. Feel free to submit questions and feedback.

ahudson 23-01-2010 08:44

Re: Watchdog!?!?!?!
 
Glad to know we are not the only team struggling with programming. When using driver station we are able to connect to robot and drive sporadically, but have a watchdog not fed message. Does it have anything to do with the DS being so sporadic?
Also we have an ERROR Code 44015.

Bongle 23-01-2010 08:52

Re: Watchdog!?!?!?!
 
We had a watchdog issue when our battery got too low.

In C++, the default code has something like
GetWatchDog().Feed()
at the top of the OperatorControl() loop. You need to make sure that that function gets called frequently.

In C++ (and probably all the other languages, since they are all the same library), the image analysis code seems to be a bit too slow to keep your watchdog fed if you put it in your main loop. I recommend you create a new task (aka thread) and do the image analysis there, putting the results in some variable accessible to the main OperatorControl() or Autonomous() loops.

That's what we're going to be doing today. I don't know the specifics of how we're going to do it yet, but I do know that the PIDController class spawns a new thread in its constructor, so we're going to be copying that.


All times are GMT -5. The time now is 16:03.

Powered by vBulletin® Version 3.6.4
Copyright ©2000 - 2017, Jelsoft Enterprises Ltd.
Copyright © Chief Delphi