Hello I am from Team 2941, and over a rather recent time frame, I have converted our robot from Labview to C++. For the most part, the robot has worked solidly (with that being said, I have stored previous versions of the robot that do not suffer such sporadic power failures).
Well to the point now outside of the background information; between some time interval (~5 seconds over wireless; really random on wired, ten seconds to a whole minute) the cRio cuts power to all the systems so the victors and solenoids will flash for a moment (no power is being sent through). I was originally programming with pthreads (avoid watchdog use where I can ), and have ultimately removed all threads except for the robot drive. At the current state, it is
The diagnostic light blinks every 200ms, for about 200ms; however, we do not have the jumper hooked up acrossed it. (Mind you, the only thread running is now the Teleoperated main). We have flashed the cRio with Windriver, swapped out our Distribution board, sidecar, and victors for jaguars.
So with that monster of a story, anyone have thoughts or answers?
I would try disabling every part on the robot physically and slowly reenable them until you find the bad component, its usually a rouge jaguar or something
You can disable the user watchdog, but you can’t disable the system (communications) watchdog. Is there anything printed on the diagnostics tab of the driver station? Does it do it with the default code?
are you attempting to run the camera? We experienced a similar fault during the season because of our frame pushing on the camera reset button. If there are any references to the camera class at all in your code and the camera is not connected or otherwise unable to communicate, the camera task will consume enough resources to trip the system watchdog.
Just a minor clarification. I believe the camera classes will timeout when the camera is down or not connected. If you have the camera code in the same thread that feeds the watchdog, your watchdog will be fed late, but the CPU load will not be high. If the camera code is in its own thread/task, the camera code will slowly return errors, with no impact on the rest of the robot.
Greg,
Can you check with NI Rep Andrew on this one? I believe we found that under certain conditions, loops with in user programs caused a slight lag in program execution which caused a watchdog issue. This was a common problem throughout the season last year regardless of which language was used. Due to the error, all output was halted for a brief period. It manifests as a power loss but is not. Conditions included loss of robot indicator, loss of motor drive, compressor shuts off, robot twitches, etc.
Sorry little more to add to this; we have swapped out the power distribution board, the sidecar, jaguars, taken the camera code out, and probably something else I have forgetten.
This has produced nearly the same results (but if it is a communication error, and say it were the communication watchdog, is there way around it?)
Just curious, and perhaps my reading of what you wrote above is not what you intended, but have you tried re-loading one of these “stored previous versions of the robot that does not suffer such sporadic power failures” and see if it still works?
while( TeleOperated()){
Wait(0.05);
}
A bit off-topic, but why are you running Teleop at 20Hz ?
This did not appear to be the case on our robot. The only reference to the camera was in the (C++) constructor so that camera images would be forwarded to our robot. Watching the output on NetConsole, errors were flooding and appeared to have no timeout. Another symptom we encountered with this was the clock on the Driver Station freezing with the robot but no watchdog error was being shown in the bottom left corner. This was also running with a Serial-based CAN.
Camera communication is over TCP, and both the reads and writes have timeouts of 0.5 or 1 second on LV. I can’t say what they are on C++. This was my basis for saying that I didn’t expect a large CPU spike with the camera wasn’t plugged in or powered.
When I said slow errors, that is of course not very definitive. It is true that last year’s system could choke a bit when a large stream of fast errors were being sent to the DS. I don’t think those errors would be only camera, but if you had lots of other errors, the camera could contribute to make things even worse.
As for loops causing watchdog issues, it isn’t the fact that it is a loop, but the time that the code takes to execute. If the code manipulates a solenoid, a common occurrence last year, they code would often have delays and would cause a watchdog each time it ran. Even worse, it would cause many teleop packets to be overwritten since the teleop handler would take far longer than 20ms to execute.
A printout was inserted deep into the libraries last year to help determine when short disables were occurring. It kept count of the user and system watchdog disables since boot. If these correspond with the robot “power failures”, then indeed, the WD was disabling the robot due to a missed deadline. If not, then it was something else.