Jaguars intermittently disconnecting

Hi all,

I just updated our 2010 cRIO to the 2011 image and rebuilt the code against the 2011 libraries. After switching the jaguar code to the new WPILib version though, we are having some major issues. Jaguars have been updated to v92, and we are using a serial bridge.

When the code is enabled, at random intervals 1 or more motors stop for a few seconds, then continue moving again. On the jaguar when the motor stops the status LED is flashing yellow, when it is working it is the expected state. I’ve reduced the code down to just joystick input and output to the 4 jaguars with no change. I haven’t seen it die though when using only two jaguars. Even when idle the status LEDs on the jaguars show the same symptoms. NetConsole gives no errors, MotorSafety has been disabled on all of the jaguars, and it works fine in BDC-COMM.

Any help would be appreciated

We had this problem, but it disappeared after a good reboot and code redeploy. That seemed suspiciously easy, so we’ll keep an eye out for these symptoms. I suspect since you’ve made several alterations to your code I’m sure your cRio has been through a couple reboots, so clearly that isn’t a solution.

Perhaps you have some CAN wiring in a bad place coupling in some interference from motors? That thought just occurred to me, and while I was doing my reboot, the wires had been rearranged (temporary test bed), so I may have incorrectly attributed the solution to the reboot.

Check and see where your wires are routed, if they’re picking up EMI from the motors, it could be causing a communications fault.

Matt

What happens when you decrease the rate you are sending the control messages?
Does this happen if you tell your Jaguars to have a certain output, and leave it at that, or does this only happen when you are continuously updating the outputs?

I do have wires running a little close to a pair of motors, but the wires don’t seem to be the problem since I still see the problem even if those motors aren’t running. Also jaguars downstream of the fault don’t always have the problem too and if I run the motors via BDC-COMM it seems to work fine.

This problem only appears to be happening under continuous messages

I haven’t tried changing the rate of the main teleop loop, but I did see a difference when I lowered the code down to 2 jaguars instead of the 4 drive. Though that may have been because there were fewer active jaguars to see a problem on (I think the jags were still failing around the same rate)

One thing that may be of importance is that the heartbeat coming from the cRIO has never seemed to work on our network. When I run via BDC-COMM all of the jags turn solid, but when running through the cRIO the jags are only solid when they are receiving set commands from my code. (mainly a problem in autonomous, forced us to use a state machine last year)

Could you check your termination resistor to make sure it has no chance of shorting?

It sounds like the rate you send messages DOES affect the problem, but I don’t think it’s the only issue.

More frequent messages means more datastreams to get interrupted by noise. I’m still leaning towards noise, we had the symptoms reappear today with some added CAN wiring on the system getting twisted up with the motor wiring. Shaking up the wires and pulling them apart seemed to resolve the problem.

That’s strange. CAN is a complementary signal, like USB and Ethernet. It shouldn’t be susceptible to inductance like that.

Out of curiosity, what motors were you using? Did it matter whether they were stopped or at full power?

I would agree, and I’m sure it definitely helps, but it doesn’t make it absolutely immune, just resistant. Perhaps seeing if you can get a scope on the pins and see if the signals are noisy?

To use the scope without removing your termination resistor, you can use a phone splitter, and make a pigtail to attach your leads to. Only the two center pins are important.

I would expect the serial connection to be more susceptible to noise than the CAN bus. Specifically susceptible to noise coupled from the CAN bus.

-Joe

This makes sense, especially since the wire we had tangled was coming from serial to the black Jag to be bridged. If the symptoms present themself again I will scope it and what I can find.

Would you suggest shielding common potential, or shielding to the frame?

This was a common problem last year and has more to do with one of the watchdogs. Joe, Andrew Watchom from NI identified it in Minnesota last year. It has something to do with restarting the code under certain conditions. Most often teams notice it as a compressor restart.

Al, could you clarify what the common problem was?
As far as I’m aware, CAN wasn’t common enough last year to have any common problems.

Were teams running their compressor over CAN? I thought this was against the rules.

Marshall,
This is not a problem strictly limited to CAN. As it is related to a code issue, CAN is not yet involved in the scheme of things. This is one of the problems with diagnosing the issue last year, in that both CAN on non CAN robots were affected, as were Labview and C+.

His name is Andy Watchorn. A “watchdog error” as people are familiar with will make all motor controllers turn off at the same time. Not some of them on the bus. On the other hand, if noise is occasionally making the safety protocol fail to get through to a motor controller, then that controller will drop out until the next renegotiation. There is a 2 second renegotiation period.

-Joe

Well, I just confirmed with our student programmer that our watchdog has been disabled for all of these incidents. Also, we received no warnings about the watchdog not being fed, so I’m inclined to believe this is something else, unless there is a lower level watchdog potentially malfunctioning. We also experienced the same problem with BDC-COMM but an otherwise identical set-up, so I’m still leaning towards EMI.

Matt

That sounds like the symptoms I’m seeing. It doesn’t explain though why this problem didn’t appear until after I upgraded to v25 and v92.

Is there any code on the cRIO that could possibly be causing the set packets to be dropped? Also is the code supposed to send out a global heartbeat signal to the bus?

Joe,
Check with Andy, I never did fully understand the issue but when he explained it to a C+ programmer, he understood the problem and corrected it. (I am a hardware guy) That team had no further problems through the season and up to the Champ finals.
It manifested itself in a very similar manner to what has been described here. It did not show any watchdog error but it seemed to be less than a two second delay. If the robot was running, it would stop for a moment and then run normal. Since many teams used pneumatics last year, most manifestations occurred while observing the compressor.

It seems like a bug with communications, like I said before, we exhibited identical symptoms with the jaguar controlled by the desktop application with the cRio completely out of the loop. Unless the problem also exists in NIs application, that would seem to preclude it being a software issue.

My theory on EMI works if you think about the compressor kicking on causing a sudden surge of inrush current, or maybe IFI cheaped out and there’s no snubbing diodes in those Spikes :wink:

I don’t know, I don’t mean to seem stubborn, but I can’t find any consistency with a software glitch.

Unfortunately I don’t usually make it to the shop during the week (new job, don’t want to start skipping out just yet!) so I won’t be able to look in to this further until next weekend.