Hi all -
Chill Out Team 1778 here. We are chasing an intermittent problem with our robot, and it is usually connected to a large collision. We’ve been troubleshooting the connections, but wondered if anyone has suggestions on how to isolate this problem effectively.
Robot config:
Basic kitbot chassis with 4 drive motor CIMs, and two additional motors for gate & rolller attachment. All 6 motors controlled by black Jaguars on a CAN bus wired to a 2CAN bus controller and terminated with a 100 ohm terminator. The 2CAN is also connected to a cRIO and on the network side to a wireless router.
Normally everything works fine, but if the robot gets hit hard on the gate, the robot will stop all response for 5 seconds before continuing. During this time the 2CAN shows a flashing red light, and then after 5 seconds returns to green flashing. We have checked the cRIO to 2CAN cable, and nothing seems to be wrong there. 5 seconds isn’t enough time for the cRIO to reboot, so something else appears to be going on.
Check your wiring. This is typically caused by some loose wires. They are typically in the least suspecting places and have a very high resistance. Something may have gotten disconnected after the collision!
Check your Ethernet cables and the condition of your router ports. Also check the power connection to the 2CAN.
Red flashing means the 2CAN lost the cRio. As Don mentioned, it takes longer than 5 seconds for the cRio to reboot. It takes ~35 seconds for the router to recover from losing power (that hit us last weekend). Now that I think about it, the 2CAN takes about five seconds to reboot…
We’ve had small bits of plastic break off of Ethernet plugs and receptacles. The result is a physically loose and electrically intermittent connection. Everything looks fine, until a jolt in the wrong direction breaks a connection.
-Edit- I just re-read the OP. Sounds like a direct connection from cRio (8-slot) to 2CAN? My suggestion was based on the four slot cRio, assuming a router was in play.
We have seen issues with the ethernet connections on the 2CAN being touchy in the past. Try wiggling some of these connections and see what happens. We had multiple 2CANs that had the same issue, worth a check.
We originally had some suspect cables, these have all been replaced and no amount of jiggling or tugging can cause this fault in the pit. Additionally, artificial vibration in the pit (read: whacking robot with a hammer) cannot replicate this failure.
We replaced the 2CAN today, and still experienced the fault. [just read the comment from 1736… doh!] We have tried powering the 2CAN from both the regulated 12V and 24V regulated sources, and still experienced the fault. I have a hard time believing it is current related, as we have done plenty of high current maneuvering on our practice field without observing it. Additionally, the drivers seem to be reversing direction no problem on the competition field.
At this point we are very good friends with the CSAs. I know there have been some weird fringe electrical issues in the past with the 120 amp breakers doing weird things, and with bad PDBs, but I assume I’ll be saying “DOH! Can’t believe we missed that!!” when this is all over.
We played lots of back to back matches today, so tomorrow we are changing our cabling such that if we lose the 2CAN the cRio will still be able to talk to the field. As Don said, I don’t think we’ll get much additional information out of this – if the cRio was losing power we’d be dead in the water for a long time.
Al has been adamantly against hot gluing electrical connections because it’s apparently “not an industry standard” (though I’ve seen it a thousand times).
Should anyone stumbled upon this in the future, the issue stopped happening today. We had a tightly timed loop with no wait in our teleop code that was driving 100% CPU usage. We added a 10ms delay and our CPU usage dropped down to ~75% (on an 8 slot cRio). We also changed our cabling, so that instead of cRio -> 2CAN -> Router, it went cRio -> Router -> 2CAN. Not sure if it was one of these or the combination, but it stopped happening!
We may try to isolate it further in the off-season.
Glad you got things working (if not entirely solved). As an aside for anyone else reading, the original configuration as described should not have passed inspection as it violates R56.
I know just enough about electricity and software to get myself in a whole bunch of trouble. Is there a plausible explanation why 100% CPU usage would make the 2CAN go out temporarily in a high energy collision?
I can see how our new routing could maybe be the fix. Even though we couldn’t replicated it by tugging on cables, maybe there was something specific to the on-field collisions that made certain connectors in certain ports temporarily disconnect.
I’m not sure if the two are connected but 100% CPU usage is never good. Maybe some “experts” can chime in about if they is any connection between the two.
Last year we put while loops and sequence tasks in TeleOp and it killed our code. Wasn’t until this year that we discovered Periodic Tasks.
I am glad you pointed that out. I think that we may have our 2CAN wired wrong too. We haven’t had any problems but it seems we may be able to prevent some.
When you see a 5 second gap after a hard collision, expect that the radio power supply was interupted. Not long enough for a complete reboot but certainly long enough for the radio to make a recovery mode and reestablish comms with cRio and field. Just enough will corrupt some of the data. With the 2Can in the loop, check to see how long it takes for the 2Can to recover from a power fault. That is more likely the issue. You moved the comms link so you likely moved the power as well. This is just an accident waiting to happen again. Check this when you can.
I recommend against hot glue for two reasons. If you have applied it correctly, you might not be able to change something if it has failed. That might forever damage the device. If it does pull apart easily, then the hot glue was not applied correctly in the first place and you are only fooling yourself that you fixed the issue. Just because you think you see adhesive applied in an industrial device, do not think it was meant to come apart. The manufacturer might just think the part is expendable.
Any chance your radio connector has a custom cable? I’ve seen teams use the wrong barrel connector and see intermittent issues on the radio.
I’ve worked with enough teams with random issues caused by continuous* 100% CPU usage to know that pretty much anything can happen. The DS Log Viewer makes this obvious.
I’m distinguishing “continuous” here from occasional 100% CPU usage, for example when complex imaging logic is running.
We just spent 2 days at Buckeye tracking down Jag-related issues, which ultimately turned out to be a loose wire in a power feed to one of them. (We have the black ones on our robot.)
While we’ve used the additional features of the Jags to great benefit in the past, the fact that any one of a number of issues can shut down the entire string is problematic (and explains why a number of teams won’t use them.)
There are CAN repeaters that would isolate the segments, but they seem pretty pricey for FRC applications.
In the particular circumstance you are describing, the issue was likely a result of talking to a jag that was (at least temporarily) not there. Which means that a star configuration would have the same issue.
The CAN signal will pass right through an un-powered Jag, they use a short tap on the bus internally, it is not passed through a chip on the board.
The other possibility is that your power issue was on the Jag doing the RS232->CAN bridging which does require powered circuitry.