Go to Post Meanwhile, it's water over the dam, spilled milk, etc. Let's drop the subject and get on with life. I am seeing people warming up their lawyerism, which is not a good thing when we are coming up on build season. - EricH [more]
Home
Go Back   Chief Delphi > Technical > Electrical > CAN
CD-Media   CD-Spy  
portal register members calendar search Today's Posts Mark Forums Read FAQ rules

 
 
 
Thread Tools Rate Thread Display Modes
Prev Previous Post   Next Post Next
  #13   Spotlight this post!  
Unread 03-04-2011, 18:23
John Heden John Heden is offline
Registered User
FRC #1073
 
Join Date: Jan 2011
Location: Hollis, NH
Posts: 29
John Heden is an unknown quantity at this point
Re: Unexplained intermittent CAN / 2CAN Jaguar problems at GSR

Greetings All,

We started this thread after experiencing a number of catastrophic CAN startup problems at the GSR regional and while we were hopeful that the V29 update would provide some relief on this issue, we continued to see this problem occasionally at the Hartford regional this past weekend. While our Drive team was instructed to monitor the Driver Station diagnostic tab for streaming CAN errors, they became complacent on Friday after never encountering the issue on Thursday. A Driverstation side CRIO Reboot did NOT recover the situation after the match began and we sat idle during the entire match (our Alliance won without our participation).

We experienced this problem again on Saturday while setting up before opening ceremonies (we had the first match) and again the system did NOT recover with a warm Driver station reboot on the first attempt and required either a third reboot attempt or possibly an actual robot power cycle to recover. This early morning triple failure scared us a bit but we NEVER experienced this catastrophic CAN failure during our later matches. The drive team did occasionally see a few startup CAN errors that concerned (panicked) them but did not see the catastrophic scrolling CAN error behavior.

We saw this failure occur during GSR at frequency of about 1 out of 6 matches ONLY while we were actually on the playing field and never while tethered to the robot with approximately the same statistic at Hartford. There was an observation that was seen once on the practice field making us curious as to whether the use of the radio was some how a catalyzing factor. Our radio was physically touching the 2CAN so we decided to try to give them some space (inverse square law). We couldn't try this with the radio in the pits but we proceeded to do some repetitive tethered power on/off tests trying out different power up sequences (relative to laptop) to try to reproduce this. We must have done this 20 or 30 times and NEVER saw a single CAN transaction failure and certainly not the continuous scrolling catastrophic CAN failure signature. We thought this was a radio ONLY failure but we did eventually experience this once while we were tethered in the pits. A quick power cycle and the problem went away! I wish we had tried a soft reboot as an experiment but our robot was being queued and our goal at that point was recovery rather than experimentation.

I believe we have some type of CAN/2CAN/CRIO/WPILib startup race condition that occasionally prevents some type of low level initialization causing the complete loss of the CAN bus. The manifestation we see is as if we simply pulled the CAN cable out of the 2CAN preventing any successful transaction to any CAN device. I believe use of the radio somehow amplifies this window of opportunity for failure given our ratio of match failures to pit failures and given we power up much more often while tethered in the pits than during actual matches. We had little working radio based experience prior to arriving at GSR due to the late availability of the physical robot for software testing. This radio testing and its influence on CAN failures will be a priority when we get our robot back. My apologies that some of this data is so soft but we were unable to find any hard correlation or anything definitive other than an occasional complete startup failure that always recovers on a power cycle mostly ruling out cabling issues. This failure occurs BEFORE being enabled essentially ruling out any real voltage drop or current/noise problems. If we startup successfully, we do run successfully. In fact, we have performed number of tests where we pull the breakers out of the Jags and even the 2CAN. This causes CAN errors to be reported but the system nicely recovers within a couple of seconds after we plug the breakers back in. We use the default voltage mode so others who have a more complex initialization or control scheme may not recover so easily.

There was also some anecdotal data coming from others at Hartford (other teams and even some of the Harford technical field folks) that believe the serial CAN interface is more robust than the 2CAN and recommended we switch away from the 2CAN. While this startup problem is catastrophic, it feels like some type of simple initialization glitch that is solvable. The CAN & 2CAN approach is a nice technology with perhaps this single gremlin to be exorcised. We'll try to diagnose this further when our bot comes home but unless we can convince our team that this is behind us, we may be forced to return to the simpler ways of PWMs....

Cheers and thanks,

John

1) CAN Wiring is correct with proper termination.
2) All components are ground isolated from the frame and electrical wiring has no shorts or ground faults.
3) We run the CRIO connection directly to the radio and connect the radio directly to the 2CAN rather than passing all CRIO traffic through the 2CAN.
4) We used all Tan JAGS.
5) We Programmed with C++.
6) Used Voltage Mode only.
7) 3 Jags with optical encoders, 1 of these has a single limit switch as well
2 Jags each with 2 limit switches, no encoder

8) I should also add that our software launches a separate dashboard thread at the end of the constructor AFTER the Jags and other robot objects are created with this data (encoder values, currents, voltages, etc) being read for display and capture by our custom dashboard. This explains why we see a continuous never ending data stream while others, I suspect, may see a number of errors during startup that stop but resume once they are enabled and autonomous & control Jaguar transactions begin.
Reply With Quote
 


Thread Tools
Display Modes Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump


All times are GMT -5. The time now is 04:12.

The Chief Delphi Forums are sponsored by Innovation First International, Inc.


Powered by vBulletin® Version 3.6.4
Copyright ©2000 - 2017, Jelsoft Enterprises Ltd.
Copyright © Chief Delphi