|
|
|
![]() |
|
|||||||
|
||||||||
![]() |
| Thread Tools | Rate Thread | Display Modes |
|
#91
|
||||
|
||||
|
Re: Unexplained intermittent CAN / 2CAN Jaguar problems at GSR
Not to pile on, but in case this may help either another team or anyone working on these problems:
- If you try to use an indexed encoder with the closed-loop PID control feature of the Jags, I'm pretty sure this won't work (based on our experiences). The issue here is probably that the index mark causes the encoder count to be reset once per revolution, and the PID logic doesn't expect this behavior, since it is not a continuous count but a count that essentially wraps around at a certain point. You can get around this by disconnecting the index pin -- certainly worth a try if this applies to you. You'll have to rely on the encoder being at zero when things start up, or doing something else to reference the count. - If you are using PID, stuttering could certainly be caused by not having the P/I/D coefficients set correctly. - I second the recommendation not to crimp an RJ connector directly to the solid leads coming from a resistor; this is not as reliable as making a short pigtail using cable that is designed for use with these connectors and soldering the terminator resistor to this. - If you use a 2CAN, the 2CAN webpage is helpful for seeing how things are behaving. In particular, it provides error counters that can help validate wiring and basic communications connectivity. |
|
#92
|
|||
|
|||
|
Re: Unexplained intermittent CAN / 2CAN Jaguar problems at GSR
Greetings All,
We started this thread after experiencing a number of catastrophic CAN startup problems at the GSR regional and while we were hopeful that the V29 update would provide some relief on this issue, we continued to see this problem occasionally at the Hartford regional this past weekend. While our Drive team was instructed to monitor the Driver Station diagnostic tab for streaming CAN errors, they became complacent on Friday after never encountering the issue on Thursday. A Driverstation side CRIO Reboot did NOT recover the situation after the match began and we sat idle during the entire match (our Alliance won without our participation). We experienced this problem again on Saturday while setting up before opening ceremonies (we had the first match) and again the system did NOT recover with a warm Driver station reboot on the first attempt and required either a third reboot attempt or possibly an actual robot power cycle to recover. This early morning triple failure scared us a bit but we NEVER experienced this catastrophic CAN failure during our later matches. The drive team did occasionally see a few startup CAN errors that concerned (panicked) them but did not see the catastrophic scrolling CAN error behavior. We saw this failure occur during GSR at frequency of about 1 out of 6 matches ONLY while we were actually on the playing field and never while tethered to the robot with approximately the same statistic at Hartford. There was an observation that was seen once on the practice field making us curious as to whether the use of the radio was some how a catalyzing factor. Our radio was physically touching the 2CAN so we decided to try to give them some space (inverse square law). We couldn't try this with the radio in the pits but we proceeded to do some repetitive tethered power on/off tests trying out different power up sequences (relative to laptop) to try to reproduce this. We must have done this 20 or 30 times and NEVER saw a single CAN transaction failure and certainly not the continuous scrolling catastrophic CAN failure signature. We thought this was a radio ONLY failure but we did eventually experience this once while we were tethered in the pits. A quick power cycle and the problem went away! I wish we had tried a soft reboot as an experiment but our robot was being queued and our goal at that point was recovery rather than experimentation. I believe we have some type of CAN/2CAN/CRIO/WPILib startup race condition that occasionally prevents some type of low level initialization causing the complete loss of the CAN bus. The manifestation we see is as if we simply pulled the CAN cable out of the 2CAN preventing any successful transaction to any CAN device. I believe use of the radio somehow amplifies this window of opportunity for failure given our ratio of match failures to pit failures and given we power up much more often while tethered in the pits than during actual matches. We had little working radio based experience prior to arriving at GSR due to the late availability of the physical robot for software testing. This radio testing and its influence on CAN failures will be a priority when we get our robot back. My apologies that some of this data is so soft but we were unable to find any hard correlation or anything definitive other than an occasional complete startup failure that always recovers on a power cycle mostly ruling out cabling issues. This failure occurs BEFORE being enabled essentially ruling out any real voltage drop or current/noise problems. If we startup successfully, we do run successfully. In fact, we have performed number of tests where we pull the breakers out of the Jags and even the 2CAN. This causes CAN errors to be reported but the system nicely recovers within a couple of seconds after we plug the breakers back in. We use the default voltage mode so others who have a more complex initialization or control scheme may not recover so easily. There was also some anecdotal data coming from others at Hartford (other teams and even some of the Harford technical field folks) that believe the serial CAN interface is more robust than the 2CAN and recommended we switch away from the 2CAN. While this startup problem is catastrophic, it feels like some type of simple initialization glitch that is solvable. The CAN & 2CAN approach is a nice technology with perhaps this single gremlin to be exorcised. We'll try to diagnose this further when our bot comes home but unless we can convince our team that this is behind us, we may be forced to return to the simpler ways of PWMs.... Cheers and thanks, John 1) CAN Wiring is correct with proper termination. 2) All components are ground isolated from the frame and electrical wiring has no shorts or ground faults. 3) We run the CRIO connection directly to the radio and connect the radio directly to the 2CAN rather than passing all CRIO traffic through the 2CAN. 4) We used all Tan JAGS. 5) We Programmed with C++. 6) Used Voltage Mode only. 7) 3 Jags with optical encoders, 1 of these has a single limit switch as well 2 Jags each with 2 limit switches, no encoder 8) I should also add that our software launches a separate dashboard thread at the end of the constructor AFTER the Jags and other robot objects are created with this data (encoder values, currents, voltages, etc) being read for display and capture by our custom dashboard. This explains why we see a continuous never ending data stream while others, I suspect, may see a number of errors during startup that stop but resume once they are enabled and autonomous & control Jaguar transactions begin. |
|
#93
|
|||||
|
|||||
|
Re: Unexplained intermittent CAN / 2CAN Jaguar problems at GSR
John,
I may be misunderstanding your comment # 3. Please clarify if my comments don't make sense. ***** 3) We run the CRIO connection directly to the radio and connect the radio directly to the 2CAN rather than passing all CRIO traffic through the 2CAN. ***** I thought the 2CAN was to be connected to the CRIO on port # 2. Since port # 2 is on a different network the traffic is isolated from the robot communication traffic on the wireless. That is how ours is wired and we just completed 2 regional events without any control issues. We had other issues, just not control ones... It seems adding that additional load through the radio switch and robot communication network could indeed cause problems. -Hugh |
|
#94
|
||||
|
||||
|
Re: Unexplained intermittent CAN / 2CAN Jaguar problems at GSR
Quote:
My $.02, Mike |
|
#95
|
|||
|
|||
|
Re: Unexplained intermittent CAN / 2CAN Jaguar problems at GSR
Hi Hugh,
With respect to my comment #3, We initially had ALL of our CRIO traffic going through the 2CAN device by connecting the CRIO to the 2CAN and then to the radio. This worked very well except for the same problem that we now discussing. Our second topology was to run all Ethernet devices (CRIO, Camera 1, Camera 2, and the 2CAN) directly to the radio which felt like a more robust approach as the 2CAN device did not need to manage all CRIO traffic. This was an experimental change that did not seem to help or hurt but that's the way we have left our robot wired. This approach leaves 1 of the 2 2CAN Ethernet ports unconnected and required us to disconnect 1 of our cameras when tethered (maybe the other 2CAN port could have been used for tethering but we never investigated this). I hope this clarifies my comment #3 a bit. Perhaps our Team's next experiment would be to use port #2 to connect directly to the 2CAN device and see whether that helps. Given that this port has NO other Ethernet traffic, perhaps it would be a bit more consistent in any network influenced timing dynamics. Thanks, John |
|
#96
|
|||
|
|||
|
Re: Unexplained intermittent CAN / 2CAN Jaguar problems at GSR
Quote:
Thanks for your thoughts. We initially had the 2CAN device on the CRIO port 1 connector which is where we first started to see our problems. The buffer capacity of the 2CAN vs. the radio was unclear but our intuition gave the radio the advantage here. I believe our next step will be to simply move to port 2 of the CRIO and see whether that helps things. Thanks, John |
|
#97
|
||||
|
||||
|
Re: Unexplained intermittent CAN / 2CAN Jaguar problems at GSR
Do you know if your 2CAN has had the firmware updated to version 2.5 or not? We have a regional this coming week/end and will try out v29 on the cRIO. We've been using serial, but it is easy enough to switch back and forth that we might try the 2CAN again, at least for practice matches. We have not yet had a chance to try either of these updates.
|
|
#98
|
||||
|
||||
|
Re: Unexplained intermittent CAN / 2CAN Jaguar problems at GSR
Quote:
Did you by chance happen to have Windriver's target debugger connection open when this occurred in your pits? If so, did it report any abnormal terminations or errors? If not, you might consider doing so as this can provide a wealth of information when things go awry. You don't have to be "debugging" the code at the time -- you can be running an "deployed" program. The nice part about it is that it will report any task failures/terminations to you as well as stack information if available. Are you able to go into a bit more detail on what your dashboard task is doing exactly? Particularly in relation to CANJaguar objects, as well as frequency of iteration. We send dashboard data during disable as well, but do it as part of the normal disable processing routine rather than as a separate task. |
|
#99
|
||||
|
||||
|
Re: Unexplained intermittent CAN / 2CAN Jaguar problems at GSR
Quote:
My teams robot (programmed with Java using IterativeRobot) has gone through one event with our 2CAN plugged into our DAP-1522 and have had no CAN related trouble. We were running cRIO v28 (v29 wasn't out at the time) and 2CAN firmware v2.5 with the SVN rev 66 plugin on the cRIO. We have 6 black jaguars on the CAN bus with no sensor inputs or limit switches. Perhaps we were very fortunate during our regional but we have not had any serious CAN issues (knock on wood) since build. All of our trouble then could be traced back to poorly made cables when we did have problems. I am hoping our luck caries us through championship. |
|
#100
|
|||
|
|||
|
Re: Unexplained intermittent CAN / 2CAN Jaguar problems at GSR
We continue to be plagued with timeouts on the CAN bus. And yes, I've checked the termination, and it all looks good. We are running V29 and I don't have the numbers for the plugin or the 2CAN firmware.
One thing that I have seen which is causing no end of issues is that if you get a timeout on the messages, the API is return no indication of the failure. So, for example, if the GetForwardLimitOK() function is called, and times out, you get back false. There is no way to know that that has happened and if you are making decisions based on these results... We have an encoder on our lift mechanism. To zero the encoder, we drive to the bottom limit switch, and when we get there, we set the encoder to 0. This works fine until we lose the message due to timeout. From that point on the lift is offset by where ever the timeout occurred. There really needs to be a way within the API to detect that the transaction timed out. Of course, the best answer would be that we don't have any timeouts. ![]() Another observation related to timeouts is that we have an on board compressor and if during initial startup the pressure sensor indicates that the compressor should run and starts the compressor immediately, we get a number of timeout messages. All in all, I'm really regretting the decision to use the CAN bus. And for the most part all of the features that I really wanted to use, that were provided by the CAN bus, proved to be unusable. |
|
#101
|
|||
|
|||
|
Re: Unexplained intermittent CAN / 2CAN Jaguar problems at GSR
Quote:
R59 adds some expansion to R50 stating: <R59> If CAN-bus communications are used, the CAN-bus must be connected to the cRIO-FRC through either the Ethernet network connected to Port 1, Port 2, or the DB-9 RS-232 port connection. Our 2CAN connection directly to the radio is probably a rule violation (inadvertent) but was an attempt to prevent errors when we were connected (legally) to port 1. A port 2 to 2CAN connection will be our next experiment to try to avoid this issue. john |
|
#102
|
|||
|
|||
|
Re: Unexplained intermittent CAN / 2CAN Jaguar problems at GSR
Quote:
Quote:
Thanks, john |
|
#103
|
|||
|
|||
|
Re: Unexplained intermittent CAN / 2CAN Jaguar problems at GSR
Quote:
|
|
#104
|
||||
|
||||
|
Re: Unexplained intermittent CAN / 2CAN Jaguar problems at GSR
Quote:
|
|
#105
|
||||
|
||||
|
Re: Unexplained intermittent CAN / 2CAN Jaguar problems at GSR
I was experimenting with the CAN bus on our spare electronics board earlier and discovered some characteristics caused by termination problems which might be helpful.
I'm not claiming this is the cause of many, or indeed any, CAN issues for other teams but it is one potential failure mode. Our practice board uses the documented terminator of an RJ-12 with a 100ohm resistor crimped between pins 3 & 4. (We also added a short length of telephone wire crimped into pins 1 & 6 as a handle for insertion / extraction which I don't believe has any bearing on the issues.) DMM testing showed a ~100ohm resistance as expected. However at some stage the resistor leads had bent towards each other to the point where they were almost touching, creating a potential short on the CANH & CANL lines. Mechanical shock could potentially cause them to touch for an instant. I discovered that when the lines were shorted even briefly the CAN bus failed completely. Interestingly, removing the short did not restore CAN, and neither did removing the short and rebooting the cRIO. However removing the short and power-cycling the robot did restore CAN. I predict that just power-cycling the Jaguars would have the same effect, but I didn't test this (although our code would still need to be rerun to initialize CAN properly). We are using serial-CAN so we could still control the bridging BlackJag using RS232 while the CAN bus was out. I conclude that only seeing the bridging Jaguar is a useful diagnostic indicator for a potential terminator problem - sorry 2CAN users. Last edited by MikeE : 04-04-2011 at 22:07. Reason: stylistic reasons |
![]() |
| Thread Tools | |
| Display Modes | Rate This Thread |
|
|