Chief Delphi

Chief Delphi (http://www.chiefdelphi.com/forums/index.php)
-   CAN (http://www.chiefdelphi.com/forums/forumdisplay.php?f=185)
-   -   Unexplained intermittent CAN / 2CAN Jaguar problems at GSR (http://www.chiefdelphi.com/forums/showthread.php?t=93338)

John Heden 07-03-2011 20:25

Unexplained intermittent CAN / 2CAN Jaguar problems at GSR
 
We had a couple of interesting and still unexplained catastrophic CAN problems at the Granite State Regional this weekend that we are still puzzling over. We use the 2CAN interface to connect 5 ‘tan’ Jaguars and everything seemed to work fine while we were tethered and also well during most of our matches.

We had two matches, however, where we failed to move at all with an endless stream of CAN transaction time outs being reported to the driver station diagnostic tab. Our drivers reported that they were able to recover well into one of these matches by rebooting the Robot and did successfully drive a bit in the little time remaining. Everything was always fine once we were tethered back at the pits and we were only able to reproduce this stream of CAN errors by actually unplugging one of the CAN cables. Interestingly, we were also able to reproduce the same sequence by disconnecting the Ethernet cable from the 2CAN interface. I was hoping to see some type of different error to help differentiate between a problem on the actual CAN bus vs. a problem with the 2CAN adapter. While the error messages were consistent with a cable disconnect of some type, it seems unlikely that a robot reboot would have resolved this type of problem and we were unable to locate any intermittent connections despite our best efforts.

Our robot is now bagged and tagged and headed to the Hartford regional so we are unable to do any further experimentation and troubleshooting. There is now some pressure to abandon the CAN interface and rewire everything to our historical PWM interfaces if this problem can be understood and solved.

Here are a few questions for anybody familiar with the Jaguar – 2CAN interface. My recollections are by memory but hopefully accurate. Any thoughts would be most appreciated.

1) How do folks recommend powering the actual 2CAN device ? There are references to ensuring the Dlink radio is powered by the boosted 12V (white Wago connector) to help avoid excessive voltage drops and possible problems. The 2CAN itself has a wide voltage range (6.5 to 28V) but should it be connected on the raw 12V fused power bus? Our autonomous start is probably the worst case power consumption where we simultaneously power up 3 big CIM motors to near full power.
2) Simultaneously accessing the 2CAN webpage while our robot code is simultaneously executing does result in a number of our robot CAN transactions failing (~ 1/1000). This is more of an annoyance for us than a problem but is suggestive of some type of system vulnerability when the 2CAN is being accessed from multiple sources. The WPI libraries seem to have access protection but the 2CAN itself may have some issue when it is being accessed by both the CRIO robot code and a browser on the2CAN page.
3) There is some mention that the FRC system does send additional CAN related messages to CAN based robots, can anybody confirm this possibility ? Could there be some type of negative interaction with the FMS system?
4) I did speak to a few CAN Jaguar users who spoke of possible overloading the CAN bus and needing to drop their access intervals down. The current WPI library interfaces, however, look like simple blocking calls where all access is nicely serialized and completes before the next sequence is initiated. While all of these calls may eventually slow down our control loops, it seems unlikely that we could overload the bus.
5) We do see some type of “too much error data!” message once in a while we were playing but could not locate this error message anywhere in our code or in the WPI libraries. Any idea as to the source of this unique message?
6) There is a rev 66 1/29/2011 for FRC plug-in update for FRC posted on the Cross the Road website. We bagged our robot before I could confirm what version of 2CAN plug-in is incorporated in the 2/11/2011 V28 CRIO update? Is there a need for manually installing this file or is the latest included with the V28 update ?

Thanks in advance for any CAN thoughts…

John

Radical Pi 07-03-2011 21:45

Re: Unexplained intermittent CAN / 2CAN Jaguar problems at GSR
 
We noticed another issue in the CAN libraries (serial) that may be related. This only happened in the pits for us, but occasionally NetConsole would show an error about the FRC_CANJaguar_ReceiveTask failing, which would subsequently cause the user code to fail quite violently. As with you, a soft reboot of the cRIO fixed the problem.

Judging by the similar symptoms between Serial and 2CAN interfaces, I'm thinking there's something wrong with the FRC_NetworkCommunication portion of the CAN library.

Phalanx 07-03-2011 22:37

Re: Unexplained intermittent CAN / 2CAN Jaguar problems at GSR
 
Hi John,

We also used CAN, but with all BLACK JAGS and 2CAN. We didn't have any issues at all with it.

I can answer a few of your questions base on how we did things.

1) We power 2CAN from the raw 12V on the PD. We didn't experience any brown outs even when applying full power to 4 cims and 4 banebots simultaneously. I will say that low battery voltage, bad terminators, faulty wiring are the biggest contributors to CAN network problems.

2) We haven't experienced that issue, however, the 2CAN web page is only supposed to be accessible when attached to CRIO Ethernet Port 1, which is great for diagnostics, but is not allow in competition. 2CAN must be on port 2 in competition mode, it's a clearly defined rule. The only question that comes to mind is the 2CAN connect to the Wireless Bridge/AP. If it is, that could explain some the packet loss.

3)There is a "TRUSTED" mode that CAN uses for FIRST and it requires FIRST specific firmware on the JAGS. Level 92 I believe is the latest. It shouldn't have any negative impact on the FMS. We certainly didn't experience any through 9 qualifying rounds plus QF's, SF's and Finals.

4 & 5) I haven't seen or experienced either of those.

6) I took no chances and FTP'd the latest 2CAN driver to the CRIO after I installed V28.

WizenedEE 08-03-2011 02:39

Re: Unexplained intermittent CAN / 2CAN Jaguar problems at GSR
 
Quote:

Originally Posted by Phalanx (Post 1036301)
2CAN must be on port 2 in competition mode, it's a clearly defined rule.

Source Please? I'm looking here:

<R59> If CAN-bus communications are used, the CAN-bus must be connected to the cRIO-FRC through either the Ethernet network connected to Port 1, Port 2, or the DB-9 RS-232 port connection.

And

<R50> Connections to the cRIO-FRC Ethernet ports must be compliant with the following parameters:
A. The DAP-1522 radio is connected to the cRIO-FRC Ethernet port 1 (either directly or via a CAT5 Ethernet pigtail).
B. Ethernet-connected COTS devices or custom circuits may connect to either cRIO-FRC Ethernet port; however, these devices may not transmit or receive UDP packets using ports 1100-1200 except for ports 1130 and 1140.

That seems to be two conflicting rules, but the 2CAN would probably count as a "pigtail"

Also, the 2CAN must be connected to the PD board by a dedicated 20 A breaker, ie not on the end part.

<R39> F. Custom circuits and sensors powered via the cRIO-FRC or the Digital Sidecar are protected by the breaker on the circuit(s) supplying those devices. Power feeds to all other custom circuits must be protected with a dedicated 20-amp circuit breaker on the PD Board.

Hooray for rules quotations. I am confused about in rule <R59> they say the radio may be connected directly OR with a pigtail. Them female to female connections, eh?

(Yeah, yeah I know, a pigtail is a male-female...)

Andy A. 08-03-2011 04:30

Re: Unexplained intermittent CAN / 2CAN Jaguar problems at GSR
 
As another data point-

95 also had CAN bus issues at GSR, as did several other teams. We used all black jags and the serial port. It was enough of an issue that FIRST personal polled CAN teams and advised them that they couldn't say what was going on, but that it seemed like CAN wasn't always working. We elected to switch to PWM control (which also brought a half dead sidecar to our attention).

The last I had heard was that a common factor in all this was that the team was using Java as their programming language. Beyond that, there didn't seem to be any commonality. I heard some accusations of it only occurring on one driver station, or side of the field, but I can't confirm if that was true or not.

heydowns 08-03-2011 11:09

Re: Unexplained intermittent CAN / 2CAN Jaguar problems at GSR
 
1511 also saw similar failures during our time at FLR this past weekend. Fortunately for us this never manifested out on the playing field, only when testing in our pit.

I was able to attach the Windriver debugger a few times when we caught the error and have some technical details to compile and share with the developers (on my list of stuff to do hopefully today).

It appears to be almost certainly a race condition during startup whereby when the failure is tripped a task initiated for CAN handling is terminating due to following a bad pointer.

For completeness, our setup is serial/black jag based, all black jags in chain. C++ with latest (2/16) update, cRIO with image v28.

Phalanx 08-03-2011 11:36

Re: Unexplained intermittent CAN / 2CAN Jaguar problems at GSR
 
Quote:

Originally Posted by WizenedEE (Post 1036391)
Source Please?

http://www.usfirst.org/uploadedFiles...%20%281%29.pdf

I also remember reading it somewhere else, like FIRST Forums, or a Team Update, but I can't remember exactly where. If/when I find/remember I'll update this.

Quote:

Originally Posted by WizenedEE (Post 1036391)
<R39> F. Custom circuits and sensors powered via the cRIO-FRC or the Digital Sidecar are protected by the breaker on the circuit(s) supplying those devices. Power feeds to all other custom circuits must be protected with a dedicated 20-amp circuit breaker on the PD Board.

I guess I'll need to move the power from direct to the PD on a 20amp.

Quote:

Originally Posted by Andy A. (Post 1036398)
We used all black jags and the serial port.
The last I had heard was that a common factor in all this was that the team was using Java as their programming language. Beyond that, there didn't seem to be any commonality. I heard some accusations of it only occurring on one driver station, or side of the field, but I can't confirm if that was true or not.

We are using Labview

We intentionally chose to not use the serial interface for the following reasons:
128Kbps transfer rate over serial versus 1MB transfer rate using 2CAN. (CAN is typically a 1MB network)
I also noticed that under heavy processing load the serial interface on the CRIO would lag, or appear to garble some data, etc.. So we opted to avoid that potential pitfall.

boomergeek 08-03-2011 11:54

Re: Unexplained intermittent CAN / 2CAN Jaguar problems at GSR
 
Add Team 241 to the list of teams at GSR with CAN/2CAN problems.

But we were using C++. We did not ask around early enough and did not get the word that it was a general problem. We did switch our robot over to PWM before it was crated for Boston.

http://www.chiefdelphi.com/forums/sh...13&postcount=3

techhelpbb 08-03-2011 12:20

Re: Unexplained intermittent CAN / 2CAN Jaguar problems at GSR
 
Just call me curious...

Everyone that has the Jaguar problem on the CAN bus, please consider telling us:

Have you put capacitors on the backs of your motors to dampen the noise?

Have you considered putting capacitors across the encoder power wires?

It's possible that noise from your electrical sources and the brush motors attached to the motor side of the Jaguars is causing issues for the CAN bus.

We've noticed this issue as well, but we have 2 solutions. We detect the failure and force a reset. We are trying to get rid of all the noise that could cause things like this to happen.

Mike Copioli 08-03-2011 12:37

Re: Unexplained intermittent CAN / 2CAN Jaguar problems at GSR
 
We have also experienced the -44087 error along with the same host of symptoms, It happened once during a practice match and never during an actual match. We of course are using a 2CAN and have only seen the issue when it is used in FRC (cRIO, FRC JAG firmware 92). We have not seen the issue when the 2CAN is being used in the Crosslink Control System. It does seem to be a race condition based on the observed behavior and difficulty reproducing the failure. I have reported the problem to Omar and he has been unable to reproduce the issue. The fact that a soft reboot of the cRIO fixes the issue, and the problem has been seen in both the serial and Ethernet gateways, leads me to believe the problem is above the gateway level. We will continue to test using the 2CAN and advise if there are any bugs found on the side of the 2CAN.

Hugh Meyer 08-03-2011 14:54

Re: Unexplained intermittent CAN / 2CAN Jaguar problems at GSR
 
Last year we experienced several issues using CAN and lag was one of them.

We connected an oscilloscope to two digital outputs. One scope channel was controlled by software code that turned on at the start and off at the end of our periodic loop. The other channel was driven by a toggle function at the beginning of our code. This helped to define the state and made a good trigger signal.

We found that our code was taking longer than our periodic time. Not a good situation. The problem occurred because of the CAN communications. We were driving 8 Jaguars and polling for just about all the data we could, so we could log the data.

As a result we multiplexed the data out over 4 periods. This reduced our code run time down to about 55 milliseconds. We run our periodic loop at 100 milliseconds.

This arrangement has been institutionalized this year with a special cable that connects directly to the break out board and scope. We regularly verify that the code is running under the period.

A change we made from last year was ground loop isolation. Remove the ground wire in the CAN cable. This is a direct ground loop mess. If you are using one encoder to drive two Jaguars you will need optical isolation to remove the ground loop.

Another change is that we added filtering on the Jaguar at the encoder connector. We were seeing resets on the Jaguars and the filter seems to be the solution. Resets will cause all kinds of problems.

I have posts regarding these issues on other threads.

We are using a 2CAN and C++. The serial port baud rate of 115k was just not fast enough to transfer all the data we wanted to transfer.

We don’t have all of the issues fixed, but I wanted to share these in hopes that it will help others using CAN Jaguar.

Good luck!

-Hugh

jhersh 08-03-2011 15:03

Re: Unexplained intermittent CAN / 2CAN Jaguar problems at GSR
 
Thanks for the detailed write-up!

Quote:

Originally Posted by John Heden (Post 1036185)
1) How do folks recommend powering the actual 2CAN device ? There are references to ensuring the Dlink radio is powered by the boosted 12V (white Wago connector) to help avoid excessive voltage drops and possible problems. The 2CAN itself has a wide voltage range (6.5 to 28V) but should it be connected on the raw 12V fused power bus? Our autonomous start is probably the worst case power consumption where we simultaneously power up 3 big CIM motors to near full power.

Powered from a 20A fused PD output.

Quote:

Originally Posted by John Heden (Post 1036185)
2) Simultaneously accessing the 2CAN webpage while our robot code is simultaneously executing does result in a number of our robot CAN transactions failing (~ 1/1000). This is more of an annoyance for us than a problem but is suggestive of some type of system vulnerability when the 2CAN is being accessed from multiple sources. The WPI libraries seem to have access protection but the 2CAN itself may have some issue when it is being accessed by both the CRIO robot code and a browser on the2CAN page.

I'm not sure how this is handled in the 2CAN. Mike or Omar, can you please comment?

Quote:

Originally Posted by John Heden (Post 1036185)
3) There is some mention that the FRC system does send additional CAN related messages to CAN based robots, can anybody confirm this possibility ? Could there be some type of negative interaction with the FMS system?

This is inaccurate. The FMS has no knowledge of the CAN bus. I'm not aware of anything at the application layer that would cause any interaction between CAN and FMS.

Quote:

Originally Posted by John Heden (Post 1036185)
4) I did speak to a few CAN Jaguar users who spoke of possible overloading the CAN bus and needing to drop their access intervals down. The current WPI library interfaces, however, look like simple blocking calls where all access is nicely serialized and completes before the next sequence is initiated. While all of these calls may eventually slow down our control loops, it seems unlikely that we could overload the bus.

This is generally accurate. There was an issue in beta where the Jaguar token negotiation could be starved out by continuous traffic from the user application, but that was worked around.

Quote:

Originally Posted by John Heden (Post 1036185)
5) We do see some type of “too much error data!” message once in a while we were playing but could not locate this error message anywhere in our code or in the WPI libraries. Any idea as to the source of this unique message?

That message comes from the error reporting to the driver station. If too many errors are generated in a short time, they won't all fit in the fixed sized packets to the driver station, and will result in that message.

Quote:

Originally Posted by John Heden (Post 1036185)
6) There is a rev 66 1/29/2011 for FRC plug-in update for FRC posted on the Cross the Road website. We bagged our robot before I could confirm what version of 2CAN plug-in is incorporated in the 2/11/2011 V28 CRIO update? Is there a need for manually installing this file or is the latest included with the V28 update ?

Revision 66 of the 2CAN plug-in is included in v28.

Please let me know if you come up with any more details that might lead to the issue.

Thanks,
-Joe

jhersh 08-03-2011 15:14

Re: Unexplained intermittent CAN / 2CAN Jaguar problems at GSR
 
Quote:

Originally Posted by WizenedEE (Post 1036391)
Source Please? I'm looking here:

<R59> If CAN-bus communications are used, the CAN-bus must be connected to the cRIO-FRC through either the Ethernet network connected to Port 1, Port 2, or the DB-9 RS-232 port connection.

And

<R50> Connections to the cRIO-FRC Ethernet ports must be compliant with the following parameters:
A. The DAP-1522 radio is connected to the cRIO-FRC Ethernet port 1 (either directly or via a CAT5 Ethernet pigtail).
B. Ethernet-connected COTS devices or custom circuits may connect to either cRIO-FRC Ethernet port; however, these devices may not transmit or receive UDP packets using ports 1100-1200 except for ports 1130 and 1140.

That seems to be two conflicting rules, but the 2CAN would probably count as a "pigtail"

I don't see these as conflicting. A pigtail is simply a wire, not an active component like the 2CAN. The 2CAN does not count as a pigtail. It still doesn't matter, however, since you can plug the 2CAN into one of the other ports on the DAP-1522 switch, not in between the radio and the cRIO.

Quote:

Originally Posted by WizenedEE (Post 1036391)
Hooray for rules quotations. I am confused about in rule <R59> they say the radio may be connected directly OR with a pigtail. Them female to female connections, eh?

(Yeah, yeah I know, a pigtail is a male-female...)

The idea of the pigtail for this application is obsolete and probably should be removed from the rules. Instead of ever unplugging your cRIO from your radio, you should instead have a pigtail from one of the other ports on the DAP-1522 that you use to tether to your DS in the pits. This means you will never forget to plug the radio back in and miss a match (or delay it if your FTA is especially generous).

-Joe

Mark McLeod 08-03-2011 15:18

Re: Unexplained intermittent CAN / 2CAN Jaguar problems at GSR
 
Quote:

Originally Posted by jhersh (Post 1036576)
The idea of the pigtail for this application is obsolete and probably should be removed from the rules. Instead of ever unplugging your cRIO from your radio, you should instead have a pigtail from one of the other ports on the DAP-1522 that you use to tether to your DS in the pits. This means you will never forget to plug the radio back in and miss a match (or delay it if your FTA is especially generous).

Not quite true, although in principle I agree.

On the practice field teams must unplug their DLink and replace it with the Practice field DLink.

jhersh 08-03-2011 15:23

Re: Unexplained intermittent CAN / 2CAN Jaguar problems at GSR
 
Quote:

Originally Posted by Mark McLeod (Post 1036579)
Not quite true, although in principle I agree.

On the practice field teams must unplug their DLink and replace it with the Practice field DLink.

I guess that's true... however if a team wanted to avoid that, they could simply have a spare power connection for the practice radio and connect it to the tether pigtail.

John Heden 08-03-2011 20:51

Re: Unexplained intermittent CAN / 2CAN Jaguar problems at GSR
 
I would like to thank everybody for their thoughts on our CAN issue and all the suggestions you have offered. Our team crafted a custom dashboard that saves all of our data to disk including any UDP NetConsole data that it happens to catch and I’ve started to work way through this information looking for possible clues. Our freeze in our first quarter final match did yield some interesting results that I’m still trying to fully understand but certainly looks like a complete loss of CAN integrity through some mechanism. The following is part of the recorded error sequence (it goes on quite a bit longer) that shows a total of 16 InitCANJaguar() calls. The only call I can find to InitCANJaguar, however, is in the constructor for CANJaguar() so I a bit perplexed at this point given we have only 5 CANJaguars. After this initialization like sequence there is a litany of getTransaction() errors before we eventually do a reset and regain control.

At this point I’m partial to the startup race condition theory of some type. I would also add that we do launch a status monitoring thread that does read information from CANJaguar at the end of our robot constructor well before the autonomous loop is initiated. This seems to work but I wonder if the 2CAN occasionally needs a bit more time to settle down before it is called upon for status and CAN transactions…

Thanks again,

John

Code>-44087 ERROR: status == -44087 (0xFFFF53C9) in getTransaction() in C:/windriver/workspace/WPILib/CANJaguar.cpp at line 425

<Code>-63194 ERROR: status == -63194 (0xFFFF0926) in InitCANJaguar() in C:/windriver/workspace/WPILib/CANJaguar.cpp at line 47

<Code>-44087 ERROR: status == -44087 (0xFFFF53C9) in getTransaction() in C:/windriver/workspace/WPILib/CANJaguar.cpp at line 425

<Code>-63194 ERROR: status == -63194 (0xFFFF0926) in InitCANJaguar() in C:/windriver/workspace/WPILib/CANJaguar.cpp at line 47

<Code>-44087 ERROR: status == -44087 (0xFFFF53C9) in getTransaction() in C:/windriver/workspace/WPILib/CANJaguar.cpp at line 425

<Code>-44087 ERROR: status == -44087 (0xFFFF53C9) in setTransaction() in C:/windriver/workspace/WPILib/CANJaguar.cpp at line 392

<Code>-44087 ERROR: status == -44087 (0xFFFF53C9) in setTransaction() in C:/windriver/workspace/WPILib/CANJaguar.cpp at line 392

<Code>-44087 ERROR: status == -44087 (0xFFFF53C9) in getTransaction() in C:/windriver/workspace/WPILib/CANJaguar.cpp at line 425

<Code>-63194 ERROR: status == -63194 (0xFFFF0926) in InitCANJaguar() in C:/windriver/workspace/WPILib/CANJaguar.cpp at line 47
Etc. etc. etc...

kamocat 08-03-2011 21:08

Re: Unexplained intermittent CAN / 2CAN Jaguar problems at GSR
 
Quote:

Originally Posted by Mark McLeod (Post 1036579)
Not quite true, although in principle I agree.

On the practice field teams must unplug their DLink and replace it with the Practice field DLink.

Are you saying that the D-link will broadcast a network, even in bridge mode?

Joe:
Was that issue with starving the Ack worked out? I haven't retested it.

Ken Streeter 08-03-2011 22:23

Re: Unexplained intermittent CAN / 2CAN Jaguar problems at GSR
 
Quote:

Originally Posted by Mark McLeod (Post 1036579)
Not quite true, although in principle I agree.

On the practice field teams must unplug their DLink and replace it with the Practice field DLink.

There is one other time that it is important for teams to unplug the cRIO from the DLink -- if you are in the finals and are tethering the robot to prepare it for the next match of the finals (for example, performing a system check or compressing air to have full tanks for the next match).

We ran into this problem at the Week Zero Scrimmage, so we were ready for the problem during the GSR finals. During the finals, the same teams are on the field in consecutive matches, so the field access point is still configured to communicate with the teams that were just on the field. Accordingly, as soon as the Driver Station is connected to the DLink, the DS enters the "FMS Connected" mode, forcing the robot into a disabled state and prohibiting "tethered" control.

If you find yourself in the final matches and need to tether the robot in between matches to add air to the tanks or perform any system checks, you'll want to connect the DS directly to the cRIO without going through the DLink, in order to avoid the FMS control.

jhersh 09-03-2011 03:45

Re: Unexplained intermittent CAN / 2CAN Jaguar problems at GSR
 
Quote:

Originally Posted by kamocat (Post 1036743)
Joe:
Was that issue with starving the Ack worked out? I haven't retested it.

Remind me what issue that was? Are you referring to the starved token resynchronization? If so, then yes, the v28 image (and several before that) include the work-around that restarts the token synchronous to the sendMessage call where the token stream is detected to be expired.

-Joe

Mark McLeod 09-03-2011 07:30

Re: Unexplained intermittent CAN / 2CAN Jaguar problems at GSR
 
Quote:

Originally Posted by Ken Streeter (Post 1036794)
There is one other time that it is important for teams to unplug the cRIO from the DLink -- if you are in the finals and are tethering the robot to prepare it for the next match of the finals (for example, performing a system check or compressing air to have full tanks for the next match).

That makes sense.
The field staff must leave the previous match teams up in FMS until the scores and penalties have been debated and submitted. I sometimes borrow one of the unoccupied player stations to test laptop link-ups in those moments (setting the laptop to one of the absent team #s).

Mike Copioli 09-03-2011 12:10

Re: Unexplained intermittent CAN / 2CAN Jaguar problems at GSR
 
Quote:

Originally Posted by jhersh (Post 1036572)

I'm not sure how this is handled in the 2CAN. Mike or Omar, can you please comment?

I will give you a preliminary answer until Omar has a chance to provide a more detailed one. In short the problem is caused by a lack of sychronicity between the cRIO CAN transactions and the 2CAN dashboard transactions. This is a simple explanation of the problem, it is actually a bit more involved as Omar has explained it. Omar has written some management code that is intended to deal with this problem, however the web dashes ability to interact with the CAN bus is second chair to the user code. If the user code is sending can throttle requests to frequently, for example, the time the web dash has to interact with the bus is limited. This is not an issue with the Cross-link Control System because the 2CAN performs all synchronization and has more of a 'master' role. But again this is Omars area of expertise.

jhersh 09-03-2011 12:35

Re: Unexplained intermittent CAN / 2CAN Jaguar problems at GSR
 
Quote:

Originally Posted by John Heden (Post 1036730)
At this point I’m partial to the startup race condition theory of some type.

Me too. I believe I've found and fixed the issue. We will be testing on a real field this evening and working on a plan for distributing the fix.

Please stay tuned.

-Joe

John Heden 09-03-2011 12:42

Re: Unexplained intermittent CAN / 2CAN Jaguar problems at GSR
 
It sounds like the experts are convergent on the problem and we remain hopeful for a robust CAN solution. For anybody following this CAN problem thread, there was an interesting statement in the March 8 (yesterday) Team Update #17:

Quote:

If a team is using a CAN network on the robot, they should check the messages in the “Diagnostic” tab of the Driver Station before a match starts to ensure that there aren’t any scrolling CAN timeouts. If there are such messages, give the MC a “thumbs down” to show you’re not ready and click on “Reboot Robot” to restart the cRIO and clear the errors. Teams will only see such timeout errors if it's properly handled in code, and they should take care to ensure that these exceptions are handled such that they can be seen on the field.
I’m going to try to convince our team that we should maintain our CAN implementation (not go back to PWM cables) but carefully monitor for this possible problem.

Thanks again,

john

Zme 09-03-2011 13:20

Re: Unexplained intermittent CAN / 2CAN Jaguar problems at GSR
 
our team wrote some code so that the robot would reboot itself automatically when/if it did not have communication with the can bus on boot, not the prettiest workaround but it works for what we needed.

techhelpbb 09-03-2011 13:27

Re: Unexplained intermittent CAN / 2CAN Jaguar problems at GSR
 
Quote:

Originally Posted by jhersh (Post 1037056)
Me too. I believe I've found and fixed the issue. We will be testing on a real field this evening and working on a plan for distributing the fix.

Please stay tuned.

-Joe

Does this solution also address the problem when it doesn't happen at startup?

We get CAN issues even when the system manages to come fully online.

jhersh 09-03-2011 14:02

Re: Unexplained intermittent CAN / 2CAN Jaguar problems at GSR
 
Quote:

Originally Posted by techhelpbb (Post 1037082)
Does this solution also address the problem when it doesn't happen at startup?

We get CAN issues even when the system manages to come fully online.

No... this is a start-up issue only. Can you describe as much about your setup and the behavior you see?

Thanks,
-Joe

techhelpbb 09-03-2011 14:17

Re: Unexplained intermittent CAN / 2CAN Jaguar problems at GSR
 
Quote:

Originally Posted by jhersh (Post 1037098)
No... this is a start-up issue only. Can you describe as much about your setup and the behavior you see?

Thanks,
-Joe

We usually don't have connectivity issues at startup. We'll be driving and all of a sudden we'll loose one or more Jaguars off the bus with timeout or communications errors (haven't looked at the debug information myself, relying on the student feedback for the specific details).

The odd part is that in many cases these Jaguars are performing similar tasks to other Jaguars in the system, so it's not something specific to their actuators.

The Jaguar will drop off the bus, it'll come back and won't respond to further adjustments until we soft boot it. Hence we detect when they fail like this and automatically force a soft boot at this point.

Our most specific issue with this has been on the drive system. We have 2 Jaguars per side connected to CIM motors in the CIMiple gear boxes. We split the encoders and they are 100% isolated from each other and we want to run PID to target a velocity setpoint (it works fine when we don't get timeouts), but even with potentiometers we've seen this (but those Jaguars are high ratio shallow pitch worm drives). It seems to happen less when we use CAN for Vbus and hence loose the external reference, but it still does happen.

Given that we'll be fine for protracted periods of time, then suddenly experience a timeout under hard driving conditions this is what leads me to believe we have some sort of noise issue at work. Obviously if there was a spike that reached logic level on the CAN bus it would cause issues as the CAN bus is basically unmodulated single ended open collector digital. However, when using PWM, the worst you'd get is a shorter than expected pulsewidth at a frequency that is possibly wrong unless you have a periodic source of interference (unlikely in this case at the normal center frequency this system uses). So basically PWM would be more noise immune mostly because the Jaguars aren't fast enough or powerful enough electrically to instantly decompose it and overcome the load's inertia in response.

For the most part, the biggest problem we've had at startup with the cRIO using JAVA has been the bridge Jaguar just outright failing or the bus being improperly terminated. We had one Jaguar that just literally up and died once we turned it off. It was raining that day, we thought maybe water got into it somehow, but I inspected it and it was dry.

Zme 09-03-2011 14:50

Re: Unexplained intermittent CAN / 2CAN Jaguar problems at GSR
 
i can provide a little more information on something that seems to have similar behavior to this problem.
We are using c++ and have had similar issues with jaguars suddenly no longer responding to commands when using a closed loop control mode.
We noticed that it generally happened when there was a fault on the jaguar (current, voltage etc). when this happened the jag stopped working for whatever reason. it was thought that perhaps the fault caused the heartbeat to time out for some reason and therefore the jag would no longer respond to commands.
the fix we tried was running back through the initialization of the jaguar, (setting pid's and enabling control) whenever we detected a fault, this seemed to alleviate the problem but didn't catch everything. we then put a button on the joystick that would run through the re-init and while it didn't solve the problem it made things bearable.

jhersh 09-03-2011 14:50

Re: Unexplained intermittent CAN / 2CAN Jaguar problems at GSR
 
Quote:

Originally Posted by techhelpbb (Post 1037101)
We usually don't have connectivity issues at startup. We'll be driving and all of a sudden we'll loose one or more Jaguars off the bus with timeout or communications errors (haven't looked at the debug information myself, relying on the student feedback for the specific details).

The odd part is that in many cases these Jaguars are performing similar tasks to other Jaguars in the system, so it's not something specific to their actuators.

The Jaguar will drop off the bus, it'll come back and won't respond to further adjustments until we soft boot it. Hence we detect when they fail like this and automatically force a soft boot at this point.

Our most specific issue with this has been on the drive system. We have 2 Jaguars per side connected to CIM motors in the CIMiple gear boxes. We split the encoders and they are 100% isolated from each other and we want to run PID to target a velocity setpoint (it works fine when we don't get timeouts), but even with potentiometers we've seen this (but those Jaguars are high ratio shallow pitch worm drives). It seems to happen less when we use CAN for Vbus and hence loose the external reference, but it still does happen.

Given that we'll be fine for protracted periods of time, then suddenly experience a timeout under hard driving conditions this is what leads me to believe we have some sort of noise issue at work. Obviously if there was a spike that reached logic level on the CAN bus it would cause issues as the CAN bus is basically unmodulated single ended open collector digital. However, when using PWM, the worst you'd get is a shorter than expected pulsewidth at a frequency that is possibly wrong unless you have a periodic source of interference (unlikely in this case at the normal center frequency this system uses). So basically PWM would be more noise immune mostly because the Jaguars aren't fast enough or powerful enough electrically to instantly decompose it and overcome the load's inertia in response.

For the most part, the biggest problem we've had at startup with the cRIO using JAVA has been the bridge Jaguar just outright failing or the bus being improperly terminated. We had one Jaguar that just literally up and died once we turned it off. It was raining that day, we thought maybe water got into it somehow, but I inspected it and it was dry.

Are you certain you aren't tripping a breaker or browning out the Jaguars under high load (due to poor wiring to the power input terminals of the Jaguar)? What you described are all symptoms of the Jag rebooting.

-Joe

jhersh 09-03-2011 14:53

Re: Unexplained intermittent CAN / 2CAN Jaguar problems at GSR
 
Quote:

Originally Posted by Zme (Post 1037115)
i can provide a little more information on something that seems to have similar behavior to this problem.
We are using c++ and have had similar issues with jaguars suddenly no longer responding to commands when using a closed loop control mode.
We noticed that it generally happened when there was a fault on the jaguar (current, voltage etc). when this happened the jag stopped working for whatever reason. it was thought that perhaps the fault caused the heartbeat to time out for some reason and therefore the jag would no longer respond to commands.
the fix we tried was running back through the initialization of the jaguar, (setting pid's and enabling control) whenever we detected a fault, this seemed to alleviate the problem but didn't catch everything. we then put a button on the joystick that would run through the re-init and while it didn't solve the problem it made things bearable.

This also sounds like your Jags are rebooting due to a brown out of the input voltage. If you brown out the Jag or trip a breaker, the closed-loop settings are lost. You will have to re-initialize them (as you are) to recover the closed-loop control.

-Joe

Zme 09-03-2011 14:56

Re: Unexplained intermittent CAN / 2CAN Jaguar problems at GSR
 
i can tell you that it is not loosing power as there were outputs added to the code that would trigger when the GetPowerCycled() function returned true. a very early version of our code actually used both faults and power cycles as a trigger to re-init but it was taken out later, why it was taken out i don't know.

techhelpbb 09-03-2011 15:02

Re: Unexplained intermittent CAN / 2CAN Jaguar problems at GSR
 
Quote:

Originally Posted by jhersh (Post 1037116)
Are you certain you aren't tripping a breaker or browning out the Jaguars under high load (due to poor wiring to the power input terminals of the Jaguar)? What you described are all symptoms of the Jag rebooting.

-Joe

I can be certain we didn't blow any fuses or trip any breaker.

It's possible, I suppose that the load is too great and we are somehow browning out, we did consider that and try fresh batteries.

We don't seem to be consistently dragging down the batteries which would loosely indicate we aren't being overly aggressive in a general sense.

I can certainly try putting a datalogger on the power supply side of the Jaguar and see what I can like that. Not sure I have a current probe I can use in that configuration on 12VDC and a shunt is not a good idea given the load impedance.

The wires are clearly of sufficient gauge...I didn't wire that part of the system but it's better than 12AWG (so it's not 12AWG), and the problem doesn't seem to follow any particular wire so it's not likely the crimps. We've crimped plenty of other wires in the past with these tools and this make and brand of lug and they are firmly screwed down.

I could try putting a large 25V or 50V capacitor near the Jaguar on the input side to provide additional low internal resistance power if somehow the impedance of the wires from the power distribution board was suspect. It may not be legal on the competition floor, but for troubleshooting I think the battery will handle that.

Mr. Lim 09-03-2011 15:19

Re: Unexplained intermittent CAN / 2CAN Jaguar problems at GSR
 
We are also seeing an occasional CANTimeoutException being thrown under brownout conditions.

Strangely, it is one Black Jaguar in particular that is always the one that goes first. It is connected to a CIM in our 4-motor drivetrain, and it is a "master" Jaguar, in that we have an encoder hooked up to it, and run it under speed control.

Under a fresh, full battery, the exceptions don't occur, but within a 2 minute match, the voltage certainly drops enough to start throwing these exceptions. We feel it is voltage related, as when we are in low gear, the exceptions never occur, whereas in high gear, they happen very frequently.

Our workaround so far has been to catch the exception, and re-initialize the Jaguars when they occur.

These seems to work decently well.

We've gone through the wiring several times, and can't seem to isolate why one Jaguar is more prone than the others. Less tolerant Jaguar? A CIM that likes to draw more current than the others? Your best guess is as good as mine...

techhelpbb 09-03-2011 15:24

Re: Unexplained intermittent CAN / 2CAN Jaguar problems at GSR
 
Quote:

Originally Posted by Mr. Lim (Post 1037132)
We are also seeing an occasional CANTimeoutException being thrown under brownout conditions.

Strangely, it is one Black Jaguar in particular that is always the one that goes first. It is connected to a CIM in our 4-motor drivetrain, and it is a "master" Jaguar, in that we have an encoder hooked up to it, and run it under speed control.

Under a fresh, full battery, the exceptions don't occur, but within a 2 minute match, the voltage certainly drops enough to start throwing these exceptions. We feel it is voltage related, as when we are in low gear, the exceptions never occur, whereas in high gear, they happen very frequently.

Our workaround so far has been to catch the exception, and re-initialize the Jaguars when they occur.

These seems to work decently well.

We've gone through the wiring several times, and can't seem to isolate why one Jaguar is more prone than the others. Less tolerant Jaguar? A CIM that likes to draw more current than the others? Your best guess is as good as mine...

I can say for certain we had one CIM in our pile of parts that had some issues with brushes. No matter what we did with that CIM it was trouble everywhere it went. It sort of worked on a Victor 884, but every time it touched a Jaguar of either model you'd either fault on startup or you'd get moving and shortly there after strange and bad things would happen.

We discovered it early on. I thought we disposed of it, or at very least marked it. Suddenly it 'reappeared' and made for another 2 or 3 hours of troubleshooting. I believe one of the other mentors showed it to the garbage can when I wasn't around so that it couldn't find it's way back into anything important again.

I can also say that one side of our robot drive train exhibits higher inertia than the other. While messing around with encoders before we isolated them that side was by far more likely to really have serious show stopping issues than the other which for the most part is the same exact design. Once we made the isolation board it stopped misbehaving and for the most part accepted the same PID tuning parameters for a setpoint of velocity (you could at least start tweaking the values from the side with less inertia and you'd be close). It drove nice and straight once we isolated the encoders until we started getting the CAN bus issues and then, after we trapped that error, it was good again. It doesn't seem to me that this particular part of our robot is any less well built than the other. We didn't, after all, fabricate the gear boxes themselves, and cursory examination doesn't indicate any obvious mechanical issue. However, it seems just a slight variation in the mechanism and even if we take the Jaguars from the side with less inertia they seem to have issues on the side with more inertia. Now I do grant the reader that in swapping the Jaguars we are using the wiring from the side of the robot that has problems and that might be why that side keeps being slightly more prone to problems. The wiring just does not itself seem bad.

As soon as I get a chance I'll rig a datalogger up on the robot while it moves. I'd really like to see what the power supply side actually looks like when this goes on.

John Heden 11-03-2011 11:49

Re: Unexplained intermittent CAN / 2CAN Jaguar problems at GSR
 
Quote:

Originally Posted by John Heden
At this point I’m partial to the startup race condition theory of some type.
Quote:

Me too. I believe I've found and fixed the issue. We will be testing on a real field this evening and working on a plan for distributing the fix.

Please stay tuned.

-Joe

Greetings,

I was curious as to whether your testing was succesful and whether we have a fix at this point ? Would it be a fix on the FMS side or a new CRIO image ?

Thanks,

John

jhersh 11-03-2011 11:54

Re: Unexplained intermittent CAN / 2CAN Jaguar problems at GSR
 
Quote:

Originally Posted by John Heden (Post 1037860)
Greetings,

I was curious as to whether your testing was succesful and whether we have a fix at this point ? Would it be a fix on the FMS side or a new CRIO image ?

Thanks,

John

It has nothing to do with the FMS. We are testing a new image that should fix it. We have had no failures yet. However, FIRST wants more testing before making it public.

Are you aware of any teams having trouble at any regionals using v28 this weekend?

-Joe

ozrien 11-03-2011 12:56

Re: Unexplained intermittent CAN / 2CAN Jaguar problems at GSR
 
Quote:

Originally Posted by Mike Copioli (Post 1037036)
I will give you a preliminary answer until Omar has a chance to provide a more detailed one. In short the problem is caused by a lack of sychronicity between the cRIO CAN transactions and the 2CAN dashboard transactions. This is a simple explanation of the problem, it is actually a bit more involved as Omar has explained it. Omar has written some management code that is intended to deal with this problem, however the web dashes ability to interact with the CAN bus is second chair to the user code. If the user code is sending can throttle requests to frequently, for example, the time the web dash has to interact with the bus is limited. This is not an issue with the Cross-link Control System because the 2CAN performs all synchronization and has more of a 'master' role. But again this is Omars area of expertise.

The 2CAN keeps track of outstanding transactions from the cRIO and from the Web dash. The 2can knows which CAN ids represent requests that will have ACK responses and keeps track of when the cRIO or WD is waiting on an ack.
So if WD wants to request say bus voltage on Jag1 it first checks if cRIO has performed any requests (resync tokenization, throttle set, etc..). If cRIO has then WD will hold off on transactions to Jag1 until Jag1 sends an ACK (intended to go to cRIO).

cRIO Set Throttle
<--------------------------User opens Web Dash, WD holds off...
cRIO ACK
WB Get Voltage <--------------- Jag sends an ACK, it is now available
WB ACK

Similarly if the WD is waiting on Jag1 for a response/ACK and cRIO request comes in, it will delay transmission until Jag1 responds, OR a one millisecond timeout occurs (to ensure cRIO gets minimal latency to bus ).

Without this management code there would be problems where...
cRIO Sets Throttle
WB Gets Voltage <--------------------------User opens Web Dash,
ACK
ACK
...which confuses the Jag and sometimes causes Jag to not respond at all.

John Heden 12-03-2011 14:27

Re: Unexplained intermittent CAN / 2CAN Jaguar problems at GSR
 
Quote:

Originally Posted by jhersh (Post 1037862)
It has nothing to do with the FMS. We are testing a new image that should fix it. We have had no failures yet. However, FIRST wants more testing before making it public.

Are you aware of any teams having trouble at any regionals using v28 this weekend?

-Joe

Thanks Joe!

No, I'm not aware of any other teams reporting CAN startup problems with V28 from this weekend. Perhaps the Team Update #17 suggestion to monitor for CAN errors was effective (or perhaps scared a few more teams into a PWM conversion). Maybe someone at this weekend’s competition might add their observations on this issue…

ozrien 13-03-2011 06:11

Re: Unexplained intermittent CAN / 2CAN Jaguar problems at GSR
 
New 2CAN firmware 2.5 available
http://www.crosstheroadelectronics.com/2CAN.htm

I found and fixed a few issues that could be related...
Major Fixes....
-Fixed a potential situation where 2can stops sending udp frames - only happens on higher Ethernet utilizations.
-There is an improvement to lagging Web dash browsing while robot is enabled.
See release notes for details on all fixes.

There is no FRC legal obligation to update from 2.1 to 2.5. Both function with the FRC_2CANPlugin (SVN rev66).

PhilBot 13-03-2011 11:26

A different Serial CAN problem.
 
I’d like to throw a totally unreported problem into the CAN problem bucket.

This relates to closed-loop potentiometer-based arm control.

This happened to our team after the 2’nd to last qualifying match at the Pittsburgh regional this weekend.

Setup: We use 4 Jaguars running on Serial CAN. The first 2 “drive” Jags are run open loop voltage control (no encoders) but the last 2 “Arm” Jags are switched between open loop voltage control (for manual adjustment), and Potentiometer based Closed loop control (for recalling pre-learned arm positions). A single VI is used to do this switch by issuing new sets of setup commands each time the mode is changed.

We also have two VI’s in “Periodic Tasks” that do a “read back” of the current arm position for diagnostic purposes and to use as inputs to LEARN arm position presets. This has been working seamlessly for the last week of build, and the first 2 ½ days of competition. Note: This read-back isn’t needed for normal operation because the Jags are normally sent “Goto” positions and left to do their own closed loop control.

Just before the last match of Quals, we broke an arm chain while trying to adjust a position reset (using our tried and true “Learn” mode). The root cause for the mechanical failure turned out to be the fact that for some reason both Jags were reporting arm positions of “0.0”, instead of the actual arm position (so the learn process was recording bogus information).

We hadn’t made any software or hardware changes for several matches, and we tried multiple reboots, measured pot voltages, checked wiring etc., but couldn’t get the jags to read proper values. It was odd that both Jags were reading zero position although they had little in common physically (other than the CAN bus and the software). There were no more than the usual occasional drive Jag CAN errors on the diag screen. Eventually we loaded up the code in debug mode to see exactly what was coming out of the Arm Jags. To our surprise both JAGS were reporting no errors and positions of 0.0. We did a Highlight Execution to verify that the reads were being done… they were, but returning no valid data.

Through this entire operation, the Arm JAGs were still responding to voltage output commands and could drive the arms to any position. They just would not return valid “position” status on request.
We were stumped as to what to try next. If I’d had my USB to Serial adapter I would have reloaded the JAG code or run the COMM program (bad me for forgetting to bring that).

The very odd thing was the problem corrected itself as if by magic after an hour of debugging when we loaded some new arm presets onto the CRIO flash drive programmatically. I can’t explain this in detail as it would require a full understanding of our code, but suffice it to say that this change should not have changed the JAG operation in any tangible way…. all it would have done is sent more reasonable “value” into the Drive VI as pot setpoints.

Once things started working again, we just rebooted back to the original code (that had never been unloaded) and redid the presets using the normal “learn” mode and everything seems to be back normal (except my future failure fears)

Has anyone seen this reluctance of the JAGs to return pot position, especially once they have been setup correctly?

jhersh 13-03-2011 13:51

Re: A different Serial CAN problem.
 
Quote:

Originally Posted by PhilBot (Post 1038499)
Has anyone seen this reluctance of the JAGs to return pot position, especially once they have been setup correctly?

Hi Phil,

That can happen if a Jag reboots or have a communication problem when you are configuring the Jag Position reference. I'm guessing if you had set the position reference again, you would have seen it fixed. Maybe your jags were browning out due to a low battery?

-Joe

PhilBot 13-03-2011 14:11

Re: A different Serial CAN problem.
 
Quote:

Originally Posted by jhersh (Post 1038565)
Hi Phil,

That can happen if a Jag reboots or have a communication problem when you are configuring the Jag Position reference. I'm guessing if you had set the position reference again, you would have seen it fixed. Maybe your jags were browning out due to a low battery?

-Joe

None of the above.

When the problem first occured we changed batteries and re-booted several times. Changed batteries again later after matches were over. No effect.

The problem is that this "0.0 postion" error persisted for many reboots, debug download etc. and then just returned to normal and hour later.

As for the code...

The Jags are switched between "Voltage out" and "Position mode" each time a preset recall button is pressed (and held). Before the new mode parameters are loaded, the JAGs are disabled, and then re-enabled at the end. All of the JAG parameters (control mode, position source, pot turns, PI gains etc) are re-loaded, and the code checks for any errors and re-tries the command sequence 3 times if errors persist.

Note: We'd run into the occasional bad-command-write problem several days before shipping and had built in the retries. So it's pretty bullet proof.

Do you think accidentally sending the JAG an invalid "position value" (eg: a negative value for target position) could lock up the controller????

Phil.

techhelpbb 14-03-2011 07:36

Re: Unexplained intermittent CAN / 2CAN Jaguar problems at GSR
 
Adding to my list of troubleshooting tools that I guess I'll be fabricating...hopefully I can keep enough students engaged at this point...

It's far too easy to think that you have a brown out condition on this robot.

The power source is limited and the surge currents for the CIMs can exceed 120A each, with each robot having generally at least 2.

Then add the fact that the platform moves...and the brownouts could be *very* short durations and I have this picture in my head of an oscilloscope being dragged behind a robot on a wheeled cart like hospital equipment. :eek: Just never let us mind how many channels you might need in a system like this!

Luckily a whole digital sampling oscilloscope can be implemented in an FPGA up to 100 million samples per second at 8 bits and plug into a USB port (depending on what and how you trigger, and your samples per period required, this is at best a 50MHz logic analyzer at Nyquist rate/frequency or maybe a 5-10MHz oscilloscope...I assume there you want to reconstruct a waveform with at least 10 points-20 points of sampling per cycle at minimum which means that you need more than the Nyquist rate/frequency which just eliminates aliasing). Luckily F.I.R.S.T. is starting to let us put laptops on the robot. However, that's still at least 2 pounds, probes, and some current measuring apparatus.

I have a better idea. Let's call it a brown out detector circuit. It's analog (so it's not subject to sampling issues...just integration and when in doubt there's some pretty darn fast operational amplifiers around these days). It'll not need to store all the data either, so the robot won't have to cart around something to fill up...just so you can analyze only what you see when you get it off there. Plus this way you'll get instant feedback. Too bad I don't have a source for small DVBST tubes (how an analog storage oscilloscope works).

When we're done with this, I'll put the information out there for everyone as well. I hope such a thing will make the whole issue of a power brown out as simple as looking at the LED or small LED backlit LCD. I can't see people forking over a few hundred bucks for a digital sampling oscilloscope and possibly 2x-5x more than that for a good inductive current probe. This will be far more economical and mobile.

Mike Copioli 14-03-2011 10:25

Re: Unexplained intermittent CAN / 2CAN Jaguar problems at GSR
 
Quote:

Originally Posted by techhelpbb (Post 1039146)
I hope such a thing will make the whole issue of a power brown out as simple as looking at the LED or small LED backlit LCD. I can't see people forking over a few hundred bucks for a digital sampling oscilloscope and possibly 2x-5x more than that for a good inductive current probe. This will be far more economical and mobile.

This is a good idea, maybe it is something we could make. One thing to consider is the fact that brown out is device dependent and not the same for each piece of hardware. So what are you trying to detect brown out on comes to question. The CAN Jaguar class has a method that allows you to determine if power has been cycled since its last call, GetPowerCycled (). This may be something that you could use to rule out Jaguar Brown out. Since the FRC control system has several voltages, 5v, 12v, 12v boost, 24v boost it is possible for only one device to brown out based on its location, wiring, and individual power requirements.

Your problem does not sound like brown out to me. Having said that I can state the a Jag will go into a non-responsive state if you bridge CANH to CANL several times or at just the right time, only a power cycle will fix this problem. This suggests either a problem with the Jag firmware or an issue with the Jaguar CAN controller itself. I have also observed Jags becoming non-responsive after a brown out causing the entire CAN bus to crash. This should not happen.

We try our best to make CAN a high performance option for teams that choose to use it. We test our products vigorously to ensure that they meet the expectations of FIRST teams and the CAN spec. Sometime there are changes made to the system that force us to retest and and re-validate. Unfortunately there is a lack of transparency that we are subject to that makes validation difficult. This lack of transparency causes us to spend most of our time testing when we could be adding new features. In a perfect world we would have full access to all of the things that affect CAN, this is not the case. I am confidant in saying that these issues that teams are seeing are not caused by the 2CAN. I do hope that they are resolved and want teams, FIRST, NI and TI to know that we are willing to do what ever is required to help find solutions to these issues.

kamocat 14-03-2011 10:35

Re: Unexplained intermittent CAN / 2CAN Jaguar problems at GSR
 
What about measuring the thing that actually causes the brownout?
The 3.3v power supply on the Jaguar. What voltage does the processor brown out at? 2.7v?
You can use an interrupt on an analog trigger to wait until a brownout occurs.
(You can get the 3.3v off of the brake/coast pins. If you're using CAN, you can override that jumper anyways)

Mike Copioli 14-03-2011 10:59

Re: Unexplained intermittent CAN / 2CAN Jaguar problems at GSR
 
Quote:

Originally Posted by kamocat (Post 1039239)
What about measuring the thing that actually causes the brownout?
The 3.3v power supply on the Jaguar. What voltage does the processor brown out at? 2.7v?
You can use an interrupt on an analog trigger to wait until a brownout occurs.
(You can get the 3.3v off of the brake/coast pins. If you're using CAN, you can override that jumper anyways)

The brown out on the Jag is around 4.8 volts based on the TPS54040 datasheet. The Jag uses a 5 volt buck in series with a 3.3 v regulator. This is the rated lockout for that regulator, most likely it is above 5 volts. Also the CAN controller and RS232 convertor both operate on 5V volts not 3.3v.

The Lucas 14-03-2011 11:18

Re: Unexplained intermittent CAN / 2CAN Jaguar problems at GSR
 
Quote:

Originally Posted by Mike Copioli (Post 1039231)
This is a good idea, maybe it is something we could make. One thing to consider is the fact that brown out is device dependent and not the same for each piece of hardware. So what are you trying to detect brown out on comes to question. The CAN Jaguar class has a method that allows you to determine if power has been cycled since its last call, GetPowerCycled (). This may be something that you could use to rule out Jaguar Brown out.

In addition to GetPowerCycled(), would it be useful to call GetFaults() (available in C++) and check for kBusVoltageFault to determine Brown Out conditions. I haven't tried reading the CAN messages, but I think it is possible to enter this fault mode without dropping to a full power cycle voltage (I have seen this indicated with the red flashing LEDs across all Jags on previous robots). I don't know if this fault mode affects sensors and closed-loop control, but it is worth a look.

Mike, can the 2CAN run on low enough voltage to log or display errors like the the Bus Voltage fault via web interface? I unfortunately don't have a 2CAN (yet) to test these things.

techhelpbb 14-03-2011 13:00

Re: Unexplained intermittent CAN / 2CAN Jaguar problems at GSR
 
Eh...I'm somehow repeating myself?

Epic atomic transaction failure.

techhelpbb 14-03-2011 13:04

Re: Unexplained intermittent CAN / 2CAN Jaguar problems at GSR
 
Quote:

Originally Posted by Mike Copioli (Post 1039231)
Since the FRC control system has several voltages, 5v, 12v, 12v boost, 24v boost it is possible for only one device to brown out based on its location, wiring, and individual power requirements.

The idea in this case would be to rig the comparator to have either various references, possibilities include:

1. A unity gain buffered non-forced voltage out D/A converter (most voltage out D/A converters are buffered somehow).
2. One of the various reference voltage components already available for various conversion systems (some of them can be 'in-system' calibrated).
3. A 'digital potentiometer' in a voltage divider configuration possibly with a unity gain buffer.

One can measure current using a shunt resistor, and I have dug out my collection of 2 milliOhm 30W shunts because using a inductive current probe is just a double problem. A standard Rogowski coil probe would be slow and limit your reaction time to about 3uS using most off the shelf components (and it assumes the hall effect sensor is capable of measuring DC saturation of the coil). So even if you did want to sample, you'd probably top out around 100,000 samples per second before you alias. Obviously a professionally designed oscilloscope current probe responds faster than this, but it's coil size is small and it's amplification circuits expensive and more difficult to calibrate. I can't see spending $400-$2,000 on this...possibly per device. :eek:

I was concerned about using the shunt in other posts because I wasn't convinced I could find a ready source of this wattage and value shunt resistors, but I found one. Lucky enough the stall current of each CIM isn't much higher. I just don't think telling people to trim their own shunts is going to work out unless they have a precision meter or bridge.

As to the placement, I figure the device has to be designed to be connected at the device that you think is 'browning' out. This includes the shunt resistor which should be on the load end.

In case it's not apparent where I'm going with this, as the concern here isn't purely passive circuit dynamics, I'm gearing up to create a way to check the phase relationship and magnitude of the voltage and current by setting the levels to where you consider anything beyond or below that a failure. Then latching that state. So the instant something slips out of the range of 'acceptable' you merely get an indicator that you have a problem. If you really wanted you could use a S/R flip flop on that configuration (like a 555 timer) and then use a digital device like a cheap PIC/AVR to do some QOS analysis. For example...how many times did you slip out of range....how long approximately was it out of range?

techhelpbb 14-03-2011 13:12

Re: Unexplained intermittent CAN / 2CAN Jaguar problems at GSR
 
Quote:

Originally Posted by Mike Copioli (Post 1039252)
The brown out on the Jag is around 4.8 volts based on the TPS54040 datasheet. The Jag uses a 5 volt buck in series with a 3.3 v regulator. This is the rated lockout for that regulator, most likely it is above 5 volts. Also the CAN controller and RS232 convertor both operate on 5V volts not 3.3v.

Thanks you just saved me some time.

I've been looking at various ways to put a little extra capacitance on the 5V and 3.3V sides to increase the noise rejection.

I thought I saw this in the Jaguar schematic before, but I haven't had time to look.

techhelpbb 14-03-2011 13:14

Re: Unexplained intermittent CAN / 2CAN Jaguar problems at GSR
 
I should like to note for the record, that we do not use the 2CAN on the cRIO.

We are using the RS232/CAN bridge in the black Jaguar.

I just want to point this out because there could be yet more complications depending on how you get to the CAN bus. :rolleyes:

techhelpbb 14-03-2011 13:22

Re: Unexplained intermittent CAN / 2CAN Jaguar problems at GSR
 
Ultimately, even with the high performance of the Jaguar's embedded controller I am concerned about using a digital measuring system to measure 'brown outs' because the noise spikes are short and they could easily alias the conversion process. If you get a good shot of noise into the circuit and it's very brief you might not even know it was there.

I've worked on some integrated circuit design and there are a few ways to go about the idea of an internal reset and brownout detector and sometimes the nice clean digital approach isn't a great idea around motors.

In other robotics designs...for example the RB5X (circa the 1980s) they used to use 2 batteries. One for the control circuitry and one for the drive motors. Specifically because this meant the noise couldn't walk back to the control logic as long as you had some isolation. They used 'logic level relays' to operate those drive motors, so there was no physical connection between the motor battery and the logic battery.

In this case, we don't get many choices with the Jaguars because they steal their logic power from the same terminals that feed the H-bridge.

BTW, in case anyone here never heard of the RB5X robot:
Meet a blast from the past and a certain recent Pawn Star TV show episode....
http://www.rbrobotics.com/
http://www.rbrobotics.com/Specs/rb5x_specs.htm

Mike Copioli 14-03-2011 18:01

Re: Unexplained intermittent CAN / 2CAN Jaguar problems at GSR
 
UUUUmmmmm Yaaaah......... or the Jag could just handle brown out better. Im just sayin.

techhelpbb 14-03-2011 18:18

Re: Unexplained intermittent CAN / 2CAN Jaguar problems at GSR
 
Quote:

Originally Posted by Mike Copioli (Post 1039533)
UUUUmmmmm Yaaaah......... or the Jag could just handle brown out better. Im just sayin.

:yikes:

Next you'll want it to motion control or something.
Maybe differential encoder inputs?

:p

Seriously, you'd think at 3.3V from a 12V+ source it could find the overhead to handle the noise.
RB5X has a 6V battery...for 5V logic...and while it moves obviously it discharges to even less.
It has *much* less room to play from the days before DC-DC converter modules where everywhere.

I'm quite sure that this can be made to work. However, the question remains what to tweak.
Hopefully, the rules won't eventually conflict with whatever needs to be done to clear this up.

EricVanWyk 14-03-2011 18:46

Re: Unexplained intermittent CAN / 2CAN Jaguar problems at GSR
 
Quote:

Originally Posted by techhelpbb (Post 1039327)
The idea in this case would be to rig the comparator to have either various references, possibilities include:

1. A unity gain buffered non-forced voltage out D/A converter (most voltage out D/A converters are buffered somehow).
2. One of the various reference voltage components already available for various conversion systems (some of them can be 'in-system' calibrated).
3. A 'digital potentiometer' in a voltage divider configuration possibly with a unity gain buffer.

One can measure current using a shunt resistor, and I have dug out my collection of 2 milliOhm 30W shunts because using a inductive current probe is just a double problem. A standard Rogowski coil probe would be slow and limit your reaction time to about 3uS using most off the shelf components (and it assumes the hall effect sensor is capable of measuring DC saturation of the coil). So even if you did want to sample, you'd probably top out around 100,000 samples per second before you alias. Obviously a professionally designed oscilloscope current probe responds faster than this, but it's coil size is small and it's amplification circuits expensive and more difficult to calibrate. I can't see spending $400-$2,000 on this...possibly per device. :eek:

I was concerned about using the shunt in other posts because I wasn't convinced I could find a ready source of this wattage and value shunt resistors, but I found one. Lucky enough the stall current of each CIM isn't much higher. I just don't think telling people to trim their own shunts is going to work out unless they have a precision meter or bridge.

As to the placement, I figure the device has to be designed to be connected at the device that you think is 'browning' out. This includes the shunt resistor which should be on the load end.

In case it's not apparent where I'm going with this, as the concern here isn't purely passive circuit dynamics, I'm gearing up to create a way to check the phase relationship and magnitude of the voltage and current by setting the levels to where you consider anything beyond or below that a failure. Then latching that state. So the instant something slips out of the range of 'acceptable' you merely get an indicator that you have a problem. If you really wanted you could use a S/R flip flop on that configuration (like a 555 timer) and then use a digital device like a cheap PIC/AVR to do some QOS analysis. For example...how many times did you slip out of range....how long approximately was it out of range?

Please remember that this forum has a wide range of audience. It is cool that you know all these words, but clarity is more valuable. These posts are effectively a smoke screen to those without the electrical background to parse it.

Allow me to translate:
We could detect brownouts by measuring current and voltage.


And please excuse my bluntness, but the rest of it is entirely superfluous and ridiculous. Seriously? DVBST? Rogowski coils? Hand trimming shunts?

This is the PID show all over again. Inspire students. Don't try to impress them.

techhelpbb 14-03-2011 19:01

Re: Unexplained intermittent CAN / 2CAN Jaguar problems at GSR
 
Quote:

Originally Posted by EricVanWyk (Post 1039567)
Please remember that this forum has a wide range of audience. It is cool that you know all these words, but clarity is more valuable. These posts are effectively a smoke screen to those without the electrical background to parse it.

Allow me to translate:
We could detect brownouts by measuring current and voltage.


And please excuse my bluntness, but the rest of it is entirely superfluous and ridiculous. Seriously? DVBST? Rogowski coils? Hand trimming shunts?

This is the PID show all over again. Inspire students. Don't try to impress them.

I don't need to impress anyone at all. It has no impact on me either way...what so ever.

1. There are lots of kinds of coils, and even field measuring coils, I'm being specific about which one.
2. There are other ways to measure current, I'm being specific about why this way.
3. I had stated concerns about why a shunt resistor to measure current with Ohm's law was a bad idea in this case before (availability), I am adjusting my view based on the available materials.
4. I was looking for an even more simplistic and cheap way to get to this goal of this brown out detector, and I had hoped to seek out specifics so I was seeking the limits of the problem. Hence why I considered how an analog storage oscilloscope is made (it's a DVBST tube).
5. I provided an alternate example of how others solved this problem, using a robot that was targeted specifically towards an educational setting.

Further, for anyone that actually wants to know what those terms mean...

http://en.wikipedia.org/wiki/Rogowski_coil

http://en.wikipedia.org/wiki/Oscilloscope

http://en.wikipedia.org/wiki/Direct-..._Storage_Tubes

http://http://www.reuk.co.uk/Make-a-Shunt-Resistor.htm

We need to solve the problems and I am trying my best to find a cost effective way to do that with the help of these mentors in the spirit of community contribution.

Thanks.

gblake 14-03-2011 21:08

Re: Unexplained intermittent CAN / 2CAN Jaguar problems at GSR
 
Quote:

Originally Posted by techhelpbb (Post 1039327)
The idea in this case would be to rig the comparator to have either various references, possibilities include:
...

Techhelpbb and other folks - My EE degree is a bit dusty; but recently there seem to be a lot of uncertainty about possible problems (and solutions) in this thread and very few clear, specific questions/answers in the last 10-15 posts.

I think the entire conversation would be improved by a clear, clean, concise list of simple observable symptoms linked to proven root causes. There have been a lot of "maybe"s, "sometimes", "somebody told me"s, and possible test method posts (techhelpBB - I have to say some of those have been way into the weeds...); but there have lately been few "Here is the checklist you follow to fix problem X", or "Here is the checklist you follow to separate problem X from problem Y".

Who can cut through the fog, drain the swamp, and help that majority of students & mentors who aren't EEs, and who don't have the time (right now) to immerse themselves in the aracane minutia?

Maybe someone could start a new thread to discuss the physics of the COTS circuitry and possible home-brew test methods, and this one could stick to proven results?

Blake

techhelpbb 14-03-2011 22:07

Re: Unexplained intermittent CAN / 2CAN Jaguar problems at GSR
 
Quote:

Originally Posted by gblake (Post 1039662)
Techhelpbb and other folks - My EE degree is a bit dusty; but recently there seem to be a lot of uncertainty about possible problems (and solutions) in this thread and very few clear, specific questions/answers in the last 10-15 posts.

I think the entire conversation would be improved by a clear, clean, concise list of simple observable symptoms linked to proven root causes. There have been a lot of "maybe"s, "sometimes", "somebody told me"s, and possible test method posts (techhelpBB - I have to say some of those have been way into the weeds...); but there have lately been few "Here is the checklist you follow to fix problem X", or "Here is the checklist you follow to separate problem X from problem Y".

Who can cut through the fog, drain the swamp, and help that majority of students & mentors who aren't EEs, and who don't have the time (right now) to immerse themselves in the aracane minutia?

Maybe someone could start a new thread to discuss the physics of the COTS circuitry and possible home-brew test methods, and this one could stick to proven results?

Blake

Fair enough, I'll summarize our situation...can't speak for the others.

The criteria for the problem we have:

1. We are not using the 2CAN, we are using all black Jaguars with one as the RS232 bridge. For what it's worth we are programming in JAVA.

2. We are using them in one of 2 modes as it relates to the wheels, which most exhibit the problem most often (we do occasionally have some issues with the arm that are similar...but our arm is highly geared down).

A. In one mode, we have >no< encoders. We drive completely off vbus and we see a problem occasionally where the Jaguars just time out. A quick reset and they generally return...and that's being general....when they don't we have to reboot.

B. In the other mode we drive drive 2 Jaguars both with a PID velocity setpoint and we have a circuit to isolate the encoders using an optocoupler and TTL logic. This circuit is the subject of another topic on this very forum:
http://www.chiefdelphi.com/forums/sh...t=89282&page=4

The circuit shown is the work of the other engineer...functionally it is equivalent to what we made. With the only obvious difference being that our circuit has a pull up on each encoder channel (for testing the circuit without either the Jaguar or the encoders mostly to check for wiring problems to either). Also our circuit uses the extra inverters to provide test points for an oscilloscope...consider it proof of concept.

Even with this circuit our robot will drive straight as an arrow, spin, dance and do everything it is supposed to...for only so long. Then we'll suddenly develop the very same CAN errors (every few minutes). In fairness I'll say we see the problem more often when we use the encoders...but...we do see the problem without them. I personally think we can attribute some of that to the simple fact that if the Jaguar malfunctions on the CAN bus whatever caused that could also cause it to loose track of the encoder. The other reference is a different matter.

3. It has been noted by myself and other mentors that the circuit in 2B above becomes necessary because of various sources of electrical noise. The wiring of the Jaguars is prone to a specific source of noise called a ground loop, especially when the grounds are connected as in a more simplistic encoder split (see other topic for more detail). This circuit eliminates this source of noise from the encoders as it isolates them from each other and even the encoder by using light. It also lets you put some test equipment on there to see what's going on...minus the software. We found that invaluable because we have a bunch of bad encoders from years past and it's hard...with so many possible issues...to find them all. With this circuit (as we made it) we can see on an oscilloscope when the encoders are not working properly and we can do that test while the wheels spin at any test RPM. I can assure you by test myself that the encoders are working properly.

4. The circuit in 2B does not do anything about any electrical noise issues on the CAN bus, it has nothing to do with it. Other mentors and myself have noted that placing a capacitance across the various sensor power supply terminals on the Jaguar (the only place we can legally get to the internally regulated power sources) improves the issues with the CAN bus errors moderately (not completely).

5. It has been suggested earlier in this topic that we were either overloading the Jaguars or they were browning out. We have no evidence of overloading the Jaguars in that we have not blown or tripped any current limiting devices (fuse/breakers). The robot doesn't seem to overly drain any properly operating battery we put on it. The Jaguars themselves, via BDC-COM do not indicate any major overloads. This leaves us with the possibility of a brown out condition. A brown out condition we would need to test for on a robot moving at high gear around a playing field and that does not appear on any test we've been able to reproduce while stationary.

6. I then proceeded to attempt to figure out how to test for this condition, in the most practical, least expensive, most definitive manner. I did so in detail as it would seem that we are not the only one to experience this problem. Furthermore, even if we didn't have this specific problem...we all should know by now that a low battery condition can cause issues on the robot and sometimes people don't notice it because batteries are often best tested under load.

7. If you use the Jaguars as you would the Victor 884 (no CAN, just PWM cables from the digital side car) you'd side step the issues. To put it as simply as possible, the way the Jaguar responds to PWM unless you really do overload the Jaguar, it'll mostly ignore major sources of non-repeating noise that might otherwise trip up the CAN bus. In this mode, you can still use the encoders on the cRIO if you want it, however, in this mode we've sort of given up on the CAN bus.

7. To date...the only options I've considered beyond those discussed are to put capacitors across the CIM motors to try to snub some of the noise from the brushes. Put capacitors on the Jaguar's sensor power supply wires to try and add additional filtering to the Jaguar's internal power supply. Removing the ground wires from the CAN cables has been suggested in the other topic.

8. Other people in this topic have reported an issue from boot up, which until recently we never experienced. In their failure mode they turn on the robot and basically fail to initialize the Jaguars on the CAN bus. It has been suggested that this is a timing issue and can be addressed in software. We have seen this issue a total of once.

Conclusion:
What I am looking for, like you, is a way to standardize the process and eliminate these painful problems from the system. They are *too technical* and expensive and effectively form a barrier to getting this working properly for anyone that doesn't have the knowledge/skills/time to make it work. Given the state of my physical health right now....I more than anyone...do not want to be making anything or wasting money looking for phantom problems.

The tangent above was merely my way of looking for an economical test for making sure that the power quality on the robot is not causing adverse problems. If we can determine that the power quality is the problem...then we can at least define a reasonable process to find the problems. Right now...I'm looking for something I can't test...because I'd have to chase the robot with test equipment in high gear.

It's possible there's simply something wrong with the CAN communications, but honestly, when other mentors have had to troubleshoot that issue, they've had to confront the manufacture of other circuits because otherwise they'd need to buy something for a logic analyzer to test it.

DonRotolo 14-03-2011 22:36

Re: Unexplained intermittent CAN / 2CAN Jaguar problems at GSR
 
I think you're looking too deep, the problem must be simpler. Like fixing a car, 99% of the time it's simple, just go back to basics.

Put a 12 volt light bulb across the Jaguar inputs, you'll (literally) see any brownout or breaker issues as you drive.

You say the wiring is good, how do you know? Explain your tests and their results.

In all your posts I see a lot of assumptions and third-hand information. To really solve this, we need first-hand data.

Thanks.

techhelpbb 14-03-2011 22:52

Re: Unexplained intermittent CAN / 2CAN Jaguar problems at GSR
 
Quote:

Originally Posted by DonRotolo (Post 1039733)
I think you're looking too deep, the problem must be simpler. Like fixing a car, 99% of the time it's simple, just go back to basics.

Put a 12 volt light bulb across the Jaguar inputs, you'll (literally) see any brownout or breaker issues as you drive.

You say the wiring is good, how do you know? Explain your tests and their results.

In all your posts I see a lot of assumptions and third-hand information. To really solve this, we need first-hand data.

Thanks.

I'll find some light bulbs to put on it and see what there is to see with that.
If it were a breaker issue, I would expect it to trip and stay that way.

The wiring was tested with multimeter for resistance from the lugs that would go on the Jaguar back to the power board.

A multimeter was applied to the Jaguar input power while it was run off the ground no major issues were noted (but multimeters are fairly slow to measure).

When the robot was misbehaving I did put an oscilloscope on the power supply once it was stationary and I saw no major issues that would account for the fact that the Jaguars at that time were basically locked up.

When we first encountered the problem weeks ago we just reboot the robot a few times. However, that was slow and painful. So we adjusted the software to detect the event of the Jaguar error and reset. For the most part that improved the situation as long as you ignore that we're still getting time out errors.

I don't have my numbers with me tonight. I'll have to swing by and get them and we don't have a DSO or an oscilloscope camera to work with, so I have no waveforms for anyone to look at.

DonRotolo 14-03-2011 23:05

Re: Unexplained intermittent CAN / 2CAN Jaguar problems at GSR
 
Quote:

Originally Posted by techhelpbb (Post 1039746)
If it were a breaker issue, I would expect it to trip and stay that way.

You'd be wrong. They trip for anywhere from a fraction of a second to a second or two. Self-resetting.
Quote:

Originally Posted by techhelpbb (Post 1039746)
multimeter for resistance from the lugs that would go on the Jaguar back to the power board.

A multimeter was applied to the Jaguar input power while it was run off the ground no major issues were noted (but multimeters are fairly slow to measure).

Nyet. Measure voltage drop along the wire - from power board to input screw, and output screw to motor connector - while passing high current. Only way to spot a bad wire (in this case) is under load. An Ohmmeter will steer you wrong.
Quote:

Originally Posted by techhelpbb (Post 1039746)
When the robot was misbehaving I did put an oscilloscope one the power supply once it was stationary and I saw no major issues that would account for the fact that the Jaguars at that time were basically locked up.

When they are 'locked up' there's no current draw, you won't see anything of interest - too late in the process.

I think 1676 will be visiting 11 soon, we'll look at the practice bot with you then.

Back to basics. Maybe build a test platform without external influences - just the bare basics - and see if you can reproduce it on the bench. Time-consuming, yes, but ultimately can be valuable. And it'll keep the students involved.

techhelpbb 14-03-2011 23:15

Re: Unexplained intermittent CAN / 2CAN Jaguar problems at GSR
 
Quote:

Originally Posted by DonRotolo (Post 1039763)
You'd be wrong. They trip for anywhere from a fraction of a second to a second or two. Self-resetting.
Nyet. Measure voltage drop along the wire - from power board to input screw, and output screw to motor connector - while passing high current. Only way to spot a bad wire (in this case) is under load. An Ohmmeter will steer you wrong.
When they are 'locked up' there's no current draw, you won't see anything of interest - too late in the process.

I think 1676 will be visiting 11 soon, we'll look at the practice bot with you then.

Back to basics. Maybe build a test platform without external influences - just the bare basics - and see if you can reproduce it on the bench. Time-consuming, yes, but ultimately can be valuable. And it'll keep the students involved.

That's fine with me.
I just want nice...clean...functional solutions.

We ran this robot off the ground for 20 minutes at a clip under test and it was fine, though in fairness I was measuring from the battery negative to the Jaguar positive input terminal. Not actually measuring the difference across the wires.

However, again...if there is a problem here we've missed under test...then let's remove it.

I don't like all the effort we had to pump into this any more than anyone else.

jhersh 15-03-2011 20:32

Re: A different Serial CAN problem.
 
Quote:

Originally Posted by PhilBot (Post 1038571)
The problem is that this "0.0 postion" error persisted for many reboots, debug download etc. and then just returned to normal and hour later.

Sounds like you had a short between the wiper and ground on your pot.

Quote:

Originally Posted by PhilBot (Post 1038571)
The Jags are switched between "Voltage out" and "Position mode" each time a preset recall button is pressed (and held). Before the new mode parameters are loaded, the JAGs are disabled, and then re-enabled at the end. All of the JAG parameters (control mode, position source, pot turns, PI gains etc) are re-loaded, and the code checks for any errors and re-tries the command sequence 3 times if errors persist.

Note: We'd run into the occasional bad-command-write problem several days before shipping and had built in the retries. So it's pretty bullet proof.

Sounds like a nice implementation. We did something similar last year and had no problems like this. The main difference between our setup and yours is we were using an encoder instead of a pot.

Quote:

Originally Posted by PhilBot (Post 1038571)
Do you think accidentally sending the JAG an invalid "position value" (eg: a negative value for target position) could lock up the controller????

I don't think it should matter, though I can't say I've explicitly tested for that. If you can reproduce it with a simple program, let me know.

-Joe

nuttle 20-03-2011 13:45

Re: Unexplained intermittent CAN / 2CAN Jaguar problems at GSR
 
FWIW, team 2641 also had CAN timeout issues at init time, using a 2CAN. We have 8 Jags on CAN, and pretty consistently saw errors before we ever started to drive, with the robot on a stand and so not much current being drawn from any of the motors (all 8 were part of a 4 drive/4 steer holonomic drive system). Also, this happened with a fresh battery. We saw some non-zero error counts when using the 2CAN web page (the 2CAN was only connected to a laptop in this configuration). We tried using the 2CAN web page to drop the bus speed for the CAN bus, but this didn't seem to help. We eventually switched to using the serial connection, and everything cleared up. Our 2CAN is from last season. We have short CAN cables (the Jags are all reasonable close to each other) and a good terminator. However, we do not use twisted pair wiring for the CAN bus, is this reccomended with the 2CAN? Just to note, the Jags are all powered by short 10-gauge connections to the 8 40-amp connections on the power board.

We'll try the new 2CAN firmware, but any other helpful advice would be appreciated. For other teams who might have trouble in the future, trying to lower the speed of the CAN bus to the lowest setting is probably worth a shot. Also, having code that handles timeout exceptions is a very good idea (we use Java and have a state machine that cycles through all of the steps we need to set up a Jag, this checks the power cycled condition after any error and if there has been a power cycle, resequences through the initialization states).

Phalanx 20-03-2011 15:34

Re: Unexplained intermittent CAN / 2CAN Jaguar problems at GSR
 
Quote:

Originally Posted by jhersh (Post 1037862)
We are testing a new image that should fix it. We have had no failures yet. However, FIRST wants more testing before making it public.
-Joe

As week 4 events will begin this coming week, I'm wondering if sufficient testing has been done to allow for a release of this new image?

MikeE 21-03-2011 13:57

Re: Unexplained intermittent CAN / 2CAN Jaguar problems at GSR
 
At the risk of being redundant and rehashing the problems that are described above and on other threads, we've identified two distinct problems with our Java / CAN system (we're using the BlackJag serial/CAN bridge).

1) Intermittent initialization failures. This always occurs as a failure of all Jags so we initially checked the cabling/termination extensively on the bridging Jaguar. Typically a power cycle of the robot fixed the issue, but sometimes a cRIO reboot worked and sometimes it persisted over multiple reboots. It seemed to be more prevalent on our practice bot than competition bot, but that may be due to a side effect of a higher frequency of power cycling on practice bot. We had one total failure to move in eliminations at Chesapeake which may have been caused by this problem. We didn't get the memo which provoked some teams to abandon CAN in favor of PWM. We will be implementing the initialization retry work-around in code on the practice bot once it's been de-cannibalized. I hope the potentially imminent NI patch solves the root cause.

2) Stuttering. This is not strictly a CAN problem because we see all outputs disabling briefly. Occurs both in autonomous and teleop, and looks quite dramatic when pneumatic solenoids are briefly disabled and our arm mechanism looks to be giving a round of applause. Originally appeared to be a code problem in teleop, but refactoring didn't solve the problem. I don't believe we've seen the problem in competition. We will check for Jags reporting a powercycle in case this is a voltage problem, although I don't think that's likely.

martin417 21-03-2011 14:59

Re: Unexplained intermittent CAN / 2CAN Jaguar problems at GSR
 
I don't think anyone has mentioned this before so I will bring it up. We're using the BlackJag serial/CAN bridge. We also have limit switches connected directly to one of the Jags. The only time we have ever had the scrolling CAN error is when the CrIo is booted with one of the limit switches depressed. Even with a limit switch depressed, it only happens about 5% of the time. We have never had the issue if no limit switches are depressed.

Part of our pre-match checklist was to make sure that the limit switch was not depressed. Naturally, the only time we forgot to check, we got the errors and didn't move at all that match. After that, we removed that limit switch and depended on the encoder to stop the mast at the bottom. We didn't have any more problems.

PhilBot 21-03-2011 15:18

Re: A different Serial CAN problem.
 
Quote:

Originally Posted by jhersh (Post 1040328)
Sounds like you had a short between the wiper and ground on your pot.

-Joe

Unlikely. It hapenned on both arm joints at the same time. I also measured the return wiper voltage and both changed when the joints moved.

The only thing these joints had in common (other than system power) was the CAN bus and the cRIO/code.

Phil.

I tell you, there is weird S**t going on with this system.

techhelpbb 21-03-2011 15:52

Re: Unexplained intermittent CAN / 2CAN Jaguar problems at GSR
 
I am suspicious of something and I'll share it here merely as speculation until I can prove it.

If you look at the schematics of the Jaguars (both models) you'll note that in all cases virtually every single interface on the Jaguar reference either both their internal ground...and one of their internal power supplies, or just ground.

The exception to this rule is the PWM input...which does in fact have a logic level optoisolator.

Now...the people using PWM don't read errors out...but in using PWM they are isolated from the Jaguar's internal power supplies entirely.

Anything you put on the Jaguar other than that input has the ability to directly impact the internal power supplies...and unlike the input power to the Jaguar which is heavily filtered because of all those power supply circuits...you'd be impacting in some cases the very same power that runs the Jaguar's microcontroller...and provides signal conditioning for it's analog to digital circuits.

I think this is the sort of thing that could explain why...when people hang even modest limit switches off the Jaguar...odd things can happen...but don't always happen.

Further...CAN is a robust bus...but if the circuits within the Jaguar's suffer from a power supply issue...these issues might translate into things like CAN timeouts.

If you think about it...you are basically putting a long antenna on the Jaguar's internal power supply whenever you connect a sensor.

*I am merely speculating.* I am awaiting the opportunity to test this. I am also accepting the possibility from other posters in this topic that...at least in the case of our issues...it might be a wiring problem.

kamocat 21-03-2011 16:23

Re: Unexplained intermittent CAN / 2CAN Jaguar problems at GSR
 
This is a good point, but it doesn't seem to add up.
Most of the inputs have a resistor (1k or greater) inline with their power supplies. There are two exceptions to this: the encoder and the brake/coast jumper.
Furthermore, limit switches are active when they are left OPEN. Their default state is closed (when the jumper is installed). This means a limit switch must be "normally closed", and break the connection when pressed. When a limit is pressed, it is drawing LESS power than when it is.

It was a good suggestion, though. I'm curios how close the Jaguar is to drawing more than their power supplies can handle.

techhelpbb 21-03-2011 17:01

Re: Unexplained intermittent CAN / 2CAN Jaguar problems at GSR
 
Quote:

Originally Posted by kamocat (Post 1043278)
This is a good point, but it doesn't seem to add up.
Most of the inputs have a resistor (1k or greater) inline with their power supplies. There are two exceptions to this: the encoder and the brake/coast jumper.
Furthermore, limit switches are active when they are left OPEN. Their default state is closed (when the jumper is installed). This means a limit switch must be "normally closed", and break the connection when pressed. When a limit is pressed, it is drawing LESS power than when it is.

It was a good suggestion, though. I'm curios how close the Jaguar is to drawing more than their power supplies can handle.

I'm not entirely thinking in terms of DC voltages and currents.

I'm thinking of it more like an antenna...merely a piece of wire.

In some cases they have it plugged directly into the microcontroller.

Take for example the limit switches...no AC decoupling capacitors.
Even the potentiometer input...no AC decoupling.

Even the grounds could be an issue if you extend the ground plane out the wrong way.

In the black Jaguar the 5V power supply is a: TPS54040
"3.5V to 42V Input, 0.5 A Step Down SWIFT™ Converter with Eco-Mode™ "

http://focus.ti.com/lit/ds/symlink/tps54040.pdf

The 3.3V regulator is:TPS73633
"Single Output LDO, 400-mA, Fixed (3.3 V), Cap-Free, Low-Noise, Reverse Current Protection"

http://focus.ti.com/docs/prod/folder.../tps73633.html

The microcontroller the: LM3S2616

http://www.luminarymicro.com/product...html#Datasheet

kamocat 21-03-2011 17:13

Re: Unexplained intermittent CAN / 2CAN Jaguar problems at GSR
 
The trouble with using AC decoupling capacitors is that they only work on an oscilating signal. Limit switches and potentiometers are often very constant.
I suppose you could put opto-isolators on the limit switch inputs and the coast/brake.
Potentiometers, however, I would expect to be noise-resistant. You could still put a op-amp inline if you want.

To answer my own question, the 3.3v power supply has about 15.5 mA of load on it, assuming the break/coast jumper and JTAG aren't shorted.
The 5v power supply has the 3.3v power supply, plus another 193mA of load.
(The actual loads may be less than what I've calculated; I used the maximum current draw. This assumes the encoder input is not shorted)

techhelpbb 21-03-2011 17:32

Re: Unexplained intermittent CAN / 2CAN Jaguar problems at GSR
 
Quote:

Originally Posted by kamocat (Post 1043303)
The trouble with using AC decoupling capacitors is that they only work on an oscilating signal. Limit switches and potentiometers are often very constant.
I suppose you could put opto-isolators on the limit switch inputs and the coast/brake.
Potentiometers, however, I would expect to be noise-resistant. You could still put a op-amp inline if you want.

Use the capacitors as a part of a filter or as bypass for frequencies that should not be there. After all a potentiometer and limit switch should be pretty constant as you've said.

MikeE 21-03-2011 17:35

Re: Unexplained intermittent CAN / 2CAN Jaguar problems at GSR
 
Quote:

Originally Posted by martin417 (Post 1043214)
... The only time we have ever had the scrolling CAN error is when the CrIo is booted with one of the limit switches depressed...

I've read several comments about scrolling messages in the diagnostic tab when a CAN timeout occurs, but I don't think I've ever seen them in our system.
During development we write to System.err and in competition to the DS display with code like this:
Code:

try {
    driveMotor.setX(something);
} catch (CANTimeoutException e) {
    DSmessage.getInstance().println(Line.kUser4, 1, "CAN timeout on drive");
}
...
DSmessage.getInstance().updateLCD();

Since there seems to be a built in logging facility in the diagnostic tab I'd prefer that they appeared there. Also the FTA(A) is more likely to check that location if problems occur on the field.

Can anyone share a Java snippet that sends CAN timeouts to the diagnostic tab?

jhersh 22-03-2011 12:08

Re: Unexplained intermittent CAN / 2CAN Jaguar problems at GSR
 
Quote:

Originally Posted by Phalanx (Post 1042535)
As week 4 events will begin this coming week, I'm wondering if sufficient testing has been done to allow for a release of this new image?

Indeed... You can find the update here: http://firstforge.wpi.edu/sf/go/proj...e_frc_2011_v29

JasonStern 26-03-2011 08:04

Re: Unexplained intermittent CAN / 2CAN Jaguar problems at GSR
 
Sadly to say, update 29 did not fix the can timeout issues for us at the dc regional. We have switched to pwm instead for the remainder of our matches.

Thanks to everyone who is trying to resolve these issues!

-Jason

drakesword 27-03-2011 18:52

Re: Unexplained intermittent CAN / 2CAN Jaguar problems at GSR
 
346 had issues at DC as well. During a couple of matches the speed of the robot would unexpectedly drop or the robot would begin shuddering (not a high speed shudder that a PID issue would show but a slow 1 second on 1 second off type issue). Finally in one of our last matches the second half of the can bus did nothing.

What is the upper limit of temperature before the jags begin to shutdown?

We did some test with our robot and we are still coming up inconclusive.

First we ran the robot and got no errors.
Then we unplugged a jag while running and still had no errors.
Then we unplugged a terminator without errors.
Unplugged any jag-jag wire and the entire bus stopped (not just beyond the lost connection)
Finally we unplugged encoders and still had no error.

We were running v29.

During the build season we were using the older image and ran the chassis for over 30 minutes without problems (except the one time we blew a grey jag by going full forward to full reverse too fast)

Dad1279 27-03-2011 19:23

Re: Unexplained intermittent CAN / 2CAN Jaguar problems at GSR
 
I'm sorry to hear that there were still issues with CAN at DC.

After our two matches at NY with 'NO COMM' issues, we thoroughly tested and could not come up with any ideas but to switch to PWM for DC.

We had no issues running PWM at DC.

Phalanx 27-03-2011 21:33

Re: Unexplained intermittent CAN / 2CAN Jaguar problems at GSR
 
We ran all of the DC Regional with CAN Jaguars in voltage control mode.
We had no encoders, pots, or limit switches in use on the JAGs.
We programmed in Labview, with Update 29.

We used encoders and limit switches attached to the digital side car.

We ran all of DC Regional, 9 Qualifiers(We missed one due to mechanical repairs), 2 Quarter Finals, and 2 Semi Finals matches. We had ZERO CAN issues.

I think it's time to investigate all of the following to attempt to find a commonality for failure.

1) Correct/Proper wiring both CAN and Power.
2) Ground isolation of components.
3) CAN Interface used, Serial or 2CAN
4) Type of JAGS used, Black, Grey, Both
5) The programming language used
6) The control mode(s) used.
7) encoders, pots, limit switches are being used.

So for us, I can say with 99.99999% certainty that for our machine...
1) CAN Wiring is correct with proper termination 100ohm.
2) All components are ground isolated from the frame and electrical wiring has no shorts or ground faults.
3) We used 2CAN on PORT 2 of the CRIO.
4) We used all BLACK JAGS.
5) We Programmed with Labview.
6) Used Voltage Mode only.
7) No Encoders, POTS, Limit Switches attached to the JAGS.

heydowns 27-03-2011 23:51

Re: Unexplained intermittent CAN / 2CAN Jaguar problems at GSR
 
v29 solved the problems we were seeing with our CAN-connected Jaguars. We no longer get random failures of the entire bus to work on robot startup as described in my earlier post.
We ran with v28 until Thursday lunch this past week and saw the boot-time failure twice during that morning.
After Thursday lunch when we put v29 on, the CAN chain (7 Jags) worked without issue for the remainder of the competition.

Quote:

Originally Posted by drakesword (Post 1045950)
346 had issues at DC as well. During a couple of matches the speed of the robot would unexpectedly drop or the robot would begin shuddering (not a high speed shudder that a PID issue would show but a slow 1 second on 1 second off type issue). Finally in one of our last matches the second half of the can bus did nothing.

Sorry to hear you had problems, same for 1123 :( We were at DC too - wish I'd have know I would have come taken a look at your setup and try to offer some suggestions.


The CAN driver (at least for C++ maybe the others I have not looked) blocks while waiting for CAN replies to commands that are sent. If you have a Jag that is disconnected, failing, powered off, or has a poor bus connection the driver will block for a "long" time waiting for that response, thus stalling your overall code execution. This might or might not account for the slow on/off issues you describe. Of course that doesn't explain the root cause.

We've actually modified our copy of the CAN driver to shut down a device that's failing communications (repeatedly) rather than continuing to bog down the entire system in order to mitigate this.

One thing I've seen helping teams with CAN issues is that termination resistors connected in the manner described in the Jaguar documentation (that is, crimping them directly into the RJ connector on the appropriate pins) is a potential source of intermittent bus errors, particularly during match play when vibration and collisions are occurring a lot. Instead of making the cable in this fashion I recommend crimping in a short length of wire and soldering the resistor to that.
Again, not necessarily the problem your team is encountering but something worth trying/checking along with double-checking all your other bus cabling.

Also, on a very general note, as I read back on this thread it is really apparent that a number of different issues are being referenced here (symptoms may seem similar but they are caused by different things). For those just coming to this thread please keep that in mind and if you post your experiences please be as specific as possible.

taichichuan 28-03-2011 08:25

Re: Unexplained intermittent CAN / 2CAN Jaguar problems at GSR
 
Hi Gang!

Team 116 used CAN at the DC regionals as well with V29 and saw the exact same type of stuttering behavior as many of the other teams:

1) Correct/Proper wiring both CAN and Power.

This is our 2nd year with CAN. I check the cables with my automated cable tester and all checked good. Termination using the 2CAN on one end and 100 Ohm terminator at the other.

2) Ground isolation of components.

Our Jags are on standoffs and are isolated from the robot chassis ground.

3) CAN Interface used, Serial or 2CAN

We were using the 2CAN with firmware 2.5 (3/13/11). This worked well at the Bayou regional except for one time with the CAN connect problem from V28.

4) Type of JAGS used, Black, Grey, Both

Mixture of both black and grey jags. All jags at Firmware V92.

5) The programming language used

C/C++

6) The control mode(s) used.

Voltage mode (we were trying to use position mode for autonomous but ran out of time)

7) encoders, pots, limit switches are being used.

No encoders, pots or limit switches were being used.


I spoke to Brad Miller at DC and described the problem and we were actually able to see a little of the problem in the pits with him standing there. He suggested changing the motor safety timeout, which we did. However, in the heat of eliminations, I don't recall if we saw the stuttering post change.
__________________

HTH,

Mike

drakesword 28-03-2011 12:44

Re: Unexplained intermittent CAN / 2CAN Jaguar problems at GSR
 
1) Correct/Proper wiring both CAN and Power.
Going to rewire most of the can-related components at VCU. We were using 6 conductor wire. Going to switch to two. 120 ohm terminator soldered to wire not crimped.

2) Ground isolation of components.
Again as above. possible ground loop through our can wires

3) CAN Interface used, Serial or 2CAN
Serial

4) Type of JAGS used, Black, Grey, Both
Black (5)

5) The programming language used
Java.

6) The control mode(s) used.
Speed and Position

7) encoders, pots, limit switches are being used.
Us-Digital(I think) encoders 200 pulses per revolution. Don't know what switches we use but we have 2 on our gripper.

To address potential blocking we will make individual threads to set output to each motor.

Phalanx 28-03-2011 13:03

Re: Unexplained intermittent CAN / 2CAN Jaguar problems at GSR
 
Quote:

Originally Posted by drakesword (Post 1046411)
1) Correct/Proper wiring both CAN and Power.
Going to rewire most of the can-related components at VCU. We were using 6 conductor wire. Going to switch to two. 120 ohm terminator soldered to wire not crimped.

We used 6 conductor wire, but we used 100ohm resistor as a terminator instead of 120ohm. I have heard second and third hand reports that using 120ohm terminators create issues, so I would try that before rewiring the entire CAN bus.

In the "Getting Started Guide" on Page 29 for the Black Jags the section "Jaguar Communication Cables" states 100ohm. Also see Table 8-1. CAN Wiring Parameters on page 26. http://www.luminarymicro.com/jaguar

heydowns 28-03-2011 13:14

Re: Unexplained intermittent CAN / 2CAN Jaguar problems at GSR
 
Quote:

Originally Posted by drakesword (Post 1046411)
To address potential blocking we will make individual threads to set output to each motor.

Just a warning - the bus transactions are serialized via a monitor lock in the CANJaguar driver (at least on the C++ version, as well as in the last public source code I can easily get to of the Java version). Thus threading the transactions won't entirely alleviate any potential for blocking with respect to one another (that is, if one Jag thread were to block waiting for a bus response all other concurrently executing Jag threads will also block). Of course, it can offer the advantage of allowing the remainder of your code to execute while the Jag thread(s) are tied up.

techhelpbb 28-03-2011 15:34

Re: Unexplained intermittent CAN / 2CAN Jaguar problems at GSR
 
Quote:

Originally Posted by drakesword (Post 1046411)
1) Correct/Proper wiring both CAN and Power.
Going to rewire most of the can-related components at VCU. We were using 6 conductor wire. Going to switch to two. 120 ohm terminator soldered to wire not crimped.

2) Ground isolation of components.
Again as above. possible ground loop through our can wires

3) CAN Interface used, Serial or 2CAN
Serial

4) Type of JAGS used, Black, Grey, Both
Black (5)

5) The programming language used
Java.

6) The control mode(s) used.
Speed and Position

7) encoders, pots, limit switches are being used.
Us-Digital(I think) encoders 200 pulses per revolution. Don't know what switches we use but we have 2 on our gripper.

To address potential blocking we will make individual threads to set output to each motor.

Our team's usage was similar:

1) Correct/Proper wiring both CAN and Power.

I wish we had load tested our robot's wiring, but it appears that no one ever tested the wire with a static resistive load...it's on our agenda for future builds. As such I can't be sure there wasn't a bad wire or crimp somewhere...however...please be aware that since last week our robot has seen competition in Palmetto and from what I understand no wiring issue presented with the robot configured to do only %VBUS over the CAN. Course that doesn't mean there isn't still an issue in that wiring somehwere...just not any major show stopping issues.

We tested our practice robot (same build as the production robot in the crate) for many hours at our test field and of all those hours we had less than 4 total lock ups that could not be cleared by any means other than a complete power down. That is statistically small...but troubling enough that the team is traveling with PWM cables and code to bypass any issues related to using them.

2) Ground isolation of components.

Our robot is using telephone wire between the Jaguars. All the wires on the interface follow that bus from the input at the serial port on the cRIO to the output where the terminator is. Hugh had suggested this as a possible ground loop, but we had to stop messing around with this for the moment and so we never broke this connection. Also that cable is not twisted in anyway, but it is unshielded, stranded 26AWG.

Our Jaguars are bolted to plexiglass, so they aren't connected to the frame ground.

As a result of the shear amount of work involved with the encoders and the drive train using the Jaguars, we entirely stopped using the encoders. When we had them connected, we used the encoder splitter we designed from the other topic and that provides complete isolation between the encoders and the Jaguars so there was no option of a ground loop from that source. When we did use this splitter our target set points for the Jaguar PID loops were velocity...and we did tune the loops successfully. However, in this configuration the number of time outs was greater than if we just ran %VBUS without the encoders. So it was either scrap CAN or scrap the encoders given the time constraints.

We still use some potentiometers on our arm that are connected to the Jaguars. We read those values back through the CAN, then hand them back to the Jaguars conditioned for %VBUS. This change was made by the students to make the arm motion more fluid. It also eliminated any further need for the Jaguar PID loop functions. The potentiometers should not be a source of a ground loop.

3) CAN Interface used, Serial or 2CAN
Serial

4) Type of JAGS used, Black, Grey, Both
Black (7)

5) The programming language used
Java and we use threading...doesn't help us when the CAN decides it won't stop blocking as we have it now. Never tried to avoid the blocking...it may be possible.

6) The control mode(s) used.
Velocity and position.

Velocity was working, position was always jerky...so we ended up using neither.

7) encoders, pots, limit switches are being used.

On the gear boxes but not used: U.S. Digital Encoders 360 count x 2
No limit switches
2 multi-turn industrial sensor grade potentiometers on the shoulder and wrist.

drakesword 28-03-2011 15:37

Re: Unexplained intermittent CAN / 2CAN Jaguar problems at GSR
 
Quote:

Originally Posted by heydowns (Post 1046424)
Of course, it can offer the advantage of allowing the remainder of your code to execute while the Jag thread(s) are tied up.

Thats the effect I was after

@Phalanx

According to some other threads the ground wire between jags can cause ground loop interference eg. the ground of one jag may be higher then the ground of another jag

techhelpbb 28-03-2011 16:06

Re: Unexplained intermittent CAN / 2CAN Jaguar problems at GSR
 
Just to clear up my post above...

Even with our Jaguars using %VBUS with CAN and only the potentiometers connected to the Jaguars as sensors...

We still get time out errors in addition to the occasional lock up.

The more wear and tear we put on our batteries the worse the problem gets.

Using the PID loops in the Jaguars seems to make the problems worse for us (to the point of dysfunction in some cases).

Due to the competition schedule, and now our additional appearance in St. Louis, we'll probably not want to tinker too much with the practice or production robot until it can't impact on performance at competition.

As soon as we can, I'd like to revisit this issue...even if we have to slap together another frame.
Our team has a spare new cRIO and I have a new Jaguar and some CIM motors.

kamocat 28-03-2011 16:20

Re: Unexplained intermittent CAN / 2CAN Jaguar problems at GSR
 
I wonder if we could be seeing the effects of process starving on the Jaguar?

CAN and RS232 communications are low priority tasks in the processor.

taichichuan 28-03-2011 16:43

Re: Unexplained intermittent CAN / 2CAN Jaguar problems at GSR
 
Quote:

Originally Posted by kamocat (Post 1046533)
I wonder if we could be seeing the effects of process starving on the Jaguar?

CAN and RS232 communications are low priority tasks in the processor.

That might explain the serial CAN problems, but we saw them with the 2CAN via Ethernet as well. I think serial CAN tops out at something like 2K pps but the 2CAN jumps that to more like 8K pps from what I understand. At least it's comforting to know that it's not just a 2CAN problem.

Mike

JasonStern 28-03-2011 20:05

Re: Unexplained intermittent CAN / 2CAN Jaguar problems at GSR
 
1) Correct/Proper wiring both CAN and Power.
We verified all wiring. In addition, the termination resistor was soldered onto short wires and not crimped directly to the plug.

2) Ground isolation of components.
All components were mounted to Plexiglas and the frame was verified as electrically isolated by the DC inspection team.

3) CAN Interface used, Serial or 2CAN
2CAN*

4) Type of JAGS used, Black, Grey, Both
Both

5) The programming language used
Java

6) The control mode(s) used.
Speed, %VBUS

7) encoders, pots, limit switches are being used.
US Digital E3 quadrature encoders with 1024 CPR and 1/2" bore for the black jaguars; nothing for the grey.

*We can cause the jaguars to freeze in both %VBUS and Speed mode using the BD-COMM utility and quickly changing values. This is true even if we remove all other jaguars from the bus.

-Jason

nuttle 28-03-2011 21:59

Re: Unexplained intermittent CAN / 2CAN Jaguar problems at GSR
 
Not to pile on, but in case this may help either another team or anyone working on these problems:


- If you try to use an indexed encoder with the closed-loop PID control feature of the Jags, I'm pretty sure this won't work (based on our experiences). The issue here is probably that the index mark causes the encoder count to be reset once per revolution, and the PID logic doesn't expect this behavior, since it is not a continuous count but a count that essentially wraps around at a certain point. You can get around this by disconnecting the index pin -- certainly worth a try if this applies to you. You'll have to rely on the encoder being at zero when things start up, or doing something else to reference the count.


- If you are using PID, stuttering could certainly be caused by not having the P/I/D coefficients set correctly.

- I second the recommendation not to crimp an RJ connector directly to the solid leads coming from a resistor; this is not as reliable as making a short pigtail using cable that is designed for use with these connectors and soldering the terminator resistor to this.
- If you use a 2CAN, the 2CAN webpage is helpful for seeing how things are behaving. In particular, it provides error counters that can help validate wiring and basic communications connectivity.

John Heden 03-04-2011 18:23

Re: Unexplained intermittent CAN / 2CAN Jaguar problems at GSR
 
Greetings All,

We started this thread after experiencing a number of catastrophic CAN startup problems at the GSR regional and while we were hopeful that the V29 update would provide some relief on this issue, we continued to see this problem occasionally at the Hartford regional this past weekend. While our Drive team was instructed to monitor the Driver Station diagnostic tab for streaming CAN errors, they became complacent on Friday after never encountering the issue on Thursday. A Driverstation side CRIO Reboot did NOT recover the situation after the match began and we sat idle during the entire match (our Alliance won without our participation).

We experienced this problem again on Saturday while setting up before opening ceremonies (we had the first match) and again the system did NOT recover with a warm Driver station reboot on the first attempt and required either a third reboot attempt or possibly an actual robot power cycle to recover. This early morning triple failure scared us a bit but we NEVER experienced this catastrophic CAN failure during our later matches. The drive team did occasionally see a few startup CAN errors that concerned (panicked) them but did not see the catastrophic scrolling CAN error behavior.

We saw this failure occur during GSR at frequency of about 1 out of 6 matches ONLY while we were actually on the playing field and never while tethered to the robot with approximately the same statistic at Hartford. There was an observation that was seen once on the practice field making us curious as to whether the use of the radio was some how a catalyzing factor. Our radio was physically touching the 2CAN so we decided to try to give them some space (inverse square law). We couldn't try this with the radio in the pits but we proceeded to do some repetitive tethered power on/off tests trying out different power up sequences (relative to laptop) to try to reproduce this. We must have done this 20 or 30 times and NEVER saw a single CAN transaction failure and certainly not the continuous scrolling catastrophic CAN failure signature. We thought this was a radio ONLY failure but we did eventually experience this once while we were tethered in the pits. A quick power cycle and the problem went away! I wish we had tried a soft reboot as an experiment but our robot was being queued and our goal at that point was recovery rather than experimentation.

I believe we have some type of CAN/2CAN/CRIO/WPILib startup race condition that occasionally prevents some type of low level initialization causing the complete loss of the CAN bus. The manifestation we see is as if we simply pulled the CAN cable out of the 2CAN preventing any successful transaction to any CAN device. I believe use of the radio somehow amplifies this window of opportunity for failure given our ratio of match failures to pit failures and given we power up much more often while tethered in the pits than during actual matches. We had little working radio based experience prior to arriving at GSR due to the late availability of the physical robot for software testing. This radio testing and its influence on CAN failures will be a priority when we get our robot back. My apologies that some of this data is so soft but we were unable to find any hard correlation or anything definitive other than an occasional complete startup failure that always recovers on a power cycle mostly ruling out cabling issues. This failure occurs BEFORE being enabled essentially ruling out any real voltage drop or current/noise problems. If we startup successfully, we do run successfully. In fact, we have performed number of tests where we pull the breakers out of the Jags and even the 2CAN. This causes CAN errors to be reported but the system nicely recovers within a couple of seconds after we plug the breakers back in. We use the default voltage mode so others who have a more complex initialization or control scheme may not recover so easily.

There was also some anecdotal data coming from others at Hartford (other teams and even some of the Harford technical field folks) that believe the serial CAN interface is more robust than the 2CAN and recommended we switch away from the 2CAN. While this startup problem is catastrophic, it feels like some type of simple initialization glitch that is solvable. The CAN & 2CAN approach is a nice technology with perhaps this single gremlin to be exorcised. We'll try to diagnose this further when our bot comes home but unless we can convince our team that this is behind us, we may be forced to return to the simpler ways of PWMs....

Cheers and thanks,

John

1) CAN Wiring is correct with proper termination.
2) All components are ground isolated from the frame and electrical wiring has no shorts or ground faults.
3) We run the CRIO connection directly to the radio and connect the radio directly to the 2CAN rather than passing all CRIO traffic through the 2CAN.
4) We used all Tan JAGS.
5) We Programmed with C++.
6) Used Voltage Mode only.
7) 3 Jags with optical encoders, 1 of these has a single limit switch as well
2 Jags each with 2 limit switches, no encoder

8) I should also add that our software launches a separate dashboard thread at the end of the constructor AFTER the Jags and other robot objects are created with this data (encoder values, currents, voltages, etc) being read for display and capture by our custom dashboard. This explains why we see a continuous never ending data stream while others, I suspect, may see a number of errors during startup that stop but resume once they are enabled and autonomous & control Jaguar transactions begin.

Hugh Meyer 04-04-2011 11:00

Re: Unexplained intermittent CAN / 2CAN Jaguar problems at GSR
 
John,

I may be misunderstanding your comment # 3. Please clarify if my comments don't make sense.

*****
3) We run the CRIO connection directly to the radio and connect the radio directly to the 2CAN rather than passing all CRIO traffic through the 2CAN.
*****

I thought the 2CAN was to be connected to the CRIO on port # 2. Since port # 2 is on a different network the traffic is isolated from the robot communication traffic on the wireless. That is how ours is wired and we just completed 2 regional events without any control issues. We had other issues, just not control ones...

It seems adding that additional load through the radio switch and robot communication network could indeed cause problems.

-Hugh

taichichuan 04-04-2011 12:06

Re: Unexplained intermittent CAN / 2CAN Jaguar problems at GSR
 
Quote:

Originally Posted by Hugh Meyer (Post 1049155)
John,

I may be misunderstanding your comment # 3. Please clarify if my comments don't make sense.

*****
3) We run the CRIO connection directly to the radio and connect the radio directly to the 2CAN rather than passing all CRIO traffic through the 2CAN.
*****

I thought the 2CAN was to be connected to the CRIO on port # 2. Since port # 2 is on a different network the traffic is isolated from the robot communication traffic on the wireless. That is how ours is wired and we just completed 2 regional events without any control issues. We had other issues, just not control ones...

It seems adding that additional load through the radio switch and robot communication network could indeed cause problems.

-Hugh

Routing CAN traffic through the radio could certainly cause problems. The radio's ports have a limited amount of buffer space before the 802.3x congestion control messages start flying around on the net. I'm not sure what the 2CAN would do if it suddenly started getting a lot of source quench message traffic. Certainly, packets would start getting lost and that would be bad on a half-duplex style network like CAN (at least in the way it's implemented for FIRST). So, it's probably better to wire the 2CAN to port 1 on the cRIO and wire the radio on the other port of the 2CAN.

My $.02,

Mike

John Heden 04-04-2011 12:37

Re: Unexplained intermittent CAN / 2CAN Jaguar problems at GSR
 
Hi Hugh,

With respect to my comment #3, We initially had ALL of our CRIO traffic going through the 2CAN device by connecting the CRIO to the 2CAN and then to the radio. This worked very well except for the same problem that we now discussing.

Our second topology was to run all Ethernet devices (CRIO, Camera 1, Camera 2, and the 2CAN) directly to the radio which felt like a more robust approach as the 2CAN device did not need to manage all CRIO traffic. This was an experimental change that did not seem to help or hurt but that's the way we have left our robot wired.

This approach leaves 1 of the 2 2CAN Ethernet ports unconnected and required us to disconnect 1 of our cameras when tethered (maybe the other 2CAN port could have been used for tethering but we never investigated this).

I hope this clarifies my comment #3 a bit. Perhaps our Team's next experiment would be to use port #2 to connect directly to the 2CAN device and see whether that helps. Given that this port has NO other Ethernet traffic, perhaps it would be a bit more consistent in any network influenced timing dynamics.

Thanks,

John

John Heden 04-04-2011 13:17

Re: Unexplained intermittent CAN / 2CAN Jaguar problems at GSR
 
Quote:

Originally Posted by taichichuan (Post 1049185)
Routing CAN traffic through the radio could certainly cause problems. The radio's ports have a limited amount of buffer space before the 802.3x congestion control messages start flying around on the net. I'm not sure what the 2CAN would do if it suddenly started getting a lot of source quench message traffic. Certainly, packets would start getting lost and that would be bad on a half-duplex style network like CAN (at least in the way it's implemented for FIRST). So, it's probably better to wire the 2CAN to port 1 on the cRIO and wire the radio on the other port of the 2CAN.

My $.02,

Mike

Mike,

Thanks for your thoughts. We initially had the 2CAN device on the CRIO port 1 connector which is where we first started to see our problems. The buffer capacity of the 2CAN vs. the radio was unclear but our intuition gave the radio the advantage here. I believe our next step will be to simply move to port 2 of the CRIO and see whether that helps things.

Thanks,

John

nuttle 04-04-2011 14:41

Re: Unexplained intermittent CAN / 2CAN Jaguar problems at GSR
 
Do you know if your 2CAN has had the firmware updated to version 2.5 or not? We have a regional this coming week/end and will try out v29 on the cRIO. We've been using serial, but it is easy enough to switch back and forth that we might try the 2CAN again, at least for practice matches. We have not yet had a chance to try either of these updates.

heydowns 04-04-2011 14:57

Re: Unexplained intermittent CAN / 2CAN Jaguar problems at GSR
 
Quote:

Originally Posted by John Heden (Post 1048852)
We started this thread after experiencing a number of catastrophic CAN startup problems at the GSR regional and while we were hopeful that the V29 update would provide some relief on this issue, we continued to see this problem occasionally at the Hartford regional this past weekend.

Hi John,

Did you by chance happen to have Windriver's target debugger connection open when this occurred in your pits? If so, did it report any abnormal terminations or errors?

If not, you might consider doing so as this can provide a wealth of information when things go awry. You don't have to be "debugging" the code at the time -- you can be running an "deployed" program. The nice part about it is that it will report any task failures/terminations to you as well as stack information if available.

Are you able to go into a bit more detail on what your dashboard task is doing exactly? Particularly in relation to CANJaguar objects, as well as frequency of iteration.
We send dashboard data during disable as well, but do it as part of the normal disable processing routine rather than as a separate task.

jtdowney 04-04-2011 15:03

Re: Unexplained intermittent CAN / 2CAN Jaguar problems at GSR
 
Quote:

Originally Posted by taichichuan (Post 1049185)
Routing CAN traffic through the radio could certainly cause problems. The radio's ports have a limited amount of buffer space before the 802.3x congestion control messages start flying around on the net. I'm not sure what the 2CAN would do if it suddenly started getting a lot of source quench message traffic. Certainly, packets would start getting lost and that would be bad on a half-duplex style network like CAN (at least in the way it's implemented for FIRST). So, it's probably better to wire the 2CAN to port 1 on the cRIO and wire the radio on the other port of the 2CAN.

My $.02,

Mike

R50A states "The DAP-1522 radio is connected to the cRIO-FRC Ethernet port 1 (either directly or via a CAT5 Ethernet pigtail)." We took this to mean that under no circumstance can an active device (2CAN) sit between the DAP-1522 and the cRIO. That leaves two choices, connect the 2CAN to the DAP-1522 or connect the 2CAN to Ethernet port 2 on the cRIO.

My teams robot (programmed with Java using IterativeRobot) has gone through one event with our 2CAN plugged into our DAP-1522 and have had no CAN related trouble. We were running cRIO v28 (v29 wasn't out at the time) and 2CAN firmware v2.5 with the SVN rev 66 plugin on the cRIO. We have 6 black jaguars on the CAN bus with no sensor inputs or limit switches.

Perhaps we were very fortunate during our regional but we have not had any serious CAN issues (knock on wood) since build. All of our trouble then could be traced back to poorly made cables when we did have problems.

I am hoping our luck caries us through championship.

mjcoss 04-04-2011 18:12

Re: Unexplained intermittent CAN / 2CAN Jaguar problems at GSR
 
We continue to be plagued with timeouts on the CAN bus. And yes, I've checked the termination, and it all looks good. We are running V29 and I don't have the numbers for the plugin or the 2CAN firmware.

One thing that I have seen which is causing no end of issues is that if you get a timeout on the messages, the API is return no indication of the failure. So, for example, if the GetForwardLimitOK() function is called, and times out, you get back false. There is no way to know that that has happened and if you are making decisions based on these results... We have an encoder on our lift mechanism. To zero the encoder, we drive to the bottom limit switch, and when we get there, we set the encoder to 0. This works fine until we lose the message due to timeout. From that point on the lift is offset by where ever the timeout occurred. There really needs to be a way within the API to detect that the transaction timed out.

Of course, the best answer would be that we don't have any timeouts. :)

Another observation related to timeouts is that we have an on board compressor and if during initial startup the pressure sensor indicates that the compressor should run and starts the compressor immediately, we get a number of timeout messages.

All in all, I'm really regretting the decision to use the CAN bus. And for the most part all of the features that I really wanted to use, that were provided by the CAN bus, proved to be unusable.


All times are GMT -5. The time now is 04:11.

Powered by vBulletin® Version 3.6.4
Copyright ©2000 - 2017, Jelsoft Enterprises Ltd.
Copyright © Chief Delphi