Chief Delphi

Chief Delphi (http://www.chiefdelphi.com/forums/index.php)
-   CAN (http://www.chiefdelphi.com/forums/forumdisplay.php?f=185)
-   -   CAN reliability (http://www.chiefdelphi.com/forums/showthread.php?t=86259)

kamocat 17-07-2010 01:04

Re: CAN reliability
 
I got the black-out detection and reconfiguration working. (In speed mode, the Jaguar is up and running with all relevant settings restored in about 200ms )
However, I'm having more trouble detecting situations when the controller is simply disabled (brown-outs, watchdog timeouts, etc.)
What I tried to do was:
If robot is enabled
AND the output setpoint sent is greater than 2^-8
AND the output setpoint (when retreived) is 0
THEN the robot is disabled and needs to be re-enabled.

As you might guess, being disabled does reset the output setpoint of a Jaguar back to 0. (The Jaguar does not remember its last nonzero output when it is re-enabled)

Anyways, I was having some trouble with that, I'm not quite sure what the issue is. Unfortunately, the Jaguars do have a message which tells you whether they are enabled or disabled (though they can tell you what mode of operation).

I have not gone back and tested the robot with the wireless radio, but I may get to that on Monday.
I will be busy over the weekend, and most likely will not make any progress on this for that time. However, please post your thoughts and experiences.

What issues do you want the Jaguars to automatically recover from?
Black-out? (completely reconfigure device)
Brown-out? (re-enable device and re-send output setpoint)
Temporary watchdog timeout? (re-enable device and re-send output setpoint)
CAN cable faulty? (re-enable device and re-send output setpoint)
Others?

Should there be diagnostic messages? Should the robot try to determine what the issue was? (distinguish between brown-out, watchdog, and CAN bus problem?)
Should a robot have a system check on startup to make sure everything is working?

kamocat 18-07-2010 22:10

Re: CAN reliability
 
Well, I got the re-enabling to work. (I had simply forgotten to add the watchdog to the refnum registry.)
I did confirm that the Jaguars disable often when run in the temporary deployment over wireless. However, I found it was actually an issue with the System Watchdog, and so I will resume work on analyzing the Driver Station reliability.

kamocat 26-07-2010 13:00

Re: CAN reliability
 
I noticed that each Jaguar reference has its own unique semaphore (meaning the semaphore only prevents simultaneous messages to the same Jaguar. This implies other Jaguars on the CAN bus can still communicate while this Jaguar is processing.)
I tried sending commands to several Jaguars in parallel, but it didn't execute any faster than when I did it in a FOR loop.
I suspect the CAN messages are taking disproportionately longer when they contain data.

In other news, I created a CAN manager over the weekend, as a central location for getting updates on the status of CAN devices, and so notifications (like "lost comms with Device 12" and "Device 13 rebooted") can be recieved in a timely and synchronized manner.
However, I'm only getting about 12 iterations per second out of it right now. That's probably due to my use of the Enumeration (which takes 64ms plus however long it takes to send and receive the data). I'm hoping to get about 5 times that rate.

EDIT: CAN Receive is SLOW when a Jaguar loses comms. I wonder if I could change the timeout?

Radical Pi 26-07-2010 20:15

Re: CAN reliability
 
The Driver plugin has a semaphore on outgoing data, so they still have to wait in line for the data to leave the serial port. On the receiving line of the cRIO though, messages can arrive out of order and are dispatched to their code as soon as they arrive (or not, that's just what it looks like from the comments). Considering how fast the jags run (main loop every 1ms), and that most messages are handled by the CAN or Serial interrupts, the return is probably fast enough to cause a minimal change in the speed of the commands.

Why don't you put the enumeration on a timer so it only executes every 2 seconds. There isn't really any need to enumerate for every single loop, especially being a slow command

I saw the same slowdown in the C++ code. If 3 jags loose connection it slows down the code enough to set off watchdogs on the jags and knock out the entire network. I plan on writing a wrapper class for CANJaguar that blocks messages if the jag is lost.

kamocat 27-07-2010 16:23

Re: CAN reliability
 
I want to know when a Jaguar is lost, and quickly, so my code can deal with the problem of (a) not getting feedback on that motor, (b) not being able to control that motor, and (c) reenabling/reconfiguring that Jaguar when communication is regained.

I think I'm starting to see why the Jaguars support motor control through PWM with simultaneous feedback through CAN. But that brings the benefit of CAN to nearly nothing, because you now have the issues with the PWM cables: no keying (easy to put in backwards), easy to wire the cable to the wrong controller, cables are fragile and pins bend easily. It's even the same cable as on the GPIO, Relay outputs, and Analog Inputs.


About the Enumeration, though, I wonder if I could cut that wait function shorter? Would it just cut off some of the motor controllers if they were above that value?

Also, I'll retest that series vs parallel thing to see if parallel executes faster when one of the controllers is unplugged.

kamocat 27-07-2010 18:52

Re: CAN reliability
 
Here's something interesting I found in the LabVIEW Help documentation:
Quote:

Input/Output
Input/Output (I/O) calls generally incur a large amount of overhead. They often take much more time than a computational operation. For example, a simple serial port read operation might have an associated overhead of several milliseconds. This overhead occurs in any application that uses serial ports because an I/O call involves transferring information through several layers of an operating system.

The best way to address too much overhead is to minimize the number of I/O calls you make. Performance improves if you can structure the VI so that you transfer a large amount of data with each call, instead of making multiple I/O calls that transfer smaller amounts of data.

For example, if you are creating a data acquisition (NI-DAQ) VI, you have two options for reading data. You can use a single-point data transfer function such as the AI Sample Channel VI, or you can use a multipoint data transfer function such as the AI Acquire Waveform VI. If you must acquire 100 points, use the AI Sample Channel VI in a loop with a Wait function to establish the timing. You also can use the AI Acquire Waveform VI with an input indicating you want 100 points.

You can produce much higher and more accurate data sampling rates by using the AI Acquire Waveform VI, because it uses hardware timers to manage the data sampling. In addition, overhead for the AI Acquire Waveform VI is roughly the same as the overhead for a single call to the AI Sample Channel VI, even though it is transferring much more data.
This may be why the communication is so slow. If we could have higher-level commands integrated into the CAN driver, then we could achieve greater communication speeds.
I think an excellent addition would be the ability to get the same status from multiple controllers, or set the same configuration/output of multiple controllers (with different data for each one).
Unfortunately, this is beyond my current ability; I don't know how to make .out files for the cRIO, and I'm certainly not interested in programming the Jaguars in C.

Ether 27-07-2010 19:30

Re: CAN reliability
 
Quote:

a simple serial port read operation might have an associated overhead of several milliseconds. This overhead occurs in any application that uses serial ports because an I/O call involves transferring information through several layers of an operating system
What "operating system" are they talking about here, in the context of a program running on the cRIO? Does VxWorks have that much overhead for RS232?

~

Radical Pi 27-07-2010 21:04

Re: CAN reliability
 
VxWorks is a real-time OS, so it's designed to have as little overhead as possible when running system calls (such as the Serial Port). I do, however, see where the multiple layers of "OS" come into play in this system. When a CAN message is sent over serial, it is sent to the NetworkCommunication code, is passed from there to BlackJagBridge, which sends raw bytes to the VISA VIs to be sent over the Serial port.

I'm not sure exactly what you mean about high-level communication. If you mean bundling a bunch of messages together and sending them at once, I'm not sure it's possible to do without support from NI/FIRST. The system is currently built to send a single message with each call into the plugin. Trying to lump them together could have strange side-effects, since for example trusted messages wouldn't get signed properly.

I also don't see how much of a difference it would make. The serial port code seems to be set to flush when it gets a chance (I'm not 100% sure about this), so there is still the same amount of accessing the serial port. The only difference is there are fewer calls through the NetworkCommunication and BlackJagBridge libraries

On the .out files, they are the equivalent of .exe files (or .dll files, not quite sure). I know for sure they are built by Wind River whenever it compiles our code, not sure if LabVIEW does the same thing.

Ether 27-07-2010 21:24

Re: CAN reliability
 
Quote:

I'm not sure exactly what you mean about high-level communication.
Just to be clear, although the above post was linked to my post http://www.chiefdelphi.com/forums/sh...2&postcount=37 , the pronoun "you" refers to kamocat's post http://www.chiefdelphi.com/forums/sh...7&postcount=36

kamocat 30-07-2010 13:02

Re: CAN reliability
 
For some reason I was thinking that my joining messages together there could be less data sent over RS232. However, the whole arbitration ID is necessary for every Jaguar, so I guess it wouldn't actually reduce required throughput.
Now I'm just fantasizing here, but scheduling status messages to be periodically sent and stored until retrieval could be a way to get around it. I have the feeling, however, that one could not use a motor controller for this.

kamocat 01-08-2010 01:37

Re: CAN reliability
 
3 Attachment(s)
While I'm waiting for TI and NI to release an update to improve real-world communication speeds, I'm going to start working on a CAN system startup check.

I also realized I forgot to publicly release the code I've been working on, so here it is.
  • The "CAN" folder contains individual tests or VIs that I don't plan to use in robot projects.
  • The "CANJaguar" folder is what I place in user.lib so it's accessible on my functions/controls pallettes and I can easily use them in other projects. It contains the original CANJaguar VI available from FIRST Forget, though many have been modified.
  • The "CAN location tracking" is a project I was working on to have the robot record its location as it travels so it can then map the obstacles around it. (This folder doesn't include the obstacle mapping. I'm thinking of doing that with the Dashboard so the cRIO doesn't have to manipulate and display large arrays).

kamocat 03-08-2010 13:16

Re: CAN reliability
 
1 Attachment(s)
Well, I got the startup test to work, and it does work beautifully.
I've uploaded "CAN Jaguar" again. (I had to make some modifications to the "get status" VI to honor the "inverted" property)

kamocat 14-09-2010 22:47

Re: CAN reliability
 
I think it's time to conclude this thread.

I'll summarize what I've learned:
  • CAN messages take a long time over serial due to the current implementation. (messages wait for errors on the CAN bus before completing). If there is an error, the function takes longer still. There is a driver update expected to help alleviate the issue. I have no data on the 2CAN module.
  • There are multiple issues with the Jaguar firmware:
    • Speed is dealt with in RPM (the documentation states revolutions per second). Although only the encoder can be used to calculate speed, the "speed reference" configuration is required. The "Speed" status is reported as zero until the Jaguar has been enabled in speed mode. The speed status reports positive regardless of the direction the encoder is turning.
    • The "Position" status is reported as zero until the Jaguar has been enabled in position mode.
    • There is no status message to tell if the Jaguar is enabled or disabled.
    • The "control mode" status is not implemented.
    • "Device Query" takes half a second to execute, but returns nothing.
  • The current sensor is only accurate within 1 amp. This makes current control unsuitable for all FRC motors smaller than the CIM.
  • There are occasional reliability issues on startup. Sometimes only some of the motor controllers will function. When my auto-configuration utility is running, the quickest way to fix this is to cycle power to the nonfunctioning devices.


Because of these issues, I would not recommend CAN to a team for competition use, and I will not be teaching a CAN workshop at our preseason kickoff.

However, I will continue working with CAN and publishing my work on Chief Delphi.

Radical Pi 14-09-2010 23:18

Re: CAN reliability
 
Quote:

Originally Posted by kamocat (Post 974195)
There are occasional reliability issues on startup. Sometimes only some of the motor controllers will function. When my auto-configuration utility is running, the quickest way to fix this is to cycle power to the nonfunctioning devices.

What kind of issues were you having? The only startup problems we saw with the Jaguars was the old ID reset bug, as well as device firmware version returning 0.

kamocat 15-09-2010 20:06

Re: CAN reliability
 
Quote:

Originally Posted by Radical Pi (Post 974196)
What kind of issues were you having? The only startup problems we saw with the Jaguars was the old ID reset bug, as well as device firmware version returning 0.

I don't know. I haven't found any causes for the failure, and no solid repeatability. However, sometimes only some of my Jaguars will work (which is why I made the startup test). A Jaguar may spontaneously start working several minutes later.

I used to think that this was due to occasional watchdog timeouts. However, enabling control to the Jaguar does not help, and a watchdog timeout does not disable control to a Jaguar (it just cuts out the heartbeat).


All times are GMT -5. The time now is 03:59.

Powered by vBulletin® Version 3.6.4
Copyright ©2000 - 2017, Jelsoft Enterprises Ltd.
Copyright © Chief Delphi