Chief Delphi

Chief Delphi (http://www.chiefdelphi.com/forums/index.php)
-   CAN (http://www.chiefdelphi.com/forums/forumdisplay.php?f=185)
-   -   CAN reliability (http://www.chiefdelphi.com/forums/showthread.php?t=86259)

kamocat 06-07-2010 23:12

CAN reliability
 
One of the things I like about CAN is that it gives us the ability to detect failures (loss of power, loss of communication, etc).
I made a program to do just that, and log it to a file. After running it for an hour, I got some interesting results. I've linked the code and the log file, so I'll explain the format it is logged in.

First off, the file is created in the top directory, using the name "CAN_status_"+timestamp (In the file I've provided, it is 5:33pm on July 6th.)

Anyways, for each entry (each line), there is the status, the list of devices which the status applies, and the timestamp.
The possible statuses are:
  • lost (communication with this Jaguar has been lost)
  • power up (power has been cycled since this was last checked)
  • got comms (communication with Jaguar has resumed, but the interruption was not due to loss of power)
  • brown out (voltage fault)
  • over temp (temperature fault)
  • over current (current fault)

I was surprised at the number of interruptions there were, seeing as the robot was undisturbed during this hour. It seems each of the black jaguars (devices 10, 11, and 12) had an interruption in communication about once a minute, and the tan jag (device 13) had no interruptions whatsoever.
The interruptions seem to be on the scale of 200ms.

Why might there be communication interruptions on an undisturbed robot? Is enumeration a flaky thing?

Bot190 10-07-2010 11:12

Re: CAN reliability
 
My team used CAN this year on our robot and found some issues with connection. We found that when polling for information while sending output information to fast the jaguars would lose connection briefly. The bandwidth afforded by a serial connection is a lot less than that of a CAN connection. This creates a big chokepoint on sending and receiving information from the jaguars.

Radical Pi 10-07-2010 12:14

Re: CAN reliability
 
I haven't seen the kind of information in your program, but I've found that CAN is VERY unreliable if there are any problems on your robot that affect the code (primarily its execution speed), the jags will become intermittent (watchdogs I think)

When you were running this, was the robot disabled? If so, is there any change when enabled and you can see if the status LEDs blink?

Ether 10-07-2010 14:11

Re: CAN reliability
 
Quote:

The bandwidth afforded by a serial connection is a lot less than that of a CAN connection.
RS232 and CAN are both serial connections. But their speeds may differ.

Does anyone know the max speed supported for RS232 by the various drivers provided (LabVIEW, Java, C++), and do these drivers offer interrupt-driven receive and transmit buffers?

If 115Kbps and interrupt-driven software buffers are supported by the RS232 drivers, it seems that should be adequate for both commands and data acquisition at 50Hz for 5 motors (depending on how efficiently the data is encoded).


~

~

Bot190 10-07-2010 16:57

Re: CAN reliability
 
By serial, i was refering to the connection on the Crio, The CAN bus has a bandwidth of 1Mb/s much greater than that of the serial connection on the Crio.

RyanCahoon 11-07-2010 03:52

Re: CAN reliability
 
Quote:

Originally Posted by Bot190 (Post 968604)
The bandwidth afforded by a serial connection is a lot less than that of a CAN connection.

Marshal - You mention the black jags, so were you using the serial connection as well? (I should say I haven't actually looked at your code)

Does anyone have a 2CAN that they try could the code with? I've been thinking about switching to CAN this year, but if it's going to be flaky, I'm starting to wonder if that's the best idea.

--Ryan

kamocat 11-07-2010 19:26

Re: CAN reliability
 
Okay, let me clear some things up:

I had no driver station connected when this code was run. (I was just testing static connectivity, without dealing with the enabled/disabled stuff. This is to help isolate the components of the control system)

The serial port on the cRIO is capable of 14 KB/s (cRIO-FRC operating instructions, found under c:/program files/National Instruments/NI RIO/manuals/crio-frc_operating_instructions.pdf)
The CAN transceivers on the Jaguars are capable of 1 MB/s (page 7 of http://www.luminarymicro.com/index.p...e&Itemid=59 1) It should also be noted that the physical limit for the CAN network when using a Black Jaguar as a master is only 16 devices.


EDIT:
Sorry for the confusion, I had several facts wrong. 14 kb/s is the minimum speed of the CAN transceivers in the jaguars.
Thank you Ether for catching my mistakes.

Radical Pi 11-07-2010 21:41

Re: CAN reliability
 
Could you provide screenshots of your VIs? I'd be interested in writing a C++ equivalent to compare with when I get my hands on the programming laptop and the robot at the same time (I don't have a copy of LabVIEW at home)

kamocat 11-07-2010 23:01

Re: CAN reliability
 
Here's the screenshots. My first post has the basic concept.

I'm using a queue to ensure that the logging process does not slow down the main loop.
The semaphore is opened at the beginning because it is used in the CAN Jaguar VIs to prevent conflicts on the RS232 bus.


This is just a method of filtering out the elements from the array. Only device numbers corresponding to TRUE elements in the boolean array are extracted.


Here's where the message is added to the queue. Where I interleave the numbers with the commas, I should change to "array to spreadsheet string".
If the array of device numbers is empty, nothing is added to the queue.


Here's the actual logging VI. I use the "preview element" function to force the VI to wait until there is an element in the queue, and then it flushes it into an array. The "write to text file" function automatically puts a line return after each element in the array. The reason it opens and closes the file every time is this ensures the write is actually saved to file at that time. (The cRIO caches the writes into large sections, on the scale of 8 or 16KB)

Ether 12-07-2010 11:55

Re: CAN reliability
 
Quote:

Originally Posted by Bot190 (Post 968604)
The bandwidth afforded by a serial connection is a lot less than that of a CAN connection. This creates a big chokepoint on sending and receiving information from the jaguars.

cRIO RS232 hardware can do 115200 baud.

That's approximately 11,520 data bytes per second.

At 50Hz, that's 230 bytes every 20ms iteration (230 bytes each direction, xmit & recv).

Are you trying to send or receive more than 230 bytes each iteration?

~

kamocat 12-07-2010 14:02

Re: CAN reliability
 
Quote:

Originally Posted by Ether (Post 968811)
cRIO RS232 hardware can do 115200 baud.

That's approximately 11,520 data bytes per second.

At 50Hz, that's 230 bytes every 20ms iteration (230 bytes each direction, xmit & recv).

Are you trying to send or receive more than 230 bytes each iteration?

~

Let's use the voltage mode "set vbus" command, and assume there message sent over RS232 is the very same message sent over CAN.
The sections are:
Start of Frame (1 bit)
Arbitration field (32 bits)
Control field (6 bits)
Data field (vbus is a 8.8 signed fixed-point number, so that's 16 bits)
Cyclic Redundancy Check field (15 bits)
Acknowledge field (2 bits)
End of Frame (7 bits)

That is 79 bits, or 10 bytes. 230 / 10 is 23. At least for the black Jaguar controlled network, the physical limit for devices is 16. (I don't know the physical limit for 2CAN)
Therefore, at least during normal use, the RS232 communication will have enough throughput. Now, during startup, it may take a little longer. I'll get theoretical stats for how long it should take to initialize a Jaguar for position control in just a bit. (Position control is something that takes a lot of initialization.)
EDIT: I was assuming the time it takes for the message to be sent on the 1Mb/s CAN bus is negligible, but I suppose it could be included. 100 bits / 1Mb/s is about 0.1 ms. (The reason for multiplying it by 10, not 8, is to approximately account for the bit stuffing.) Let's make that 0.2ms to allow for a reply message. 0.0002s * 115,200 bits/s means there's about 23 bits of wait time. I'll round that up to three bytes. So it's 230 / 13, which is a little under 18. If you're using speed mode, that would now be 15 bytes. 230 / 15 is a little over 15, and you're almost at the limit there of 16 devices per network.


Now, I've been assuming worst-case scenario: that the entire CAN message is sent over RS232. It's quite possible that only the arbitration and data fields are being sent. The arbitration field is 4 bytes. For "set vbus", the data field is 2 bytes. Add in the 3 bytes of wait time, and you have 5 bytes. 230 / 5 is 46.
It would then be a 7-byte message for set speed. 230 / 7 is about 33.


NOTE: I am following the Robert Bosch CAN standard. The poster I'm looking at is the vector "CAN Protocol Reference Chart" which you can order (free, I think) from can-solutions.com

kamocat 12-07-2010 14:57

Re: CAN reliability
 
Okay, so on to initialization. Here's a list of commands that would be logical to use when configuring a Jaguar for position control. (I'm choosing position control because it has the most things to configure)
  1. Firmware version (0 bytes data)
  2. Position Mode Enable (0 bytes data)
  3. Position reference (0 bytes data)
  4. Proportional Constant (4 bytes data)
  5. Integral Constant (4 bytes data)
  6. Derivative Constant (4 bytes data)
  7. Encoder lines or Potentiometer turns (4 bytes data)
  8. Break/Coast (1 byte data)
  9. Soft limit switches (1 byte data)
  10. Forward soft limit (5 bytes data)
  11. Reverse soft limit (5 bytes data)
  12. Position Set (4 bytes data)
The sum of data bytes being sent here is 32 bytes, though it is being sent over 12 messages.
If the whole CAN message is sent through RS232, that should be (14*(8+3)+32)bytes / 11,520 bytes/s or 16ms to initialize a Jaguar for position mode.

If only the arbitration field and the data is sent, it should be (14*(4+3)+32)bytes / 11,520 bytes/s or 9ms to initialize a Jaguar for position mode.

So, now that I have some theoretical calculations to compare it to, I shall make some tests.

Bot190 12-07-2010 14:59

Re: CAN reliability
 
Looking at the Can Jaguar library for C++, The maximum size of a sent message is 10 bytes, as was previously determined, and the maximum size of a received message is 14 bytes.
This assumes that a separate message is sent for every request to every jaguar. In this experiment 5 jaguars were used, this would limit the requests for information to 4, but limits the data being received to 3 messages.
Please correct me if I'm not thinking about this right, or if anything looks off.

kamocat 12-07-2010 15:41

Re: CAN reliability
 
Quote:

Originally Posted by Bot190 (Post 968822)
Looking at the Can Jaguar library for C++, The maximum size of a sent message is 10 bytes, as was previously determined, and the maximum size of a received message is 14 bytes.
This assumes that a separate message is sent for every request to every jaguar. In this experiment 5 jaguars were used, this would limit the requests for information to 4, but limits the data being received to 3 messages.
Please correct me if I'm not thinking about this right, or if anything looks off.

It seems I don't quite understand their thinking. The maximum length of the data field in CAN is 8 bytes, so the maximum length should be 12 bytes. (I suppose it's true the Jaguars don't have any messages with data fields longer than 5 bytes)
Perhaps there is some error handling in the message returned, which would be added on by the main Black Jaguar?

You are correct that a separate message is sent to each jaguar, in most situations. There are some messages (heartbeat and enumerate) that are sent to all devices. I'm pretty sure that the main Black Jaguar acts both as the master and a slave on the CAN bus.
I don't understand where your numbers (4 and 3) came from. Could you please elaborate?

Bot190 12-07-2010 16:03

Re: CAN reliability
 
If every data request is 10 bytes, and your requesting it from 5 jaguars, 10 bytes * 5 jaguars = 50 bytes, 230 bytes / 50 bytes = 4.6, rounded down to 4 messages.

Given 14 bytes per response for 5 jaguars, 14 bytes * 5 jaguars = 70 bytes, 230 bytes / 70 bytes = 3.3 rounded down to 3 messages. The size of the Messages was based off the C++ Jaguar library, in the JaguarCANDriver.h file if you want to look.

Radical Pi 12-07-2010 16:25

Re: CAN reliability
 
Just a quick confirmation for your assumption about the serial-to-can bridge. Assuming FIRST didn't change too much from the default Jaguar firmware, I'm looking at the default source code right now and it looks like it directly resends the body of the serial message over the CAN bus if the message is intended for another jaguar or if it's meant for all the jaguars. It looks like the CAN headers are added on in the process, so the serial message is only containing the device ID and the message itself. If the message is intended for the bridging jaguar, it never reaches the CAN bus and is just directly processed.

kamocat 12-07-2010 16:44

Re: CAN reliability
 
Quote:

Originally Posted by Radical Pi (Post 968827)
Just a quick confirmation for your assumption about the serial-to-can bridge. Assuming FIRST didn't change too much from the default Jaguar firmware, I'm looking at the default source code right now and it looks like it directly resends the body of the serial message over the CAN bus if the message is intended for another jaguar or if it's meant for all the jaguars. It looks like the CAN headers are added on in the process, so the serial message is only containing the device ID and the message itself. If the message is intended for the bridging jaguar, it never reaches the CAN bus and is just directly processed.

Really? That's not what I expected, but I'm glad you looked it up.
I suppose this means there could be a small difference in communication speed between the bridging jaguar and the rest.

Bot190:
I understand your reasoning now, thank you.
From this new information, it appears that a 6-byte data field can be sent, but a 10 byte data field can be received. The maximum data field length in CAN is normally 8 bytes, so perhaps those last two bytes are used for something else.

kamocat 12-07-2010 17:38

Re: CAN reliability
 
1 Attachment(s)
Okay, I tested how long it took to configure a Jaguar for Position mode.


Clearly, there is something here I'm not accounting for. My estimate was 16ms, and it's turning out to be 172ms.
The figure is very consistent, always 172 or 173 with this combination of actions.
There is no difference in time between the black or tan jags, nor between the bridging jag and the others.

(For some reason, the Device Query action was taking 500ms, and not returning anything, so took that out to make the "total time elapsed" a relevant figure.)

EDIT: Device Query now seems to be taking about 172ms, but still not returning anything.
Anyways, when I run this VI in a loop, with DS Comms running in parallel, it seems to take around 220ms disabled, 240ms enabled. (teleop vs autonomous doesn't make a difference) However, it has a LOT more jitter; the time varies by at least 10ms each way.

kamocat 13-07-2010 16:45

Re: CAN reliability
 
I looked up the baud rate of the Black Jaguars, and it matches the cRIO at 115,200 bits/s. (There is no min or max listed, just the typical, so I think it MUST go at this speed.)

Perhaps the VIs are sending a lot more messages than we thought.
I count two sends and two receives in the "Jag Open". Add one more to each if you're in speed mode, or several more in voltage mode.
It appears that data must be requested from the Black Jaguar; it doesn't automatically send the data back. (I don't know why else CAN Receive would require the arbitration field, referred to here as the message ID)

Radical Pi 13-07-2010 23:37

Re: CAN reliability
 
You're right about the baud, both the black jag code and the cRIO driver init at 115,200.

I'm not sure how the LabVIEW code runs, but I see it a bit different in the C++ version. 3 messages are sent for each Jag getting an init. First is a firmware version request. 2nd, if in Voltage mode, is a message to enter voltage mode (for the other modes, that message is sent when telling the jag to enable PID). The 3rd message is telling the jag that the data line for speed is the encoder (the comment says that the jag requires it for all modes).

Normally, each message, either sending data or requesting data, uses 2 packets. On a sending command (such as setting the speed reference), the command is sent and then the jag replies with an ACK packet. On a requesting command (such as firmware version), the ACK packet is replaced with the reply, and no ACKs are sent. For some reason, the Voltage mode enabler (but not the speed, current, or position enablers. bug?) doesn't check for an ACK packet, so only one packet is used in the code, but according to the Jag firmware an ACK is still sent, so that should still count for 2 packets.

So, I count 6 packets traveling over the serial connection for each jag during init. All other commands use 2 packets , either a single set or a single get (except for setting PID values and configuring the position limits, which use 3 send messages each, so 6 packets each).

One slight correction to my earlier post about what goes over the serial. The arbitration field IS included in the serial packets. It holds the command for the jaguar as well as the device id as a 32-bit unsigned int. Here's my count for the size of the serial packet: 1 byte start of packet, 1 byte for size of packet, 4 bytes for message ID, and then 0 to 4 bytes of data, for a total of 6-10 bytes per packet.

ACK messages contain no data, so 6 bytes large. The original sent message could be 6-10 bytes still, so a single send message totals to 12-16 bytes.

Requests for data from the jag also carry no data, so those again are 6 bytes. Responses always carry at least one byte of data, so those total to 13-16 bytes.

Assuming worst-case scenario (16 bytes for each command), 900 messages can pass through at 115,200 baud, or a little less than 1 full command (send and receive) every millisecond.

At 1 command per millisecond, I doubt the data transfer itself is the bottleneck in your tests.

woah, that post got a bit out of control. :ahh: Time for bed

Ether 14-07-2010 00:40

Re: CAN reliability
 
Quote:

Originally Posted by Radical Pi (Post 968924)
Assuming worst-case scenario (16 bytes for each command), 900 messages can pass through at 115,200 baud, or a little less than 1 full command (send and receive) every millisecond.

at 115200 baud 8N1 each data byte requires 10 bit-times, so max throughput is 115200/10=11520 data bytes per second, or 11.5 data bytes per millisecond (in each direction, xmit & recv).


~

kamocat 14-07-2010 01:13

Re: CAN reliability
 
Quote:

Originally Posted by Radical Pi (Post 968924)
You're right about the baud, both the black jag code and the cRIO driver init at 115,200.

I'm not sure how the LabVIEW code runs, but I see it a bit different in the C++ version. 3 messages are sent for each Jag getting an init. First is a firmware version request. 2nd, if in Voltage mode, is a message to enter voltage mode (for the other modes, that message is sent when telling the jag to enable PID). The 3rd message is telling the jag that the data line for speed is the encoder (the comment says that the jag requires it for all modes).

Normally, each message, either sending data or requesting data, uses 2 packets. On a sending command (such as setting the speed reference), the command is sent and then the jag replies with an ACK packet. On a requesting command (such as firmware version), the ACK packet is replaced with the reply, and no ACKs are sent. For some reason, the Voltage mode enabler (but not the speed, current, or position enablers. bug?) doesn't check for an ACK packet, so only one packet is used in the code, but according to the Jag firmware an ACK is still sent, so that should still count for 2 packets.

So, I count 6 packets traveling over the serial connection for each jag during init. All other commands use 2 packets , either a single set or a single get (except for setting PID values and configuring the position limits, which use 3 send messages each, so 6 packets each).

One slight correction to my earlier post about what goes over the serial. The arbitration field IS included in the serial packets. It holds the command for the jaguar as well as the device id as a 32-bit unsigned int. Here's my count for the size of the serial packet: 1 byte start of packet, 1 byte for size of packet, 4 bytes for message ID, and then 0 to 4 bytes of data, for a total of 6-10 bytes per packet.

ACK messages contain no data, so 6 bytes large. The original sent message could be 6-10 bytes still, so a single send message totals to 12-16 bytes.

Requests for data from the jag also carry no data, so those again are 6 bytes. Responses always carry at least one byte of data, so those total to 13-16 bytes.

Assuming worst-case scenario (16 bytes for each command), 900 messages can pass through at 115,200 baud, or a little less than 1 full command (send and receive) every millisecond.

At 1 command per millisecond, I doubt the data transfer itself is the bottleneck in your tests.

woah, that post got a bit out of control. :ahh: Time for bed

Radical Pi, that's good to know.
The only thing about the messages that doesn't explain is how the limit switch settings are modified. I'll quote from Luminary Micro's Firmware Development Package User's Guide.
Quote:

Forward Soft Limit Switch
This setting takes two values. The first is 16.16 signed fixed-point value that is the position of the forward soft limit switch. The second is an 8-bit value that specifies if the motor position must be greater than or less than the position of the forward soft limit switch. Greater than is encoded as 0
and less than is encoded as 1. Less than should be used if positive voltage results in the position increasing and greater than if positive voltage results in the position decreasing.
If this command is sent with no data, the motor controller will respond by sending this same message to report the position of the forward soft limit switch.
This has not changed in the FRC firmware.


Also, I'm sure the Black Jaguar driver is the same for all languages; we use the same .out file, don't we?



So, what's left to cause the commands to take so long?
Bogging down of the cRIO processor? (I don't think so; this is the only thing running on it)
Slow response time on the Jaguar? It has a 50 Mhz processor.

I don't know what the issue could be.

Is there anyone who has a robot available that can confirm my results? (I don't care if you use my code or create your own)
From another standpoint, it's remotely possible my code is at fault. I can test anyone's LabVIEW code for them. If you tell me how to deploy the compiled code over FTP, then I can test code written in C++ and Java as well. (If you didn't notice, I have lots of time)

Radical Pi 14-07-2010 09:25

Re: CAN reliability
 
My mistake on the serial bandwidth. The point still stands though that it's not an issue.

I see the 5 bytes too of data in the soft limits code for C++ as well.

Yes, both languages use the same .out file, but the driver only handles pulling the data off of the serial line and splitting messageID from data. The CAN protocol is still defined by user code.

I think I may have a reason for the slowdown and the jitter. In the C++ code, all requests to the CAN driver are routed through either FRC_NetworkCommunication_JaguarCANDriver_sendMessa ge, or FRC_NetworkCommunication_JaguarCANDriver_receiveMe ssage. Also, when the driver is loaded by the OS, the init calls FRC_NetworkCommunication_JaguarCANDriver_registerI nterface with a pointer to the driver class. By the names, I'd assume that all messages are run through the NetworkCommunication library before they get to the driver itself, most likely so a disabled robot cannot continue to send CAN messages. NetworkCommunication is searching for a driver station and communicating with the driver station, so it could account for the delay and the jitter.

As a test, could you put a timer around setTransaction, or even better around the receiveMessage inside setTransaction? receiveMessage directly calls the NetworkCommunication library, so it would be the best place to time the command

One quick question. I see a few "trusted" commands (Enable and Set for each of the control modes) which get an extra 2 bytes of 0s at the beginning of their data section. I don't have the source to the jag firmware that reads Trusted commands, so I'm wondering if you know what trusted means and why it should be used

EricVanWyk 14-07-2010 10:02

Re: CAN reliability
 
Quote:

Originally Posted by Radical Pi (Post 968954)
One quick question. I see a few "trusted" commands (Enable and Set for each of the control modes) which get an extra 2 bytes of 0s at the beginning of their data section. I don't have the source to the jag firmware that reads Trusted commands, so I'm wondering if you know what trusted means and why it should be used

"Trusted" communication with the Jaguars is how we assert safety in a CAN enabled system. If you used CAN last year, you used trusted mode. In untrusted mode, an extra PWM cable is attached to each CAN jaguar just to tell that the system is still up and good - untrusted mode was not legal for 2010 FRC use.

There is a security layer that inserts some extra stuff between the commands you send and the data sent to the 2CAN/BlackJagBridge. I won't go in to details, it is what prevents someone from e.g. putting rogue firmware on a 2CAN and driving the robot while disabled.

kamocat 14-07-2010 14:06

Re: CAN reliability
 
Quote:

Originally Posted by Radical Pi (Post 968954)
I think I may have a reason for the slowdown and the jitter. In the C++ code, all requests to the CAN driver are routed through either FRC_NetworkCommunication_JaguarCANDriver_sendMessa ge, or FRC_NetworkCommunication_JaguarCANDriver_receiveMe ssage. Also, when the driver is loaded by the OS, the init calls FRC_NetworkCommunication_JaguarCANDriver_registerI nterface with a pointer to the driver class. By the names, I'd assume that all messages are run through the NetworkCommunication library before they get to the driver itself, most likely so a disabled robot cannot continue to send CAN messages. NetworkCommunication is searching for a driver station and communicating with the driver station, so it could account for the delay and the jitter.

As a test, could you put a timer around setTransaction, or even better around the receiveMessage inside setTransaction? receiveMessage directly calls the NetworkCommunication library, so it would be the best place to time the command.

Well, what if we changed that? We could open up CAN Send and CAN Receive and make them call the Jaguar plugin directly.

First, however, I'll try what you suggested. But I think I'll actually want it around the "send" and "receive" VIs, because the "set transaction" and "get transaction" use both. However, it might be a good idea to do all of the above: measure how long the transaction is taking as a total, measure the time Send and Receive take to execute, and measure the time in-between them as well.

kamocat 14-07-2010 15:53

Re: CAN reliability
 
The "Clear Ack" VI seems to be taking 16 miliseconds, where as the "send" and "recieve" take 1 or 2.
I'm going to try out the Execution Trace toolkit, and see if I can dig in deeper.

EDIT:
Here's some data collected with the Execution Trace toolkit:
CAN Send takes around 600 microseconds with a message, 10 microseconds without.
CAN Receive takes around 2 miliseconds with a message, 20 microseconds without.
Clear Ack only takes about 15 microseconds.

I guess I'll try changing those DLL calls to directly reference "FRC_BlackJagBridgePlugin.out"

EDIT2: To be honest, it didn't make much difference. I think I'll change it back, however, because I think otherwise it won't let my motors be enabled.

Something that's not quite making sense about this execution trace toolkit is that subVIs are sometimes shown as executing for longer than their owning VI. Does it simply time how long the processor spends on each task, regardless of the VI hierarchy?

kamocat 15-07-2010 14:37

Re: CAN reliability
 
Quote:

Originally Posted by Bot190 (Post 968604)
My team used CAN this year on our robot and found some issues with connection. We found that when polling for information while sending output information to fast the jaguars would lose connection briefly. The bandwidth afforded by a serial connection is a lot less than that of a CAN connection. This creates a big chokepoint on sending and receiving information from the jaguars.

Well, I tried to actually run a robot today. It works pretty well with the default project they provide, but that was just voltage mode.
I modified to to be in speed mode, with synchronous update, and linear control (instead of that weird squared thing they have in arcade drive). I was having trouble with some of my motors not running, so I used Periodic Tasks to poll all four motors on all status, PID, output, and encoder lines.
It turns out that they have exactly the same configuration as the others, but their output seemed to be disabled. It is repeatable, but inconsistent. The only pattern I noticed is that they usually work again if I stop the program and restart, and that it is only these two controllers. They are my rear motors (which are updated first), and in the middle of my CAN bus. Switching the front and reverse motors don't seem to help. Their Device IDs are 11 and 13; the other two are 10 and 12. 13 is my tan jaguar, but the issue usually occurs with 11.
Sometimes, the motors don't run initially, and then kick in when I get to higher speeds. However, I usually don't sit at low speeds, so this may be simply time-related. When their output is disabled, they also report speed as 0, regardless of the actual speed of the wheel. Under normal operations, if their encoder cable was unplugged, PID would cause them to speed up drastically, not cut power.

If you like, I can provide a video of the feedback I'm getting in the periodic tasks loop, or I can make a program to log it as it occurs.


For the record, I've made some significant changes to some CAN VIs. If any of these things create issues, please tell me now.
CAN Jag Open now enables control for ALL modes (not just voltage).
Enable and Disable control now work for ALL modes (including voltage).
The invert direction option now works for ALL modes (not just voltage).
I'm using a copy of "set output" which appends an unsigned byte to the end of the message (for synchronization group. It's actually a bit-mask.)
CAN Motors now uses this "set output with synchro", and also has a "CAN Send" with the system message "Sync Update" with the synchronization group mask as a message. (The value I chose is equivelant to decimal 128. It's the 8th synchronization group.)

In addition to this information, I have an explicit question:
What is a Token message? I noticed that the token messages (which seem to be the output set, enable, and disable for each mode) require two empty bytes at the beginning of each message. I can't find anything like this in Luminary Micro's CAN Specification. Is this where the "trusted mode" comes in? Does the FRC_BlackJagPlugin put some secret number in these two bytes, which the jaguars must have if they are to accept the message?

Radical Pi 15-07-2010 15:56

Re: CAN reliability
 
Is the problem following the Jag or the ID (as in try switching it so the code sends the rear commands to the front motors)

Are you sending trusted SET messages every 100 ms? If a jag doesn't receive one every 100ms it will disable itself

Quote:

Originally Posted by kamocat (Post 969086)
In addition to this information, I have an explicit question:
What is a Token message? I noticed that the token messages (which seem to be the output set, enable, and disable for each mode) require two empty bytes at the beginning of each message. I can't find anything like this in Luminary Micro's CAN Specification. Is this where the "trusted mode" comes in? Does the FRC_BlackJagPlugin put some secret number in these two bytes, which the jaguars must have if they are to accept the message?

Read EricVanWyk's message above your's. I just asked the same thing (It's referred to as trusted mode in the C++ code).

kamocat 15-07-2010 23:22

Re: CAN reliability
 
Quote:

Originally Posted by Radical Pi (Post 969090)
Is the problem following the Jag or the ID (as in try switching it so the code sends the rear commands to the front motors)

Are you sending trusted SET messages every 100 ms? If a jag doesn't receive one every 100ms it will disable itself



Read EricVanWyk's message above your's. I just asked the same thing (It's referred to as trusted mode in the C++ code).

Trusted SET messages every 100ms? You mean these jaguars effectively have TWO heartbeats?
The problem follows the motor controllers around, regardless of whether they are front or rear.
However, i haven't tried rerouting the CAN bus in a different sequence.

UPDATE:
I'm connected to the robot directly over ethernet, and have the driver station running on my computer. I'm not having any issues.
This means it probably has something to do with timeouts; the delay of running live on a computer over wireless was apparently enough to disable one of the Jaguars?

UPDATE 2:
I happened to try unplugging one of the Jaguars, and then plugging it back in. It created some errors, caused some watchdog expirations, and would no longer run. I didn't realize that once a Jaguar is lost, it's gone until you restart the code. I'll see what I can do to fix that.
(It probably just has to do with the encoder and PID settings being replaced)
I'm also going to try to programmatically determine when the Jaguar is not doing what it should. (If the watchdog is alive and a nonzero output setpoint is being sent, and the output setpoint of the controllers is zero, then that motor is disabled and needs to be reenabled.)

NOTE: I've changed this message several times. Please reread if you have not since 9:33.

Radical Pi 16-07-2010 00:34

Re: CAN reliability
 
nvm about the trusted set thing. I just saw the trusted heartbeat message in the API. Not sure why if there is a trusted heartbeat the system would need EN and SET messages to be trusted as well. (Although it seems like the heartbeat doesn't work all the time, namely autonomous for us)

Oh and I didn't mean to change the CAN IDs of the jags, I meant to send commands that are meant for (example) 12 and 13 to 10 and 11 instead to see if it is your logic or the jag

kamocat 17-07-2010 01:04

Re: CAN reliability
 
I got the black-out detection and reconfiguration working. (In speed mode, the Jaguar is up and running with all relevant settings restored in about 200ms )
However, I'm having more trouble detecting situations when the controller is simply disabled (brown-outs, watchdog timeouts, etc.)
What I tried to do was:
If robot is enabled
AND the output setpoint sent is greater than 2^-8
AND the output setpoint (when retreived) is 0
THEN the robot is disabled and needs to be re-enabled.

As you might guess, being disabled does reset the output setpoint of a Jaguar back to 0. (The Jaguar does not remember its last nonzero output when it is re-enabled)

Anyways, I was having some trouble with that, I'm not quite sure what the issue is. Unfortunately, the Jaguars do have a message which tells you whether they are enabled or disabled (though they can tell you what mode of operation).

I have not gone back and tested the robot with the wireless radio, but I may get to that on Monday.
I will be busy over the weekend, and most likely will not make any progress on this for that time. However, please post your thoughts and experiences.

What issues do you want the Jaguars to automatically recover from?
Black-out? (completely reconfigure device)
Brown-out? (re-enable device and re-send output setpoint)
Temporary watchdog timeout? (re-enable device and re-send output setpoint)
CAN cable faulty? (re-enable device and re-send output setpoint)
Others?

Should there be diagnostic messages? Should the robot try to determine what the issue was? (distinguish between brown-out, watchdog, and CAN bus problem?)
Should a robot have a system check on startup to make sure everything is working?

kamocat 18-07-2010 22:10

Re: CAN reliability
 
Well, I got the re-enabling to work. (I had simply forgotten to add the watchdog to the refnum registry.)
I did confirm that the Jaguars disable often when run in the temporary deployment over wireless. However, I found it was actually an issue with the System Watchdog, and so I will resume work on analyzing the Driver Station reliability.

kamocat 26-07-2010 13:00

Re: CAN reliability
 
I noticed that each Jaguar reference has its own unique semaphore (meaning the semaphore only prevents simultaneous messages to the same Jaguar. This implies other Jaguars on the CAN bus can still communicate while this Jaguar is processing.)
I tried sending commands to several Jaguars in parallel, but it didn't execute any faster than when I did it in a FOR loop.
I suspect the CAN messages are taking disproportionately longer when they contain data.

In other news, I created a CAN manager over the weekend, as a central location for getting updates on the status of CAN devices, and so notifications (like "lost comms with Device 12" and "Device 13 rebooted") can be recieved in a timely and synchronized manner.
However, I'm only getting about 12 iterations per second out of it right now. That's probably due to my use of the Enumeration (which takes 64ms plus however long it takes to send and receive the data). I'm hoping to get about 5 times that rate.

EDIT: CAN Receive is SLOW when a Jaguar loses comms. I wonder if I could change the timeout?

Radical Pi 26-07-2010 20:15

Re: CAN reliability
 
The Driver plugin has a semaphore on outgoing data, so they still have to wait in line for the data to leave the serial port. On the receiving line of the cRIO though, messages can arrive out of order and are dispatched to their code as soon as they arrive (or not, that's just what it looks like from the comments). Considering how fast the jags run (main loop every 1ms), and that most messages are handled by the CAN or Serial interrupts, the return is probably fast enough to cause a minimal change in the speed of the commands.

Why don't you put the enumeration on a timer so it only executes every 2 seconds. There isn't really any need to enumerate for every single loop, especially being a slow command

I saw the same slowdown in the C++ code. If 3 jags loose connection it slows down the code enough to set off watchdogs on the jags and knock out the entire network. I plan on writing a wrapper class for CANJaguar that blocks messages if the jag is lost.

kamocat 27-07-2010 16:23

Re: CAN reliability
 
I want to know when a Jaguar is lost, and quickly, so my code can deal with the problem of (a) not getting feedback on that motor, (b) not being able to control that motor, and (c) reenabling/reconfiguring that Jaguar when communication is regained.

I think I'm starting to see why the Jaguars support motor control through PWM with simultaneous feedback through CAN. But that brings the benefit of CAN to nearly nothing, because you now have the issues with the PWM cables: no keying (easy to put in backwards), easy to wire the cable to the wrong controller, cables are fragile and pins bend easily. It's even the same cable as on the GPIO, Relay outputs, and Analog Inputs.


About the Enumeration, though, I wonder if I could cut that wait function shorter? Would it just cut off some of the motor controllers if they were above that value?

Also, I'll retest that series vs parallel thing to see if parallel executes faster when one of the controllers is unplugged.

kamocat 27-07-2010 18:52

Re: CAN reliability
 
Here's something interesting I found in the LabVIEW Help documentation:
Quote:

Input/Output
Input/Output (I/O) calls generally incur a large amount of overhead. They often take much more time than a computational operation. For example, a simple serial port read operation might have an associated overhead of several milliseconds. This overhead occurs in any application that uses serial ports because an I/O call involves transferring information through several layers of an operating system.

The best way to address too much overhead is to minimize the number of I/O calls you make. Performance improves if you can structure the VI so that you transfer a large amount of data with each call, instead of making multiple I/O calls that transfer smaller amounts of data.

For example, if you are creating a data acquisition (NI-DAQ) VI, you have two options for reading data. You can use a single-point data transfer function such as the AI Sample Channel VI, or you can use a multipoint data transfer function such as the AI Acquire Waveform VI. If you must acquire 100 points, use the AI Sample Channel VI in a loop with a Wait function to establish the timing. You also can use the AI Acquire Waveform VI with an input indicating you want 100 points.

You can produce much higher and more accurate data sampling rates by using the AI Acquire Waveform VI, because it uses hardware timers to manage the data sampling. In addition, overhead for the AI Acquire Waveform VI is roughly the same as the overhead for a single call to the AI Sample Channel VI, even though it is transferring much more data.
This may be why the communication is so slow. If we could have higher-level commands integrated into the CAN driver, then we could achieve greater communication speeds.
I think an excellent addition would be the ability to get the same status from multiple controllers, or set the same configuration/output of multiple controllers (with different data for each one).
Unfortunately, this is beyond my current ability; I don't know how to make .out files for the cRIO, and I'm certainly not interested in programming the Jaguars in C.

Ether 27-07-2010 19:30

Re: CAN reliability
 
Quote:

a simple serial port read operation might have an associated overhead of several milliseconds. This overhead occurs in any application that uses serial ports because an I/O call involves transferring information through several layers of an operating system
What "operating system" are they talking about here, in the context of a program running on the cRIO? Does VxWorks have that much overhead for RS232?

~

Radical Pi 27-07-2010 21:04

Re: CAN reliability
 
VxWorks is a real-time OS, so it's designed to have as little overhead as possible when running system calls (such as the Serial Port). I do, however, see where the multiple layers of "OS" come into play in this system. When a CAN message is sent over serial, it is sent to the NetworkCommunication code, is passed from there to BlackJagBridge, which sends raw bytes to the VISA VIs to be sent over the Serial port.

I'm not sure exactly what you mean about high-level communication. If you mean bundling a bunch of messages together and sending them at once, I'm not sure it's possible to do without support from NI/FIRST. The system is currently built to send a single message with each call into the plugin. Trying to lump them together could have strange side-effects, since for example trusted messages wouldn't get signed properly.

I also don't see how much of a difference it would make. The serial port code seems to be set to flush when it gets a chance (I'm not 100% sure about this), so there is still the same amount of accessing the serial port. The only difference is there are fewer calls through the NetworkCommunication and BlackJagBridge libraries

On the .out files, they are the equivalent of .exe files (or .dll files, not quite sure). I know for sure they are built by Wind River whenever it compiles our code, not sure if LabVIEW does the same thing.

Ether 27-07-2010 21:24

Re: CAN reliability
 
Quote:

I'm not sure exactly what you mean about high-level communication.
Just to be clear, although the above post was linked to my post http://www.chiefdelphi.com/forums/sh...2&postcount=37 , the pronoun "you" refers to kamocat's post http://www.chiefdelphi.com/forums/sh...7&postcount=36

kamocat 30-07-2010 13:02

Re: CAN reliability
 
For some reason I was thinking that my joining messages together there could be less data sent over RS232. However, the whole arbitration ID is necessary for every Jaguar, so I guess it wouldn't actually reduce required throughput.
Now I'm just fantasizing here, but scheduling status messages to be periodically sent and stored until retrieval could be a way to get around it. I have the feeling, however, that one could not use a motor controller for this.

kamocat 01-08-2010 01:37

Re: CAN reliability
 
3 Attachment(s)
While I'm waiting for TI and NI to release an update to improve real-world communication speeds, I'm going to start working on a CAN system startup check.

I also realized I forgot to publicly release the code I've been working on, so here it is.
  • The "CAN" folder contains individual tests or VIs that I don't plan to use in robot projects.
  • The "CANJaguar" folder is what I place in user.lib so it's accessible on my functions/controls pallettes and I can easily use them in other projects. It contains the original CANJaguar VI available from FIRST Forget, though many have been modified.
  • The "CAN location tracking" is a project I was working on to have the robot record its location as it travels so it can then map the obstacles around it. (This folder doesn't include the obstacle mapping. I'm thinking of doing that with the Dashboard so the cRIO doesn't have to manipulate and display large arrays).

kamocat 03-08-2010 13:16

Re: CAN reliability
 
1 Attachment(s)
Well, I got the startup test to work, and it does work beautifully.
I've uploaded "CAN Jaguar" again. (I had to make some modifications to the "get status" VI to honor the "inverted" property)

kamocat 14-09-2010 22:47

Re: CAN reliability
 
I think it's time to conclude this thread.

I'll summarize what I've learned:
  • CAN messages take a long time over serial due to the current implementation. (messages wait for errors on the CAN bus before completing). If there is an error, the function takes longer still. There is a driver update expected to help alleviate the issue. I have no data on the 2CAN module.
  • There are multiple issues with the Jaguar firmware:
    • Speed is dealt with in RPM (the documentation states revolutions per second). Although only the encoder can be used to calculate speed, the "speed reference" configuration is required. The "Speed" status is reported as zero until the Jaguar has been enabled in speed mode. The speed status reports positive regardless of the direction the encoder is turning.
    • The "Position" status is reported as zero until the Jaguar has been enabled in position mode.
    • There is no status message to tell if the Jaguar is enabled or disabled.
    • The "control mode" status is not implemented.
    • "Device Query" takes half a second to execute, but returns nothing.
  • The current sensor is only accurate within 1 amp. This makes current control unsuitable for all FRC motors smaller than the CIM.
  • There are occasional reliability issues on startup. Sometimes only some of the motor controllers will function. When my auto-configuration utility is running, the quickest way to fix this is to cycle power to the nonfunctioning devices.


Because of these issues, I would not recommend CAN to a team for competition use, and I will not be teaching a CAN workshop at our preseason kickoff.

However, I will continue working with CAN and publishing my work on Chief Delphi.

Radical Pi 14-09-2010 23:18

Re: CAN reliability
 
Quote:

Originally Posted by kamocat (Post 974195)
There are occasional reliability issues on startup. Sometimes only some of the motor controllers will function. When my auto-configuration utility is running, the quickest way to fix this is to cycle power to the nonfunctioning devices.

What kind of issues were you having? The only startup problems we saw with the Jaguars was the old ID reset bug, as well as device firmware version returning 0.

kamocat 15-09-2010 20:06

Re: CAN reliability
 
Quote:

Originally Posted by Radical Pi (Post 974196)
What kind of issues were you having? The only startup problems we saw with the Jaguars was the old ID reset bug, as well as device firmware version returning 0.

I don't know. I haven't found any causes for the failure, and no solid repeatability. However, sometimes only some of my Jaguars will work (which is why I made the startup test). A Jaguar may spontaneously start working several minutes later.

I used to think that this was due to occasional watchdog timeouts. However, enabling control to the Jaguar does not help, and a watchdog timeout does not disable control to a Jaguar (it just cuts out the heartbeat).

Ether 15-09-2010 21:40

Re: CAN reliability
 
Quote:

Originally Posted by kamocat (Post 974265)
sometimes only some of my Jaguars will work

Is it always the same Jags which sometimes don't work ?

Try swapping them with Jags that always work and see what happens.




kamocat 16-09-2010 21:23

Re: CAN reliability
 
I'll record which Jags it is next time it happens.

jhersh 10-12-2010 19:43

Re: CAN reliability
 
I'd like to address the implications of the lessons learned for the upcoming season...

Quote:

Originally Posted by kamocat (Post 974195)
  • CAN messages take a long time over serial due to the current implementation. (messages wait for errors on the CAN bus before completing). If there is an error, the function takes longer still. There is a driver update expected to help alleviate the issue. I have no data on the 2CAN module.


NI and TI are each working to improve performance of the parts of the system they are responsible for.

As for the timeout, the default timeout for a single device transaction has been reduced from 100ms to a much more reasonable 10ms.

The 2CAN plugin is another place where performance needs to be considered (but has not yet been addressed... support from CTRE is needed).

Quote:

Originally Posted by kamocat (Post 974195)
  • There are multiple issues with the Jaguar firmware:
    • Speed is dealt with in RPM (the documentation states revolutions per second). Although only the encoder can be used to calculate speed, the "speed reference" configuration is required. The "Speed" status is reported as zero until the Jaguar has been enabled in speed mode. The speed status reports positive regardless of the direction the encoder is turning.

This was an issue with the library implementation and has been fixed for 2011.

Quote:

Originally Posted by kamocat (Post 974195)
    • The "Position" status is reported as zero until the Jaguar has been enabled in position mode.

Same issue, also fixed for 2011.

Quote:

Originally Posted by kamocat (Post 974195)
    • There is no status message to tell if the Jaguar is enabled or disabled.

This is sort of implied... if you can talk to it and you told it to enable, it's enabled. If it's not enabled, but you told it to be, then you probably can't talk to it to ask it if it's enabled. ;)

Quote:

Originally Posted by kamocat (Post 974195)
    • The "control mode" status is not implemented.

It is for 2011.

Quote:

Originally Posted by kamocat (Post 974195)
    • "Device Query" takes half a second to execute, but returns nothing.

I'm not sure what the utility of this message is... I've never tried it. Are you implementing this yourself?

Quote:

Originally Posted by kamocat (Post 974195)
  • The current sensor is only accurate within 1 amp. This makes current control unsuitable for all FRC motors smaller than the CIM.

This is a known feature request at TI. No change in this respect for 2011.

Quote:

Originally Posted by kamocat (Post 974195)
  • There are occasional reliability issues on startup. Sometimes only some of the motor controllers will function. When my auto-configuration utility is running, the quickest way to fix this is to cycle power to the nonfunctioning devices.

I haven't experienced this behavior. Any information that would help reproduce the issue would allow me to investigate.

We are working hard to make using CAN with Jaguar a competitive advantage and not a liability. We need the help of people like you to identify issues and bring them to our attention.

Thanks for all of your investigation and feedback!

-Joe

kamocat 10-12-2010 23:05

Re: CAN reliability
 
Thanks for posting this reply! I'm under oath of NDA until the info is made publicly available.

The "Device Query" message is something that would be useful if there was more than one type of device that worked on CAN.
Quote:

Originally Posted by SW-RDK-BDC-UG-5228.pdf
Device Query
This command indicates that the motor controller should return some basic information about itself. This command uniquely addresses a device and only the addressed device will respond to this message. In response to this message, the motor controller will send back eight bytes of data. The first byte indicating the motor controller’s function and the second indicates the manufacturer. The remaining bytes are reserved for future use.

I'm assuming there's a typo in here, and they meant "the device's function".

I'm not sure if this command has practical usage. If you send a CAN device a message, using the wrong device type and manufacturer in the device ID, will it respond with the wrong device type and manufacturer?

Mike Copioli 12-12-2010 02:19

Re: CAN reliability
 
Quote:

Originally Posted by jhersh (Post 984800)

The 2CAN plugin is another place where performance needs to be considered (but has not yet been addressed... support from CTRE is needed).


A new plugin and firmware version for the 2CAN will be released shortly. Joe you should be hearing from Omar soon regarding this if not already. I believe he has some CAN traffic snap shots that you may find helpful.

jhersh 12-12-2010 02:35

Re: CAN reliability
 
Quote:

Originally Posted by Mike Copioli (Post 985014)
A new plugin and firmware version for the 2CAN will be released shortly. Joe you should be hearing from Omar soon regarding this if not already. I believe he has some CAN traffic snap shots that you may find helpful.

Sounds great... Thanks Mike!

ozrien 12-12-2010 15:15

Re: CAN reliability
 
New Firmware and plugin is available at http://www.crosstheroadelectronics.com/2CAN.htm

Marshall (or any one), have you observed any latencies issue with a 2CAN? I know you've done the extensive testing with the BlackJaguar.

One reproducible problem I can report is if I create a Robot app that drives throttle for several jaguars (say 1-10). Then only put one jaguar (id 1) on the CAN bus. When I do this the Jaguar does not drive at all (massive delays and timeout issues). Looking at the CAN bus it looks like the cRIO is attempting to reestablish tokenization with the other jags. When this happens the throttle CAN frames are not going out for Jaguar 1, which causes Jaguar 1 to timeout as well.

This tokenization that occurs to keep the Jaguars enabled seems to block the transmit requests from the user's code (CANJagar::sendmessage) if the Jag on the other side doesn't send it's ack with token response, presumably because cRIO internal logic is waiting on the response. Now this could happen because of an intermittent cable issue somewhere, or because a jaguar is left unplugged (deliberately or accidently) or because of software.

Mike Copioli 12-12-2010 18:07

Re: CAN reliability
 
Also note that there are now two different versions of the 2CAN firmware. Please be sure to download the correct version based on your application. A seperate link to a zip containing the plugin and the 2CAN firmware has been added under the Downloads for FIRST section at the bottom of the page.

kamocat 13-12-2010 00:35

Re: CAN reliability
 
Quote:

Originally Posted by ozrien
Marshal (or any one), have you observed any latencies issue with a 2CAN? I know you've done the extensive testing with the BlackJaguar.

I have not tested with 2CAN.
I know I was originally complaining about the low throughput of serial port. (The 2009 Beta threads are gone now, or I would link to it.) However, the 2CAN wasn't released until January, by which time I was busy with the new season. By the time I was purchasing more Jaguars to make a test robot in late spring, the Black Jaguars were the only ones available.
And so, having some of the capabilities of CAN with almost no increase in cost, I couldn't rationalize buying a 2CAN.
That's not to say I wouldn't like one.


All times are GMT -5. The time now is 03:59.

Powered by vBulletin® Version 3.6.4
Copyright ©2000 - 2017, Jelsoft Enterprises Ltd.
Copyright © Chief Delphi