CAN reliability

One of the things I like about CAN is that it gives us the ability to detect failures (loss of power, loss of communication, etc).
I made a program to do just that, and log it to a file. After running it for an hour, I got some interesting results. I’ve linked the code and the log file, so I’ll explain the format it is logged in.

First off, the file is created in the top directory, using the name “CAN_status_”+timestamp (In the file I’ve provided, it is 5:33pm on July 6th.)

Anyways, for each entry (each line), there is the status, the list of devices which the status applies, and the timestamp.
The possible statuses are:

  • lost (communication with this Jaguar has been lost)
  • power up (power has been cycled since this was last checked)
  • got comms (communication with Jaguar has resumed, but the interruption was not due to loss of power)
  • brown out (voltage fault)
  • over temp (temperature fault)
  • over current (current fault)

I was surprised at the number of interruptions there were, seeing as the robot was undisturbed during this hour. It seems each of the black jaguars (devices 10, 11, and 12) had an interruption in communication about once a minute, and the tan jag (device 13) had no interruptions whatsoever.
The interruptions seem to be on the scale of 200ms.

Why might there be communication interruptions on an undisturbed robot? Is enumeration a flaky thing?

My team used CAN this year on our robot and found some issues with connection. We found that when polling for information while sending output information to fast the jaguars would lose connection briefly. The bandwidth afforded by a serial connection is a lot less than that of a CAN connection. This creates a big chokepoint on sending and receiving information from the jaguars.

I haven’t seen the kind of information in your program, but I’ve found that CAN is VERY unreliable if there are any problems on your robot that affect the code (primarily its execution speed), the jags will become intermittent (watchdogs I think)

When you were running this, was the robot disabled? If so, is there any change when enabled and you can see if the status LEDs blink?

The bandwidth afforded by a serial connection is a lot less than that of a CAN connection.

RS232 and CAN are both serial connections. But their speeds may differ.

Does anyone know the max speed supported for RS232 by the various drivers provided (LabVIEW, Java, C++), and do these drivers offer interrupt-driven receive and transmit buffers?

If 115Kbps and interrupt-driven software buffers are supported by the RS232 drivers, it seems that should be adequate for both commands and data acquisition at 50Hz for 5 motors (depending on how efficiently the data is encoded).

~

~

By serial, i was refering to the connection on the Crio, The CAN bus has a bandwidth of 1Mb/s much greater than that of the serial connection on the Crio.

Marshal - You mention the black jags, so were you using the serial connection as well? (I should say I haven’t actually looked at your code)

Does anyone have a 2CAN that they try could the code with? I’ve been thinking about switching to CAN this year, but if it’s going to be flaky, I’m starting to wonder if that’s the best idea.

–Ryan

Okay, let me clear some things up:

I had no driver station connected when this code was run. (I was just testing static connectivity, without dealing with the enabled/disabled stuff. This is to help isolate the components of the control system)

The serial port on the cRIO is capable of 14 KB/s (cRIO-FRC operating instructions, found under c:/program files/National Instruments/NI RIO/manuals/crio-frc_operating_instructions.pdf)
The CAN transceivers on the Jaguars are capable of 1 MB/s (page 7 of http://www.luminarymicro.com/index.php?option=com_remository&func=download&id=1123&chk=7041e2a5f36506de7c55251bfbfa621e&Itemid=591) It should also be noted that the physical limit for the CAN network when using a Black Jaguar as a master is only 16 devices.

EDIT:
Sorry for the confusion, I had several facts wrong. 14 kb/s is the minimum speed of the CAN transceivers in the jaguars.
Thank you Ether for catching my mistakes.

Could you provide screenshots of your VIs? I’d be interested in writing a C++ equivalent to compare with when I get my hands on the programming laptop and the robot at the same time (I don’t have a copy of LabVIEW at home)

Here’s the screenshots. My first post has the basic concept.

I’m using a queue to ensure that the logging process does not slow down the main loop.
The semaphore is opened at the beginning because it is used in the CAN Jaguar VIs to prevent conflicts on the RS232 bus.


This is just a method of filtering out the elements from the array. Only device numbers corresponding to TRUE elements in the boolean array are extracted.


Here’s where the message is added to the queue. Where I interleave the numbers with the commas, I should change to “array to spreadsheet string”.
If the array of device numbers is empty, nothing is added to the queue.


Here’s the actual logging VI. I use the “preview element” function to force the VI to wait until there is an element in the queue, and then it flushes it into an array. The “write to text file” function automatically puts a line return after each element in the array. The reason it opens and closes the file every time is this ensures the write is actually saved to file at that time. (The cRIO caches the writes into large sections, on the scale of 8 or 16KB)

cRIO RS232 hardware can do 115200 baud.

That’s approximately 11,520 data bytes per second.

At 50Hz, that’s 230 bytes every 20ms iteration (230 bytes each direction, xmit & recv).

Are you trying to send or receive more than 230 bytes each iteration?

~

Let’s use the voltage mode “set vbus” command, and assume there message sent over RS232 is the very same message sent over CAN.
The sections are:
Start of Frame (1 bit)
Arbitration field (32 bits)
Control field (6 bits)
Data field (vbus is a 8.8 signed fixed-point number, so that’s 16 bits)
Cyclic Redundancy Check field (15 bits)
Acknowledge field (2 bits)
End of Frame (7 bits)

That is 79 bits, or 10 bytes. 230 / 10 is 23. At least for the black Jaguar controlled network, the physical limit for devices is 16. (I don’t know the physical limit for 2CAN)
Therefore, at least during normal use, the RS232 communication will have enough throughput. Now, during startup, it may take a little longer. I’ll get theoretical stats for how long it should take to initialize a Jaguar for position control in just a bit. (Position control is something that takes a lot of initialization.)
EDIT: I was assuming the time it takes for the message to be sent on the 1Mb/s CAN bus is negligible, but I suppose it could be included. 100 bits / 1Mb/s is about 0.1 ms. (The reason for multiplying it by 10, not 8, is to approximately account for the bit stuffing.) Let’s make that 0.2ms to allow for a reply message. 0.0002s * 115,200 bits/s means there’s about 23 bits of wait time. I’ll round that up to three bytes. So it’s 230 / 13, which is a little under 18. If you’re using speed mode, that would now be 15 bytes. 230 / 15 is a little over 15, and you’re almost at the limit there of 16 devices per network.

Now, I’ve been assuming worst-case scenario: that the entire CAN message is sent over RS232. It’s quite possible that only the arbitration and data fields are being sent. The arbitration field is 4 bytes. For “set vbus”, the data field is 2 bytes. Add in the 3 bytes of wait time, and you have 5 bytes. 230 / 5 is 46.
It would then be a 7-byte message for set speed. 230 / 7 is about 33.

NOTE: I am following the Robert Bosch CAN standard. The poster I’m looking at is the vector “CAN Protocol Reference Chart” which you can order (free, I think) from can-solutions.com

Okay, so on to initialization. Here’s a list of commands that would be logical to use when configuring a Jaguar for position control. (I’m choosing position control because it has the most things to configure)

  1. Firmware version (0 bytes data)
  2. Position Mode Enable (0 bytes data)
  3. Position reference (0 bytes data)
  4. Proportional Constant (4 bytes data)
  5. Integral Constant (4 bytes data)
  6. Derivative Constant (4 bytes data)
  7. Encoder lines or Potentiometer turns (4 bytes data)
  8. Break/Coast (1 byte data)
  9. Soft limit switches (1 byte data)
  10. Forward soft limit (5 bytes data)
  11. Reverse soft limit (5 bytes data)
  12. Position Set (4 bytes data)

The sum of data bytes being sent here is 32 bytes, though it is being sent over 12 messages.
If the whole CAN message is sent through RS232, that should be (14*(8+3)+32)bytes / 11,520 bytes/s or 16ms to initialize a Jaguar for position mode.

If only the arbitration field and the data is sent, it should be (14*(4+3)+32)bytes / 11,520 bytes/s or 9ms to initialize a Jaguar for position mode.

So, now that I have some theoretical calculations to compare it to, I shall make some tests.

Looking at the Can Jaguar library for C++, The maximum size of a sent message is 10 bytes, as was previously determined, and the maximum size of a received message is 14 bytes.
This assumes that a separate message is sent for every request to every jaguar. In this experiment 5 jaguars were used, this would limit the requests for information to 4, but limits the data being received to 3 messages.
Please correct me if I’m not thinking about this right, or if anything looks off.

It seems I don’t quite understand their thinking. The maximum length of the data field in CAN is 8 bytes, so the maximum length should be 12 bytes. (I suppose it’s true the Jaguars don’t have any messages with data fields longer than 5 bytes)
Perhaps there is some error handling in the message returned, which would be added on by the main Black Jaguar?

You are correct that a separate message is sent to each jaguar, in most situations. There are some messages (heartbeat and enumerate) that are sent to all devices. I’m pretty sure that the main Black Jaguar acts both as the master and a slave on the CAN bus.
I don’t understand where your numbers (4 and 3) came from. Could you please elaborate?

If every data request is 10 bytes, and your requesting it from 5 jaguars, 10 bytes * 5 jaguars = 50 bytes, 230 bytes / 50 bytes = 4.6, rounded down to 4 messages.

Given 14 bytes per response for 5 jaguars, 14 bytes * 5 jaguars = 70 bytes, 230 bytes / 70 bytes = 3.3 rounded down to 3 messages. The size of the Messages was based off the C++ Jaguar library, in the JaguarCANDriver.h file if you want to look.

Just a quick confirmation for your assumption about the serial-to-can bridge. Assuming FIRST didn’t change too much from the default Jaguar firmware, I’m looking at the default source code right now and it looks like it directly resends the body of the serial message over the CAN bus if the message is intended for another jaguar or if it’s meant for all the jaguars. It looks like the CAN headers are added on in the process, so the serial message is only containing the device ID and the message itself. If the message is intended for the bridging jaguar, it never reaches the CAN bus and is just directly processed.

Really? That’s not what I expected, but I’m glad you looked it up.
I suppose this means there could be a small difference in communication speed between the bridging jaguar and the rest.

Bot190:
I understand your reasoning now, thank you.
From this new information, it appears that a 6-byte data field can be sent, but a 10 byte data field can be received. The maximum data field length in CAN is normally 8 bytes, so perhaps those last two bytes are used for something else.

Okay, I tested how long it took to configure a Jaguar for Position mode.

Clearly, there is something here I’m not accounting for. My estimate was 16ms, and it’s turning out to be 172ms.
The figure is very consistent, always 172 or 173 with this combination of actions.
There is no difference in time between the black or tan jags, nor between the bridging jag and the others.

(For some reason, the Device Query action was taking 500ms, and not returning anything, so took that out to make the “total time elapsed” a relevant figure.)

EDIT: Device Query now seems to be taking about 172ms, but still not returning anything.
Anyways, when I run this VI in a loop, with DS Comms running in parallel, it seems to take around 220ms disabled, 240ms enabled. (teleop vs autonomous doesn’t make a difference) However, it has a LOT more jitter; the time varies by at least 10ms each way.
http://content.screencast.com/users/kamocat/folders/Jing/media/4291df72-8d4f-4408-bab8-413683f72653/config%20jaguar.png

config Jag.vi (40.5 KB)


config Jag.vi (40.5 KB)

I looked up the baud rate of the Black Jaguars, and it matches the cRIO at 115,200 bits/s. (There is no min or max listed, just the typical, so I think it MUST go at this speed.)

Perhaps the VIs are sending a lot more messages than we thought.
I count two sends and two receives in the “Jag Open”. Add one more to each if you’re in speed mode, or several more in voltage mode.
It appears that data must be requested from the Black Jaguar; it doesn’t automatically send the data back. (I don’t know why else CAN Receive would require the arbitration field, referred to here as the message ID)

You’re right about the baud, both the black jag code and the cRIO driver init at 115,200.

I’m not sure how the LabVIEW code runs, but I see it a bit different in the C++ version. 3 messages are sent for each Jag getting an init. First is a firmware version request. 2nd, if in Voltage mode, is a message to enter voltage mode (for the other modes, that message is sent when telling the jag to enable PID). The 3rd message is telling the jag that the data line for speed is the encoder (the comment says that the jag requires it for all modes).

Normally, each message, either sending data or requesting data, uses 2 packets. On a sending command (such as setting the speed reference), the command is sent and then the jag replies with an ACK packet. On a requesting command (such as firmware version), the ACK packet is replaced with the reply, and no ACKs are sent. For some reason, the Voltage mode enabler (but not the speed, current, or position enablers. bug?) doesn’t check for an ACK packet, so only one packet is used in the code, but according to the Jag firmware an ACK is still sent, so that should still count for 2 packets.

So, I count 6 packets traveling over the serial connection for each jag during init. All other commands use 2 packets , either a single set or a single get (except for setting PID values and configuring the position limits, which use 3 send messages each, so 6 packets each).

One slight correction to my earlier post about what goes over the serial. The arbitration field IS included in the serial packets. It holds the command for the jaguar as well as the device id as a 32-bit unsigned int. Here’s my count for the size of the serial packet: 1 byte start of packet, 1 byte for size of packet, 4 bytes for message ID, and then 0 to 4 bytes of data, for a total of 6-10 bytes per packet.

ACK messages contain no data, so 6 bytes large. The original sent message could be 6-10 bytes still, so a single send message totals to 12-16 bytes.

Requests for data from the jag also carry no data, so those again are 6 bytes. Responses always carry at least one byte of data, so those total to 13-16 bytes.

Assuming worst-case scenario (16 bytes for each command), 900 messages can pass through at 115,200 baud, or a little less than 1 full command (send and receive) every millisecond.

At 1 command per millisecond, I doubt the data transfer itself is the bottleneck in your tests.

woah, that post got a bit out of control. :ahh: Time for bed