CAN-based delays

We are loving the new Talon SRX controllers! There are so many kewl features. But the other day our programmer-students were reporting delays processing commands from the driver station. Our application in the robot is all message-based with multiple threads responding to messages to perform subsystem-specific tasks. The students had inserted some code to query 6 different Talons for information about connected sensors inline with code that answers flurries of messages (100s per sec) and were seeing some delay. The delay was from the CANbus queries. I helped them fix it by only querying the Talons when they needed the data, not all of the time, and by handling the messages more efficiently.

Has anybody else experienced this? Any unexpected delays issuing commands to Talons (to which the Talons must answer)? If Omar is listening, how fast do the Talons respond to messages requesting data about sensor outputs etc?

TIA

I haven’t looked up the specific CAN algorithms, but this is a common problem on many baseband (shared) busses. A likely cause of the problem is your use of multiple threads all trying to ping CAN at the same time. If you wrote a dedicated CAN thread, then had all the subsystems that need it submit (possibly prioritized) requests to that one thread for processing, you may reduce collisions.

If you look at section 20 of the Talon SRX manual it looks like the implementation is already only using the CAN bus in a single thread, with different data requests at different times. So (based on the document) it looks like whenever you call the CAN functions for the SRX, all you really are doing is accessing a global that already contains the data.

We are running 9 talon SRX controllers, and three of them are reading their sensor data in seperate threads. We have not seen any issues, though we do see strange, and quite large spikes, in the CPU usage data at a very periodic rate (~7 seconds?) that line up with small increases in CAN utilization. Not sure if that is related but I could imagine this would cause issues if the spikes hit 100% for any extended time.

Based on the threads he’s started, wireties’ team is using C++, not LabView. It appears they may be using SampleRobot, not IterativeRobot. That may make a difference.

Shouldn’t make too big of a difference, they should both be calling the same library.

The requests are inter-leaved by the Linux drivers. A dedicated CANbus thread won’t help - I think.

Ours is totally custom and in C++. We use the WPI libraries but none of the example infrastructures.

If you are getting signals that are unsolicited, then there is zero increased utilization in CAN bus. All those getters just grab the last received decoded value, so they are fast. There is no “command” and “response” to wait for. You could disconnect CANBus completely and it shouldn’t affect timing. Almost all the signals work this way and are listed in section 20 (along with the default frame rates).

If you are reading signals that are solicited (motor control profile params or Iaccum) then the CAN bus utilization will increase. So check the can utilization to see if it’s high (section 15). I can’t imagine why you would need to do that though.

Are you getting DS messages in the log? section 16.8?

What signals are you reading?

What is the observed behavior in the robot that is not meeting your expectations? Is it something you can reproduce with a smaller test app? Sounds like your app is more unique/complex then a typical robot application, which without an understanding of what’s going on, we’re likely not going to be able to pinpoint what you are seeing in a forum. Seems like the first step should be figuring out what is actually taking more time then what you’d expect. If you want you can send your source to support@crosstheroadelectronics.com and I can take a look.

Then it must not be the CAN queries.

Just the limits

The loop time in the task controlling our stacker was long enough to produce perceptible delays getting back to the top of the loop and servicing the next message (not from the DS but from our demultiplexer task).

Not easy to reproduce - you’ve ruled out the CAN queries. The students were issuing many hundreds of SmartDashboard updates per sec, maybe that was it. I moved the CAN queries and accompanying SmartDashboard updates around so they are only called as needed by the state machine in this area of the code. All is well now but I wondered where the delay came from.

Thanks

Just the limits

Ok then. Yeah, the limit switches [GetForwardLimitOK() and GetReverseLimitOK() ] are definitely unsolicited.