Feature Request: Synchronized status frames for CAN motor controllers

Having motor controllers on CAN is great and simplifies both programming and wiring while allowing a lot of added functionality, but one thing that’s always bothered me from my 4 years of using these motor controllers in FRC is that the status and control frames from these motor controllers are sent out periodically asynchronously. This means that if I want to run a 100 Hz control loop on the RIO, in order to guarantee that I don’t get stale data each loop cycle from a CAN device, I essentially need to set that device’s status frame to 200 Hz, and even this only guarantees me data that is accurate to within 5 ms. Control frames being transmitted asynchronously from user code only adds to this “end-to-end” latency. Does this amount of latency matter for controlling FRC mechanisms? Probably not, although I have seen more and more teams running increasingly complex control algorithms RIO side. However, CAN utilization does matter. Having to set status frames to 5 ms periods on several motor controllers and sensors on the robot adds up, and would be entirely unnecessary if user code could request a device or a list of devices to send a specific status frame on demand.

My question is this: is it feasible for CAN device manufacturers (CTRE, REV, etc.) to add a feature to their API and firmware where periodic status frames could be disabled on specific devices, and instead could be requested on-demand by the user code? The default behavior out of the box could and should still be periodic status frames (the same as it is now), and this feature could be enabled optionally by teams wishing to have tighter control of their timings. For what it’s worth, REV already partly solved this issue by making control frames send on demand (when a “set” function is called) by default (instead of periodically). I am not sure if CTRE’s latest API behaves in the same way, but I’d like to see this added to theirs as well if it’s not already the case.

I’d like to know if other teams are interested in having this feature available on CAN based products in FRC, or if this is even possible (perhaps there’s some technical barrier that I’m not aware of that prevents this). Thanks!

5 Likes

There’s nothing inherently that wouldn’t support this, but it actually would do very little to reduce can utilization. A send and receive at 10ms is just as much actual bus time as 2 sends at 5ms. You lose a tiny amount because rtr frames are smaller, but it’s actually not a huge gain. You would gain a bit of latency, but it wouldn’t be a ton.

1 Like

Please! This isn’t an issue my team has run into but I suspect it might be in the future. Another solution might be sending back data with timestamps. Then teams could trivially account for differences in time between measurements and how it impacts the time dynamics of the motor and could correspondingly adjust the controls

2 Likes

We actually do get timestamps, it’s just a bit more memory intensive in Java to actually get back, so we don’t by default. If that’s something you’re interested in, create an issue on the allwpilib GitHub and we’ll take a look.

2 Likes

Thanks! I just created an issue for this

1 Like

This would be a welcome addition as well, but this would still require some sort of extrapolation to account for the latency I think? I do like to have everything timestamped though.

Is this true in the case of a status frame? The “request” frame to each device wouldn’t need to contain much data, while the status frame from each device presumably contains a lot (relatively). I’m not a CAN expert, so I could be wrong here.

A way I could see this problem being solved is a frame that requested multiple devices to send a status frame, so the RIO would only be sending out one of these requests per loop cycle. The real goal for me is just to gain additional confidence that my measurements are being taken as close as possible to when we use them.

2 Likes

You’re probably right - I’d suspect it would take some time to send the signal through the CAN bus. There’s always going to be latency, since it still takes time to get the last encoder position, divide by the change in time, average it with previous measurements if you’re using any sort of on-board filtering, etc. But given Talons run at 1kHz (idk about Rev) that latency would definitely be negligible. I guess what also matters is how consistent the time it takes to send the signal is. If it’s not very very consistent, you might also want to account for that as some sort of “sensor” noise

1 Like

An RTR frame (what would be used to request a status frame) is 64 bits, whereas the actual status frame is 128 bits. So its not exactly 2x, its 1.5x, but still thats not a huge bump.

A vendor could enable a way to have all devices send their packets on 1 broadcast packet from the rio, but that actually adds a decent amount of difficulty and would need a more advanced API. You would need a separate SendRequest function at the beginning of your loop. Then after that, there is no way currently in the low level CAN API to actually wait for a packet to be received, so you would have to loop waiting for a packet to check if its been received. It’s not that its not doable, but it would be a fairly complicated API. And getting this working between vendors would require some work as well.

As for timestamp, all we get on the rio is the rio timestamp when the packet arrives. We don’t know how long it took from the device to write to the read, or how long it took the device to measure. Adding a device side timestamp to every device is not feasible, as each CAN frame is only 8 bytes long, and would require a way to synchronize to the rio. All we can get on the rio is how long since the packet was received. Which is still useful in a lot of cases.

2 Likes

This is the kind of API I’m looking for. I know there would be some decent work involved in setting this up, I’m just trying to gauge interest and see if we can’t get the attention of vendors to maybe get this process started.

I’m not too worried about the fraction of a millisecond that it takes for these activities, but having the RIO FPGA timestamp of when it received data would be a useful feature! Thanks for considering it.

1 Like

Its not actually FPGA timestamp, but instead its steady_clock (because CAN is done without the FPGA). That does mean its only a ms clock, but thats the best the rio hardware for CAN can actually give us currently.

The functionality exists already, its just up to vendors to find a way to expose it via their API’s.

1 Like

One thing to watch out for here is creating a thundering herd problem. Only one CAN message may be on the bus at any point in time, and if there are other messages needing to be sent which overlap in time, some of these have to wait (unless they are dropped). If a broadcast message requests a response from each other device, this will cause all these devices to try to send replies at about the same time. CAN handles the collisions that result, but it means the responses all have to queue up.

It’s actually more complicated (bit stuffing, etc. – here is one paper that goes into more detail), but the simplest analysis is close enough (and errs on the side of minimizing problems). At 1Mb/s, a 128-bit message takes 128µs of time on the bus. If we’re considering doing something every 5ms, this is ~1/39th of 5ms. Let’s say there are 8 devices – by the time all 8 have replied, over 1ms will have passed; >20% of the bus bandwidth is gone, and there’s still a variable latency of over a millisecond.

Maybe this is fine, but the point is that there are limitations to consider. For example, time stamping at the RIO wouldn’t cover latency in getting onto the CAN bus (CANopen has a network time sync feature, so devices could provide their own timestamps). Another problem could show up if you have code that tries to send 8 CAN messages every 5ms – it could wind up blocking for >1ms, just waiting to complete the sends.

I’ll mention a couple of things I’d like to be considered for future evolution of the control system, since they could address some of the issues raised in this discussion thread.

Hardware: I’d like to see open motor controllers (see this thread) as an option to provide low-latency control. Also, it would help to have multiple CAN busses (higher speed might be an issue with signal integrity/wiring on some FRC robots), which could be done independent of opening up controllers.

Software: I’d like to see the CAN interface be implemented following an event driven pattern – in fact, I’d like to see the lower level handling of everything be event driven, with higher-level APIs that look the same as today. Interaction with the driver station is already mostly this way, but interaction with CAN devices is more like a blocking remote procedure call interface. The current set up blocks while waiting for CAN messages to be sent and/or for replies. This is simplest to program, but blocking has costs. However, changing the internals would enable an alternate set of non-blocking APIs for code that was designed to operate this way. [Note that it’s been a while since I’ve looked much at any of this code!]

6 Likes

Second this!

The OP was concerned about data latency within the robot control system.
Have there been any studies to quantify the data latency issues across FRC CAN?
It’s typically not as simple as increasing data communication rates.
For example, when in the control frame does the Rio dispatch command messages, when does the Rio act on status/measurement messages. Same questions for each of the target controllers.

Great write up, except since the roboRIO and the 2020 netcomm rewrite, none of the calls are blocking. Writing technically can block for 3 retry cycles for sending, to 300us, but thats it. Reading however is not blocking, and instead just checks if the requested message exists in the message map. It it does, it returns it, otherwise it returns message not found.

1 Like

Thanks! That is great to hear!!!

Do you have a sense as to how universally the various CAN vendors have adopted this? In other words, are there methods one can invoke on some of the vendor-specific controllers which will block, or has this gone away – either with changes in semantics of WPILib calls or with the vendors adapting to the changes made in the rewrite?

Nice work, BTW. This is a big deal.

Does a read fetch the lastest data, or does it return the last data received?

My team saw two factors that were much bigger contributors to instability than the CAN bus transit time:

  1. Sensor delay caused by the motor controller’s onboard filtering
  2. Status frames arriving out of phase with a roboRIO control loop at much lower update rates than the feedback loop

Here’s my team’s story on that.

My FRC team had four options for our flywheel controller in the 2020 season (when I say onboard, I mean running on the motor controller):

  1. Onboard sensor with onboard control
  2. External encoder with onboard control
  3. Onboard sensor with roboRIO control
  4. roboRIO sensor with roboRIO control

(1) has onboard sensor filtering that can’t be disabled on the SPARK MAX if one’s using the hall effect encoder. The sensor delay introduced by the filtering made our controller unstable in our operating regime. The Talons do support reducing the amount of filtering if I recall correctly.

(2) fixes the sensor lag issue, but we couldn’t use it because we wanted to use NEOs and external encoders were’t supported in brushless mode. It’s supported now, but the docs as of November 2019 didn’t mention Alternate Encoder Mode: https://web.archive.org/web/20191130192507/https://www.revrobotics.com/sparkmax-users-manual/. Wayback doesn’t have backups from later than that, so I don’t know when the docs were added.

(3) has nondeterministic latency introduced by status frames arriving asynchronously at 10Hz, a much slower rate than the 200Hz roboRIO feedback loop. We tried increasing the status frame update rate to 100Hz, but it caused CAN frames to get dropped as well as cause spurious zeroes in the encoder data.

You can throw optimal control at the latency problem:

but the optimal solution at least doubles the rise time and makes the controller less robust to disturbances.

We went with (4) because it didn’t have sensor latency issues and the CAN write latencies weren’t noticeable. We couldn’t just reduce the feedback loop rate to match the sensor data because our flywheel dynamics were too fast (shot recovery happened in 4 timesteps at 200Hz, aka 20ms). We had a 1:2 dual-NEO gearbox.

3 Likes

This is my preferred solution as well, but there are only so many DIO channels and 4x decoders in the FPGA to work with, and wiring discrete sensors all the way to the RIO can be very inconvenient for some mechanisms. CAN latency to a SPARK MAX was probably not noticeable since REV did away with the periodic control frame for the 2020 release and now sends control frames as soon as you call “set” in the API (thanks to the excellent CAN changes made in WPILib 2020!).

This would be a problem if you had the entire bus syncing messages on the RIO’s request. If this got implemented I would just want to use it on a couple “key” motor controllers, while leaving the rest on the default behavior. Better yet, it would be nice if, as part of these changes, vendors allowed users to disable certain status frames entirely on specific motor controllers as well.

Multiple busses sounds great, but I don’t think we’ll be seeing this for quite a while, since the NI’s new roboRIO has already been confirmed for the next few years. I’m a big fan of your suggestions though and I’d love to see the FRC control system move towards this!

What I’m really looking for is a way to sync up a few (maybe 3-4) encoder signals on the CAN bus to a control loop on the RIO by calling some kind of “requestFrames” method which blocks until either all associated controllers respond or until a timeout is reached. Timestamps on each frame is an added bonus. This would allow low-latency RIO side control of complex mechanisms while not having to run more encoder wire and use DIO channels (and FPGA decoder modules) for these mechanisms. I understand that enabling this feature on every motor controller on the bus would be unwise, which is why if this gets implemented the default behavior out of the box should still be as is.

This feature also would make a lot of sense for products like the new CANCoder from CTRE!

2 Likes

Enjoy.

EDIT: I actually think it’s possible to write a “module” to create a “virtual” socketCAN interface and talk through the CAN interface that NI has on the RoboRIO but it would definitely be a hack. I’m not 100% sure I want to pursue it but it’s an idea. It’s not going to be useful for everyone but it would enable switching out hardware a bit easier.

3 Likes

I should have known you guys would have something like this already created! :grin:

Now FIRST just needs to make it legal to control stuff with this. I suppose it would be perfectly legal though to use one of these USB to CAN boards combined with this providing all you put on that bus is sensors? Might be an interesting time explaining it to inspectors, but could solve some congestion issues if the main bus is already heavily loaded.

2 Likes