Real time versus normal timing

I know that timing is a tough subject, but I’m trying to figure out why some of my assumptions on real time timing versus normal LabView timing seem to be wrong. I read and re-read the paper from 358, which pointed alternative methods, but didn’t really help. We are trying to get more repeatable and faster control, while keeping the CPU usage down.

OK, first off, we are having some issues with CPU usage on our robot. With normal waits, we were hovering between 80 and 100% CPU usage. We turned off debug code and enabled in-lining on some sub-vi’s that were called a lot, and our CPU usage dropped to ~50%, with a 100% spike every second or so.

Our fastest timed loop is our driving code in periodic. It has a 50 ms wait and contains a standard PID vi and an unaltered cartesian holonomic drive vi. We use CAN with a 2CAN, and found that shorter waits only hurt our CPU usage. We still get occasional warnings about CAN Jaguar timeouts.

OK, here’s where the assumptions come in. I thought that the real time pallet used timing circuits on the FPGA, essentially offloading the timing from the cRIO CPU, so we tried to substitute the real time waits for the standard LabView waits. Our CPU usage jumped back to 80-100%. We took those out, and tried the wait until next multiple (the metronome from the standard timing), and that made the robot very laggy, pegging the CPU usage between 95-100%.

We are considering putting the PID and our gyro that can be read much faster than 20 Hz, into a faster loop away from anything that has to talk over CAN. If we do that, would we benefit from using the real time timing? Would there be a reason to use the real time PID over the standard one?

Does the real time pallet use the cRIO’s FPGA like I assumed, or does it just increase the workload for the cRio CPU? Does what we are seeing make sense? Should we just avoid thinking about the real time pallet of vi’s?

I know that I ask lots of questions here. Hopefully what shakes out can help many others.

The Labview FRC codebase has approximately zero control over the FPGA on the cRIO. The FPGA image is pre-compiled by the wizards that create the WPILib FRC framework, and all Labview/C++/Java code merely interfaces with the FPGA through DMA calls and such. So no, the Real Time timing VIs don’t have much of anything to do with the FPGA and wouldn’t off-load any processing from the CPU to the FPGA. They are much higher accuracy and more deterministic than the standard “Wait (ms)” function, however, and that obviously carries a higher CPU cost with it.

To reenforce what Kevin said … the cRIO product allows for a combination of hard and soft realtime operation. But for FRC, that division has already been defined, and compiled into the image. Until the FPGA tools better support partial reconfiguration, it isn’t really feasible to allow for FPGA modification and ensure safety of outputs.

So in the RT palette, the timers and such are referring to realtime implementation of those concepts. These will attempt to minimize latency and jitter at the expense of efficiency. They are more likely to use spin-lock instead of messaging or signaling. They are more likely to run at elevated priorities.

What I’d suggest is identifying the elements of your code that are causing your CPU usage to be high. They may not be the same as the ones causing the jitter in timing. It would also be helpful for you to describe your usage of CAN. What CAN messages are being sent to how many controllers? You may also want to measure just the CAN nodes and see if they are the big pole in the tent.

Greg McKaskle

Thanks Kevin and Greg! That makes sense, and explains why there isn’t a free lunch.

I’m a little confused on the suggestion of measuring the CAN nodes. Is there an easy way to get a loop time for commands sent and the acknowledgement back?

We are using a total of six jaguars, four for the drive and two for the pick-up mechanism. The four drive jaguars are part of a sync group that is called within the standard WPI cartesian holonomic drive vi, and the two pickup jaguars are part of a sync group called from within a 100 ms loop in our periodic tasks.

We poll the two pick-up jaguars for the state of their limit switch inputs, and use that as an interlock for our pneumatic catapult mechanism.

We are using a 2CAN to interface with the CAN bus, and it doesn’t report any errors. The Driver Station sees a lot of -44086 errors that indicate that the CAN GetStatus vi did not get the appropriate number of bytes. We still see the correct output indicating the state of our limit switches. We tried routing the errors from the CAN GetStatus to a Clear Errors vi, but the DS is still reporting the same thing. I’ll post a log later tonight, hopefully with some snippets from our code.

What Greg said. If you aren’t measuring it, you can’t fix it. There’s a corollary: if you’re measuring it too often, you’re interfering (hello Dr Heisenberg). On Greg’s suggestion, we ran profiling running off of our workstation, and found a number of stupid user tricks that had pegged out CPU usage, mostly relating to sending data too often to the SD. Werner slapped us good.

I am CAN ignorant, but are you calls to the CAN VI’s waiting for completion (is your code waiting for them to return), or are they fire and forget, passing control as soon as the hardware involved is loaded up?

This may be unrelated, but we learned the hard way that the Compressor Start VI does not return. We had it sequentially preceding Compressor Control Loop, and had a terrible time until we parallelized them. Thanks to Doug, the CSA at the Arizona Regional, for steering our analysis where it needed to go.

I am looking forward to a higher level of CAN integration in the new controller.

How often was too often?

Our calls are mostly embedded within the holonomic drive VI, but they are fire and forget as far as our code goes. At a lower level, there is a command and response that is required by the FRC architecture, on the CAN bus. So there may be some hanging for a response at levels far below what we can touch. We occasionally get timeout errors for the jaguars, but that happens far less often than the above-mentioned wrong number of bytes error.

We had the SD write calls in 10 & 20 mS loops. Per Doug, the SD write calls each try to send a TCP packet, and that takes time. No, the drivers don’t need that much data. We also discovered that 1/S was too slow; we were missing reports of transitions in state machines. At 3/S, we were getting good info the DS, and the robot behaved much better. Setting some motors in both autonomous and periodic was not a good thing either.

My CAN comments were a shot in the dark for you to contemplate. I haven’t dealt with it yet in FIRST, and at work, CAN interfaces are just another bunch of transistors. The (somewhat naive) point is to keep the control loop speed outside the time it takes to set up the CAN messages.

I’m still pining away for those CAN ports hiding down deep in the cRIO :slight_smile:

Tim

I’m going to see how many writes to SD we have, and try to combine them into flattened strings from clusters or arrays. It has been nice to use the SD for prototyping and debugging, but it has been a bit too easy to arbitrarily add things to the table.

We used to think long and hard about what we would send via UDP, and there was only one, one-way stream that we would make. Now we have SD reads and writes in autonomous, teleop and timed tasks. About a quarter are no longer used, so they will get pruned. Not knowing the mechanism under SD, I just assumed that the data was coalesced and sent as part of the driver station’s scheduled communication. Now, knowing that each write creates a TCP packet (pretty inefficient for a Boolean write), I have more reason to minimize those SD writes.

The SD writes do not immediately do TCP. They are already amortized. When you write to an SD variable, it needs to map the name to a storage location, update the storage locally, compare it to the current value to see if it is truly an update, and possibly mark it for replication. Every 100ms by default, the library accumulates the marked element and sends them to the other participants.

So, by all means, remove the elements you do not need to update, but I don’t think you need to change things to make them hard to use.

As a test, you want to eliminate the overhead of synchronizing them by commenting out just the NT Server in Robot Main. Another test is to run it normally, but close your dashboard and other clients and see how much the CPU drops. This test will still include overhead of the local name lookup and local storage. When SD clients and servers aren’t communicating, they still function al local storage, essentially late-binding globals, but they do no networking.

One other clarification – the compressor start doesn’t block, but the Compressor Control function has the loopy border, and will not return as long as the compressor refnums are still good. But Start just sets state and returns.

Greg McKaskle

The code below is what is giving us the CAN error. I’m thinking that we should probably sequence the get status vi’s with a small delay (~10 ms). I assume that our error may be caused the second request hitting the CAN bus before the first response is returned. Because the 2CAN is in the middle, as well as the TCP/IP stack, I’m just really stabbing in the dark here.

Any thoughts?





What happens to the outside 100mS loop when you hit that 500mS delay?

Probably not timing-relevant, but the logic using the Pass and Shoot globals is a little convoluted.

What’s that eraser ?! thingy?

When you hit that 500ms delay, the whole loop stops for half a second, motors keep doing whatever they were doing, you stop reading limit switches, etc. If that’s what Levansic is expecting, then I expect it’s not the problem. In C++/Java, pauses like that can cause motor safety timeout errors because you’re not updating the motors often enough, but I don’t remember if that’s the case in Labview.

The eraser block there is a clear errors block. It basically sinks any errors on the input wire and makes them disappear without needing to be handled, etc.

It’s probably moderately insane and not particularly necessary to do it this way, but it should be possible to swap those status reads so that you read one every other pass through the loop. So on odd iterations you read channel 1, and even iterations you read channel 2. Dunno if that’s fast enough for your purposes, but i think it’d be more deterministic than tossing all that in a sequential structure to get that 10ms delay between reads.

On the question about the 500 ms delay, that is there only for the case of firing the solenoids. It doesn’t affect other threads, and doesn’t fire until the motors in this thread are disabled (in one direction) by their limit switches.

The idea of checking every other loop may be a good idea, but we use the status of both limit switches together for an interlock on our catapult. We don’t want to launch until both are tripped, or else parts will fly.

If we did that, I guess we could use a shift register to keep the state of the unread one. At that point, we would probably have to speed up the loop, or face problems with our autonomous code. Maybe it wouldn’t be a problem, and the benefit of putting less noise on CAN may benefit other parts (drive) more.

Thanks for the suggestion!

:cool:
I am amused to hear that stalling a loop isn’t always bad. Seems like it might actually prevent undesired reactions to input changes. Probably more efficient, and easier to explain than a state machine that either counts or watches time deltas before moving on.

We had an interesting day. Early on, one of our limit switches failed. This exposed a flaw in our interlock logic where our catapult could and did fire. Parts flew.

The problem was traced to the switch. We had the exact same failure later on, on the other side of the robot. Luckily, no one was hurt.

On the positive side, we had cleaned enough cruft from our code that our CPU usage had dropped to a steady 30%. We achieved a large portion of this by moving the CAN calling portion of the drive code to another loop (thread) apart from all of our driver filter, gyro, and PID. With this change, we also stopped getting any CAN errors.