Worried about high CPU usage in CRIO

I admit I haven’t been 100% involved with the software group in a few years until this year. I am back in the saddle again.

One thing I have noted is the high CPU usage in various things we have running. Here are some observation so far. I guess I am wondering if these are “normal behaviors”.

We have noted that if we load up a default 2012 framework with the default arcade drive, we see about 55-60% CPU usage.

We wrote a serial driver that had taken this up to around 65-70% CPU usage. The loop rate is about 60 milliseconds here.

On another project, we have a vision application separate than the serial driver program that when the vision is tracking, consumes about 75-85% CPU usage. The loop rate is the standard vision processing VI.

So I am sitting here watching these two programs in development and I am concerned that when we combine then together that we are going to max out our CPU to 100%, cause watchdog errors and the robot to start dropping out of Tele-op mode because we are getting a “drive loop” not running fast enough error.

So a few questions.

  1. Are these CPU usages normal?
  2. Should we be concerned and start thinking about enabling and disabling loops in the program only when we need them? For example if we don’t need vision tracking all the time, just turn off that loop until the moment we need it, then enable it, and shut off other loops?

Any other pointers, advice on CPU usage, watchdogs, or loop rates would be very appreciated.

Thanks

I can’t tell you whether it is “normal”, but I can confirm the numbers you’re seeing: 65% CPU with the default code, and 100% with everything running. When we added vision processing to our testbed robot, the controls were unusably sluggish, the vision was lagging, and the system watchdog kept shutting down the motors.

We have done a lot of optimizing to keep the amount of actual processing in Teleop to a minimum, but it wasn’t helping enough. We moved all the vision code to the Dashboard and are now using UDP packets to send the target data array back to the robot for action. I was worried that the E11 Classmate might not be up to the task, but it works very well. We haven’t tried the E09 yet, but I suspect it won’t do as well as the E11.

Alan - Appreciate your input on this matter. Thanks for confirming what we see as well. Our robot does the same thing, “unusable” because of watchdogs.

Ok, so we are unfamiliar with moving the vision code by off loading it to the dashboard, is there a white paper for newbies how to do this?

In response to your question about E09 and E11 comparison, other than the size of the unit in comparison to a normal laptop, that might be the best thing if performance of a little “netbook” PC compared to a “laptop” running your driver station would be the better selection if we do what you recommend and have seen as far as a performance boost. Using a normal laptop instead of a E09 or E11 at the driver station.

At first when I saw some other teams questions about:

  1. Adding a laptop on your robot.
  2. Adding a SECOND CRIO on your robot.

I was a little concerned with those kinds of questions teams are asking, but this may confirm why. I think we have tasked out the CRIO…I remember back in the IFI days…at least the CMU CAM was processing vision data on it’s own and sending a serial string to the robot controller…maybe we need to look into a small PC too… http://smallpc.com/panelmounts.php (not under $400)

Or a low cost vision processor that can handle vision in a co-processor relationship to the CRIO like the CMU CAM days.

If I recall, the E09 and E11 laptops are very similar in terms of CPU and RAM. You may want to get an SD drive so that you have additional disk for development tools.

As for maxing out the cRIO CPU, there are indeed lots of things in the framework. I doubt that the dashboard code is all that useful, so you may want to configure it, or better yet, rewrite your own.

I’ll look at the usage when I get back into town and see if there are some obvious things to tune up.

Greg McKaskle

It does not solve the total CPU bandwidth issues but you can prioritize activities on the cRIO. We run the DS comms in task with higher priority than the camera so the camera gets the remaining bandwidth AFTER the comms, motors and sensors have been serviced. We work in C++ and use the native VxWorks (OS in the cRIO) API for doing this but I believe LabView, Java and WPI/C++ have APIs to create independent tasks and set their relative priorities.

We are processing images at about 10Hz (on the cRIO) which (we hope) is fast enough.

HTH

We are seeing CPU usage interfering with operations, and so are also optimizing our code. We are fooling with Arduinos to manage some of the easier computing tasks offboard, not sure if we;ll actually use any of them though. Some of our motors are taking advantage of the PID loops in the Jaguars (available under CAN control) to further offload CPU cycles.

A 400 MHz Power PC and we’re maxing it out. Incredible.

Agreed 100%!

If the basic code is already running at 65%, couldn’t that indicate there is a flaw in the basic control code?
I don’t recall any previous years’ code acting this way.
Don’t we have tools that could point us to where the biggest users of CPU cycles are? Something akin to “Task Manager” in Windows?

Ok, I think I have attracted all the “power users” of Labview at CD in one post. “ya’ll” are scaring me with your comments…

I just got back helping a rookie team with a new CRIO-II. We downloaded the default code in that CRIO-II and it was running 40-45% with the default code. We only have two of the CRIO-I. Maybe I’ll order a CRIO-II tomorrow just to gain another 20%…

Maybe I should post this in the NI community and hook up with a Labview engineer over there and understand if we are doing something wrong or if this is fact of the matter…

One man’s flaw is another man’s feature.

You know, I asked our programming mentor that exact question Friday. He said that if I could find the Ctl, Alt and Del keys on the cRio I could access the Program Manager…:stuck_out_tongue:

Wait, so you can string together multiple cRIO II’s to get more processing power? :ahh:

You should get really worried when Mark McLeod shows up.

We’ve struggled with high usage since the beginning. I wonder how much of this is now it’s much easier to see the CPU% on the driver station, so now more people are noticing it. Also, all the extra monitoring unfortunately also takes more processor time. Unfortunately, I didn’t look at the utilization for the default project in previous years, and now that the LabVIEW license expired, I’m not sure it’s possible.

There’s a few things you can do to track down high usage. In the Default project, there is a VI called Elapsed Times. You can drop it into each loop and wire in a name, and it will keep track of how long it takes between calls of that VI. This can help track down slow loops. You can also go to Tools -> Profile -> Performance and Memory for NI’s equivalent of Task Manager.

You may have an excellent point there, except the watch dog errors have always been there. In the case where 100% CPU is seen, the watch dog error fill the driver station diagnostic window. If in the past, without the CPU chart, we would have seen the same watch dog errors fill the screen I assume due to high CPU usage, even though we could not see the CPU usage in year previous.

My logic is, if we did not see any watch dog errors in 2009, 2010, and 2011, then I assume maybe we did not have a MAX’ed out CPU Usage. This year, we see the CPU usage hit 100%, then we see the watch dog errors fill the screen, and teleop disables, and the robot shudders to a stop when we have too much loaded in the CRIO. (Which bty the way isn’t much code at all… compared to the past robots. And if you wanna see the past robots, click the link at the bottom of the screen to the repository. The only time we used vision was 2009 and that robot was fine.

BTY, that’s a great tool who ever designed this CPU, latency and charts, thanks for giving us the chart trends to see this information…who ever you are. Thank you.

There is a function called “spy” which one can run on the cRIO console. It will print CPU usage by task every 10 seconds. There is a problem with the spy utility though, it uses the auxiliary clock library in the OS to profile the system and I’m not sure if NI uses that timer/clock library for anything else - next time I am in our lab I will check. There is also a remote display of the nearly the same information when using Workbench in debug mode.

The watchdog goes off if it does not get “petted” regularly and the FRC comms code interprets this as a dangerous condition (thus the disabling of motors etc). 100% CPU usage is not a good sign but it does not automatically mean something is wrong. If the watchdog is going off you could being doing too much work serially (one right after another) in between messages from the DS. Try parallelizing your activities and prioritizing the comms with the DS. The watchdog alarms should go away and you’ll be giving the camera all the “left over” time. Then slow down and/or simplify the camera code till utilization drops just below 100%.

HTH

The CPU just barely hitting 100% will not cause watchdog/motor safety errors on its own. Consider a simplified example where you run a single loop every 20 ms. If the loop takes 10 ms to run it will take 50% of the CPU. If it takes 20ms, it will use 100% of the CPU. The default value for a motor safety is 100ms, so it won’t be until your loop takes more then 100ms that you will see a motor safety problem. I believe the system watchdog is even longer then that.

It gets a lot more complicated when you have multiple loops involved like the framework code. However, even this year, we’ve definitely had the CPU at 100% without watchdog/motor safety errors.

ASPCA would be happy to know you are petting your watchdog rather than kicking it.
*
*

My role here is to balance that equation. What’s the opposite of “power user?” :stuck_out_tongue:

Yes, just as you can put a laptop (or even a desktop, I suppose) onto the robot as an auxiliary processor, under [R65]. But remember [R52] applies here as well.

One thing i have noticed when running the code is that sending the data to the dashboard using the default code takes up alot of CPU resources. It the past when i have deleted that it freed up between 10 and 20% of the cpu resources. I havent tried it with this years code or the new Crio’s. I will try this tomorrow when i get access to our robots.

We take over that default operation with our own dashboard sending code and send the data less often, more often than 10Hz is of questionable value

Last year we had so little code we sent it every teleop period, and it worked, but this year i know that will not be possible because of the amount of stuff we have to have. We will have to test how often we can send our dashboard, or we might not have it at all.

Adding some code to my test project, when I was demonstrating for a local team, I kept killing the cRIO. (BTW the DoS bug in the network stack still exists.) I had to add some careful performance controls in my code to keep the CPU utilization down. (Partially my fault to begin with.) I was running between 65-75% CPU on the cRIO with nearly default code.

Previous years have not be a whole lot better, and normally saw these utilization numbers for most LabVIEW projects. The Vision loop was the worse, normally consuming whatever was remaining of the cRIO. The performance monitoring in LabVIEW is very useful in tracking down problems. A built project, running at start-up should take a bit less resources then just hitting the run button, since it is not running in debug mode.

PS. During my testing the other night, I saw some interesting metrics. I will have to dive into it tonight.