Beta test teams: Has base processor utilization decreased?

Last year, I ran into issues with processor utilization when writing our code in Java. The default code’s base processor usage was around 70%, IIRC, and it went up fast.

This year, to train new programmers, we’re programming last year’s robot in LabVIEW with the 2012 system. The default code had a processor usage of about 95% – reading a single Analog channel at 100 Hz (granted, this was output to a front panel, but no other processing was performed) maxed out the processor and prevented code from deploying (due to starvation).

I note that we’re not the only team to run into this – there were threads here last year documenting these issues.

To anyone who might know (Beta Test teams in particular), has the default code’s processor utilization improved?

Thank you for any insights.

Just out of curiosity – were you reading the analog channel during any set time intervals? I had a problem like this while working on a videogame in C# where I would get the states of around 500 sprites individually in the main game loop and would starve the game’s resources.

Labview specific, however, we always read information from the Digital Sidecar (and assorted sensors) in the 100ms or 10ms timed loops, which you can find in the Timed Tasks vi in outside of the case statement which contains all of the various robot states and just ‘under’

A quick solution/troubleshoot (since I don’t have our code handy) is to:

  1. Make sure you aren’t recursively getting values from that sensor inside of an indefinite while loop (keeping in mind that the case statement for the robot state is itself in a while loop, and Labview is a dataflow language).

  2. Read the value in one of the Timed Task vi’s (vi that contains 2 loops that run every X ms, typically 100 or 10), store it in a Global variable (grr, scope), then use that variable in whatever you need to do (i.e. have it output to the Dashboard in Teleop)

  3. While LV can be rather heavy at times, just running the base Joystick+Arcade drive 2012 Robot Project had me hovering at around 30-60% processor utilization on an 8-Slot cRio. Given the parity that WPIlib has, I wouldn’t expect much else from C++ or Java deploys.

  4. See if there’s anything extraneous running, look in your error messages (inside the Diagnostics tab of the Driver Station) for things about Robot Main running slow, and if it says anything about not finding a camera. Chances are, it’s throwing so many errors (and tying up resources) because of the camera code that is part of the default project. Try removing it, see if that helps?

I’m eager to hear what the beta teams have to say, but I can say that time was spent reimplementing and simplifying Network Tables for all three languages, and for LabVIEW in particular, some debugging was turned off and some optimizations took place.

You shouldn’t see 95% processor usage like that unless something else is going on. I suspect that you were throwing errors each control packet or something similar. Flip to the diagnostics tab and see if messages are flying by.

I’m happy to help if you don’t find your issue.
Greg McKaskle

We are also having some odd results that cause slow response time, lost packets and so on. Please see the attached picture which will help explain. This is the base labview code from 2012. The only change to the code is that the two drive motors have been changed to PWM ports 3 and 4.
You can see that the CPU starts giving error messages such as “loops running slow” at the same time as lost packets occur. These errors occur even when we are doing nothing on the controls. The robot is just turned on and enabled.
I wonder if there is something we have done wrong or is it something else?

CPU useage.doc (687 KB)

CPU useage.doc (687 KB)

Perhaps this was it – I recall errors flowing by, and they weren’t triggered by any of our code.

I did not realize that the code would repeatedly throw errors if no camera was present – during the build season, I’ll have to try removing that code and see if it improves.

If you look inside, the camera code is there for you to look at, and could account for the lost packets you guys are receiving.

You’re sampling the analog input at 100 hz, that means that you’re sampling the signal 100 times per second. Some quick math indicates (1000 ms = 1 second, 1000 ms / 100 NumOfCycles = 10 ms) that you only need to read the signal in the code once every 10 ms. This can starve whichever thread (and slow down the loop) the getSignal function is in, and can be particularly inefficient. I think this may be your real problem, since whcirobotics would have the same camera problem but exceedingly low processor utilization (~12.5%)(also, note that his problem was a large number of dropped packets).
Navigate to Robot, and look for something called Timed near the bottom of the ‘screen’ (just below the Driver Station StartCOM vi and Vision vi. You’ll see two loops – one that runs every 10 ms and one that runs every 100 ms, and some accompanying joystick code. Essentially, replicate the joystick code but have the action you want performed inside the 10 millisecond loop. This has the added benefit of running independently from whichever mode the robot is in*, which can be handy.

*Note that you should just get the value of whatever signal the sensor is reading there, and then pass that on to <Autonomous/Teleop/Whatever> via Global Variable or however you want to handle it.

Last year our team had slow response times, dropped packets and a cpu that was bouncing from 90% - 100% all the time.

** We programmed in C++ **

We are also having some odd results …

The log file shows that every few minutes, the lost packets goes up, latency goes up, and CPU usage actually goes down because there is less to process. My guess is that this is due to something on your laptop interfering with the communications. If enough packets are lost, the robot may be disabled or safety may shutdown drive motors.

I’d open up the Task Manager and see if you can identify a process that decides to index the hard drive, check for updates, or something else every few minutes. While it is possible that this is related to the cRIO, that seems less likely than the laptop.

If you figure out what it is, please post the results.

Greg McKaskle

This “100% CPU” metric, unless defined carefully, is ambiguous.

Every embedded project I’ve ever worked on had 100% CPU usage, but met all its hard real time deadlines. Huh? Yes, because all the leftover processing time was used for doing background (lowest priority) health monitoring.

A more meaningful metric would be throughput margin for your critical realtime tasks.

The best metric we can get (in LabVIEW at least) is the RT Get CPU Loads block.

It identifies what proportion of processor time is used by what priority tasks. It includes a number for each priority, and the sum is 100.

All tasks are Normal or inherit from parent by default. If we sum everything except Idle, we can get our consumed CPU load. The default framework does not use any of the VxWorks RT tasks, so timing determinism goes out the window and is wildly variant on CPU load on both ends and link quality (since it’s timed to DS packets).

For the past few years, the CPU load on our robot has been so high that we’ve had to use the ‘No App’ DIP switch to reprogram the robot, because the NI bootloader is running at lowest priority and is being starved. It pains me every time I must open the darn imaging tool, No App (Virtually), reboot, flash, un No-App in the imaging tool, and reboot again. It’s a process far too long for a competition environment.

The Default code gets around this limitation by not running most of the code when disabled. I hate this method, as I can’t debug without enabling the robot, and it’s sometimes useful to probe an output without allowing it to move.

If there’s a microsecond (or better) system clock, you can read it at the start (ts) and end (te) of each realtime task. These can be used to measure the margin and scheduling accuracy for each task.

// start of task

// task code goes here

// end of task

First, we solved the issue Palardy mentions by having a switch so we could enable and disable all our programming in the disabled VI. We also run a ton of code in disabled: every sensor is polled in disabled and we have some autonomous script code running as well.

Secondly - our default project in 2012 and the 2013 runs in the neighborhood of 60-70% cpu utilization as read from the driver station. This was true for us on both the 4 slot and 8 slot cRIO’s. Usually we saw it closer to 60%. You should not be seeing 95% utilization with a default project.

We saw the ‘loop running slowly’ messages when running the code from the programming computer. We did a couple things that seemed to help. First, we noticed that they completely disappeared when we did a permanent deploy. Perhaps it’s the number of indicators / controls we had in our project. We also started tethering. We were running through our wireless network with all the other computers on it. That occasionally caused a failed deploy and was definitely causing connectivity problems. I assume that was because of latency: we didn’t dig any deeper because it was easier to just tether the robot.

Our 2012 code in teleop rarely when over 85% cpu utilization. However, when vision was enabled, which was only about 1/4-1/2 second during our lock-on phase, usage spiked to 100%.

We kept a very close eye on it during development. Several times we implemented code and saw it jump substantially, so we went back and restructured things. Our most common trick was to have an equality in front of VI’s so that they did not execute if the incoming data had not changed.

In addition, we implemented Palardy’s ‘improved’ motor set VI’s. That dropped our cpu utilization by a fair amount. Perhaps 6-8%, but I can’t recall exactly anymore.

Finally, we found that our old '09 classmate was no longer adequate for our driver station. We found this accidentally at our first competition. After the first practice match the drivers complained of lag in the control system. We checked the cpu and it looked good, however the timing in the communication graph was shot through the roof. We reimaged clean and nothing changed. We were not using a custom dashboard. We plugged in our development laptop (quad core etc) and the ping spikes disappeared.

That’s all I can think of right now.

we also kept a close eye on our vision process when it was enabled. when not not using vision we were around 90% usage. when we turned vision on we went to 100% usage. after we saw this we decided to disable the vision section of code. to have room for other robot tasks as we did loose packets every now and then.