View Single Post
  #8   Spotlight this post!  
Unread 12-01-2015, 14:41
Greg McKaskle Greg McKaskle is offline
Registered User
FRC #2468 (Team NI & Appreciate)
 
Join Date: Apr 2008
Rookie Year: 2008
Location: Austin, TX
Posts: 4,753
Greg McKaskle has a reputation beyond reputeGreg McKaskle has a reputation beyond reputeGreg McKaskle has a reputation beyond reputeGreg McKaskle has a reputation beyond reputeGreg McKaskle has a reputation beyond reputeGreg McKaskle has a reputation beyond reputeGreg McKaskle has a reputation beyond reputeGreg McKaskle has a reputation beyond reputeGreg McKaskle has a reputation beyond reputeGreg McKaskle has a reputation beyond reputeGreg McKaskle has a reputation beyond repute
Re: Utilizing Both RoboRIO Cores

Backing way up to the original question, paraphrased as -- How can my team's code take advantage of a multi-core target?

First off, this is no different than any other modern OS and computer processor. So if you know how to do this on Windows or linux desktop, you know how to do it on your roboRIO. And once you learn to do it on the roboRIO, you will know how to do it elsewhere.

If you do nothing in your code, or intentionally write your code to run in a single thread, the OS will run other processes' code simultaneously with your code. So the cores will both get some use, but nowhere near optimal. Your code will only ever run on one of the cores at a time.

In order for your code to run on more than one processor, the process executing your code needs to expose threads of execution to the OS. Within your process, you allocate this finer grained, lighter weight, execution concept called a thread. The OS scheduler can then schedule the pool of threads contributed from various processes and can utilize both cores.

For C++ and Java, you can use tasks or allocate threads. I do not think that commands currently use more than one thread. PID subsystems do, however. So WPILib code in these languages will automatically get some use from multiple cores. As with PID, WPILib can do some things automatically, and that will likely happen over time. Other libraries such as OpenCV may also internally use threads to use multiple cores.

Now for LabVIEW:
The runtime of LabVIEW automatically generates a pool of threads of varying priorities and sub-schedules the code with these threads. The white paper covers some of the details. I'll try to summarize the implementation below, but the upside is that the compiler and runtime take advantage of parallel expressions automatically. LV doesn't expose threads as a concept because data flow languages can do this quite well without it. Independent code executes on as many cores as you have. If you want to know how, read the white paper or the details sections below.

LabVIEW Implementation Details:
LabVIEW is a compiled language. Whether deployed or running in the debugger, the VI is always compiled first. A VI resembles a function, but in many respects, it is more. The VI is analyzed for asynchronous operations and parallel operations. Nontrivial parallel operations are compiled into what we call "clumps". As an example, if a VI branches the parameters into two or three parallel loops that independently compute values on the inputs, the compiler will typically generate two or three independent clumps, one for each asynchronous loop. The VI function internally has smaller elements -- clumps -- that it schedules as it executes.

When a VI begins execution, one or more clumps are scheduled for execution, and as they complete, they schedule downstream clumps that depend on their output data. This continues until the system is idle. Since clumps can call other VIs, which expose more clumps, the system is often executing loops or loop contents from numerous VIs.

The default template for LabVIEW has a periodic tasks VI with parallel loops that wake up on a timer, do work, and go back to sleep. Each loop is its own clump. The vision VI is similarly its own loop/clump. The Network Tables Server -- another collection of parallel clumps. Start Communication is another. So by default, there are probably about a dozen clumps sharing the two cores -- they also take naps.

By the way, this is also how LV VIs execute on the cRIO. Of course with only one core, only one clump can actually be running at a time, but there are still benefits, as timed and I/O operations can wait in parallel with someone else's execution.

There is a tool called the execution trace toolkit that will show all active clumps over time, and will show how they interrupt each other based on priority and other mechanisms. It produces a ton of data to look at, but it is geeky fun to be able to see into the runtime and see how it executed your code at such a fine level of detail.

Feel free to ask other questions.
Greg McKaskle