Jetson GPU Usage

We’ve bought a Jetson TX1 board to do our vision work this year and I had a question on its benefits. We’re not actually using the GPU module because we’re not using any of the vision algorithms that it can accelerate. Given this, is there a benefit of doing vision on the Jetson over a cheaper board like the RPI3?

Well the CPU is a bit more powerful than the rPi 3’s, but that’s about it. Comparing the jetson to an Odroid XU4 or C2 if you’re not using GPU, the XU4 or C2 wins generally.

I highly doubt that. Thresholds, desaturates, and blurs are very common operations that can be GPU accelerated.

Also, we’re writing our CV code in python but I can’t find any gpu functions written for python. Are there wrappers available?

No. The python bindings just got OpenCL support, but the Jetson doesn’t support it. You’d have to write your own bindings for the CUDA functions.

Without doing lot of complicated moving of memory from the CPU to the GPU, the one thing that seems to be offloaded using the built-in CUDA version of the OpenCV on Jetson is the decode of the encoded video from the camera, and I believe also the encode of the video (if you want to create a JPEG image and write it to disk, so a MJPG streamer can stream it to the driver station).

It’s not clear how that compares to the Raspberry Pi 3 yet.

We are transitioning from a Jetson TK1 to a Raspberry Pi 3 this year, and one of my notes is to get performance stats; if I get to that I will post them.

  • scott

Here are published specs, for comparison:

Jetson TX1
ARM Cortex-A57 MPCore (Quad Core) w/NEON technology. Up to 1.73Ghz.
Cache: L1 Cache: 48KB; L2 Cache: 2MB
4GB Ram LPDDR4 (1600Mhz)
GPU: NVidia Maxwell: 256-core
OpenGL ES Shader Performance: Up to 1024 GFlops.

Raspberry Pi 3
CPU: ARM Cortex-A53 (Quad Core). Up to 1.2Ghz.
Cache: L1 Cache: 32KB, L2 Cache: 512KB
RAM: 1GB Ram LPDDR2 (900 Mhz)
GPU: Broadcom VideoCore IV GPU. Up to 24 GFlops.

My overall takeaway is if you’re not taking advantage of the Jetson TX1 GPU, the performance between the two is comparable, though the Jetson TX1 clearly wins.

The very interesting question is how much of the GPU is a team actually taking advantage of. If the answer is not much, the Raspberry Pi 3 cost/performance is very compelling.

I’d expect that over the coming years the integration/ease-of-use between OpenCV and the GPUs will improve and at that point the difference in performance (as realizable by FRC teams w/out deep GPU experience) will become more pronounced in Jetson TX1’s favor. Perhaps someone w/significant experience here can shed more light on the current state-of-the art here and where things are headed on future OpenCV/GPU integration…

It should be noted that the Pi’s Cortex-A53 DOES support ARM NEON (just like the jetson), as it’s mandatory per-core in it’s specification.

Compiling OpenCV with the proper flags, as well as being mindful of your memory allocations will grant a speed increase in the Pi 3, as OpenCV will use NEON where possible. It won’t be nearly as fast as a GPU-Enabled Jetson, but the speed is generally what we call “Fast Enough” for this application, and should work more than fine if your vision pipeline isn’t too heavy.

The biggest amount of time will be spent in operations like Thresholding, as at this point you’re throttled by memory speeds and CPU looping speeds. NEON will increase the speed of the CPU looping, and the memory bottleneck is more than fast enough for a 30fps stream.

An A57 is about 2x faster per clock than the A53. Add in the extra clock speed, memory BW and cache and it might up to 3x. That’s pretty significant - whether it is worth the cost or not is another question but I’m not sure I’d call it comparable performance.

That being said, I’d be happy to see actual numbers if people have code which they think is representative of what teams typically run.