PhotonVision 2021 Official Release

After months of hard work and beta testing, the PhotonVision team is excited to announce the first full release of PhotonVision! With a focus on robustness and usability, this feature-packed release is a big step towards making vision processing accessible to all teams by improving ease of use and decreasing cost. With everything from GPU acceleration (on supported devices), a Raspberry Pi image, and a custom vendor dependency, to a reimagined web dashboard and vendor support on devices like the Gloworm, PhotonVision is ready to take on the 2021 game. Our most important features are detailed below:

GPU Acceleration

  • PhotonVision can offload large portions of processing to the GPU. This allows it to run 2x faster than similar software running on the same Pi 3/Pi Compute Module 3 hardware. This release unlocks the following performance improvements on your Gloworm or Pi 3:
Resolution PhotonVision on Gloworm Limelight
320 x 240 90 FPS 90 FPS
640 x 480 85 FPS Unsupported
960 x 720 45 FPS 22 FPS
1920 x 1080 15 FPS Unsupported

Note: to consistently get the above performance at greater than 480p you’ll need to have well-tuned HSV thresholds. Also, on the Pi Camera v2 PhotonVision can reach up to 120 FPS at 320x240.

PhotonLib

  • Vendor library makes interfacing with your camera as simple as creating a PhotonCamera and getting the latest result – no more remembering confusing NetworkTables keys
  • SimPhotonCamera allows teams to accurately simulate and unit test the vision component of their robot code with the WPILib simulator
  • Complete code examples and example projects make getting up and running easy
  • PhotonUtils, included in the vendor library, abstracts common functions like distance estimation, as well as utilities for estimating target and robot pose without SolvePNP

Vendor Support

  • Vendors like Gloworm have images that “just work” on their devices with no user configuration needed
  • Raspberry Pi-based devices support LED control over hardware PWM

PhotonVision Usability Improvements

  • Comprehensive documentation on installation, pipeline tuning, and robot code (with full examples for common uses)
  • A premade Raspberry Pi image makes setup fast and simple; just download the image and flash it onto an SD card
  • Offline access to documentation and code examples means easy access on-field
  • Completely reimagined user interface allows you to view more information at once; every element has a helpful ‘hint’ that shows up when you hover over it
  • Take snapshots on-field and view them later
  • Supports streaming thresholded and color streams simultaneously
  • Settings can be exported to a zip backup and later imported
  • Key metrics such as memory usage and CPU utilization can be viewed from the web interface
  • Test mode (–test-mode) (seen below) makes trying out PhotonVision as simple as running a JAR file (no camera required)

The PhotonVision team is excited to bring a new level of accessibility and ease of use to FRC vision processing. If you have a Raspberry Pi laying around we encourage you to flash the image onto it and try out PhotonVision. If your team is looking for a more “plug and play” experience, then you should check out the Gloworm, which can run PhotonVision. And if you just want to follow along (or contribute to the project) then we’d love for you to join our Discord!

–The PhotonVision Team

43 Likes

If this is your first time playing with computer vision in FRC, we’ve got you covered with complete pipeline tuning docs, which cover everything from HSV filtering and target grouping to camera calibration.


12 Likes

Those FPS numbers are spicy. Do you have pipeline latency measurements as well, and how does that compare to the camera-sensor-to-start-of-pipeline latency?

1 Like

Quick Pipeline Latency samples I just ran on a Pi3b and a v1 camera:

Resolution 3D (solvePNP) 2D
320 x 240 4ms 2ms
640 x 480 Untested 5ms
960 x 720 Untested 10ms
1920 x 1080 25ms 21ms

Still in the works: doing a full, end-to-end “photons to NT target” latency checkout.

5 Likes

Gloworm image available here: Release PhotonVision v2021.1.3 (Gloworm 023434d) · gloworm-vision/pi-gen · GitHub

4 Likes

Sorry for the bump, but we would love to use photon as a coprocessor for our VMX-Pis. I got it working (using static addressing) for Java. However, my team is beginning to explore CPP (I know that we have been focussing on Java, but our programmers want to try out CPP, and this is a wonderful way to do so).

However, the VMX-Pi only supports the FRC 2020 library. Is there either a way to install the 2020 version of photon (at least on the robot side), or a way to install the Units/Time dependency in the 2020 code?

The issue I am trying to solve, is our robot code will not build because of the missing time dependency required by the photon header.

1 Like

My students are thinking about upgrading our vision system, and PhotonVision is obviously a good choice. However, it got me thinking that I don’t fully understand your speed w.r.t. using the GPU.

We use an ODROID XU4, which supports OpenCL. When I have tested using OpenCV with and without turning on OpenCL, it took me a while to understand what I was seeing:

  • If I tested single operations (such as convertColor() or inRange()), using the GPU was a big win.
  • However, when I put it all together (convert through to findContours), using the GPU or not was pretty much a wash (no improvement).

I hypothesized (with no actual proof) that the overhead of moving the image in and out of the GPU was canceling out the speed of the actual image processing. Note we are using WPILib cscore as the server “backbone” so some of the operations which might be in the GPU were not. Also, OpenCV findContours() is not GPU compatible, so the (binary) image needs to be pulled out of the GPU. For instance, I could see it being a big win if the raw camera JPEG was read directly into the GPU (or moved as the first step) and then decode the JPEG there and continue on inside the GPU.

So, for the PhotonVision people:

  • Do you read the image directly into the GPU and keep it there?
  • I know the RPi 3 GPU code is pretty specialized, but is that code modular? If someone managed to write OpenCL routines, could those be used on other platforms?
  • Anything else I am missing?

Thanks!!

Oh and one other: if you use a USB camera, are the times comparable to what is quoted? Or does the RPi Camera get a speed boost from DMA into the GPU?

@pietroglyph would be better to answer on the specific implementation, its boundary diagram, and “why”. But, to start the answer, the GPU driver’s code is available here: https://github.com/PhotonVision/photon-picam-driver

IMO, using a MIPI/CSI-2 camera is a huge gain over anything done with a USB webcam, regardless of GPU involvement. Other than the PS-Eye, all webcam’s I’ve dealt with were optimized for web streaming, where the difference between 5ms and 200ms of frame latency was negligible. Using the lower level interface (and more general-purpose camera) greatly improves the latency numbers.

I know the photonvision GPU acceleration only enables on the MIPI/CSI-2 camera types (not possible to use it with USB cameras). My understanding is that a big chunk of its gain is from not having to copy arrays of full pixel data around (at least the thresholding is done on GPU, meaning the output is a array of binary pixels, not RGB). I’d imagine that doing this on a USB camera wouldn’t see much improvement, for some of the reasons you mentioned.

There are three reasons the Pi 3 GPU acceleration we have is faster than pretty much anything else on a comparable device:

  1. As @gerthworm pointed out, both available Pi Camera sensors, MIPI CSI, and the image processing pipeline on the Pi VideoCore can operate at framerates that are much higher than most webcams, which are designed for video conferencing. Just compare the available video modes (and level of control over those video modes) to what you get on consumer-grade webcams.
  2. As you pointed out, the CSV conversion and thresholding can be done very easily and efficiently on the GPU—for reference, our implementation of this in GLSL, which runs as an OpenGL fragment shader is 35 lines and runs much faster than the OpenCV CPU implementation does on the Pi 3. Because this is the slowest part of our pipeline after finding contours, this makes a big difference.
  3. As you also guessed, none of the GPU acceleration pans out if you have to e.g. copy data from a USB webcam into the GPU, process it there, and copy it back off. We actually have an alternative GPU implementation that does this and I’ve run it on the Pi 3 and 4 and it’s much slower than the CPU-only alternative. The only reason what we’re doing is feasible is because:
    a. We only make a single copy, and it’s off of the GPU. Data from the camera streams over CSI after some minor on-camera-module processing, and it is read into GPU memory and post-processed and de-Bayered. It is then in a form we can read. At this point the data is usually copied off the GPU, but in our case we keep it there and it can be used directly (wtih ~no copying) on the GPU by our OpenGL fragment shader. Finally, once we’re done converting the color and thresholding, our GPU memory buffer contains a thresholded image.
    b. That copy off the GPU is very fast. This is for three reasons: 1) The Pi has GPU and CPU memory in the same place physically—it is the same memory and is just split (you can actually configure this split.) This means that we can, using the VideoCore shared memory APIs (which are only available on the Pi 3), map GPU memory into our program’s virtual address space and access it directly with almost the same performance characteristics as we would have accessing CPU memory. 2) We use SIMD instructions to copy the RGBA (the only supported texture format for VCSM) off the GPU into a single channel buffer (remember: this is basically just a binary image of “thresholded” or not.) 3) We triple-buffer the GPU-side processing so that the GPU can be processing an image in one buffer at the same time we’re copying from one or two other buffers.

It is possible you could replicate these things on another non-Pi device if it has an architecture that supports it and exposed APIs that enable you to put images right into the GPU and efficiently copy the processed result off. Very fast GPU<->CPU DMA would maybe work too (the Pi actually has this but it doesn’t work out great because for our use case because of cache coherence issues that we can’t work around without performance loss.) Our Pi Camera GPU acceleration code is very much not modular. The fragment shader and maybe some of the buffering and copying code could be reused elsewhere. Porting the camera shader to OpenCL would be difficult because of the camera handling and shared memory bits—even if we could make these things work with OpenCL it wouldn’t afford us much modularity because the camera handling uses MMAL and the shared memory stuff uses the VCSM API, which are both Pi/Broadcom specific.

10 Likes

Wow, thanks for the details!

One “last” question, and I guess this is more theoretical, since I don’t have the time (or probably knowledge) to act on it:

If there were a different platform, eg maybe the Jetson Nano, where one could implement useful (custom) GPU acceleration, would that be easy to combine with the rest of the PhotonVision functionality (web config, etc)?

That’s a good question. To propose a path:

  1. Rearchitect the picam-specific code to have multiple drivers (Pi+GPU, JetsonNano, or whatever else) with the same boundary diagram and JNI.
  2. Update photon core to support a generic GPU-accelerated camera, of which picam is one specific type.
  3. Add logic to have a priority and autodetect available GPU acceleration options for the current hardware, and select appropriately. There’s a bit of tearup here to the existing camera selection/matching process required, probably?

1 is probably easy to rearchitect, but difficult when it comes to a) getting the guts of the GPU interaction correct and b) building for multiple targets in a clean, CI-friendly way. 2 would be mostly a rename operation, probably not too bad. 3 could get a little fuzzy, but also wouldn’t be the worst.

An alternate hack method: swap the guts of the picam driver with whatever a jetson needs, and ignore the word “pi” everywhere in the code and UI. Hack the camera selection to only work with your hardware assumptions.

TL;DR: my opinion is the hard part would be supporting the specific GPU - the rest of photon should be flexible enough to integrate without massive tearup.

Sounds good, but as I said, I don’t think this is something I would take on (yet), so prioritize it as you wish. I mentioned the Jetson because it has a MIPI/CSI interface and I also vaguely remember seeing a reference about reading a USB camera directly into GPU memory. The RPi3 has a pretty wimpy CPU, so the Jetson (while bigger) would be a better general purpose processor (eg using 1 CSI and 1 USB camera).

Sure, gotcha. Agreed, if you need to use a USB camera for some reason, being able to have that at a lower latency (presumably that’s what the GPU achieves) could be a help.

One thing to consider - I agree the Jetson in general has more horespower than a Pi3. But, how much horsepower does an average FRC team actually need?

To put a sample point out there, here’s the latest build of Photon running on a Pi3 in my basement. It’s got a pi3 cam and a Logitec webcam hooked up, streaming the webcam simultaneously with blob target detection on the Pi cam at 960x720 at ~40-50 FPS. Including the GPU acceleration, we’re hitting the framerate of the camera, with ~10ms of processing latency on average.

My punchline for teams investigating vision processing: Don’t discount the Pi, just because it’s specs aren’t quite as good as other options. It packs a punch, and (IMO) is by far the best bang for the buck today.

2 Likes

Well, what we have been using for a few years is a webcam (nice Logitech C930e which has a wider FOV than most) and an ODROID XU4. We use SolvePnP for targeting, so run at 848x480 and can get close to 30 FPS. When I tested the RPi3, the GPU is un-usable by OpenCV and the CPU is about 1/3 the speed, so I was getting 5-10 FPS (don’t remember the exact number). So, if you need to process bigger images from a USB camera, then the RPi3 is just not up to it. (RPi4 might work, but then you don’t get the GPU support.)

Also, the Jetson Nano with 2GB is now only $60, so from a price/performance comparison, it is a much better deal than a RPi (3 or 4). But it is definitely bulkier, so you would probably not want to mount it somewhere the weight mattered.

BTW, has anyone out there experimented with better or longer cables for the CSI cameras? Those ribbon cables would be terrible to run compared to a nice round USB cable. I know most solutions combine the camera and computer in one package (Gloworm, Limelight), but that RPI camera is tiny, and sometimes it would be great to be able to mount it at a distance from the “bulky” computer.

Adafruit sells a 2 meter one, but warns:

Please note, we did test this length cable with our Pi Model B/B+ and a Pi Camera and it worked great but 2 meters is really long for this kind of camera protocol, so if you have a very electrically noisy environment (inside a tesla coil?) you may have corrupted images.

It also looks like people have done adapter boards to other types of cables (8p8c, HDMI) over the years, but I’m not finding anything readily for sale with a quick search.

Hi gerthworm,

Do you know how the processing latency is measured? My experience with the rpi cam is that it takes ~5ms to transfer the frame from the camera to the computer. Is this counted in the 10ms?

Larry

EDIT: for a 720p frame

1 Like

Hey Larry, I’ll need Declan to help confirm. But I believe the answer is “yes, mostly”.

Based on this line in the driver implementation, the latency duration starts counting from the mmal driver’s “pts” timestamp.

A more in-depth discussion of the PTS timestamp can be found here, but from what I can tell the latency should include any time to transfer image data from the camera chip over the CSI2 bus to the GPU.

More info on the mmal driver can be found here.

Yes actually. I’ve had a mental ticket for doing just that for a while now. With CUDA and native zero-copy memory support it’d be very very fast. I use these same techniques in vision pipelines at work, and in some cases have seen ~400x speedups when implementing CUDA on top of the native zero-copy.

As for the PhotonVision integration side, it’d take some special casing in a spot or two just like we do for the PiCam, as well as some JNI to interface with a C++ “driver” that would handle the image grab and pre-processing. No more complex than the PiCam driver, and if anything, less so.

2 Likes