Chief Delphi

Chief Delphi (http://www.chiefdelphi.com/forums/index.php)
-   Programming (http://www.chiefdelphi.com/forums/forumdisplay.php?f=51)
-   -   30fps Vision Tracking on the RoboRIO without Coprocessor (http://www.chiefdelphi.com/forums/showthread.php?t=152283)

Jaci 14-11-2016 08:24

30fps Vision Tracking on the RoboRIO without Coprocessor
 
Howdy,

A hot topic surrounding the FRC community is Vision Tracking and Processing. Faster and faster, vision processing is becoming more accessible, with community projects, code releases, frameworks and new hardware to play with. There's also a common misconception that the RoboRIO just isn't powerful enough to run a Vision System, with CPU time to spare for your own program. Let's debunk that.

Here you can find the post I've made on how we can achieve 30fps, 640x480 Vision Processing on the RoboRIO itself without the need for a coprocessor.

In short, we can process 30 frames in about 231ms (7.7ms per frame), which is about 23% of the 30fps boundary. This leaves processing room for the FRC Network Daemon, as well as your own user code.

The code used in this investigation is available here

nickbrickmaster 14-11-2016 09:57

Re: 30fps Vision Tracking on the RoboRIO without Coprocessor
 
This is pretty cool. Who's the guy on the R thread that was complaining about optimization? :P

Is this feasible to use in competition? How flexible is it? (I have limited experience with vision. Is a threshold the only thing that you need?)
How much CPU time does a typical robot program take up (as I don't have one in front of me?) What if I'm running 3-4 control loops on the RIO?

Jaci 14-11-2016 10:20

Re: 30fps Vision Tracking on the RoboRIO without Coprocessor
 
Quote:

Originally Posted by nickbrickmaster (Post 1616202)
This is pretty cool. Who's the guy on the R thread that was complaining about optimization? :P

Is this feasible to use in competition? How flexible is it? (I have limited experience with vision. Is a threshold the only thing that you need?)
How much CPU time does a typical robot program take up (as I don't have one in front of me?) What if I'm running 3-4 control loops on the RIO?

1) Feasible, for sure. In a way, it may be a better alternative to a coprocessor for a few reasons, such as freeing up a port on the router, or not having to worry about network latency and bandwidth. I can easily see this being used inside of a competition environment.

2) Flexible is depending on how you want to develop the code further and/or use it. The most expensive functions in Computer Vision are memory allocations and copying. Thresholding is on of the biggest culprits of this, and a threshold is present in just about every algorithm. The assembly can be modified to work on different types of thresholding (less than instead of greater than, or both!), or on other algorithms depending on your use case. The code I've provided is just a stub of all the possibilities. Normal OpenCV functions and operations still apply, leaving it about as flexible as any other vision program. The actual copy function itself only takes 2ms, leaving you with 31ms per frame to do everything else.

3) The CPU usage of a robot program is pretty hard to judge, as most of it is dependent on how the code is written. I'll take the closest example that I have, and that is ToastC++. Running at 1000Hz update rate, the main process (which interfaces with WPILib) uses about 20% CPU, and the child process (the actual user control) uses about 2% CPU. This 1000Hz is updating 4 motors based on 4 axis of a joystick (although the main process actually updates all allocated motors, digital IO, analog IO and joysticks each loop). In a competition I wouldn't recommend a 1000Hz update rate, something like 200Hz would be way more than plenty, as you likely have a lot more stuff going on. If you design your control loops carefully (that is, running them all in a single loop, see this for implementation details), you should easily be able to saturate your needs without hitting 100% avg CPU. If you're still afraid, thread priorities are your friend. Obviously this depends on a number of factors (what you're doing, whether you're C++ or Java, etc)

euhlmann 14-11-2016 10:41

Re: 30fps Vision Tracking on the RoboRIO without Coprocessor
 
Pretty cool!
Now if only all of OpenCV could be NEON-optimized :rolleyes:

Or if somebody could teach me what black magic I need to invoke to get OpenCV GPU acceleration on Android :D

Jaci 14-11-2016 10:55

Re: 30fps Vision Tracking on the RoboRIO without Coprocessor
 
Quote:

Originally Posted by euhlmann (Post 1616211)
Pretty cool!
Now if only all of OpenCV could be NEON-optimized :rolleyes:

Or if somebody could teach me what black magic I need to invoke to get OpenCV GPU acceleration on Android :D

OpenCV does have a NEON and VFP build option, both of which were enabled during these tests, which is part of the reason cv::inRange executed so quickly

euhlmann 14-11-2016 11:01

Re: 30fps Vision Tracking on the RoboRIO without Coprocessor
 
Quote:

Originally Posted by Jaci (Post 1616214)
OpenCV does have a NEON and VFP build option, both of which were enabled during these tests, which is part of the reason cv::inRange executed so quickly

Yes, but few things have been NEON-optimized so far

RyanShoff 14-11-2016 11:12

Re: 30fps Vision Tracking on the RoboRIO without Coprocessor
 
Have you looked at how much overhead comes from getting 30fps from a USB camera?

Also findContours() should run faster on non-random data.

Jaci 14-11-2016 11:23

Re: 30fps Vision Tracking on the RoboRIO without Coprocessor
 
Quote:

Originally Posted by RyanShoff (Post 1616217)
Have you looked at how much overhead comes from getting 30fps from a USB camera?

Also findContours() should run faster on non-random data.

I don't have a USB camera to test with, and I have to fix my Kinect adapter before I can run this live.

I understand the findContours() method will run faster with non-random data, however I chose random data to provide a worst-case scenario. Using a real image from a Kinect, the speed is somewhat faster.

Andrew Schreiber 14-11-2016 11:41

Re: 30fps Vision Tracking on the RoboRIO without Coprocessor
 
5 Attachment(s)
First, pretty awesome write up. Running on board removes a lot of risk associated with reliance on vision processing. The communication step is hard.

Second, I'd be curious how you derived the requirement of 640x480. It seems to me that using a lower resolution image would process faster and the quickest win in this whole process would be to compute what the min image resolution required would be.

I've attached some of the test images 125 produced that I've down sampled as an example if folks want to play with it. They were taken at 14 feet away dead straight on and then scaled using imagemagick to 1280x960 -> 80x60. While the 80x60 image is just silly I do believe there are applications where much lower resolutions are just as effective.

It also opens the possibility of using low res images for identifying ROI and then processing just the smaller region in the higher resolutions.

Jared Russell 14-11-2016 11:49

Re: 30fps Vision Tracking on the RoboRIO without Coprocessor
 
This is very cool, though I'm not (yet) convinced that you can get 30fps @ 640x480 with an RGB USB camera using a "conventional" FRC vision algorithm. But now you have me thinking...

Why I think we're still a ways off from RGB webcam-based 30fps @ 640x480: Your Kinect is doing several of the most expensive image processing steps for you in hardware.

With a USB webcam, you need to:

1. Possibly decode the image into a pixel array (many webcams encode their images in formats that aren't friendly to processing).

2. Convert the pixel array into a color space that is favorable for background-lighting-agnostic thresholding (HSV/HSL). This is done once per pixel per channel (3*640*480), and each op is the evaluation of a floating point (or fixed point) linear function, and usually also involves evaluating a decision tree for numerical reasons.

3. Do inRange thresholding on each channel separately (3x as many operations as in your example) and then AND together the outputs into a binary image.

4. Run FindContours, filter, etc... These are usually really cheap, since the input is sparse.

So in order to do this with an RGB webcam, we're talking at least 6x as many operations assuming a color space conversion and per-channel thresholding, and likely more because color space conversion is more expensive than thresholding. Plus possible decoding and USB overhead penalties. Even if we ignore that, we're at 7.7 * 6 = 42.6ms per frame, which would be 15 frames per second at 64% CPU utilization. Anecdotally, I'd expect another 30+ ms per frame of overhead.

The Kinect is doing all of the decoding for you, does not require a color space conversion, and gives you a single channel image that is already in a form that is appropriate for robust performance in FRC venues. No Step 1, No Step 2, and Step 3 is 1/3 as complex when compared to the above.

However...

Great idea hacking the ASM to use SIMD for inRange. I wonder if you could also write an ASM function to do color space conversion, thresholding, and ANDing in a single function that only touches registers (may require fixed point arithmetic; I'm not sure what the RoboRIO register set looks like). This would add several more ops to your program, and have 3x as many memory reads, but would have the same number of memory writes.

Jared Russell 14-11-2016 11:57

Re: 30fps Vision Tracking on the RoboRIO without Coprocessor
 
Quote:

Originally Posted by Andrew Schreiber (Post 1616224)
Second, I'd be curious how you derived the requirement of 640x480. It seems to me that using a lower resolution image would process faster and the quickest win in this whole process would be to compute what the min image resolution required would be.

This is definitely true. The resolution you need is a function of range, target geometry, angle of incidence, camera field of view, the frequency and type of non-target objects that pass the threshold, and required precision. 640x480 has been overkill for all vision challenges to date.

640x480x30 fps is a convenient benchmark, though, as it is achievable with largely unoptimized code by many forms of coprocessors.

Jaci 14-11-2016 11:58

Re: 30fps Vision Tracking on the RoboRIO without Coprocessor
 
Quote:

Originally Posted by Andrew Schreiber (Post 1616224)
First, pretty awesome write up. Running on board removes a lot of risk associated with reliance on vision processing. The communication step is hard.

Second, I'd be curious how you derived the requirement of 640x480. It seems to me that using a lower resolution image would process faster and the quickest win in this whole process would be to compute what the min image resolution required would be.

I've attached some of the test images 125 produced that I've down sampled as an example if folks want to play with it. They were taken at 14 feet away dead straight on and then scaled using imagemagick to 1280x960 -> 80x60. While the 80x60 image is just silly I do believe there are applications where much lower resolutions are just as effective.

It also opens the possibility of using low res images for identifying ROI and then processing just the smaller region in the higher resolutions.

Honestly I used 640x480 as a kind of 'boast' as to how much potential this can hold (that, and it's also the default resolution of a Kinect camera @ 30fps). You can actually downscale this image entirely using the VFP, by using vld1.64 to load into the D registers, and a variation of vst to shift back out to memory, interleaved, discarding the extras, or saving them for later use as you proposed in your last paragraph. This is 'effectively' zero cost to the entire algorithm, as it does it 128 bits at a time.

Andrew Schreiber 14-11-2016 12:05

Re: 30fps Vision Tracking on the RoboRIO without Coprocessor
 
Quote:

Originally Posted by Jared Russell (Post 1616233)
This is definitely true. The resolution you need is a function of range, target geometry, angle of incidence, camera field of view, the frequency and type of non-target objects that pass the threshold, and required precision. 640x480 has been overkill for all vision challenges to date.

640x480x30 fps is a convenient benchmark, though, as it is achievable with largely unoptimized code by many forms of coprocessors.

Quote:

Originally Posted by Jaci (Post 1616234)
Honestly I used 640x480 as a kind of 'boast' as to how much potential this can hold (that, and it's also the default resolution of a Kinect camera @ 30fps). You can actually downscale this image entirely using the VFP, by using vld1.64 to load into the D registers, and a variation of vst to shift back out to memory, interleaved, discarding the extras, or saving them for later use as you proposed in your last paragraph. This is 'effectively' zero cost to the entire algorithm, as it does it 128 bits at a time.

Understood, just wanted to make sure other folks reading the thread didn't get the idea that 640x480 was required.

Jaci 14-11-2016 12:14

Re: 30fps Vision Tracking on the RoboRIO without Coprocessor
 
Quote:

Originally Posted by Jared Russell (Post 1616229)
This is very cool, though I'm not (yet) convinced that you can get 30fps @ 640x480 with an RGB USB camera using a "conventional" FRC vision algorithm. But now you have me thinking...

Why I think we're still a ways off from RGB webcam-based 30fps @ 640x480: Your Kinect is doing several of the most expensive image processing steps for you in hardware.

With a USB webcam, you need to:

1. Possibly decode the image into a pixel array (many webcams encode their images in formats that aren't friendly to processing).

This is certainly true. As I mentioned, I don't really have a benchmark to gather data from a conventional USB webcam, so I can't really provide input to this part.

Quote:

Originally Posted by Jared Russell (Post 1616229)
2. Convert the pixel array into a color space that is favorable for background-lighting-agnostic thresholding (HSV/HSL). This is done once per pixel per channel (3*640*480), and each op is the evaluation of a floating point (or fixed point) linear function, and usually also involves evaluating a decision tree for numerical reasons.

3. Do inRange thresholding on each channel separately (3x as many operations as in your example) and then AND together the outputs into a binary image.

These can actually both be turned into 1 set of instructions if your use case is target-finding.
Most robots use some sort of light source to find the retro-reflective target. Most typically this is the green ring (for our Kinect, it's the IR projector). If your image is already in the RGB form, you can actually just isolate the Green channel (which you can do with SIMD extremely simply, vld3.8) and proceed onward. Storing the R and B channels out to a D register but not writing it to RAM will save a lot of time here, and then your thresholding function will only take one set of data.

Something similar can be done with HSV/HSL, however this will require a bit more math on the assembly side of things to isolate the Lightness for a specific hue or saturation. Nonetheless, it's still faster than calculating for all 3 channels.

Quote:

Originally Posted by Jared Russell (Post 1616229)
However...

Great idea hacking the ASM to use SIMD for inRange. I wonder if you could also write an ASM function to do color space conversion, thresholding, and ANDing in a single function that only touches registers (may require fixed point arithmetic; I'm not sure what the RoboRIO register set looks like). This would add several more ops to your program, and have 3x as many memory reads, but would have the same number of memory writes.

I believe it would be possible to do HS{L,V}/RGB color space correction with SIMD if you're willing to take on the challenge. I may give this a try when I have some time to burn.
Putting them all into one set of instructions dealing only with the NEON registers is entirely possible, in fact the thresholding and ANDing are already grouped together, operating on the Q registers. I can confirm that the ARM NEON instruction set does include fixed-point arithmetic, although it requires the vcvt instruction to convert them to floating-point first, which is also done by the NEON system.

NotInControl 14-11-2016 14:19

Re: 30fps Vision Tracking on the RoboRIO without Coprocessor
 
Interesting work.

We took a look at using the RoboRio for Vision Processing back in 2014 under the alpha test of the new hardware. We tried IP and Web Cams using the same vision detection algorithm to find hot goals as implemented on our 2014 robot.

This was an OpenCV implementation in C++ which was compiled using Neon running on the Roborio.

Take a look at our data, at the below link, under Vision, at the IP camera test.

We would need to dust it off, but for our complete end to end solution I think we could only get 20fps at 320x240 on the Rio.

http://controls.team2168.org/


Over the past few years we have grown to develop a decoupled, off-board vision system, for various reasons we deemed beneficial, but I am glad to see progress in this area.


All times are GMT -5. The time now is 04:57.

Powered by vBulletin® Version 3.6.4
Copyright ©2000 - 2017, Jelsoft Enterprises Ltd.
Copyright © Chief Delphi