![]() |
30fps Vision Tracking on the RoboRIO without Coprocessor
Howdy,
A hot topic surrounding the FRC community is Vision Tracking and Processing. Faster and faster, vision processing is becoming more accessible, with community projects, code releases, frameworks and new hardware to play with. There's also a common misconception that the RoboRIO just isn't powerful enough to run a Vision System, with CPU time to spare for your own program. Let's debunk that. Here you can find the post I've made on how we can achieve 30fps, 640x480 Vision Processing on the RoboRIO itself without the need for a coprocessor. In short, we can process 30 frames in about 231ms (7.7ms per frame), which is about 23% of the 30fps boundary. This leaves processing room for the FRC Network Daemon, as well as your own user code. The code used in this investigation is available here |
Re: 30fps Vision Tracking on the RoboRIO without Coprocessor
This is pretty cool. Who's the guy on the R thread that was complaining about optimization? :P
Is this feasible to use in competition? How flexible is it? (I have limited experience with vision. Is a threshold the only thing that you need?) How much CPU time does a typical robot program take up (as I don't have one in front of me?) What if I'm running 3-4 control loops on the RIO? |
Re: 30fps Vision Tracking on the RoboRIO without Coprocessor
Quote:
2) Flexible is depending on how you want to develop the code further and/or use it. The most expensive functions in Computer Vision are memory allocations and copying. Thresholding is on of the biggest culprits of this, and a threshold is present in just about every algorithm. The assembly can be modified to work on different types of thresholding (less than instead of greater than, or both!), or on other algorithms depending on your use case. The code I've provided is just a stub of all the possibilities. Normal OpenCV functions and operations still apply, leaving it about as flexible as any other vision program. The actual copy function itself only takes 2ms, leaving you with 31ms per frame to do everything else. 3) The CPU usage of a robot program is pretty hard to judge, as most of it is dependent on how the code is written. I'll take the closest example that I have, and that is ToastC++. Running at 1000Hz update rate, the main process (which interfaces with WPILib) uses about 20% CPU, and the child process (the actual user control) uses about 2% CPU. This 1000Hz is updating 4 motors based on 4 axis of a joystick (although the main process actually updates all allocated motors, digital IO, analog IO and joysticks each loop). In a competition I wouldn't recommend a 1000Hz update rate, something like 200Hz would be way more than plenty, as you likely have a lot more stuff going on. If you design your control loops carefully (that is, running them all in a single loop, see this for implementation details), you should easily be able to saturate your needs without hitting 100% avg CPU. If you're still afraid, thread priorities are your friend. Obviously this depends on a number of factors (what you're doing, whether you're C++ or Java, etc) |
Re: 30fps Vision Tracking on the RoboRIO without Coprocessor
Pretty cool!
Now if only all of OpenCV could be NEON-optimized :rolleyes: Or if somebody could teach me what black magic I need to invoke to get OpenCV GPU acceleration on Android :D |
Re: 30fps Vision Tracking on the RoboRIO without Coprocessor
Quote:
|
Re: 30fps Vision Tracking on the RoboRIO without Coprocessor
Quote:
|
Re: 30fps Vision Tracking on the RoboRIO without Coprocessor
Have you looked at how much overhead comes from getting 30fps from a USB camera?
Also findContours() should run faster on non-random data. |
Re: 30fps Vision Tracking on the RoboRIO without Coprocessor
Quote:
I understand the findContours() method will run faster with non-random data, however I chose random data to provide a worst-case scenario. Using a real image from a Kinect, the speed is somewhat faster. |
Re: 30fps Vision Tracking on the RoboRIO without Coprocessor
5 Attachment(s)
First, pretty awesome write up. Running on board removes a lot of risk associated with reliance on vision processing. The communication step is hard.
Second, I'd be curious how you derived the requirement of 640x480. It seems to me that using a lower resolution image would process faster and the quickest win in this whole process would be to compute what the min image resolution required would be. I've attached some of the test images 125 produced that I've down sampled as an example if folks want to play with it. They were taken at 14 feet away dead straight on and then scaled using imagemagick to 1280x960 -> 80x60. While the 80x60 image is just silly I do believe there are applications where much lower resolutions are just as effective. It also opens the possibility of using low res images for identifying ROI and then processing just the smaller region in the higher resolutions. |
Re: 30fps Vision Tracking on the RoboRIO without Coprocessor
This is very cool, though I'm not (yet) convinced that you can get 30fps @ 640x480 with an RGB USB camera using a "conventional" FRC vision algorithm. But now you have me thinking...
Why I think we're still a ways off from RGB webcam-based 30fps @ 640x480: Your Kinect is doing several of the most expensive image processing steps for you in hardware. With a USB webcam, you need to: 1. Possibly decode the image into a pixel array (many webcams encode their images in formats that aren't friendly to processing). 2. Convert the pixel array into a color space that is favorable for background-lighting-agnostic thresholding (HSV/HSL). This is done once per pixel per channel (3*640*480), and each op is the evaluation of a floating point (or fixed point) linear function, and usually also involves evaluating a decision tree for numerical reasons. 3. Do inRange thresholding on each channel separately (3x as many operations as in your example) and then AND together the outputs into a binary image. 4. Run FindContours, filter, etc... These are usually really cheap, since the input is sparse. So in order to do this with an RGB webcam, we're talking at least 6x as many operations assuming a color space conversion and per-channel thresholding, and likely more because color space conversion is more expensive than thresholding. Plus possible decoding and USB overhead penalties. Even if we ignore that, we're at 7.7 * 6 = 42.6ms per frame, which would be 15 frames per second at 64% CPU utilization. Anecdotally, I'd expect another 30+ ms per frame of overhead. The Kinect is doing all of the decoding for you, does not require a color space conversion, and gives you a single channel image that is already in a form that is appropriate for robust performance in FRC venues. No Step 1, No Step 2, and Step 3 is 1/3 as complex when compared to the above. However... Great idea hacking the ASM to use SIMD for inRange. I wonder if you could also write an ASM function to do color space conversion, thresholding, and ANDing in a single function that only touches registers (may require fixed point arithmetic; I'm not sure what the RoboRIO register set looks like). This would add several more ops to your program, and have 3x as many memory reads, but would have the same number of memory writes. |
Re: 30fps Vision Tracking on the RoboRIO without Coprocessor
Quote:
640x480x30 fps is a convenient benchmark, though, as it is achievable with largely unoptimized code by many forms of coprocessors. |
Re: 30fps Vision Tracking on the RoboRIO without Coprocessor
Quote:
|
Re: 30fps Vision Tracking on the RoboRIO without Coprocessor
Quote:
Quote:
|
Re: 30fps Vision Tracking on the RoboRIO without Coprocessor
Quote:
Quote:
Most robots use some sort of light source to find the retro-reflective target. Most typically this is the green ring (for our Kinect, it's the IR projector). If your image is already in the RGB form, you can actually just isolate the Green channel (which you can do with SIMD extremely simply, vld3.8) and proceed onward. Storing the R and B channels out to a D register but not writing it to RAM will save a lot of time here, and then your thresholding function will only take one set of data. Something similar can be done with HSV/HSL, however this will require a bit more math on the assembly side of things to isolate the Lightness for a specific hue or saturation. Nonetheless, it's still faster than calculating for all 3 channels. Quote:
Putting them all into one set of instructions dealing only with the NEON registers is entirely possible, in fact the thresholding and ANDing are already grouped together, operating on the Q registers. I can confirm that the ARM NEON instruction set does include fixed-point arithmetic, although it requires the vcvt instruction to convert them to floating-point first, which is also done by the NEON system. |
Re: 30fps Vision Tracking on the RoboRIO without Coprocessor
Interesting work.
We took a look at using the RoboRio for Vision Processing back in 2014 under the alpha test of the new hardware. We tried IP and Web Cams using the same vision detection algorithm to find hot goals as implemented on our 2014 robot. This was an OpenCV implementation in C++ which was compiled using Neon running on the Roborio. Take a look at our data, at the below link, under Vision, at the IP camera test. We would need to dust it off, but for our complete end to end solution I think we could only get 20fps at 320x240 on the Rio. http://controls.team2168.org/ Over the past few years we have grown to develop a decoupled, off-board vision system, for various reasons we deemed beneficial, but I am glad to see progress in this area. |
Re: 30fps Vision Tracking on the RoboRIO without Coprocessor
No fair heading down so close to bare metal
:) |
Re: 30fps Vision Tracking on the RoboRIO without Coprocessor
No one knows what vision processing will be needed in the future. For this year we found that feeding the results of processing into a control loop did not work well. We take a picture calculate the degrees of offset from the target. Then use this offset and the IMU to rotate the robot. Take another frame and check that we are on target. If not rotate and check. If on target shoot. We did not need a high frame rate and it worked very well. I'll note that our biggest problem was not the vision but, the control loop to rotate the bot. There was a thread on this earlier. We hosted MAR Vision day this past weekend. It has become very apparent that most teams are struggling with vision. While it's nice to see work like this, I would like to see more of an effort to bring vision to the masses. GRIP helped allot this year.
|
Re: 30fps Vision Tracking on the RoboRIO without Coprocessor
Given that HSV requires a bunch of conditional code it's going to be tough to vectorize. You could give our approach from last year a try :
Code:
vector<Mat> splitImage;After that we did a threshold on the newly created single-channel image. We used Otsu thresholding to handle different lighting conditions but you might get away with a fixed threshold as in your previous code. To make this fast you'd probably want to invert the red_scale and blue_scale multipliers so you could do an integer divide rather than convert to float and back - but you'd have to see which is quicker. Should be able to vdup them into all the uint8 lanes in a q register at the start of the loop and just reuse them. And be sure to do this in saturating math because overflow/underflow would ruin the result. Oh, and I had some luck getting the compiler to vectorize your C code if it was rewritten to match the ASM code. That is, set a mask to either 0 or 0xff then and the mask with the source. Be sure to mark the function args as __restrict__ to get this to work. The code was using d and q regs but seemed a bit sketchy otherwise, but it might be fast enough where you could avoid coding in ASM. |
Re: 30fps Vision Tracking on the RoboRIO without Coprocessor
If you web-search for Zynq vision systems or Zynq vision processing, you will see that a number of companies and integrators use this for professional systems. Zynq from Xilinx is the processor of the RoboRIO.
So I think my take on this is ... You likely don't need 640x480. It is almost as if that were taken into account when the vision targets were designed. You likely don't need 30 fps. Closing the loop with a slow-noisy sensor is far more challenging than a fast and less-noisy one. Some avoid challenges, but others double-down. The latency of the image capture and processing is important to measure for any form of movement (robot or target). Knowing the latency is often good enough, minimizing this is of course good. If there isn't much movement, it is far less important. The vision challenge has many solutions. Jaci has shown, and I think the search results also show that many people are successful using Zync for vision. But this does take careful measurements and consideration of image capture and processing details. By the way, folks typically go for color and color processing. This is easy to understand and teach, but it is worth pointing out that most industrial vision processing is done with monochrome captures. Greg McKaskle |
Re: 30fps Vision Tracking on the RoboRIO without Coprocessor
Quote:
|
Re: 30fps Vision Tracking on the RoboRIO without Coprocessor
Quote:
|
Re: 30fps Vision Tracking on the RoboRIO without Coprocessor
Quote:
|
Re: 30fps Vision Tracking on the RoboRIO without Coprocessor
Quote:
I'll do a write up on this at some time, but I've got a lot on my plate over the next 2 weeks and I have to clean up the code a bit. |
Re: 30fps Vision Tracking on the RoboRIO without Coprocessor
Quote:
However, I disagree with the not-needing 30fps. Vision targetting in FRC is very hit-or-miss. While a high update-rate might not be needed for alignment operations, I find sending back to the driver station a 30fps ("""""natural framerate""""") outline of what targets have been found is quite useful. For example, this year I sent back the bounding boxes of contours our vision system found to the driver station. This had the advantage that the driver had some kind of feedback about just how accurate we were lined up (and could adjust if necessary), and took next to no bandwidth as we were only sending back a very small amount of data 30 times a second (per contour). This was insanely useful and you can see that if you look at our matches (we implemented it between Aus Regional and Champs). |
Re: 30fps Vision Tracking on the RoboRIO without Coprocessor
Quote:
|
Re: 30fps Vision Tracking on the RoboRIO without Coprocessor
Quote:
|
Re: 30fps Vision Tracking on the RoboRIO without Coprocessor
Quote:
WPILib did a poor job of wrapping NIVision (work NOT done by NI, by the way). The history is that a few folks tried to make a dumbed-down version for the first year, and it was a dud. Then some students hacked at a small class library of wrappers. But the hack showed through. That doesn't mean NIVision, the real product, is undocumented or trying to be sneaky. NI publishes three language wrappers for NIVision (.NET, C, and LV). The documentation for NIVision is located here -- C:\Program Files (x86)\National Instruments\Vision\Documentation. And one level up is lots of samples, help files, utilities, etc. If the same people did the wrappers on top of OpenCV, it would have been just as smelly. Luckily, good people are involved in doing this newer version of vision for WPILib. But I see no reason to make NIVision into the bad guy here. If you choose to ignore the WPILib vision stuff and code straight to NIVision libraries from NI, I think you'll find that it is a much better experience. That is what LV teams do, by the way. LV-WPILib has wrappers for the camera, but none for image processing. They just use NIVision directly. If my time machine batteries were charged up, I guess it would be worth trying to fix the time-line. But the I'm still worried about the kids, Marty. Greg McKaskle |
Re: 30fps Vision Tracking on the RoboRIO without Coprocessor
Quote:
If you can set the camera where it will simplify your task, and put the targets where it will simplify your task, you can simplify the system, lower cost, increase effectiveness, increase throughput, etc. The FRC robot field is not nearly as controllable or predictable, but it is beneficial to spend some time thinking about what you can control. Also, monochrome cameras can have about 3x frame rate at the same resolution, or higher resolution at the same frame rate. They can have higher sensitivity, allowing faster exposures. Monochrome doesn't have to have a broad spectrum of lighting or capture. Lasers are already monochrome. Filters on your lens or light source make it narrower. Lenses don't have to worry about different refraction for different wavelengths. The first step most team code perform is an HSL threshold -- turning an RGB image into a binary/monochrome one. So, I'm not saying monochrome is better, but it is different, and powerful, and common. My point is that color cameras aren't a requirement to make a working solution and there are benefits and new challenges in each approach. As for frame rate: 30fps is based on a human perception threshold. Industrial cameras, and SLRs for that matter, operate at many different exposures and rates. If the 30 fps is to align with a driver feedback mechanism, then it is a good choice. If it is to align with a control feedback mechanism, slower but more accurate may be better, or far faster may be needed. The task should define the requirements, then you do your best to achieve them with the tools you have. It is exciting to see folks reevaluate and sharpening the tools. Greg McKaskle |
Re: 30fps Vision Tracking on the RoboRIO without Coprocessor
Quote:
Tim . |
Re: 30fps Vision Tracking on the RoboRIO without Coprocessor
The Zynq architecture has a hard ARM CPU and an FPGA on a single chip. The ARM is completely open because the safety code for FRC has been pushed into the hard-realtime FPGA. It is not possible with current tools to easily allow a partial FPGA update. So the FPGA is static for FRC during the regular season. If you want to use tools to change it in offseason, go for it.
The FRC FPGA doesn't currently have any vision processing code in it. It wasn't a priority compared to accumulators, PWM generators, I2C and SPI and other bus implementations. If you get specific enough about how you want the images processed, I suspect that there are some gates to devote. But many times, the advantage of using an FPGA is to make a highly parallel, highly pipelined implementation, and that can take many many gates. And if the algorithm isn't exactly what you need, you are back to the CPU. So, with todays tools, CPU, GPU, and FPGA are all viable ways to do image processing. All have advantages, and all are challenging. There are many ways to solve the FRC image processing challenges, and none of them are bullet-proof. That is part of what makes it a good fit. Greg McKaskle |
Re: 30fps Vision Tracking on the RoboRIO without Coprocessor
I was digging through the source and found this
https://github.com/opencv/opencv/blo...pp#L1536-L1690 So now I'm curious: does the performance boost come from not using CV_NEON in your OpenCV library build, or because NEON intrinsics are significantly slower than using plain assembly? |
Re: 30fps Vision Tracking on the RoboRIO without Coprocessor
Quote:
An objdump would tell for sure, though, if anyone's up to a challenge. |
| All times are GMT -5. The time now is 04:39 AM. |
Powered by vBulletin® Version 3.6.4
Copyright ©2000 - 2017, Jelsoft Enterprises Ltd.
Copyright © Chief Delphi