Our team is wanting to get serious about vision this year, and I’m curious what people think is the state of the art in vision systems for FRC.
Questions:
Is it better to do vision processing onboard or with a coprocessor? What are the tradeoffs? How does the RoboRIO change the answer to this question?
Which vision libraries? NI Vision? OpenCV? RoboRealm? Any libraries that run on top of any of these that are useful?
Which teams have well developed vision codebases? I’m assuming teams are following R13 and sharing out the code.
Are there alternatives to the Axis cameras that should be considered? What USB camera options are viable for 2015 control system use? Is the Kinect a viable vision sensor with the RoboRIO?
I can not speak for all of FRC. I can only speak from the perspective of a 3 time award winner for the vision/object tracking system we developed and used in 2014.
I’m going to try to answer your questions, from our perspective, in order.
Up through 2014, it was “better” to use a co-processor. Vision tracking on the cRio will work, but unless it is done VERY CAREFULLY and with limited requirements, it could easily max out the processor. We did use it successfully a few years ago, but we really limited the requirements. Since 2013, we started using a PCDuino (This link is to a retired version, but it is the version of the board we used.). How does the RoboRIO change the answer to this question? Sorry, I can’t say either way. We are not a Beta team, so we have no direct experience with the RoboRio. My guess is, it will have “much better” performance, but how much better remains to be seen. (I too am curious!)
We used OpenCV running under Ubuntu. Our scripts were written in Python. We will likely stick with this for 2015, but that determination is yet to be made. The PCDuino offers excellent access to the GPIO pins, thus allowing us to do some really neat software tricks.
Our code was shared in this post. That post is the main reason we won the “Gracious Professionalism Award” on Curie this year. This code is board specific, but can easily be modified to run on many different boards.
Sorry, without being a Beta team, we can not address this question.
There are a number of threads that already talk about computer vision in FRC, however, it never hurts to talk about it every once in awhile as the participants of the competition are always changing.
I’ll go through your questions one at a time.
I do not know if this question has an objective answer. Teams have had success with with the cRIO, off board, or using a second computer like the PI or an o-droid. The trade off for using an on board computer is obviously weight, but PI weighs something like 200g so that shouldn’t concern you. You have to somehow communicate what the vision system outputs to your cRIO, for the past 3 years we’ve been doing a simple udp.
As for off board, I have heard it can be slow, and with the cap on data per team, I personally think it is no long a valid option.
Using the cRIO, and programming it in labview or something, is doable, and there are plenty of resources around to do so, but the truth is it won’t be as fast a something running on an SBC.
I haven’t looked at the specs of the roboRIO, so my opinion on that shouldn’t be taken seriously. I heard it is “linuxy,” what have yet to get a clear definition of what that actually means. It would be cool to make your vision program an application and run it on the roboRIO, but I don’t know if that is doable.
As for vision libraries, any will do, but some have more support than others. I don’t know the state of openni, but last time I checked it was no longer funded. OpenCV has been around since…I want to say 1999. It is used in industry and academia and that is the one I suggest using, but of course there exists bias. I played around with roborealm a tad, but it seems too simple, I feel someone using it wouldn’t get the fundamental understanding of what is happening, but if you are just looking to figure out how far away you are from something and don’t really care about understanding, then I suggest that.
There is always the option of writing your own computer vision library, of course, but I’d be willing to bet money it wouldn’t be as encompassing or efficient as existing open sourced cv libraries, like ni and opencv.
I like to think computer vision is a little unique. Yeah, you can develop code bases, but the vision task changes so dramatically over the years, except for game piece detection, that I don’t think it’d be worth it. Take opencv for example. It is an extremely developed library that now focuses on readability and ease of developing. If you really know what you’re doing, you can solve the vision program in 100 lines of code, if you don’t know what you’re doing, it can take upwards to 2k. You could maybe group some routine operations, such as binary image morphology, into one step, but even then you’d only be saving one line of code.
Once you get a game piece detection code, however, that can just be “pulled from the shelf” and used year after year.
As long as your library can get the image from the camera, it really doesn’t matter which one you use. The common option is the axis camera. We’ve used the kinect for 2 years, then this past year we used 3 120 degree cameras (it was more a test of concept for future years). If you want to do some research on hardware, look up: asus xtion, pixy, playstation eye, kinect, and keep and eye out for the kinect one to be “hacked” and thus being usable.
I feel that computer vision is a developing aspect of frc. My freshman year it was logo motion. I didn’t ask around in the pits, but I didn’t see many cameras to my memory. When 1114 and 254 were doing multiple game piece autonomous (in the einstein finals, mind you), I think it really inspired people I think, as did 254’s autonomous routine this year. Just look at all path trajectory stuff that has been posted over the path several months.
This is my 2 cents on the topic. Take it how you will. If you have any questions, or want to see some code from my previous vision solutions (from 2012-2014), I’d be more than happy to go over it with you.
I emailed a first rep requesting they consider an aerial camera for future games, and if they do that, or allow teams to put up their own aerial camera, then on board vision systems become obsolete because essentially you have an “objective” view, instead of a “subjective” one.
There are tradeoffs here that depend on the robot, the strategy, and the team’s resources. All have been made to work. The roboRIO is somewhere between 4x and 10x faster. It has two cores and includes NEON instructions for vector processing.
But vision is a hungry beast and will bring any computer to its knees if you just brute force it. The “right” solution depends on where the camera is mounted, where it is pointed, lighting, lenses, and of course processing. Makes it challenging, huh.
All of them are capable of solving the FRC imagining problems quite well, IMO.
I’ve seen plenty who make it work.
The WPI libraries will continue to support Axis and will support many UVC cameras via NI-IMAQdx drivers. This means that many USB webcams will work. Additionally, we have also tested the Basler USB3 industrial cameras. I have not tested Kinect directly on the roboRIO, though if you google myRIO and Kinect you will see others who have.
I don’t think people realize this. After the image is acquired, it is literally pure mathematics. If you aren’t careful, as in don’t put in a wait function, your computer will try to execute the program as fast as possible, and the program is basically in an infinite loop until you somehow close the program. (I believe the majority of teams don’t gracefully exit their program after a match, but instead simply power off).
To give an example, when you have a binary image you desire, as in your targets are white (1) and everything else is black (0), you preform the operation described in this paper (Topological structural analysis of digitized binary images by border following - 百度文库) This paper was published in 1983. There do exist other methods, but the computational cost is about the same. (See also: Edge detection - Wikipedia).
Basically what this step does is reduces a 640x480 (or whatever resolution image you have) matrix to a sequence of curves that allow you to do cool things, such as recursively approximate a polygon to guess how many sides it has.
What is the purpose? Do you want to process a handful of images at the beginning of the match, or do you want to continually process images and perform AI like pathfinding? Those questions are extremely important for answering many of your questions. The first option should not require more power than the RoboRIO offers. The second option would probably benefit from a coprocessor. These tiny boards can offer a lot of power, easily many times the available oomph of the RoboRIO.
Running OpenCV on the RoboRIO is far from realistic. The hard disk space is quite limited, and as it seems to me, there will be missing pieces of linux which will make it extremely difficult to compile all the dependencies of OpenCV.
Also, I think OpenCV can be a tad inefficient in general. I was running a small test and OpenCV’s windows maxed out one of my system’s cores (i3)!
I really think converting color spaces, thresholding and some of these relatively simple transformations should be much faster, especially because the formula should be constant.
It can be inefficient, but it great at what it does. It is an open source library that people volunteer their time to contribute to. I think I’m going to be able to get away with taking 6 credit hours next semester for doing research in the field of computer science. That means I’ll have dedicated time to work on stuff, like optimizing the opencv library (or just the functions I use regularly, such as erode, dilate, find contours, appoxpoly, solvepnp, and optical flow)
I’ve still not looked at the source code for opencv in depth. I really hope some things in it aren’t parallel-ly computed because I love doing parallel computing and it’d give me great practice.
Like you mentioned…2 months ago to me, it spawns threads when it’d be quicker not to do it. Maybe I could simple put a condition that if the image resolution < (x,y), then don’t spawn another thread.
Team 900 this year is currently working with a nVidia Tegra TK1(Quad ARM Cortex-A15 with Kepler GPU with 192 CUDA cores, <$200) to tackle vision processing. So far we are seeing good results with high FPS(>40) with large image resolutions(720p) doing just filtering at the moment all on the GPU with OpenCV. We are working on tracking next.
Answering question 4, the Kinect is a viable means of vision sensing. I’d recommend checking out this paper from Team 987, who used the Kinect very effectively as a camera in 2012’s FRC challenge, Rebound Rumble. I believe one of the major advantages of the Kinect is it’s depth perception is much better than a standard camera, though I’m not really a vision expert.
I don’t really understand what you mean by depth perception. It has a depth map, if that is what you mean. The kinect is a great piece of hardware because it has an rgb, ir, depth camera, and a built in IR light (though I suggest adding some IR leds if you’re going the IR route). It is fairly cheap now, and there has been a lot of work done with it, but if you’re using only the rgb camera, the only thing different is how much it distorts the image. All the cameras you will ever think about using basically have the same amount of distortion, so that shouldn’t be a concern. If it is, then calibrate your camera:
But I’m going to say that it is a waste of time unless you are doing an camera pose calculation.
The kinect does have a subtle advantage over other cameras, like the axis: It draws attention to it. Little kids love seeing things they see everyday used in a way they never thought possible.
For me, I think this mostly depends on whether or not you want to processes as you move. If you’re able to process from a stopped position, then you should be fine processing from the crio. Pray for more reflective tape, because that stuff makes life 100x easier. Also make sure you grab a camera that you can modify the exposures on, and ideally one that will hold those exposure settings after a reboot.
That being said, if you’re looking to process while moving, I’d recommend going with a coprocessor. Where it may be feasible to do it on the roborio, its probably safer to do it on a coprocessor, and you’ll get much better performance/resolution utilizing a coprocessor. Like 900 I’m very excited about the Jetson TK1, and with the new voltage regulator, getting set up with it should be simplified.
I have stayed away from RoboRealm. This is more of a personal preference but any software that requires licenses immediately makes it more challenging to work in a distributed fashion. This year we utilized both NI Vision and Open CV. The NI Vision library was used for hot and cold, and Open CV for Java was used for distance to target calculations. I was fond of this set up, because it allowed us to work in the same language offboard and onboard. That being said, I think the advantages the TK1 presents may outweigh the advantages of working in a single language.
NI Vision was a lot easier to get started with. Once the camera was calibrated correctly, it was relatively simple. We spent 90% of our time trying to make it more efficient (essentially a moot point). Once we landed on only doing hot cold detection onboard, that made things easier.
Open CV took a lot more to get set up with. If you haven’t done vision processing in the past, and opt to go with something like open cv, be prepared to spend most of the season on it and even with that, it may never find its way on the robot. In the backpacking world we have an expression “Ounces make pounds”, and it’s definitely true in first. Expect a coprocessor to be one of the first things to go when you have to shed weight.
They’re out there, but you’ll have to dig, and you’ll have to dig through full code bases to find it. From what I’ve seen teams often release their whole code base as a whole rather than a set of libraries.
As many others, I’m not on a beta team, but if you’re using NI Vision, the Axis Camera is actually a fairly nice solution. The big things I’ve found is that WPIlib is already configured for the Axis Camera specs. Also, it holds it’s values after a reset, so you don’t have to worry about your exposure settings resetting in between matches.
All that being said, I think 900 is probably a good reference point for the state of the art solution/execution. The way they implemented their single frame vision processing was rather clever. Also, it sounds like they’re following the FIRST roadmap by getting started on the Jetson TK1 early. I think that is where a lot of the beneficial state of the art stuff is going to go. Teams have been utilizing the kinect and doing object detection for several years now, but it seems to me these are more bullet points for awards as opposed to practical implementations.
For me, the state of the art stuff I’m interested in, is that which eases implementation while addressing the various constraints for the platforms we work on.
I’ve been working on a library extension for OpenCV that adds features (there’s currently no vanilla way to rotate an image x number of degrees!). I am also going to use OpenMP to attempt to create extremely high-performance functions that utilize all cores on your system!
The nice thing about OpenMP is that if it is not available, everything will work properly. The function won’t be multithreaded though!
OpenCV actually has three modes for threading, as I remember. OpenCV can use OpenMP and IntelTBB and disabled.
I guess that you could use C++11 threading to parallelly perform many simple tasks. It is quite cool how the threading can be done with lambdas:
std::thread x(](){printf("Hello World, from a thread :D");});
or using regular function pointers:
void hello()
{
printf("Hello World, from a thread :D");
return;
}
int main()
{
std::thread x(hello);
x.join(); //We're not doing anything new afterwards, so the thread would crash if we did not join it!
}
Instead of just skipping threading, better threading techniques should be used! Let’s say cvtColor:
If the res is less than 32 by 32 (let’s just say), then skip threading. This resolution should be quite low because threading is actually incredibly fast! Use libraries like pthread because of their speed. News Flash: C++11 threading under POSIX uses pthread :D.
What I would do is use divide the image up into equal parts, with the denominator as the hardware concurrency. I would then run my own version of cvtColor with no optimizations on that small image. Afterwards, I would stitch those images back together to return. Voilla! You have a cvtColor function that is highly optimized using your hardware concurrency. This really dictates how many threads the computer can run TRULY parallelly (not switching back and forth in threads!).
I believe the DLib has some optimized color conversion code too. Just convert the Mat to cv_image and it should work beautifully:
dlib::cv_imagedlib::bgr_pixel image(cv::imread(“image.png”));
Good luck! Maybe we can work on this stuff together!
As Lineskier mentioned, OpenCV is kind of difficult to get started with!
This is true. OpenCV is quite difficult to set up and start running. However, once the setup is complete, it is an extremely easy-to-learn library, especially because of it’s documentation. There’s a 600 page manual hat explains nearly every function. Just use [CTRL/CMD] + [F] and you should be good
However, I faced these chalenges with OpenCV, so I have a working install script that downloads my code and everything. Feel free to use the script under Debian/GNU Linux. If you don’t want my old code, just remove those lines. I might remove them myself as they download a lot of code and fill up your drive!
If you want, Go ahead and check out my GitHub (@yash101)! I have all sorts of OpenCV programs!
Also, i spent around 3 hours a day during build season last year with OpenCV as I was learning it and coding with it at the same time. I also hadn’t coded in C++ in 4 years so I lost all my good coding skills and had to recode the entire app a couple times. I at least got rid of the 1500 lines of code in int main() :D!
I am also working on the DevServer, which hopefully should make it much less of a hassle of granting the cRIO with data!
That’s great that you are learning the nVidia Tegra TK1. I would like to know more about your experiences with this device. Will you post some example code? Are you using C++ or python?
We used the PCDuino last year and we downloaded code from Billbo911’s (above) website. His code is written in python and very easy for the kids to understand. It worked very well last year. We tracked balls, reflective tape and bumper reflectors. We will probably use the PCDuino and Bill’s code again this year.
We used a USB web cam but had do down sample to 480x320 to maintain 30Hz with our image processing output. OpenCV and python work very well but you have to be careful because python loops can slow you down.
One thing that I did not see mentioned in this thread is how to enhance the reflective tape image. We used white ring lights last year which is very WRONG! There is so much ambient white light that we had a terrible problem tracking the reflective tape. I recommend using 3 green ring lights. Then pre filter only green pixels. You can buy small diameter, medium and large. The small fits in the medium and the medium fits in the large.
We’re using OpenCV in C++ so example code is plentiful around the web. Our students have just now started to use it so we don’t have anything to share just yet. If we make progress to the point where we can share it then we will, probably towards the end of build season.
The big deal with the TK1 is that it has the ability to use the GPU to assist with offloading work. To my knowledge, there is no method to use the GPU assisted functions for OpenCV with Python currently but that might be changing with the 3.x code release around the corner. We’re using the 2.4.x code right now.
C++ is what we are using for the GPU integration as of right now because you have to manually manage the memory for the GPU and shuffle images onto it and off of it as you need them. Nvidia has a decent amount of resources out there for the Jetson but it is definitely not a project for those unfamiliar with linux. It’s not a Raspberry Pi and not anywhere near as clean as a full laptop. To get it working you have to do a bit of assembly. It’s a nice computer, just not as straight forward as a Pi or a PCDuino or any of the others that have larger user bases. There are also problems running X11 on it so you really need to run it headless (Nvidia writes binary blob graphics drivers for linux that are not super stable).
We’re aiming for full 1080 but depending on the challenge we will likely have to down sample to 720 to get it to work with the frame rates we need.
Granted, this is all off-season right now and we have a lot of testing to do between now and the events before any of this is guaranteed to go on the robot. For all I know FIRST is going to drop vision entirely… I mean, cameras don’t work under water do they?
That is quite an old document. OpenKinect has changed significantly and is much harder to use now! The documentation kind of sucks as the examples are all in very difficult (for me) C!
The greatest problem with the Kinect was getting it to work. I have never succeeded in opening a kinect stream from OpenCV!
The depth map of the Kinect is surprisingly accurate and powerful!
As of last year, thresholding was the easy part :)! Just create a simple OpenCV program to run on your PC, to connect to the camera and get video! Create sliders for each of the HSV values, and keep messing with one bar until the target starts barely fading! Do this for all three sliders. You want to end with the target as white as possible! It is OK if there are tiny holes or 1-4 pixels in the target not highlighted. Next, perform a GaussianBlur transformation. Play around with the kernel size until the target is crisp and clear!
Last year, I use std::fstream to write configuration files. It is a good idea, unless you get a program that has a much better configuration parser! Just write the HSV values to the file and push it onto your processor! Voilla! You have your perfect HSV inrange values!
Hunter mentioned to me, last year, that when at the competitions, as soon as possible, ask field staff if there will be time where you will be able to calibrate your vision systems! At the Phoenix regional, this was during the first lunch break! USE THAT PERIOD! Take the bot on the field and take a gazillion pictures USING THE VISION PROCESSOR CAMERA, so when you aren’t under as much stress, you can go through a couple of them at random locations and find the best values!
As I mentioned before, and will again in caps lock, underline and bold: SET UP A CONFIGURATION FILE!
This way, you can change your program without actually changing code!
I see an issue getting a USB camera driver reading the image more than 30Hz. This was an issue with the PCDuino and our web cam last year. The Ubuntu USB driver would not feed the processor more than 30Hz. Dumping the images from RAM to GPU could be a bottle neck because of the huge sizes of the frame buffers.
I used python binding at work to copy data to (and from) the GPU queue. Python might be easier for the kids to use if it is available. I wonder if you can use OpenCL on the TK1 dev kit? OpenCL might give you the OpenCV/python bindings on that OS.
I hope FIRST continues to have image processing during the games. Some of the kids enjoy that more than any other task. Good luck with the TK1.
AUVs (autonomous underwater vehicle) are gradually developing vision systems. A big problem is correcting the color distortion from the water. A good friend of mine is working in a lab at Cornell and is detecting, and retrieving, different colored balls at the bottom of a swimming pool.
The task of finding your local position (aka GPS-denied) becomes exponentially more complex when you do it in 3 dimensions (think quad-copters or AUVs).
Over half the battle is getting everything to work, in my opinion. You have to compile source code and sometimes change cmakelists (if you want to compile opencv with openni).
For those of you interested in what the depth map looks like for the kinect: depth map
You can do a lot of cool things with a depth map, but that’s for another discussion.
I personally am not a fan of blurring an image unless I absolutely have to, or if my calculation requires a center and not corners of a contour.
You should be asking when you can calibrate vision to the point that it is borderline harassment until you get an answer. A lot of venues are EXTREMELY poor environments due to window locations, but there isn’t much you can do about it. As an example: uhhhh
By lunch on Thursday, I got it working like it did in stl:stl
Here is a short “video” me and a student made during calibration at stl:videoWe tweaked some parameters and got it to work nearly perfectly. As you can guess, we tracked the tape and not the leds for hot goal detection. I somewhat regret that decision, but it’s whatever now.
I don’t think that most teams fail at vision processing because of any of the items listed. FIRST provides vision sample programs for the main vision task that generally work well. Here’s what I think teams need to work on to be successful with vision processing:
You need to have a method to tweak constants fairly quickly, to help with initial tuning and also to tweak based on conditions at competition.
You need to have a method to view, save, and retrieve images which can help tune and tweak the constants.
You need to have a way to use the vision data, for example accurately turn to an angle and drive to a distance.
You need to understand exactly what the vision requirements are for the game. Most of the time, there are one or more assumptions you can make which will greatly simplify the task.
I have plans for an OpenCV Codegen, where I basically make a drag and drop (more like click to add) interface that writes the C++ code for you. It won’t be 100% efficient because it really is just putting together known bits of code that work. It is up to you to thereafter change the variable names and optimise the code. I am trying to learn how to thread HighGUI at the moment so hopefully everything should be 100% threaded!
This will be meant to help beginners (and adept) programmers get OpenCV code down in no time!
I will also try to add two network options – DevServer-based, and C Native socket calls (Windows TCP/UDP, UNIX TCP/UDP).
I have been slowly working on this project since last year. I am thinking about it being 100% web-based. Hopefully, this will make getting started with OpenCV a no-brainer!
It is my goal this year, to get my vision code completed as soon as possible!