Best Way to Learn Vision Processing

So many of you may know how I’ve been trying vision processing. I finally got a working install of OpenCV in Windows, using VS C++ Express 2010. However, I don’t know where to start when learning OpenCV. What tutorials do you suggest? Are there any books I could look out for when I go to the public library? Video tutorials should work too.

Other than that, what webcam should I persuade either my parents or my team to buy? I don’t think my laptop’s integrated camera will cut it :wink:

Thanks :wink:

The OpenCV documentation is actually rather good, and I found the example code to be pretty informative when I was learning OpenCV. I learn best by example, though, so YMMV. If you want a more edited, traditional book, there are a bunch listed here; you’re probably most likely to find the O’Reilly OpenCV book (the first one listed, with the butterfly).

Once you’ve gone through a few examples, I’d suggest trying to solve one of the past years’ challenges (or you can wait until Saturday :)). OpenCV is a large enough library that you’ll get lost if you try to learn the entire thing without some sort of specific challenge to direct your efforts.

If you have access to the game elements from those years, that would obviously be the best; else I’d suggest looking for pictures on Chief Delphi (ex: Jared from 341 posted some example camera captures from the 2012 game) or on-robot camera footage off of YouTube (keepvid.com/). Just try applying some of the filters and see what happens. I’d recommend starting with smoothing operators, (http://docs.opencv.org/modules/imgproc/doc/miscellaneous_transformations.html#cvtcolor), and thresholds. From there, you can move onto contour analysis and simple shape detection using Hough transforms.

When it comes to webcams, some are slightly better than others, but in my experience they’re all about the same. The real advantage to having another webcam besides the one built into your laptop is that you can easily aim it at things without having to contort yourself and your laptop into weird poses.

C++ OpenCV 2.0 may or may not be the best way to get into vision processing depending on your software background. Because it is a big library that does a lot of complicated things, there is a pretty steep learning curve. Moreover, current versions of OpenCV rely upon a pretty solid foundation in modern C++.

If you know Python, I recommend using OpenCV’s Python bindings for rapid prototyping: it is syntactically simplified and quicker to throw together something that works in my experience.

If you have access to it, I highly recommend using Matlab’s Image Processing Toolbox. I learned a large part of my foundation in vision from using its functions and (just as importantly) reading its documentation.

Computer vision is such an interesting subject because it combines so many disciplines of mathematics and computer science. I first took several courses in signal processing before diving into vision, and was glad I did so; many vision problems can be described as filtering problems. Graph theory, 3D geometry, and machine learning are all also very relevant fields. I would start with whichever one of these you might have some background, and grow from there.

Classic algorithms or techniques to get familiar with:

The pinhole camera model
Camera calibration
Thresholding
Binary morphology (erode/dilate)
Edge/corner detection
Correlation/Cross-correlation

I think that is pretty close to the order I would go in, as well (maybe gloss over calibration quickly at first).

Here’s a Python ‘tutorial’ I put together to help our team better understand the algorithm described in the FIRST vision processing doc @ http://wpilib.screenstepslive.com/s/3120/m/8731, and the similar team 341 posted vision code.

Maybe you will find it helpful if you aren’t past this point yet.

It also leveraged some of the OpenCV tutorials to open multiple windows for the various stages of the processing, and provide some knobs for experimenting with tuning the algorithm in real time.

I started writing it in C++ using just the open source gcc compiler, and, as mentioned in another post here, decided to take advantage of the interactive and friendlier Python environment.

The ‘tutorial’ involves running the program and seeing how the tuning knobs affect the targeting results, as well as some suggested ‘next steps’ for students to take in the comments at the beginning of the file, to facilitate learning by example.

Last year we were sucessful using cheap Logitech webcams (C110 or C210) that you can find at a bigbox store for about $20. Since they are cheap and have been available for a long time, the Linux device drivers for these are fairly robust.

That is one reason why I am looking at Logitech. They are dirt cheap and work, a value for their price.

I have a working Windows install with C++. I also have a working Python install with Ubuntu/Linux. I don’t know if that link works. It takes me to the index page of the Screenstepslive.

IMO, there are basically two ways to go about learning something. 1) you can pick a corner to start nibbling at and random-walk your way outward until you cobble together enough to solve the problem you’re currently working on 2) you can go about trying to learn the big picture and theory of the entire subject, and then try to figure out how that applies to your current problem.

Jared and I gave you lists of topics to get started with also as a way to try to give you some direction about where to focus initially.

If you like (1), the 341 example code and OpenCV tutorial code is a great way to start. If you like (2), either pick up one of those books (if there’s nothing available in the library, then perhaps a late nondenominational winter holiday present? or can you convince your team to buy it as a resource to you and future students?) or you can check out courses on Coursera or EdX - though be warned that these will probably quickly require a decent understanding of calculus and linear algebra.

If you’re going with the (1) experimentation route, I’d suggest that you define a specific project to work on sooner or later. Find a brightly-colored object around your house and try to track that - e.g. tennis balls can work well (more-so if they’re newer).

I’d recommend you steer away from C and C++ if you’re not already very comfortable with them. Essentially, with great power comes great responsibility, and you’ll want to be able to focus your efforts on working out the computer vision problems without making it even more complicated.

As Jared says, Python’s more concise syntax will allow you to iterate faster than you would in Java. Java will probably have a little better performance, though if you’re mostly just calling OpenCV functions and not trying to process the pixels yourself, this is much less of a concern. Go with whatever language is most comfortable for you.

The recent versions of the Windows prebuilt binaries have included the Java bindings in opencv\build\java. I haven’t used it myself, but I believe you should be able to just include the .jar file in your classpath and the path to either the x86 or x64 .dll in the library path. Something like

java -Djava.library.path=C:\opencv\build\java\x64 -cp .;C:\opencv\build\java\opencv-247.jar MainClass

I believe 341 used JavaCV instead, which is another option.

It looks like the wpilib.screenstepslive links were changed when 2014 rolled in, and I don’t see that same doc in the 2014 set. I’m posting it here for you. I found it helpful reading to get a handle on the generic approach I’m seeing used in other team’s posted code.

Vision_Processing.pdf (3.51 MB)


Vision_Processing.pdf (3.51 MB)

I think I’ll stick with C++ because I have a greater C knowledge than python or java. Also, it works on Windows with minimal configuration!

I have read through the document you attached as well as the (less filled) 2014 equivalent. (I am programming in C++)

I have also setup the Axis M1011 Camera (plugged into the D-Link, and it has a green ring light) so it has a brightness of ~10, and have it displaying on the driver station (would like to process it using the cRIO). I added a picture to the cRIO root directory (testimage.jpg) from the vision example program.

I loaded the 2014 Vision Code and ran Autonomous but nothing happened.

What should I do?

I strongly recommend the PS3 Eye camera. They can be had for ~$20 on Amazon. They’re also proper computer vision cameras, as they’re designed to work with the little glowing controller thing from Sony. You get:

  • Up to 120fps capture
  • Up to 640x480 resolution
  • Good driver support on Windows and Linux
  • 2 lens settings (50deg and 70deg FOV if memory serves)

The other cool thing with the Eye, is so popular you can get 3rd party lenses and IR imaging solutions for it. In 2012 my team had a pretty sophisticated CV approach, but at one competition our sensor was blinded by the stadium lighting being mounted directly behind the backboard. 2013 we switched to using infrared for the retro-illumination and did not have a repeat.

@DavisC: I haven’t tried running the crio hosted vision code. To learn about vision processing, our team is taking the approach of writing algorithms using open cv on laptops and ironing out bugs there, then attempting to deploy to a second on-robot processor that would be running our code in either python or C++, and handing calculated results to the crio code. Sorry I don’t have an answer for you about next steps with the approach your taking.

I have read this book… http://simplecv.org/ . It is pretty good. I have in past season’s used OpenCV 2.0 using C++ ( VS 2010 on windows 7 on a i3 core) and I have used SimpleCV on an Odriod-u2 (just bought the U3). I learned allot more about vision processing itself while using just OpenCV. Lots of tutorials and books. The main reason I learned more was because SimpleCV does allot of the work for you and it is a higher abstraction that is built on OpenCV. Like many abstractions they leak so it is good to know what is underneath. http://www.joelonsoftware.com/articles/LeakyAbstractions.html

We use SimpleCV because I know OpenCV (master beginner) and that makes me more productive. I am glad I started with OpenCV to learn simple vision processing but it is a steeper learning curve.

There is this and it is not cheap (though I did get Mathworks to get me a node locked copy for student use for our FIRST team of Matlab with all of the packages). http://www.petercorke.com/Machine_Vision_Toolbox.html

I also have access through work. It is not the cheapest way to learn vision but it is a common way in college and it can show you some advanced material and concepts.

I have looked at the code that comes with 2014. It is pretty much what we did as far as an algorithm last year to get distance in SimpleCV.