New to object detection - team 5970

Our team programs in python 3.0 and have never used vision for either of the two competitions we’ve been in. This year we are attempting vision (given the significant ranking point incentive from Auto-Quest). I have been experimenting writing scripts for haar cascades (mine is training at the moment - stage 8) using OpenCV. I was wondering how this could be implemented on the robot and if any teams have experience with this kind of object detection in the past. We were considering running it via rasbian on the robot. What kind of camera would be best? Is speed an issue with this? Should I turn down saturation in OpenCV or value or some other attribute? Advice would be appreciated

I’m not an expert, but…

My very limited experience with Haar cascades is that they aren’t very useful in FIRST. In the off season after Stronghold, I tried to train a Haar cascade to recognize the balls from that season. Grey things about the size of a kickball. After generating the data sets and doing the training, it didn’t work worth a hoot.

I looked into them a little bit, and noticed that most of the demos were about face recognition. I then looked a little bit into the actual recognizers, and saw that they were particularly well suited for some pretty structured objects, with particular patterns, but slight variations on those patterns. i.e. two eyes and a nose, but with eyes at different widths. For regular geometric objects, like balls, cubes, and lines, they didn’t work as well.

I had encountered something similar way back in my college days at the University of Illinois in the mid '80s. I was into neural nets, and as a TA, gave a design to some undergrads for a neural net processor, based on material from a paper I had read (Something called WISARD, by some guy named Aleksandr), which could recognize faces. It worked awesomely, and wowed people who saw it, but it couldn’t do anything except faces. (And it couldn’t do many faces, but for the mid '80s, it was pretty darned impressive as a demo.)

In other words, I’m not sure Haar cascades will work all that well, although I would love to be wrong about that, especially if source code is provided. I’m going to try the sort of things that can be done by detecting lines, circles, polygonal approximations of contours, etc. I started working with some things today just fiddling with GRIP settings, and I’m pretty sure that will be adequate to find the location of cubes on the floor, or strings of red and blue lights.

(And,did I mention I’m not an expert? The work I’ve done with Haar cascades could best be described as “fiddling”, not real work. It just didn’t work for me, and in searching for papers, I couldn’t find anything comparable to something I would want to try in FIRST. If someone came along with better experience, I would gladly listen.)

Thanks for the feedback. Do you know of a different method using opencv? Also, if not stated already, I would also not be classified as an expert. My experience in programming could be described as maybe one and a half days messing with these cascade scripts in python.

There is geek cred in doing a machine learning solution, but I would advise walking before running.

Since your team hasn’t used vision before, play around a little with GRIP. At the very least, it exposes you to some of the concepts that OpenCV offers for doing basic vision.

Last year, our team leapt from using GRIP to using Python + OpenCV. With basic threshholding, smart use of camera settings, and a light ring, you can easily get the vision targets on the side of the switch. Simple and way less computationally expensive than Haar cascades.

We added a second mode to find and quantify position of the yellow gears that didn’t have retroreflectance (HSV is your friend). A similar trick can be used to find the bright yellow covers on the milk crates. Again, this doesn’t require exotic bits of OpenCV.

Simple and working beats ambitious and flakey every time.

Team 900 has a white paper that touches on these methods.
I’d also recommend http://pyimagesearch.com as an excellent resource. Just be careful there as Dr. Rosebrock is now heavily into the deep learning aspects that are very cool, but overkill for this game.

I’ll echo what Levansic said, and add a couple of elements.

Generally, you are looking for something specific. It might be the red or blue lights on the plates of the scale. It might be the retroreflective tape, illuminated by LEDs from an LED ring. It might be a crate.

You start with an HSV filter to get rid of things that are the wrong color. The next step is to find something the right shape. There are openCV functions to find lines, circles, and the best approximate polygon of a certain number of sides. For example, to find a square, you find everything of the right color, and then find polygons that approximate the outline of everything found, and then throw out everything that doesn’t have four sides. (You can’t generally apply an equal length side test initially, because that assumes you are looking at the square from the front.)

And to figure out how to do a lot of this? You play with Grip. Grip is a truly amazing program. Download it, take some pictures of the things you want to find, and start playing with settings until you see what looks like something close to what you want. Then have it generate some code for you by selecting “generate code” from the menu, and you can see the OpenCV functions used. (I don’t think it has the polygon approximation thing. Google for cv2.approxPolyDP, or “find rectangles using OpenCV”)

There’s also a lot of code in all of these code reveal threads that have been posted. I’m afraid that we aren’t reusing anything ourselves, so we didn’t do a code reveal, but lots of other teams have. Hopefully, someone will pipe up with an easy to follow link that does something an awful lot like what you are thinking of doing.

That should get you started.

This is probably the key to most of FRC, especially vision.

The challenge is usually that it’s not just about detecting an object, it’s about doing something useful and quickly with that info.

I’m going to echo the others and say that Haar cascades are unlikely to be the most efficient way to achieve your goals. Our CV pipeline last year was very basic: convert to HSV, blur (to smooth out the pixels), inRange to isolate the retroreflective tape echoing our green LEDs back to us, and then findContoursto get our initial list of Regions of Interest.

I could imagine Haar cascades for detecting non-retroreflective tape shapes (such as cube tracking) but you’re trading off training time for iterating your solution. If I were facing that, I’d probably jump straight to Convolutional Neural Nets (CNNs) and deploy using a Movidius stick. But one way or the other, if you’re using machine learning / differentiable programming, there are a lot of moving parts (data pipeline, training iterations, deployment process). I’m a big fan of jumping into the deep end, but there’d be very high technical risk associated with such a strategy.

On the specific technical questions: speed may or may not be an issue. We achieved 13-15FPS last year at 640x480 on an R Pi. That was sufficient for us and actually our big performance problem was latency from Network Tables! Network Tables is easy, but a challenge for targeting (which was what we were doing last year). This year, I think we’ll use raw sockets.

We found that manually controlling the exposure was critical, since the retroreflective is so effective that it blows out the pixels and they come across as white (and thus can be confused with background lights). We found an old Webcam that supported:

    EXPOSURE = 0.05 
    self.cap = cv2.VideoCapture(video_device_index)
    self.cap.set(cv2.cv.CV_CAP_PROP_BRIGHTNESS, EXPOSURE)

Once we were able to get hue from the retro tape, we switched to HSV, but note that OpenCV does the “H” in HSV from 0-180, not the 0-360 from typical HSV discussions, so:

    # Convert to HSL so that we can focus in on green
    # Note that OpenCV Hue ranges only to 180 (not 360), so use 1/2 values shown in graphics program
    hsl = cv2.cvtColor(frame, cv2.COLOR_BGR2HSV)

    # Smooth out pixels
    blurred = cv2.blur(hsl, (5, 5))

    # Modify this as necessary
    min_green = (40, 100, 100)
    max_green = (80, 255, 255)
    inrange_green = cv2.inRange(blurred, min_green, max_green)