Vision Processing - Target Recognition

What are some approaches to target recognition that FRC uses? That is, what is the algorithm teams use to recognize the vision target in the image?

CHS Robotics used a jury-rigged system with the NI Vision Library to look at rectangles and sizes to identify targets in 2012. I later personally wrote another algorithm that did pattern recognition on the corners of the targets. At UCSD, I’m now working with a project that uses a biologically inspired discriminant saliency algorithm for target acquisition and recognition. Are the approaches just as varied and complex in FRC?

Our team used OpenCV (through JavaCV, as this was a SmartDashboard extension) to find the targets. We used contour analysis to find possible targets, then performed a convex hull and polygon approximation. If the contour approximated to have 4 edges, oriented properly (ie. alternating nearly-horizontal and nearly-vertical), it was selected as a target, and if the aspect ratio was in the proper (configurable) range, it would be separately identified as either a 2- or 3-point target.

We ended up using a convex hull then looked at rectangles in 2013(and had some success, but we never used it), but in 2012, we had a harder time picking out the targets without spending time calibrating the camera. So, we turned on our green light ring, captured an image, turned off the green ring, captured an image, and compared. This gave us a great view of the targets because we only allowed items that turned green to show up in the final image. Then, we told the camera to ignore anything that was above or below the actual targets, or too far in the wrong direction. The point was we would only search for targets where we thought they could actually be. Finally we acquired HSL ranges for the targets based on the average of the regions where the camera thought it saw a target, and used the same convex hull/rectangle scoring/pid loop to do the final alignment. This worked well, but it took an additional two seconds for the two images to be captured and our laptop to do all the math.

I can only comment on what I’ve seen, but the vision processing approaches I’ve seen are highly varied but fairly basic.

I’ve seen teams use NI Vision, OpenCV, RoboRealm, with a few doing custom algorithms.

Most are ad hoc and purpose-made. They only need to know orientation and perhaps distance, and that is all they calculate. It would be great to see robots that detected game elements, field boundaries, friend/foe robots, and navigational obstacles in general, but this is not generally seen as beneficial to the game, so they are largely ignored or are accomplished with other sensors.

Some of the ad hoc approaches are well done, and others are brittle. If you would like to make your knowledge and/or code available to teams, I’d suggest a blog or white paper that analyzes previous games and their vision elements. Perhaps include things like lines on the field, carpet detection, bumper detection. I’d discourage a highly sophisticated black box over an understandable and tinker-able set of widgets that teams can use in many different ways.

And of course you can offer up your opinions and insight on this forum.

Greg McKaskle

1706 used a fairly standard process.

We grabbed the image, as a grayscale,then put it through a binary threshold. Then found the contours of the image, which was mentioned in a few posts above. next, we took the sequence of contours, and applied a moment to it, which provided us with a subpixel accurate corner. Since we had such an accurate corner, we threw object pose out(which we used in 2012, which did work, but was rather complicated), and found distance based off basic trig. The angle in the screen was calculated:

x pixels up/half the screen’s height = some y rotation/half the field of view

we added that angle to our camera’s world angle, and since the height of the camera and center of the 3 point is constant, that is opposite, we simply solve for adjacent, distance on the ground to the target. From the feeder station, we were reading ~48.5 feet, and we measured it to be something like 48.7.

As for differentiating between the 2 and 3 point, a simple aspect ration was calculated. Our magic number was 2.4. The aspect ration was (top length + bottom length)/ left side + right side) where length was in pixels. the simple algebra 1 distance equation was used. We got the corners from approximating a polygon(which I have nearly fixed, will post finished code on here in about a week when it is applicable for multiple targets and not just one). A simulation was created of the field (including the pyramid), to virtually test the vision system. In the simulation, object pose was used, and you can change rotation, y displacement(crossrange) and x displacement(downrange). Then the vision program was applied, and 2.4 was found to be the magic number.

A neat thing that was discovered this year, was the setting of using a tree in cvFindContours. To eliminate everything that wasn’t a target, I simply made this check

if(contours->v_next == NULL)
{
contours = contours->h_next;
}

what this does is, if a contour was detected that was solid, then it wasnt a target and it was skipped over, and the algorithm to calculate aspect ration was skipped over and it went to the next contour, this was also used if the aspect ration was calculated as > 4.2.

Sorry about the length of this response, but I’m almost done!

In terra heute on thursday we ran into some problems with the sun and detecting robots, so a field of view was applied to eliminate the lower part of the screen, just below the lowest point the 3 point could be, which was discovered using the simulation. Also, we would detect windows behind the alliance wall, so I applied an upper bound, it read

if(center.y > 260)
{
contours = contours->h_next;
{

*note, the center was re-positioned to be at the center of the screen.

This was a nightmare Thursday morning when the problem was discovered, especially since st louis went so smoothly, but it got fixed!

Lastly, I calculated x rotation similarly to how I calculated y rotation, so i sent the crio distance and x rotation. The programmer of the crio tuned a pid so that whenever he held a button, we locked onto the 3 pt and our turret automatically adjusted based off distance. BUT, we ran into some tough defense, so we made 2 more cases for the pid, one was tuned to make the vision read 12 degrees to the right of the 3pt, and lower the turret, so that we would be able to shoot into the near 2pt, and the other was for the other 2pt. so we had 3 buttons for each other the targets, and since we had machanuum wheels, it took us half a second to realign to pick another target to shoot at, which proved very annoying for a defending robot.

DONE.

For rookie teams, or teams that want to shut down a robot with vision, dont worry about blocking the shot. The best defender we encountered just put a big pole up, and while it did block our shots, it made my interior contour test fail, which caused the vision program to get no reading, which made us blind. So, disturb the camera’s view! it’s a competition, not a friendly.