Tracking Rectangles

For this year’s game, it seems to be necessary to locate the backboards programatically in both the “hybrid” and tele-op periods. I found a paper from Team 1511 that helps determine a robot’s position relative to a vision target (see: http://www.chiefdelphi.com/media/papers/2324).

I feel like most of the math and thought behind that paper would translate well to this scenario too. The only problem is finding the rectangles accurately in an image using the camera or Kinect. I was wondering if any teams had any tips on how to do this with the camera, since we haven’t really tried camera-tracking since we played with CircleTrackerDemo two years ago. Unfortunately, most of the CircleTrackerDemo code seems specific to ellipses only. Any ideas on how to do rectangles with a camera? Perhaps some code we can use?

If that isn’t possible, an alternative would be using the Kinect. Although I’m sort of clueless when it comes to shape tracking (outside of human shapes) with the Kinect.

Thank you for your help, and I appreciate any input you may have.


http://homepages.inf.ed.ac.uk/rbf/HIPR2/hough.htm
http://www.inf.ufrgs.br/~crjung/research_arquivos/paper_1125.pdf

Go knock yourself out.

Looks like a lot of complicated math to work through, but that’s to be expected. I am interested in how your first link does perspective rectangle detection, but it doesn’t seem to include any mathematical descriptions. There are some vague mentions of finding the vanishing points and the unit vector field pointing in the direction of vanishing lines, but nothing specific.

There’s currently a rectangle processing VI or something like that for LabView programers. I’m hoping that could be ported to Java (and/or C++) soon, since we moved on from LabView a long while back. Is there any word on this?

I was also wondering how other teams accomplished tracking of retro-reflective tape (in circular and rectangular shapes) for Logomotion in Java. From an electronic point of view, a cluster of LEDs around the camera seems necessary. However, programmatically, was it necessary to do Hough Transforms? If so, is there a more concise description of these transforms we can access? Perhaps how to take the transformed image and use it to determine the edges of a rectangle? Thanks as always.

I can’t find the white paper on the NI site yet, but one should be posted soon that covers several approaches. One approach uses simple particle analysis to identify the ones most like hollow rectangles. Another approach is to use the line or rectangle geometric fit routines – which are Hough implementations under the hood.

The paper actually uses NI Vision Assistant for most of the exploration, but does refer to the LV example when it comes to scoring and position/distance calculation. The LV example will also run directly on your computer, so your cRIO can run whatever, and the laptop can pull images directly from the camera that is on the switch.

Greg McKaskle

I posted a copy of Greg’s the Whitepaper here:

http://firstforge.wpi.edu/sf/docman/do/listDocuments/projects.wpilib/docman.root

This has a lot of good information about finding and tracking the 2012 vision targets.

Brad

Thank you Brad, this is perfect. Just three questions/comments for anyone:

  1. On the PDF under the “Measurements” section concerning distance (page 9), it says that the blue rectangle width is 11.4 ft but half the blue rectangle width is 6.7 ft. I don’t know who wrote this, but that seems like a typo.

  2. Does the particle processing method only accurately find rectangles when it encounters them head on? Is the edge detection method necessary to find rectangles distorted by perspective?

  • I’m assuming it’s possible to use the edge detection method in Java by taking the NI Vision Assistant’s generated C code and translating it (hopefully).
  1. Are there any pointers you can give on how to process camera images on the laptop instead of the cRIO? We’ve never tried this before, but it seems worth doing.

Thank you again for your help.

I have to wonder if you read these papers before you suggested them. Expecting somebody with high school education (assuming bhasinl has completed high school in the first place) to read through these papers without your own practical insight and reasoning for posting them seems a little pointless.

:frowning: I was just trying to help, but I was merely trying to demonstrate the complexity of the issue. It is just not merely telling the cRio to find a rectangle and that there are many things going behind the scene.

Personally, these types of papers are actually fun to read. Even if you only understand quarter of the math, the more you try, you start getting a bigger picture. I think it is more beneficial to butt your head and try going through this way first before using code already provided. It really builds character IMHO.

I love challenges too and it was interesting to read and actually (somewhat) understand the problem of image processing better. Of course, I appreciate that it is quite a big feat, and know through experience that you can’t tell the cRIO to just look for a rectangle. However, given that the NI Vision Assistant does a lot of what we need it to do in terms of image processing, it makes more sense to use that instead of writing code that does Hough transforms on images and looks for peaks. Thank you for the read though.

Going back to the issue, I was wondering if any teams could answer my third question from my post above: how is it possible to have the image processing happen on the laptop instead of the cRIO? We’re replacing our Classmate this year so the performance gain could be significant. I read in the PDF that making this switch requires no change in code (or something along those lines). What does it require then?

I don’t mean to attack in any sense, I think you should’ve offered this kind of explanation in the first place. I’ve seen you post the exact content on another thread too.

Completely agreed with this, reading papers is a skill that requires lots of practice. Another great thing to do is to read the WPILib code.

The code will be the same because the components you use in LabVIEW are abstracted for each library (IMAQ for the cRio/laptop and OpenCV for the laptop). For instance if you use some sort of rectangle detector block, LabVIEW will know whether to use IMAQ or OpenCV based on the target platform - your code remains the same.

If you’re using your Classmate as the DS and want to do the processing on there, I assume you’d use the dashboard data (there should be examples in LabVIEW, there are in C++).

I’m still wondering how to do it between the cRio and a laptop on the robot.

EDIT:

To take advantage of this and distribute the processing, you need to send the image from the robot to the dashboard laptop, process it, and then send the results back to the robot. Transmitting data and images also takes time, so the best location to process images depends on all of these factors. You may want to take measurements or experiment to determine the best approach for your team.

This is the quote from NI’s document, so yes you’ll have to use the dashboard data example.

I was once good at head-math, but I guess things change. The formula is correct, you take half of the blue rectangle. The example values are wrong, half of 11.4 is 5.7, not 6.7.

As for running on the laptop. The LV example project does both. A LV project can have code for multiple devices or target devices. For simplicity, the FRC projects tend to have only one. The rectangular target processing project has roughly the same code with slight differences in how it is invoked under both the My Computer section of the project and the RT cRIO section. The tutorial goes into detail about how to Save As the RT section to run on the cRIO, but if you prefer, you can pretty easily integrate the My Computer VI into your dashboard, do the processing, and arrange for the values to be sent back to the robot via UDP or TCP.

If you prefer to use OpenCV, it should theoretically run both locations, but I’m not aware of any port of it for the PPC architecture. Both OpenCV and NI-Vision run on the laptop.

If I glossed over too many details, feel free to ask more detailed questions.

Greg McKaskle

Greg: Are there any examples out there of the dashboard sending data via UDP or TCP data and example of the cRIO receiving UDP or TCP data using C++ code?

Pretty new to the whole FRC programming as a whole. Sorry if this is a “dumb” question.

Thanks,

Jay

The framework examples do a bit of this already, but for a limited protocol.

If you drill into the dashboard code, you will find that the camera loop does TCP port 80 communications to the camera. The Kinect loop does UDP from a localhost Kinect Server, and even the other loop gets its data from a UDP port from the robot.

For the robot side, there are C++ classes for building up a LabVIEW binary type and submitting it for low or high priority user data. I’m not that familiar with other portions of the framework which may directly use UDP or TCP.

Greg McKaskle

The whitepaper is extremely useful but the part I needed help with is actually what’s glossed over the most. My understanding is that it’s fully possible to determine both angle and distance from the target by the skew of the rectangle and the size. Here is a quote from the whitepaper:

“Shown to the right, the contours are fit with lines, and with some work, it is possible to identify the shared points and reconstruct the quadrilateral and therefore the perspective rectangle”

Except it stops there. Have any other reading or direction you can send us to take this the rest of the way? I’d really like our bot to be able to find it’s location on the floor with the vision targets and unless we are straight-on, this is going to require handling the angle. Thanks!

-Mike

But question is still how do you get the robot to track the rectangle like it would with a circle?

This varied depending on which language you use. If you aren’t using Java, you should have access to a “convex hull” function, and a “find edge” function, which should do what you want. I haven’t tested this yet, as I am using Java and do not have these functions. I’m working on getting them implemented in Java, but I have bigger fish to fry at the moment.

In theory, the bounding rectangle should be enough, if you put your camera as high as possible, and are willing to tolerate a little error. The height would tell you how far away you are, and the width, after accounting for the height, would tell you how far “off center” you are, giving you your position in polar form relative to the target. The error would be greater the further off center you are (since the perspective transformation makes the rectangle taller than it should be), but I would need to test to see if it is a significant amount.

There are a number of approaches, but I’ll show the one that I would use – not in code, but as an example. I’m also making a few simplifications to get things started – notably, I’m often assuming that the camera and target are in the same vertical plane.

  1. I open up the image shown in the paper into Vision Assistant (the one with the perspective distortion).
  2. Use the third tool, the Measure tool to determine the lengths of the left and right vertical edges of the reflective strip. I measure 100 pixels and 134 pixels.

First image shows the measurements in red and green.

Since the edges are different pixel sizes, they are clearly different distances from the camera, but in the real-world, both are 18" long. The image is 320x240 pixels in size.

The FOV height where the red and green lines are drawn are found using …

240 / 100 x 18" -> 43.2" for green,
and
240 / 134 x 18" -> 32.2" for red.

These may seem odd at first, but it is stating that if a tape measure were in the photo where the green line is drawn, taped to the backboard, you would see that from top to bottom in the camera photo, 43.2 inches would be visible on the left/green side, and since the red is closer, only 32.2 inches would be visible.

Next find the distance to the lines using theta of 47 degrees for the M1011…
(43.2 / 2) / tan( theta / 2) -> 49.7"
and
(32.2 / 2 ) / tan( theta / 2) -> 37.0"

This says that if you were to stretch a tape measure from the camera lens to to green line, it would read 49.7 inches, and to the red line would read 37 inches.

These measurements form two edges of a triangle from the camera to the red line and from the camera to the green line, and the third is the width of the retro-reflective rectangle, or 24". Note that this is not typically a right triangle.

I think the next step would depend on how you intend to shoot. One team may want to solve for the center of the hoop, another may want to solve for the center of the rectangle.

If you would like to measure the angles of the rectangle described above, you may want to look up the law of cosines. It will allow you to solve for any of the unknown angles.

I’d encourage you to place yardsticks or tape measures on your backboard and walk to different locations on the field and capture photos through your camera. You can then do similar calculations by hand or with your program. You can then calculate many of the different unknown values and determine which are useful for determining a shooting solution.

As with the white paper, this is not intended to be a final solution, but a starting point. Feel free to ask followup questions or pose other approaches.

Greg McKaskle





This is incredibly helpful - I have no idea why the idea to use the camera as part of a triangle didn’t come to mind but it was the key piece I was missing. Thanks much!

-Mike

Would you be able to provide some raw images of the hoops through the Axis camera?