Rectangle identification

The math for finding the distance has 2 parts: finding the distance along to the camera’s Z axis (straight out from the camera’s lens), and then finding the distance along the ground. We’ll start with the camera’s Z axis.

So one thing that’s true about all cameras, and for the most part all vision in general, is that things far away seem smaller. I found that both the width and height are inversely proportional to the distance away. So for any height h and distance z:

h = A/z

Where A is some unknown constant. Luckily, we don’t actually need to know A, because we can eliminate it later. Now we need two sets of h-z pairs: one for the calibration, and we’ll name them h0 and z0, and one for the current target; these will be h1 and z1. So now we have a couple more equations:

h0 = A/z0
h1 = A/z1
h1 = B*h0

B is a constant that we can easily calculate for each new h1, because it’s equal to h1/h0. Now we don’t actually know z1, but we know h1, h0 and z0, so we can solve for z1:

z1 = A/h1
z1 = A/(B*h0)
z1 = A/(B*(A/z0))
z1 = (A/A)/(B/z0)
z1 = 1/((h1/h0)/z0)
z1 = h0*z0/h1

Voila! Now to calibrate h0 and z0, I decided to make things easy for myself. When we get particles from WPILibJ’s image library, the x, y, width and height are all in pixels. Since B (h1/h0) doesn’t change as long as h1 and h0 are in the same units, and one dimension that always stays constant is the size of the image myself, I made h0 the image’s height and h1 the particle’s height in pixels. I then found the Z value in meters required for the vision target to fill the camera’s vertical view by using a tape measure and a good view of the camera’s feedback (e.g. using SmartDashboard’s Camera widget or the default dashboard’s camera feedback).

Finding the distance along the ground is pretty simple once you have the distance along the camera’s Z axis. If you know the angle between the ground and the camera’s view, it’s just a right angle trigonometry problem:

absoluteZ = cameraZ*cos(CAMERA_ANGLE_FROM_GROUND)

To find the angle from the ground, I used an angle measure provided by one of the mentors on my team (who is also a physics teacher). It had a plumb line (basically just a string with a weight on the end) and a protractor along the bottom, similar to the one in this picture:

Obtaining the horizontal angle of any point in an image is actually fairly simple, and only requires 2 pieces of data: the width of the image, and the camera’s horizontal field of view. Every camera has a horizontal fov, and it represents the angle between the last visible points on the left and right sides of the camera. A camera’s horizontal FOV is usually obtained from the camera’s datasheet, but I happen to know that the Axis 206 camera has a 54 degree fov, while the M1011 has a 47 degree fov. Finding the absolute angle of any point (the angle from the left edge of the image) is pretty simple; in fact, it’s just a proportion:

absAngle = x/imageW*FOV

where x is the x value for the point, imageW is the image width, and FOV is the horizontal FOV of the camera.

Now you don’t actually care about this angle, because it doesn’t really tell you where to turn your robot, which is what really matters. What you want is the relative angle, or the angle relative to where the camera is facing. This angle is easily obtained from the absolute angle:

relAngle = absAngle-FOV/2

It should be noted that these angles are going clockwise, so if you want to make it more technically correct by making the counterclockwise, you should negate relAngle.

Our code doesn’t actually track multiple targets (it only tracks the top one), but with some tricks you can track multiple targets with ease.

Now, when tracking multiple targets you may want to have more information, such as the relative height of the targets, and possibly their vertical angles. All of these things are fairly easy to calculate if you have the particle data, mostly following the same lines of the calculations above. Some of it will be a little more difficult, but I don’t want to put too much math into this post, and if you really want me to, I can probably explain a good amount of it.

The most difficult part of tracking multiple targets is differentiating between them. But since we know the layout of the targets, we can do a few comparisons and figure them out.

Now we know the targets are laid out like this:

              --------
             |        |
             |        |
              --------
    --------            --------
   |        |          |        |
   |        |          |        |
    --------            --------
              --------
             |        |
             |        |
              --------

From this image, we can find some useful comparisons

The top target’s center has the lowest y value, and its x value is between the x values of the 2 middle targets.

The bottom target’s center has the highest y value, and its x value is between the x values of the 2 middle targets.

The middle targets’ centers have (almost) the same y value (with some fudge factor, I’d estimate that their difference would never be greater than 1/3 of their height, regardless of distortion), and have y values between the top and bottom targets.

I haven’t personally worked through the code to differentiate all the targets, but I think those comparisons are a good jumping-off point for finding them all.

A final note on optimization: don’t use the operations provided by BinaryImage, ColorImage, etc. Operations such as convexHull, removeSmallObjects, and others all create copies of their image before returning. Dynamic memory allocation and deallocation are 2 of the most expensive operations a computer can do, and if you’re allocating and deallocating a 640x480 image (possibly multiple times) every time you process an image, that’s 307 kilobytes of memory per image, and that’s for the smallest image type, BinaryImage! This may not sound like a lot, but the cRio doesn’t have much processing power, and finding a 307 kb chunk of memory isn’t exactly a cakewalk for it.

When I initially tested my image processing, it was taking up to 5 seconds to process each image. In a 2 minute match, that’s terrible. By having a single ColorImage (for the camera image) and a single BinaryImage (for the processed image(s)) and reusing them between processing loops, I managed to reduce that time to a half a second or less! I reused them by calling the image processing through the NIVision class instead of the provided Image classes. It’s a bit of a pain, but well worth it.

I’d like to provide you with my code, but through the many changes over this build season it has gotten a bit messy, and I think it would confuse you more than it would help you. I’ll do my best to clean it up and provide you with an example, but for now you’ll have to make do with this really long post containing almost all the concepts I figured out during that time. Good luck! :wink: