Does any one know how to make an algorithm that is able to identify solely the the rectangle we are tracking even at angles? See, our problem is that when we only use the threshold we still have other targets such as the lights and windows. And when we use the algorithms in the whitepaper, it only works when we are in front of the robot. So could someone post there code for identifying rectangles or explain what the did?
thanks
We didn’t track rectangles specifically - we tracked particles. With a good green LED ring around the lens of the camera, we found that it created a pretty distinct color range when reflected off the retroreflective tape (high blue, slightly higher green, low red). The algorithm went something like this:
-get image from camera
-perform threshold
-convex hull (makes particles a little clearer)
-remove small objects (removes noise and small gaps in particles, since gaps can make IMAQ think that they're separate particles)
-particle size filter (eliminates some more noise)
-get details (distance away & angle)
It calculates the distance based on the height that it sees, because if the camera is at a consistent height, the target’s height is unlikely to be distorted, while its width can easily appear to vary with the angle of view.
Now I’m not sure that this is the best strategy, or even if it’s really a good one, but with some optimization I got it to process an image in half a second, and provide an accurate distance to within 5 cm. Also, it runs in a separate thread, so that half a second isn’t lagging the rest of the cRio
If you want to know anything more about the process (especially distance and angle calculations), just feel free to ask.
thanks for your help. I am new this year and the only programmer on our team so I would really appreciate it if you could explain your equation for distance at angle and while straight in front. Also how did you manage to track multiple objects. And if it wouldn’t be to much trouble could you post your code so I could see how you did it? Also how did you do angle measurements?
Thanks a ton,
Dimitri
The math for finding the distance has 2 parts: finding the distance along to the camera’s Z axis (straight out from the camera’s lens), and then finding the distance along the ground. We’ll start with the camera’s Z axis.
So one thing that’s true about all cameras, and for the most part all vision in general, is that things far away seem smaller. I found that both the width and height are inversely proportional to the distance away. So for any height h and distance z:
h = A/z
Where A is some unknown constant. Luckily, we don’t actually need to know A, because we can eliminate it later. Now we need two sets of h-z pairs: one for the calibration, and we’ll name them h0 and z0, and one for the current target; these will be h1 and z1. So now we have a couple more equations:
h0 = A/z0
h1 = A/z1
h1 = B*h0
B is a constant that we can easily calculate for each new h1, because it’s equal to h1/h0. Now we don’t actually know z1, but we know h1, h0 and z0, so we can solve for z1:
z1 = A/h1
z1 = A/(B*h0)
z1 = A/(B*(A/z0))
z1 = (A/A)/(B/z0)
z1 = 1/((h1/h0)/z0)
z1 = h0*z0/h1
Voila! Now to calibrate h0 and z0, I decided to make things easy for myself. When we get particles from WPILibJ’s image library, the x, y, width and height are all in pixels. Since B (h1/h0) doesn’t change as long as h1 and h0 are in the same units, and one dimension that always stays constant is the size of the image myself, I made h0 the image’s height and h1 the particle’s height in pixels. I then found the Z value in meters required for the vision target to fill the camera’s vertical view by using a tape measure and a good view of the camera’s feedback (e.g. using SmartDashboard’s Camera widget or the default dashboard’s camera feedback).
Finding the distance along the ground is pretty simple once you have the distance along the camera’s Z axis. If you know the angle between the ground and the camera’s view, it’s just a right angle trigonometry problem:
absoluteZ = cameraZ*cos(CAMERA_ANGLE_FROM_GROUND)
To find the angle from the ground, I used an angle measure provided by one of the mentors on my team (who is also a physics teacher). It had a plumb line (basically just a string with a weight on the end) and a protractor along the bottom, similar to the one in this picture:
Obtaining the horizontal angle of any point in an image is actually fairly simple, and only requires 2 pieces of data: the width of the image, and the camera’s horizontal field of view. Every camera has a horizontal fov, and it represents the angle between the last visible points on the left and right sides of the camera. A camera’s horizontal FOV is usually obtained from the camera’s datasheet, but I happen to know that the Axis 206 camera has a 54 degree fov, while the M1011 has a 47 degree fov. Finding the absolute angle of any point (the angle from the left edge of the image) is pretty simple; in fact, it’s just a proportion:
absAngle = x/imageW*FOV
where x is the x value for the point, imageW is the image width, and FOV is the horizontal FOV of the camera.
Now you don’t actually care about this angle, because it doesn’t really tell you where to turn your robot, which is what really matters. What you want is the relative angle, or the angle relative to where the camera is facing. This angle is easily obtained from the absolute angle:
relAngle = absAngle-FOV/2
It should be noted that these angles are going clockwise, so if you want to make it more technically correct by making the counterclockwise, you should negate relAngle.
Our code doesn’t actually track multiple targets (it only tracks the top one), but with some tricks you can track multiple targets with ease.
Now, when tracking multiple targets you may want to have more information, such as the relative height of the targets, and possibly their vertical angles. All of these things are fairly easy to calculate if you have the particle data, mostly following the same lines of the calculations above. Some of it will be a little more difficult, but I don’t want to put too much math into this post, and if you really want me to, I can probably explain a good amount of it.
The most difficult part of tracking multiple targets is differentiating between them. But since we know the layout of the targets, we can do a few comparisons and figure them out.
Now we know the targets are laid out like this:
--------
| |
| |
--------
-------- --------
| | | |
| | | |
-------- --------
--------
| |
| |
--------
From this image, we can find some useful comparisons
The top target’s center has the lowest y value, and its x value is between the x values of the 2 middle targets.
The bottom target’s center has the highest y value, and its x value is between the x values of the 2 middle targets.
The middle targets’ centers have (almost) the same y value (with some fudge factor, I’d estimate that their difference would never be greater than 1/3 of their height, regardless of distortion), and have y values between the top and bottom targets.
I haven’t personally worked through the code to differentiate all the targets, but I think those comparisons are a good jumping-off point for finding them all.
A final note on optimization: don’t use the operations provided by BinaryImage, ColorImage, etc. Operations such as convexHull, removeSmallObjects, and others all create copies of their image before returning. Dynamic memory allocation and deallocation are 2 of the most expensive operations a computer can do, and if you’re allocating and deallocating a 640x480 image (possibly multiple times) every time you process an image, that’s 307 kilobytes of memory per image, and that’s for the smallest image type, BinaryImage! This may not sound like a lot, but the cRio doesn’t have much processing power, and finding a 307 kb chunk of memory isn’t exactly a cakewalk for it.
When I initially tested my image processing, it was taking up to 5 seconds to process each image. In a 2 minute match, that’s terrible. By having a single ColorImage (for the camera image) and a single BinaryImage (for the processed image(s)) and reusing them between processing loops, I managed to reduce that time to a half a second or less! I reused them by calling the image processing through the NIVision class instead of the provided Image classes. It’s a bit of a pain, but well worth it.
I’d like to provide you with my code, but through the many changes over this build season it has gotten a bit messy, and I think it would confuse you more than it would help you. I’ll do my best to clean it up and provide you with an example, but for now you’ll have to make do with this really long post containing almost all the concepts I figured out during that time. Good luck!
Thanks for ur great explanation. And thanks in advance for posting your code.