Camera Pose Estimation help

I have it towards the beginning of my ongoing list of what I want to learn, to learn Camera pose estimation. I feel like it would be nice, if the computer could analyze a two-dimensional image and calculate the distance, and also the angle to the target. However, I am stuck because I don’t have access to any mentors who have messed around with image processing. I wasn’t able to find any OpenCV C++ examples on implementing this, leaving me with a list of functions, basically jargon to me as I don’t have a clue on what they do.

It would be really helpful if a mentor would help me glean the pieces to build the mental jigsaw! Hopefully I could use this to have an extremely powerful vision system!

Thanks, and I hope you enjoyed this past season! :slight_smile:

By the way, I switched to GitHub, and I wrote an OpenCV template to get the programmer started with a threaded image grabber instantly!

Figuring out the angles between the camera coordinate frame and a given pixel in the image is something that is very useful and definitely worth learning. It does require some knowledge of matrix algebra and is typically a university level topic, but if you have the will to learn, there are some good resources online. You will want to read up on the following terms:

pinhole camera model
camera calibration
homogenous coordinates
camera resectioning
intrinsic matrix
extrinsic matrix

Some good references are the OpenCV camera calibration page and the Wikipedia page for Camera Resectioning.

Basically, you need to find a function that converts an (x, y) pixel in the camera frame into a ray oriented in the direction of (X, Y, Z) in the world frame. A common way to figure out this transform is to perform calibration against a series of images of a pattern of known dimensions, like a checkerboard with precisely measured squares. You can then use these measurements to compute the “optimal” (least squares) intrinsic matrix (modeling things like the field of view of the camera) and extrinsic matrix (modeling things like the pose of the camera relative to the world).

Distance is trickier. Each pixel in a standard 2D camera frame actually represents a ray between the focal plane and objects in the world. There is no explicit way to measure distance, but there are tricks you can use (stereo cameras, assuming things about the size or position of the target, etc.) depending on what the actual problem is that you are trying to solve. One reason the Microsoft Kinect and other depth cameras are so cool is that you get a measured distance value for each pixel!

Take a look at the dropbox-work-speedtest-kinect. It was my 2012 program that uses camera pose estimation. It is in c, but the concept is still the same.

For those of you who are not on the dropbox, pm me an email and I’ll add you. I am working on including any aspect of opencv that would possibly be used for an FRC competition. This includes game piece recognition, cascade training, target recognition, and I was going to do human interaction but 254 beat me to it.

I already understand distance calculations. My main purpose is to learn pose estimation! Thanks for that information as that will help guide me to learning all the basics! It turns out that I have a good friend who is quite gifted in Calculus, so he will be there to help me understand all the complex mathematics!

In this document,
should I try going through chapter six (P379, [P385 actually])?

For you purposes, I’d look into findextrisiccameraparams2, or the c++ equivalent, solvepnp, projectpoints, Rodrigues2, and RQDecomp3x3.

Here is the paper from the writers of the solvepnp algorithm: A Complete Linear 4-Point Algorithm for Camera Pose Determination. I guarantee that you will not understand it (I barely do I’ve been studying it for 2 years) considering your math background, but it has a stellar intro as to what camera pose actually is.

Also, a good textbook is A Multiple view geometry in computer vision by Hartley and Zisserman.

(Both of these are in the dropbox)

I feel like I should probably use the Kinect for now! I believe it has built in camera calibration. For pose estimation, I am still crawling Google in search of a good tutorial on pose estimation!

By camera calibration do you mean the intrinsic camera params are known? If so, then you can calibrate any camera by an opencv program found here: Camera calibration With OpenCV — OpenCV 2.4.13.7 documentation? That returns the internal camera matrix. I highly recommend running this program on any camera and understanding what it does.

Like I said, get an example of a camera pose estimation to work, then go through the code and see how it accomplishes the task.

yash,
There are a number of calculations that need to be done once you have some known objects in the frame. If you are pointing the camera at a known object (size, shape and orientation in space) then you have to know the focal length of the visual field, how much the known object occupies and what orientation it might be in to make the calculations. In typical FIRST cameras, the focal length is known so there is no variable that has to be entered for zoom lenses and focus. In the cameras that are used for the virtual down line in football, there is sensors on the lens for focus and zoom, there are sensors for pan and tilt in the camera head and then there is positional info that supplied by the down marker flag. Each camera is calibrated for the field and there is obvious known size objects on the field so there is a lot of calculations done for that. Just to display the down line on program video, only three or four cameras are used and they require a 40’ trailer of computers to make the calculations and supply the information to the computer that makes the display. If you watch, you will only see the line displayed on those cameras. Any of the other cameras will not show a down line and those are the dead give aways for the cameras that do not supply the info needed. Handheld and mobile cameras will never be the host camera for this display. with a fixed lense, and known size object you might be able to calculate distance if the object doesn’t change aspect ratio with respect to the camera.