We’re trying to get some of the vision tracking going that never quite worked during the season.$@# I’m having a bit of a problem with localization.$@# I’m wondering if anyone in the CD community can spot the flaw in what I am doing. I’m afraid this will be directed mostly to mentors here, but if a student happened to understand the question and provide an answer, I would be very pleased and impressed.
$@#
My goal is to find the position of the camera in world (model) coordinates by finding the vision targets.$@# The basic approach is to find the target in the 2-D image using opencv functions, and then pass that information to solvePNP.$@# The output of solvePNP gives translation and rotation vectors that provide the 3-D coordinates of the target object in the camera coordinate system.$@# What I want is to get the position of the camera in the world coordinate system.$@# I do that by inverting the resulting transformation.
$@#
It’s a bit hard to describe the actual problem I am having$@# without the use of graphics, but I will try.
$@#
Suppose I have a vision target (a piece of reflective tape of known dimensions, for example).$@# Now suppose I move the camera 10 inches in front of the target, and then 10 inches to the left of the target, and then turn the camera 45 degrees so the camera origin is now facing the world coordinate system origin.
$@#
So, now I have the camera facing the target, 10 inches in front of the target, and 10 inches to the left.$@# The world coordinate system is set up so that the X axis is horizontal to the right.$@# The Y axis is vertically down.$@# (Done that way because pixel counts typically increase as we go down.)$@# By the right hand rule, that means the Z axis is into the wall.
$@#
What I expect when I calculate the position of the camera is for the camera to be at the point (-10, 0, -10).$@# Instead, what I am getting is (10, 0, -10).$@# The value for the x axis has the wrong sign.$@# 10 inches on the positve x axis would put it to the right of the target, instead of the left.$@# (Well, in the real world, I have slight modifications because I’m not at exactly 45 degrees, and the vision doesn’t get the corners of the rectangle perfect, but I’ve rounded it back to approximate “ideal” values.)
$@#
Here’s the approach in more detail.$@# I call solvePNP.$@# It gives me back a translation vector of (0,0,14.14) and a rotation vector of (0,-pi/4,0)$@# With X defined to the right, and Y down, Z is toward the target, so it is telling me that the target is 10sqrt(2) inches in front of the camera, rotated 45 degrees.$@# That seems exactly right…except that I think it ought to be +pi/4 instead of -pi/4.
$@#
I can use those values to compute a transformation from world coordinates to camera coordiates.$@# I’m using the homogenous transformation style of coordinate conversion where I create a matrix [R|T,0|1], where R is the rotation matrix, and T is the translation vector.$@# T can be read directly from solvePNP.$@# R can be computed by creating three 3x3 rotation matrices, one each for X, Y, and Z, using the values from solvePNP.$@# Then, I multiply the three matrices together to get the composite matrix.$@# To do the composite matrix I multiply ZY*X, because that’s the order I read I was supposed to use.
$@#
Now I have my 4x4 matrix that will translate from world to camera.$@# I want camera to world, so all I have to do is invert the matrix, and I’m done.$@# And since the camera location is what I want, that position is (0,0,0) in the camera coordinate system, which means the position of the camera is just the T portion of the inverted matrix. Performing the inversion, I get (10, 0, -10), which is correct except for the sign of the X coordinate.$@# The actual value I expect is (-10, 0, -10).$@# As a student, we always argued that sign errors should not be worth many points, but in the real world, it means my robot turns left instead of right.$@# Sign errors become much more important in that case.
$@#
If I want to get the camera orientation in world coordinates, I have to extract the Euler angles from the computed rotation matrix, but I’m ignoring that for the moment.
$@#
I find that if I change the output of solvePNP from a -45 degree rotation to a +45 degree rotation, I get exactly the answer that I want, and it seems correct to me as well, but then why is solvePNP spitting out a -45 in the first place?
$@#
Well, I hope that can be deciphered, and someone can understand what I’m saying there.$@# Any help would be appreciated.
It seems like the rotation vector that is returned by solvePNP is clockwise for whatever reason. You want the end result to be counterclockwise (Y axis down and rotation 45 degrees to the right is positive).
OpenCV has a builtin method Rodrigues() which takes a 3x1 rotation vector and turns it into a 3x3 rotation matrix. Documentation [here](http://docs.opencv.org/2.4/modules/calib3d/doc/camera_calibration_and_3d_reconstruction.html#void Rodrigues(InputArray src, OutputArray dst, OutputArray jacobian)).
See if the rotation matrix ® produced by the Rodrigues method is the same as the one your code produces. My suspicion is that that the Rodriguez method also converts from clockwise to counterclockwise internally.
Do you have code you could share?
Yep.
Thanks. I signed on to let people know that I had solved the problem, and the solution was indeed using the Rodrigues() function.
I’ll be sharing the code later, but here is the basic approach for anyone interested.
Use calibratecamera() to get a camera calibration matrix and distortion coefficients.
Find the object of interest, and get the 2D points of that object. (Typically, the corners of the reflective tape rectangle, but I’ll be working on some more ambitious recognition now that this is working.)
Knowing the 3D world coordinates of the object, use solvePNP() to get a rotation vector and translation vector.
Use Rodrigues() to take the 3D returned rotation vector and translate it into a 3x3 rotation matrix.
Form a 4x4 matrix from [R|T,0|1], using the homogeneous coordinate convention to combine a translation and rotation matrix. This matrix will translate the world centered coordinate system into the camera centered coordinate system.
Invert that 4x4 matrix. The T portion of the inverted matrix is now the position of the camera in the world coordinate system. The orientation can be derived from the R portion of the inverted matrix. I will be posting code after a bit of cleanup.
Thanks a lot for the offer of assistance.
Be cautious as to which OpenCV version you’re using with SolvePnP. Github issue I haven’t been keeping up with the OpenCV project for quite some tme, but it seems likely that this problem isn’t fixed.
Regardless, glad you figured it out! And welcome to the world of computer vision theory!
I havent’ t gotten back to this in a while, and the code isn’t exactly as clean as I would like, but here it is, and it worked well for me.
#This code is exerted from a program I wrote when working on finding the location of the camera (and therefore the robot) based on finding the location of the vision target. In this code, my vision target was a printed green square, not one of the FIRST vision targets. It was 2.6 inches on a side.
#given a set of quadrilateral candidate, determine the location of the camera
#returns cameralocation as matrix, boolean True= candidates were found
#Find the vector from the camera to the target, in camera centered coordinates. In those coordinates, the origin is the center of the camera. The x axis is to the right, and the y axis is down. That puts the Z axis projecting directly out of the camera plane. The name “findCameraLocation” is a bit misleading. It is really finding the location of the object. (The name comes from some earlier experimentation)
def findCameraLocation(candidate,camera_matrix,dist_coefs):
print ‘testing candidate’
print 'candidate is ', candidates
#I used a 2.6 inch square as my target
objpoints1=np.array((0,0,0),(2.6,0,0),(2.6,2.6,0),(0,2.6,0)],dtype=np.float)
print ‘obj points set’
#objpoints1 is an array of the corners of the square in world coordinates
retval,rvec1,tvec1=cv2.solvePnP(objpoints1,candidate,camera_matrix,dist_coefs)
print ‘returning result’
return True,tvec1,rvec1
#The return of true in the code above is a holdover from some old code. It is meaningless in this code.
#This is the code block for finding the location of the camera in world coordinates. In this case, “world coordinates” means centered on the upper left of the square that was the vision target.
#Before calling this function, find the corners of the vision target in pixel coordinates, and store those coordinates in rect.
found,tvec,rvec=findCameraLocation(rect,camera_matrix,dist_coefs)
if found:
print ‘translation’,tvec
print ‘rotation’,rvec
#tvec and rvec give the location of the target in camera centered coordinates. What we want is the location of the camera in world coordinates. In this case, the origin of the world coordinate system is the top left corner of the green square.
#to really use all this for a general purpose function that would find the position of the robot on the field, we would have to define a coordinate system for the field, where 0,0,0 might be the center of the field on the floor, and a transformation (rotation and translation) of the vision target in those coordinates. For now, we assume that the world coordinates are centered on the vision targets.
#The approach that we will take is to use rvec and tvec to form a transformation matrix using the Rodrigues algorithm. (look it up) Then, invert the matrix.
#call Rodrigues using rvec. The algorithm creates a 3x3 rotation matrix out of 3 euler angles, which is what are returned from the FindCameraLocation function.
ZYX,jac=cv2.Rodrigues(rvec)
print 'Rodriguex'
print ZYX
print 'done with r'
#Now we have a 3x3 rotation matrix, and a translation vector. Form the 4x4 transformation matrix using homogeneous coordinates.
#There are probably numpy functions for array/matrix manipulations that would make this easier, but I don’t know them and this works.
totalrotmax=np.array([ZYX[0,0],ZYX[0,1],ZYX[0,2],tvec[0]],
[ZYX[1,0],ZYX[1,1],ZYX[1,2],tvec[1]],
[ZYX[2,0],ZYX[2,1],ZYX[2,2],tvec[2]],
[0,0,0,1]])
#The resulting array is the transformation matrix from world coordinates (centered on the target) to camera coordinates. (Centered on the camera) We need camera to world. That is just the inverse of that matrix.
WtoC=np.mat(totalrotmax)
inverserotmax=np.linalg.inv(totalrotmax)
f=inverserotmax
print(inverserotmax)
#The inverserotmax is the 4x4 homogeneous transformation matrix for camera to world coordinates.
#The location of the camera in world coordinates is given by the elements of column 4, the translation vector. The rotation matrix is given by the 3x3 rotation submatrix.
#If you need the euler angles out of that rotation matrix, the Rodrigues function can also provide that, but we know that the rotation angles in the inverse transformation are just -1*rvec from the original matrix.