Vision Following while incoming at angle



Currently, we have decent vision tracking on a RPi (we can see the target from about 3 meters away at 12FPS) and have been successful in sending the data over to the roborio through network tables (using flush to be quicker). Up until now, we’ve been using information for the x difference (center of Deep Space Target - center of image) as the input for the driving.

When starting roughly parallel to the target (see Case #1 in the attached image), we are successful in driving up to the target and lining up parallel. However, when we start at an angle (see Case #2), we end up having the target in the exact center of the image but the robot hits the wall at an angle (see Case #2 progression).

As a result, we’ve added another property that we’re measuring: the ratio between the distance between the two pieces of tape to the outer distance (inner/outer). We’ve seen that this value changes significantly between if the robot is at an angle or if it is straight (with driving back/forward not affecting the ratio). We’re thinking of using this property as an additional measure for driving up to the target.

A mentor suggested that from that ratio we calculate the angle of the robot relative to the wall (would appreciate help here) and then use that to overshoot to a point in front of the target (aligned with the tape on the floor) and turn back. Has anyone done something similar and could help us? Also, do you think it would be possible to establish continuous motion in this fashion?

Open to other recommendations for how to tackle this problem.



OpenCV actually provides a fancy method called cv2.solvePnP() which does almost all of this for you. Similarly to how our eyes have a sense of perspective, solvePnP() works by finding the translation and rotation transformation vectors that describe where your camera is with respect to the target. Using it requires a camera matrix and distortion coefficients that accurately describe the internal characteristics of your camera (see here for details on how to find these values).

Next, you need the real-world coordinates of the target you’re going to be looking at. For this year’s game, here are the rough values we calculated (measured in inches, going clockwise around each vision target and centered at (0.0, 0.0, 0.0)), which we call the model_points:

    # Left target
    (-5.938, 2.938, 0.0), # top left
    (-4.063, 2.375, 0.0), # top right
    (-5.438, -2.938, 0.0), # bottom left
    (-7.375, -2.500, 0.0), # bottom right

    # Right target
    (3.938, 2.375, 0.0), # top left
    (5.875, 2.875, 0.0), # top right
    (7.313, -2.500, 0.0), # bottom left
    (5.375, -2.938, 0.0), # bottom right

The next step in the process is finding the 8 corners of your vision target on the screen, and ordering the same way as they are listed in your real-world point array. We’ll call this list of pixel locations your image_points.

Finally, plug the values into the solvePnP function like so:

(ret, rvec, tvec) = cv2.solvePnP(model_points, image_points, camera_matrix, dist_coeffs)

From here, there’s a little more math to do in order to find the exact X and Y distances and angles your robot must travel in order to get to the goal, all of which is really well explained in this paper by Team 2877 (specifically section 9):

In addition, here are a few more articles that may be helpful when playing around with solvePnP():

Feel free to reply with any other questions!


Thank you very much for the prompt and thorough reply! Great to hear that there’s already a function doing it for me:)

I’ll try it out today and post the results.

Orian 4338


I posted about a way that my team (4028 - the Beak Squad) solves this problem in what we found is a more reliable, easy, and robust solution that what is described here. All your team needs is a gyro (like a NavX) and a few lines of code depending on the target. If @andrewda’s solution doesn’t work for you, I suggest taking a look at what I wrote.



I am in the process of calibrating the camera, but I’m getting very large variation between trials with a checkerboard and I’m not sure if it’s normal or not. Should I just average them and see what happens? All of the trials below successfully mapped a 6x7 grid onto a chess board.

Trial 1
mtx = [[ 2.98093132e+03 0.00000000e+00 8.72738482e+02]
[ 0.00000000e+00 3.71181690e+02 1.91935009e+02]
[ 0.00000000e+00 0.00000000e+00 1.00000000e+00]]
dist = [[ 4.96571016e+01 -2.24170390e+03 -1.48731455e-01 -1.23020256e+00

Trial 2
mtx = [[ 2.53131003e+03 0.00000000e+00 1.02048597e+02]
[ 0.00000000e+00 4.94880023e+03 2.10678130e+02]
[ 0.00000000e+00 0.00000000e+00 1.00000000e+00]]
dist = [[ -2.10926802e+00 1.06675378e+02 2.39050308e-01 -1.30129055e+00

Trial 3
mtx = [[ 2.43004720e+03 0.00000000e+00 1.05836907e+02]
[ 0.00000000e+00 5.38876878e+03 2.03908481e+02]
[ 0.00000000e+00 0.00000000e+00 1.00000000e+00]]
dist = [[ -5.31562965e+00 1.44479933e+02 2.17696793e-01 -1.17801808e+00

Trial 4
mtx = [[ 2.60218633e+03 0.00000000e+00 3.28271548e+02]
[ 0.00000000e+00 1.34411205e+03 1.98160682e+02]
[ 0.00000000e+00 0.00000000e+00 1.00000000e+00]]
dist = [[ -4.68758374e+00 7.54100486e+03 -6.34671155e-01 3.00089758e+00

Trial 5
mtx = [[ 6.06621074e+03 0.00000000e+00 5.98997193e+02]
[ 0.00000000e+00 1.24820920e+03 1.27236898e+02]
[ 0.00000000e+00 0.00000000e+00 1.00000000e+00]]
dist = [[ 2.25936293e+02 -1.14838647e+04 -1.83767392e+00 1.93130943e-01

Trial 6
mtx = [[ 5.73484660e+03 0.00000000e+00 6.36178922e+02]
[ 0.00000000e+00 1.21447218e+03 1.13256125e+02]
[ 0.00000000e+00 0.00000000e+00 1.00000000e+00]]
dist = [[ 1.46961128e+02 -1.29520134e+03 -2.20487075e+00 2.28520514e-01

Trial 7
mtx = [[ 1.83820663e+02 0.00000000e+00 2.61376086e+02]
[ 0.00000000e+00 1.11947107e+03 -4.69885139e+02]
[ 0.00000000e+00 0.00000000e+00 1.00000000e+00]]
dist = [[ 1.58918791e+01 1.29223182e+02 -9.25433587e-01 -1.43185781e-01

Trial 8
mtx = [[ 1.73129592e+02 0.00000000e+00 2.36276017e+02]
[ 0.00000000e+00 1.84806309e+03 -4.14808605e+02]
[ 0.00000000e+00 0.00000000e+00 1.00000000e+00]]
dist = [[ 1.02539922e+01 -6.36859885e+01 1.19270009e-01 6.30808790e-02

Trial 9
mtx = [[ 8.08482913e+02 0.00000000e+00 2.40618280e+02]
[ 0.00000000e+00 6.68216380e+03 -1.39850106e+02]
[ 0.00000000e+00 0.00000000e+00 1.00000000e+00]]
dist = [[ -1.63955733e+01 1.75488592e+03 -6.92927829e-02 -1.72192425e-01

Trial 10
mtx = [[ 1.43382694e+02 0.00000000e+00 2.77662633e+02]
[ 0.00000000e+00 2.96244305e+03 1.27420660e+02]
[ 0.00000000e+00 0.00000000e+00 1.00000000e+00]]
dist = [[-0.17374505 -0.05484612 -0.01907556 -0.01018011 0.01552375]]

Trial 11
mtx = [[ 1.52791770e+02 0.00000000e+00 2.74250555e+02]
[ 0.00000000e+00 3.92429795e+03 2.26694429e+02]
[ 0.00000000e+00 0.00000000e+00 1.00000000e+00]]
dist = [[-0.37285608 0.07102629 -0.02044474 -0.0036982 -0.00135812]]

Trial 12
mtx = [[ 1.55234778e+02 0.00000000e+00 2.64951489e+02]
[ 0.00000000e+00 2.20196588e+03 2.15214177e+02]
[ 0.00000000e+00 0.00000000e+00 1.00000000e+00]]
dist = [[-0.31961254 0.0273056 -0.03590567 0.00358836 0.00759577]]


How many images are you using in each trial? The more images you use, the more accurate your values will be. For reference, we used 20 images to calibrate each of our cameras. Also be sure that the checkerboards in the images you are using have a variety of angles, distances and locations. Also be sure that you use the same resolution to calibrate your camera as you are using in your code.

Here’s the script we use to calibrate. Point it to a directory of image files and it should take care of the rest, plus save a bunch of calibrated images so you can make sure it’s working correctly.

$ python /path/to/images/*.jpg


1…? I meant that each trial is just the processing of one image, is that not enough?

We are, thanks.

I tried to give it a variety of angles but then for some images a value in the matrix was 2.98 e+03 and then for others it was 6.06 e+03… Is that normal or is there a problem here?

I’ve been using a live feed from the camera for processing, should I capture a few images and then run the program instead?

Thanks again for your advice, greatly appreciate it:)


I think if you mount your camera at a certain height above the vision target, the triangle you can get from that will give you all the angles and distances required.


Can you give an idea of the range of angles, distances etc that you used? Are we talking move a couple inches this way or that? feet? Would you be willing to share the 20 images you used to calibrate your cameras?


Sure! We varied our pictures from about 0.25-1.5 meters away and tried to get close to 75 degrees in every direction, as well as quite a few from straight on. Attached is a zip of the 24 images we took on one of our Lifecam HD-3000s.

camera (1.4 MB)


The more images you use, the more accurate your resulting matrices are going to be. I’m not 100% sure on the math OpenCV uses in the background, but I believe it’s more complex than just averaging the values together, though I believe doing so would get you a fairly similar result given enough images.

Yeah, I’d expect that if you’re just using one image at a time. After supplying more images and running them together, you should get something more stable and more accurate.

That’s how we’ve done it in the past. It’s nice to have replicable calibrations and be able to see what’s going on/if any of the frames are messing it up.


When you calibrate the camera, you want to first save the images to disk and process them after. Run all the images through the routine at once, and output a processed image for each with the locations of the found vertices. It is not uncommon for the routine to find incorrect locations in the image, and that will screw up the calibration. Scan the output images by eye, throw out the bad ones, and run again with only the good one.

@andrewda’s routine has an output step, or you can use our routine here Probably pretty equivalent code.


Thanks to all of your tips, we’ve been able to calibrate the camera using 50 saved images. However, when we run our solvePnP algorithm, we’re experiencing a peculiar problem: the outputs vary significantly for minor changes in the inputs…

Here is the code used for solvePnP along with the printed results for two similar frames (apologize for it not being that organized):

Compute real-world distances/angles based on the rectangles

def compute_output_values(leftRect, rightRect, centerY):

# Calculate the left verticies
leftVerticies = cv2.boxPoints(leftRect);
# Calculate the right verticies
rightVerticies = cv2.boxPoints(rightRect);

# Combine the left and right into one verticies array
image_points = np.concatenate((leftVerticies, rightVerticies));
image_points[:,0] -= (visionConstants.width/2);
image_points[:,1] -= (centerY);
image_points[:,1] *= -1;
print (image_points);
print (visionConstants.model_points);

# Compute robot orientation
(ret, rvec, tvec) = cv2.solvePnP (visionConstants.model_points, image_points, visionConstants.mat, visionConstants.dist_coeffs);

# Compute the necessary output distance and angles
x = tvec[0][0]
y = tvec[1][0]
z = tvec[2][0]
# distance in the horizontal plane between camera and target
distance = math.sqrt(x**2 + z**2)
# horizontal angle between camera center line and target
angle1 = math.atan2(x, z)
rot, _ = cv2.Rodrigues(rvec)
rot_inv = rot.transpose()
pzero_world = np.matmul(rot_inv, -tvec)
angle2 = math.atan2(pzero_world[0][0], pzero_world[2][0])
print ("Distance: %f, Angle1: %f, Angle2: %f, X: %f, Y: %f, Z: %f, CenterY: %f" % (distance, angle1, angle2, x, y, z, centerY));
return distance, angle1, angle2

## Camera constants

dist_coeffs = np.matrix([-5.10E-02, 9.89E-04, 3.08E-03, -4.16E-03, -6.69E-06]);

mat = np.matrix([[6.88E+01, 0, 2.44E+02], [0, 9.78E+01, 1.85E+02], [0, 0, 1]]);

And printed results:

[[ -31.70584106  -35.67059326]
 [ -59.70584106  -28.67059326]
 [ -44.23526001   33.21173096]
 [ -16.23526001   26.21173096]
 [  82.90585327  -33.42352295]
 [  68.90585327   29.57649231]
 [  97.28231812   35.88238525]
 [ 111.28231812  -27.11761475]]
[[-5.37709  -3.199812  0.      ]
 [-6.69996  -2.699812  0.      ]
 [-5.32288   2.625     0.      ]
 [-4.        2.125     0.      ]
 [ 5.37709  -3.199812  0.      ]
 [ 4.        2.125     0.      ]
 [ 5.32288   2.625     0.      ]
 [ 6.69996  -2.699812  0.      ]]
Distance: 1.469959, Angle1: -1.423479, Angle2: 1.618986, X: -1.454037, Y: 0.494368, Z: 0.215768, CenterY: 280.152954

[[ -31.70584106  -35.75      ]
 [ -59.70584106  -28.75      ]
 [ -44.23526001   33.13232422]
 [ -16.23526001   26.13232422]
 [  84.11767578  -33.39706421]
 [  68.52941895   28.95588684]
 [  96.76470947   36.01473999]
 [ 112.35296631  -26.3381958 ]]
[[-5.37709  -3.199812  0.      ]
 [-6.69996  -2.699812  0.      ]
 [-5.32288   2.625     0.      ]
 [-4.        2.125     0.      ]
 [ 5.37709  -3.199812  0.      ]
 [ 4.        2.125     0.      ]
 [ 5.32288   2.625     0.      ]
 [ 6.69996  -2.699812  0.      ]]
Distance: 4.526347, Angle1: -0.658742, Angle2: 1.571939, X: -2.770679, Y: 0.855593, Z: 3.579267, CenterY: 280.073547

Any ideas? Thanks in advance

Edit: I noticed a non-FRC user of solvePnP experience a similar problem here but I’m not sure how they solved it (if they did).

Edit: We realized that our problem was that the rectangles were not exact rectangles but rather had a strong blur to them. We isolated the problem to be the fact that we tried to use IR lightning and it is possible that the camera can not produce the same resolution for IR. We decided to switch back to the traditional lightning, and, with a new camera, initial testing is showing that this problem is significantly decreased. I’ll update on how it goes tomorrow.