OpenCV Pose Estimation For Vision


Did any team use 3D pose estimation (OpenCV’s solvePnP) to accurately determine the robot’s absolute position relative to the peg? For us, I noticed that there are known problems with solvePnP getting unstable when going even >2 feet away from the target. We tried to increase the accuracy of the pixel locations of the corners of the tape, but it didn’t help that much.

Our team uses solvePnP and our results only variy by about half an inch (we detect horizontal distance from the peg tip).
How are you detecting the corners of the target? How accurate is the tape detection (is your program detecting any noise)?
If the corners and detection are spot on, The only reason I can think of is that the camera is miscalibrated. How did you find the intrinsic matrix and/or distortion coefficients for whatever camera you’re using?

SolvePnP becomes more numerically stable with the more points you add. To start, try re calibrating your camera, then ditch solvePnP and use solvePnPRansac.

Yeah we’re using solvePnPRansac and the calibration’s average reprojection error is 1.12 (using OpenCV’s sample code for calibration). We’re detecting the corners with goodFeaturesToTrack (with shi-tomasi) and refining with cornerSubPix (for floating point accuracy). Overall, the corners look really stable and accurate and our main issue is with solvePnPRansac. The strangest thing I’ve noticed is that one of the pose axes inverts itself occasionally (likely due to some sort of eigenvector computation?) by around 170 ish degrees.

Do you guys use more than four points (say add the four mid points of the sides of the rectangle) or do you use some sort of feature extractor like ORB or FAST?

It’s hard to really diagnose the issue with limited knowledge. Do not worry about the axis flip as that is a consequence on the mathematics with co-planar object points.

When I used solvepnp many years ago, it worked fine. So I did some digging and found this, and long store short.

So, as unfortunate as this seems, it appears that solvepnp is bugged in someway or another. It seems that somehow, the order the the points matter and that can improve accuracy considerably. It seems that there is a consensus that cvPOSIT is the way to go with the current state of opencv.

In our experience, goodFeaturesToTrack gave us inconsistent points. We fixed this by implementing our own sketchy getCorners function.


  1. Sort all points by xpos+ypos
  • First element (greatest xpos+ypos) is our first point. The last element (least xpos+ypos) is the fourth point.
  1. Sort all points by xpos-ypos
  • First element is the third point, last element is the second point
  1. Return corners

With this our points are ordered counter-clockwise from the top-right. The object points in the PnP function are ordered the same way.
We used this approach only because we got inconsistent points with goodFeaturesToTrack, if you guys are sure you have consistent points maybe this isn’t the best method to experiment with.

Yeah I used to use a sketchy method similar to that where I found the points closest to each of the four corners, but it wasn’t robust enough for my liking. goodFeaturesToTrack actually works really well if you tune it appropriately and reorder the points based off of their polar coordinates (least to greatest theta). Our problem isn’t in extracting corners but in solvePnP. Thanks Loveless, we’ll try cvPOSIT as those links described.

we got all kinds of wrong answers from solvePnP, doing the math manually was correct, (correct = acceptable answers) but solvePnP would just be wrong, the camera matix was close to what the manufacture would have us believe. somehow a change of ± 10 degrees on the bot would correlate to a change of whatever solvePnP wanted. id tell you to do the relevant math by hand.

We used solvePNPRansac in 2016, and it almost worked fine. However, I couldn’t figure out how to use the rotation vectors returned properly. Instead, we computed multiple translation vectors, and did vector projections and stuff on them. It worked, once we figured out that the translation vector is oriented with a plane along the camera lens, and one axis normal to the lens. Probably not the best solution, but it could be worth checking the raw translation values.