With computer vision for the last few years, we have been using solvePnP() from OpenCV to get the “pose” of the robot with respect to the vision targets. The “full” pose calculation gives the distance plus 2 different angles: the angle of the robot w.r.t. the robot-target line, and the angle of the target w.r.t. that line. Depending on the game, that 2nd angle (target angle) is very useful; for instance in 2019, that told us how to turn our robot to be perpendicular to where we were picking up/placing hatches.

However, the pose calculation, or at least solvePnP, can be very sensitive to the locations of the corners fed to the algorithm. From what I have seen, the target angle in particular can fluctuate by 10-20 degrees with what seems like a “random” change of a couple of pixels in the image (the distance seems to be less sensitive, and the robot-angle does not seem to care). The 2020 high goal target seems particularly challenging (IMHO) because the upper-output corners are particularly sharp and can appear rounded in the image (and even worse if you are going for a long shot).

What techniques/tricks have people been using to get better corners to feed to solvePnP, or do you have a better method(s) to get the “full” robot pose? Note that I am interested in getting all 3 pose values, not just distance or robot angle (target angle is not always needed, but partly I am curious and partly stubborn).

Note that I do understand why the pose is sensitive in the way it is. Also, I do have some tricks, and I will share those, but I want to see what advice others have first.

This year we had an extended Kalman filter which fused latency-compensated PnP pose estimates with VSLAM and encoder pose estimates. This allows us to do PnP slowly at a very high resolution and use encoders and VSLAM to fill in the gaps (and ignore erroneous measurements.) Wrappers that do the latency compensation and sensor fusion with encoders will hopefully be in WPILib next year.

I’m going to jump off of what you said. Kalman filters provide the mathematically least amount of variance in the estimate of linear system’s states (a robot is nonlinear, but an unscented Kalman filter provides similar performance). Basically, the more sensors you have, the more information you can get if you know how the robot behaves physically. You can take velocities of the robot’s wheels and predict where it’ll move next, then correct given new velocities, positions, gyro angles, accelerations, solvePnP data, whatever combination you want (it’s good to have some vision sensor though, our team was planning on using distance to target but then the season got suspended). Just gonna link this thread where I’ve already talked about KFs and provided links to learn about them: State Estimation using Kalman Filters. I’m not sure if there’s any easier way to fuse together solvePnP and other sensors that doesn’t require the rigor and understanding of a KF though, I’d be curious to find out

I appreciate the responses. I know a bit about Kalman filters, but we have not used them in our code (yet). However, I was kinding thinking about techniques to improve the vision-only results. After all, feeding poor results into a Kalman filter does not really help.

So, I said I had some insight, so I guess I should post that.

My first insight is: approxPolyDp() is not good. It is quick, but it makes plenty of obvious, stupid mistakes, in terms of finding a reasonable polygon around a contour. I believe the fundamental problem is that the output of approxPolyDp() is explicitly a subset of the points on the input contour. So, for example, if you feed it a square with slightly rounded/truncated corners, you will not get back a square with the sides along the obvious edges. Typically you will get 1 or 2 sides which cut off some chunk of curve. This is obviously bad as inputs into solvePnP, since (at least) one of the corners is wrong.

To improve on this, I wrote a routine to use houghLines(). It is a bit messy in that you need to draw the contour onto a blank image and then process that, plus picking the right lines from the candidates is tricky. However, the results look decent and are more reliable than using approxPolyDp(). Unfortunately, I don’t have quantitative data; I have a reasonable number of test images, but I don’t have true distance/angle numbers to go along. The number of times when the target-angle flips by ~10-20 degrees is definitely lower, but not actually 0. (Our code is on Github, and I am happy to send a link.)

Finally, I assume it is generally done, but this year we ran our targeting camera at 848x480, which 2x what we “normally” do. We definitely needed the higher resolution for the long, trench shot.

Anyone have any better tricks? Better algorithms? Do people feel it is important to run the camera at even higher res?

The best thing we did was run at very high resolutions (1080p.) Because bandwidth and processing limitations make these video modes slow on the picam we needed the Kalman filter to fill in the gaps and we also needed latency compensation. You don’t necessarily need a Kalman filter if you want to fill in the gaps and latency compensate in a less sophisticated manner.

To add to whats being said here, high resolution seemed to be the key to success this year, at least for our team. We ran our vision pipeline (with slovepnp) on a raspberry pi this year at 1080p and had angles of about ± 0.5 deg from behind the control panel. It came at the cost of frame rate (our solution ran at about 7 fps) but it was well worth it since we only needed to take a couple of samples.
Code: https://github.com/RobotCasserole1736/RobotCasserole2020/blob/master/piVisionProc/casseroleVision.py

If you don’t need your full pose from SolvePNP, estimating goal position using target elevation seems to be a lot more robust. We ran it at ~800x450 on a LifeCam with our “targeting region” set to top middle in Chameleon and didn’t get any of the jitter that solvePNP() gave us. Higher resolution 100% makes solvePNP and approx poly dp better though.