Trajectory from camera

Is it possible to create a trajectory in autonomous from a 3d pose estimation from a camera running solvepnp? If so is it viable, and is there anything we should know before trying this?


It’s possible. We attempted it in 2019.

The basic cycle was:

  1. Driver gets robot about 15 feet away from target, roughly pointed at it
  2. Driver hits “align” button
  3. solvePNP results transferred from camera/coprocessor to RIO
  4. solvePNP results used to generate a path which drives the robot from its current position to a fixed position/orientation relative to the solvePNP result
  5. Executes the path after generation.

The biggest thing we had was tuning our drivetrain path follower to work well in all cases. Our robot required +/- 1 inch precision to work well, and we could only get the combination of camera/drivetrain within 2 or 3 inches.

We also didn’t spend a ton of time on it, so I’m fairly certain there’s a bug or two in our coordinate transforms. Using the newly-built-in wpilib coordinate transform functions would be advisable.


Ok thank you. Did you find that the provided solvePNP results were accurate enough for the trajectory, or was it a major contributing factor in the lack of precision?

With high resolution, low framerate, and sub-pixel feature identification, from about 15 feet away, we were seeing ~1 inch accuracy in solvePNP itself. The bigger error we saw came from flex in the camera mount, and pneumatic rubber wheels.


Alright thank you! Do you remember what resolution you were using?

the max we could push the camera to.

In 2019 - I believe 1280 x 960.

2020 we ran a similar pipeline at 1920x1080 - in this case, the high resolution was mostly to get accurate angles from 3/4 of the way down the field.

That being said, it really depends on your hardware & SW assumptions. For these applications, I believe the answer is to use higher resolution - slower processing, but higher spatial accuracy in the results.

This isn’t the only method of course, a faster pipline could be used to drive a more continuous-update algorithm (as supposed to ours, which was very “one-shot” in its nature).


We did this last year to align to both the hatch pickup and the hatch placement stations. We actually ran at 424x240! Generally, the driver only engaged that code from 4-6ft away, though.

We probably should have run at a higher frame size because we were getting occasional problems with the angle of the target w.r.t. the robot-target line. However, we solved that well enough at the existing frame size.

We have an H-Drive that year (probably “never” again ;-), so the drive-to-target algorithm was a little tricky.

One thing to think about, in terms of resolution: do you really need it? You need to think about this for your own situation, but think about this:

If you are 15ft away, and the idea is to drive to a position, you probably only need maybe 1ft or 10deg accuracy. As you approach the target, you need more accuracy. However, the target in the image is also getting bigger, more pixels, so the accuracy of solvePnP() will be also be getting better. Alternately, it is usually pretty easy for the driver to get the robot near the target (like we did in 2019), so maybe you only need to be accurate when the robot is 4-6ft from the target.

Shooting is a bit different, in that you want an accurate distance/angle from far away (if you are going for a long shot). However, solvePnP is very accurate for the shooting angle and usually decent for distance (the 3rd angle of the target is the hard one). Also, there are other methods to get distance and angle which don’t use solvePnP. Finally, if you are really 10ft away, the difference between 9 and 11ft maybe is not important (unless you are really going for the inner goal).

For next year we are planning on using two separate cameras, one for normal vision and one for solvepnp. Do you think that would work or is it a bad idea?

What is your reasoning? I assume it has something to do with driver can not needing the resolution of the vision cam?

We want to have fast data for aiming but we also want to use solvepnp for pose-estimation.

That statement, to me, means there is some confusion of what you mean with:

Maybe others disagree, but to me “normal vision” is streaming. Your 2nd statement is suggesting that “normal vision” means finding the target and providing some distance/angle to the Rio.

Anyway, yes, it can work well. My team has run 2 cameras for 4-5 years. We have used a coprocessor (Odroid-XU4) for the last 3 years. We use the 2 cameras to have different views around the robot, and at least one of those cameras provides targeting info (sometime distance/angle, and sometimes full 3-quantity “pose”). You can also switch a camera between different mode; we allow the driver to switch certain cameras between image streaming and target-finding (with the exposure set really low).

Yes, sorry for the bad phrasing/improper terminology but yes, one for pose and one for targeting. Thank you for the info!