I’ve been trying to implement a filter to fuse odometry with vision for our swerve. I understand that SwerveDrivePoseEstimator exists, but I am opting for something more custom as a fun project and so that I have more flexibility. I am unsure, however, what the prediction (process model) should look like. A mentor told me that it should be the odometry, but isn’t that a measurement? Let’s say we use the odometry as the process model. Then if we were to implement a UKF, all sigma points would be offset by the same odometry twist - and the weighted average would be the same as the previous pose + the odometry twist. Similarly, the limelight measurements are already in state space, so the measurement function is also linear. Isn’t this just a linear model? Couldn’t we just implement a linear Kalman Filter with some control input for the odometry.

From what I can see, Mechanical Advantage uses some version of a linear kalman filter, but Citrus Circuits uses an Extended kalman filter.

The more accurate way to model this is with factor graphs, but it’s more computationally intensive and requires a coprocessor: GitHub - mcm001/gtsam-playground. This program consumes AprilTag corners and a camera model, and produces robot poses. The main benefits are:

it more accurately represents the true probability distribution via tag corner pixel noise propagating through a camera model, so you no longer have to manually fudge the measurement covariance matrices based on distance

because it finds the maximum likelihood trajectory over time (kinda like structure from motion), tag pose ambiguity is avoided

A simpler/cheaper way to do this would probably be to use the same odometry, tag corners, and camera model in a UKF. The odometry is the model, the camera corners are the measurements, and the camera model is the nonlinear measurement model h(x, u). The main problem is pose ambiguity comes back and has to be handled, and a UKF isn’t computationally cheap either (tho tbf, there’s a lot of object allocation optimization we could do, short of just wrapping C++'s whole UKF class in a JNI).

I’ve only experimented with the out of the box Limelight for localization. There the measurement function would just return the predicted pose, right?

For using OpenCV libraries and analyzing tag corners, since tag ambiguity is different depending on orientation, would the measurement noise matrix be different every time (how do you account for pose ambiguity)?

Also, if you detect collisions, would you just increase the elements of the covariance matrix of the process model

On a side note , doesn’t pose ambiguity mean that the tag can be interpreted in two different orientations leading to 2 poses? In that case, can’t you eliminate that by choosing the pose which is closest to your current pose?

Sorry for all the questions - I have nobody else to ask

Edit: Is there any noticeable difference in performance between factor graphs and a UKF?

That’s how the WPILib pose estimator works at least. The problem with that is the true uncertainty isn’t gaussian on pose; it’s gaussian on tag corner pixels. The measurement model I suggested in my previous post would include pose in the state x and return corner pixel locations.

You basically just have to be aggressive about rejecting pose measurements that don’t make physical sense based on where you just were.

Or throw out the pose estimate entirely depending on how bad the drivetrain slipped, yea.

There’s several ways to resolve it, and none are perfect. PhotonLib offers a few different strategies if you want inspiration.

Like I said in my previous post, a UKF is cheaper than factor graphs, but still expensive on a roboRIO.

If I had to use a coprocessor - wouldn’t the added latency make the pose estimate always lag, so that we have an estimate of where the robot was, say 80 ms ago?

Although I’m pretty sure if I’m using Krakens or Talons, using a CANivore the encoder measurements can go directly to a coprocessor without having to go through the Rio……

Thanks a lot for the help! Your state space control book rocks by the way.

If you sync your coprocessors clock with the rio’s fpga, you can call AddVisionMeasurement() with the synced timestamp and it will apply the pose correctly, even if it is “back in time”.

I understand that. But I’m saying that it takes time for a measurement to go from the Rio to the coprocessor and then the result to go back to the Rio through NT. So it “lags behind”.

Only if you’re using SolvePnP or similar methods. If you’re using a more pinhole-type model based just on the center of the tag, the error in pixel space is easy to project into relative yaw and azimuth

As long as you’re tracking odometry over time (which WPILib’s odometry does), it doesn’t actually matter that your pose estimates are lagging behind.

The big idea here is that you’re not trying to estimate the current pose of the robot, but instead you’re estimating the error in your odometry from the true pose. This error accumulates over time, but doesn’t have very large jumps, so an 80ms old estimate over your odometry error can be added to your up-to-date raw odometry pose to determine a real-time best estimate of your current position.

Does the limelight provide an accurate uncertainty covariance matrix on the poses it generates? Because fundamentally when estimating a pose from an apriltag, the uncertainty in each axis is heavily dependent on your position, and they’re correlated with each other, which is why you need all 6 values of the covariance matrix, not just the independent variance of each state variable.

If limelight provides this, then yeah you could try a standard linear kalman filter as you describe. You’re correct in that odometry is a linear process.

However, the key thing you need here is that the process noise is NOT constant, and once again, there are large correlations between each variable. You’ll need to calculate a Q matrix as a function of the odometry at each update step.

You’ll want to model the fact that a slight error in your heading when driving over a long distance results in a large error in your position. You also need to model that this noise scales more with distance traveled, not time. So uncertainty shouldn’t grow at all while at rest, and it should grow more quickly when traveling faster.

Fair, though at that point, you don’t need the pose estimator class, cuz you aren’t using the pose. I was mainly trying to point out that the model in the pose estimator’s Kalman filter is very wrong.

From my understanding, the locations of the of the 3d corners of the tag when projected onto the camera space are the measurements. In that case, for a UKF, how would the measurement function to go from state to measurement work? I know that you said that it should use the camera model, but I was hoping you could elaborate a little bit. Aren’t there are infinite number of 4 tag-corner locations that could correspond to a pose? What would the residual even look like?