Kalman Filter Question

When using a Kalman Filter like for combining your odometry measurements from your wheels with your pose estimates from your vision measurements, does the filter take into account how long it has been since the last vision measurement?

In other words, if I were to tell my robot its position with an april tag at t=0, drive around, and then update its position with an april tag, would it trust that tag more if it saw it at t=10 as opposed to t=2?

I think that would make sense logically at least, as your robot has had more time to build up odometry error and therefore the vision measurement would be comparatively more accurate compared to your odometry if it was longer between seeing april tags. Is this how the Kalman Filters in WPILib work?

The short answer is yes. As time goes on without a measurement you trust measurements more and more.

I ended up writing a lot more than I originally though I would so I hope this helps. If you have any questions about something I wrote or something else feel free to ask. I haven’t done any college courses or something on the topic so I might not know everything but I will answer to the best of my ability.

The kalman filter has a model which it uses to calculate where it is given the inputs you gave to it (in this case the model is odometry). On top of that you can add measurements to correct errors created by the model.

The kalman filter has a number P (technically this is a matrix) which tells it how much to trust the model (odometry). If P is 0 that means you can trust the model completely and it’s position is completely correct. If P is small that means we can trust the model relatively well because the position it tells us reflects or real position pretty well. If P is large then that means we cannot trust the model a lot and because it is far away from our position. What is important to understand is that P is a representation of how far away our model is from our actual position.

Now the kalman filter can do 2 things

  1. Use our model to predict where we will be given certain inputs
  2. Correct our model using certain measurements
    We will try to understand how each of these is affected by P and how each of these affects P

Prediction
When we predict where our system is going to be we use our model to calculate where we are going to be in the future. In a world where we had a perfect model we would only need to do this because the model could calculate exactly where we are. Unfortunately we are not on a perfect world and so our model cannot calculate exactly where we will be. There are 2 contributing factors two this. First, the actual model for our system is very complex so we use a simplified one that can predict our system pretty well but not perfectly. Second there is process noise (disturbances we cannot model such as friction or air resistance) in our system and that causes our model to be worse as well. So in the end our model can predict where we will be in the future (in the case of code the next code cycle) but not with perfect accuracy.

Therefore each time we predict statistically the distance between where the model thinks we are and where we actually are grows. Now remember the relationship we said there is between P and this distance between the model and our real position. When we predicted, the distance grew, and therefore P must also grow.

In summary each time we predict our next position P grows and therefore we trust our model less.

Measurement
When we add a measurement we are telling our system were it should actually be. Effectively we are telling our system that it’s model may be incorrect and this new externally measured position may be more correct. Of course the measurement is also not necessarily completely correct and so we should not trust it entirely either. Therefore we need to find some way to combine the position the system thinks it is in and the external measurement based on how much we trust each of them.

So how much do we trust actually trust each of them? Well earlier we said that P represents how much we trust our model, and so we will use it to combine the measurements. We also need to know how much we trust the measurement, but this changes based on the type of measurement (Camera, Lidar, etc.) and the situation of the measurement (How fast we were moving when we took the frame, distance from the target, etc.). So to take this into account along with every measurement we add a number R (Technically this is also a matrix) which tells us how much we trust the measurement. Now you probably know this as standard deviation which you put in the code because standard deviation is a slightly simplified version of R (They still work very well for what we are doing).

So how do we combine these two measurements. There is some math behind this but the easiest way to think about this is as a weighted average. We say that

x_{corrected} = \frac{x_{model} * P + x_{measurement} * R}{P + R}

Now this isn’t exactly correct because it assumes that we trust P and R more if they are larger which is the opposite of what is true. Therefore it would be closer to the truth if it was 1 / P and 1 / R in the equations but the point is that you combine the two values using P and R as weights.

Now that we have combined the two we can plug the corrected X into the model and continue any other calculations (such as predicting and using this value in our code) using this new X. On top of this because we have now added a measurement this new X should statistically be closer to our real position and therefore P gets smaller.

In summary when we add a measurement we do 2 things

  1. We find a more correct position based off a combination of our model and measurement. We know how much to trust each of these based off of P and our standard deviations for the measurement.
  2. Because we know trust our position more as we have more data for it we decrease P and trust our model more from now on.

So in general we have 2 things we can do

  1. Predict where we our going to be which increases P and decreases how much we trust the model
  2. Add a measurement which changes our position based off of P and decreases P causing us to trust our model more.

So now to answer your initial question. We are periodically predicting where we will be to keep our model up to date. The more we predict P gets larger, we trust our model less and we trust measurements more. Therefore if we went 0.1 seconds since the last measurement P will be relatively small and therefore we won’t trust our measurement very much compared to our model. If we get 2 seconds since the last measurement P will be large and therefore we will trust our measurements more. if we have 10 seconds since the last measurement then P will be very large as we have predicted a lot and therefore we will trust our measurements almost completely.

7 Likes

I think this thread has what you want.

TLDR: State uncertainty is constant, pending a faster controller.

but vision uncertainty is easy to adjust, and everybody does.

I think that would definitely be the ideal implementation. This post however says that at least the way the Kalman filter is implemented now, it doesn’t build up uncertainty or P.

On another note, thank you for the amazing explanation of Kalman filters!

WPILib’s KalmanFilter class does, but the pose estimator classes don’t because the roboRIO is a potato. Also, a linear Kalman filter is the wrong kind of filter for the true probability distribution of this problem anyway, so we just went with what’s simple and Good Enough™ instead of chasing high accuracy the roboRIO can’t achieve.

Until the new robot controller is out, any disrecommendation against decreasing vision stds over time (as long you don’t see targets, and reset stds after you see targets)?

Don’t do that; the vision stddevs are time invariant. The proper way to model it is via the error covariance P increasing over time (Pₖ₊₁ = APₖAᵀ + Qₖ). The problem is applying laggy vision measurements at the proper place in a pose backlog and replaying the odometry measurements gets prohibitively expensive.

1 Like

I’m aware that’s not the proper way to do it, but not sure from your explanation what you assume will happen if the stddevs are scaled by time without a target? (as we already scale them based on distance, etc.).

That method will give you an answer regardless because math is math, but it violates assumptions the Kalman filter makes (namely, a linear measurement model with additive noise with a constant spectral density), so there’s no guarantee the answer will be good. Outside those assumptions, a nonlinear filter likely offers better performance.

The main reason to use a Kalman filter is mathematically rigorous optimal filtering, so if that’s not your goal, it’s probably the wrong tool for your use case. As for what the exact behavior is, you’d have to run some simulations.

What if a pose estimator just stored a single Covariance state that represents the present covariance, even though the vision measurements are likely corresponding with a state which is a few odometry updates older, circumventing the need for propagation, while still measuring uncertainty in 3 axes to some extent.
It wouldn’t be totally accurate, but I would imagine this would be “Good (better?) Enough” as well, no?

I’m having trouble understanding your suggestion because it doesn’t follow the standard terminology.

Is it a covariance or a state? Those are mutually exclusive terms in Kalman filters.

The covariance of what? There’s error covariance, process noise covariance, and measurement noise covariance.

If you correct an old state estimate with an old vision measurement, you still have to propagate that forward to the current state estimate somehow if you don’t want the robot acting on laggy state estimates. The WPILib pose estimators do this by finding a delta with respect to the old pose estimate, then applying that delta to the latest pose estimate (or at least they used to; it changes every year).

That doesn’t make sense to me. If you only have one covariance, the only way that can apply to 3 axes is if the uncertainty is always the same for some states (e.g., x-y-z translation or rotation, with the appropriate measurement model), and it’s unclear which states you mean.

Apologies, I’ll do my best to use proper/consistent terminology as I can definitely see why my post was confusing.

The base premise is that state (pose estimate) propagation is done via a simple transform rather than replaying odometry.
This is effectively the paradigm ROS uses in localization. There’s an Odometry “frame”, where odometry is tracked independent of sensors. So if odometry says the robot has moved 10 meters forward and 3 meters to the right at t=2 then the odom->base_link transform is (10, 5) (at t=2). Even when a sensor measurement correcting that comes in, we don’t modify this.

Then there’s a “map” or “world” frame which is where we make external measurements (apriltags with known poses). This transform is from map->base_frame. So at t=2 we measure that we’re at pose (11, 4.5) instead. We use this to determine our total odometry drift/error to be the difference, (1, -0.5). This can be thought of as the odom->map transform, completing the loop.

However, we might not receive this measurement until t=5.
The trick is that we assume the odometry drift (odom->map) has 0-mean noise. So we estimate the current t=5 map->base_link pose to be the t=5 odom->base_link pose added to the t=2 (most recent) map->odom “offset/error/drift” value.

So no replay of odometry required. However, I’ve made no mention of uncertainty yet. I believe in ROS, this is usually calculated at the pose estimator which is incorporating the odometry in real time, as opposed to our paradigm in FRC where limelight and photonvision are providing people global pose estimates without incorporating odometry on its end.
The problem there is that compounding the prediction uncertainty step and measurement step do need to be done in order to be accurate.

So lets say in the example I mentioned above we performed odometry at t={1, 2, 3, 4, 5}, we would then have state error covariance P at those times as well P_1, P_2, P_3, P_4, P_5. Where P_t = F_t P_{t-1} F_t^T + Q_t where F_t is the linearized process evolution matrix at t and Q_t is the process noise matrix based on some model at t.

Upon receiving the measurement for t=2, you now have a P_2' which is calculated using the Kalman gain K_t and incorporates the measurement noise R_t.
P_t' = (I - K_t) P_t (I - K_t)^T + K_t R_t K_t^T
K_t = P_t (P_t + R_t)^{-1}

I believe for fusing pose estimates from other sources, the observation matrix H is just identity so it’s omitted.

But now you have to do P_3' = F_2 P_2 F_2^T + Q_2 and perform those error covariance updates up to the present up to P_5' which is a lot of matrix operations at once which is expensive (this is my understanding for the rationale against this currently).

But I’m wondering if it would be “Good Enough,” to simply apply the measurement covariance update on P_5 and R_2. The Kalman gain would then be applied to the error found at t=2 to determine the offset to apply to the current state.
This is what I meant by only tracking the up-to-date estimate covariance instead of maintaining the full backlog of them. To me that seems like a very simple algorithm to implement for the basic pose estimator class which still expresses the growing of uncertainty as the robot moves without a sensor measurement.

It’s possible you even disagree with the initial premise of tracking pose errors using a transform like this, as I know the current wpilib pose estimator does not do it this way, but I’m curious on your thoughts on both parts.

@Jared_Russell I saw you also talked on the need to reflect growing odometry uncertainty over time, yet I saw in your team’s code you didn’t account for that. Have you experienced with time-based stddevs scaling? Are you joining Tyler’s concerns against it?

1 Like

Yea, that’s what KalmanFilterLatencyCompensator does, which was way too slow.

You mean the error covariance update? R isn’t updated by this process. Assuming that, the suggested process would underestimate the true value of P_5 since the replay from P_2 would normally cause growth. This means K would be underestimated and the measurement wouldn’t correct the state estimate as much: plot p/(p+5) from p = 0 to 100 - Wolfram|Alpha.

The transform graph thing ROS does sounds more complicated than what we do now. I’m of the opinion that if we’re not going to be mathematically rigorous, we shouldn’t bother overcomplicating the pose estimator, because that makes it harder to maintain. Every time we touch the current pose estimator implementation becomes a major operation that takes months to review and merge because (1) it’s complex, and (2) subtle bugs creep in that are hard to find and fix.

Yes, I meant calculating P_5' in terms of the prior prediction P_5 and the sensor covariance for that measurement R_2 (which you may decide to model as a function of your distance to apriltag, skew, whatever).

This growth would already be incorporated in the prior estimate of P_5.

In a naive simplification, if we imagine the prediction step grows your variance by 10% each step, and the measurement update shrinks it by 20%, then either way you have 1.1^5 * 0.8 in this example, we applied the same transformations on the covariance matrix, just out of order.
However the order of these operations likely do matter, my question is by how much.

By my memory, the pose estimator was replaying odometry updates in the past since the vision update to catch up to real time in a for loop, which to me felt very complicated from what is needed.
However it seems that code has changed to actually do exactly what I’m describing, which is to just apply the Kalman gained innovation twist to the latest odometry estimate.

In that case I think I agree that anything I’m suggesting would be adding unneeded complication from where it is.

I missed this the first time I read your post, I think this is the source of my confusion as it doesn’t seem it worked this way when I read the code last season. I was under the impression Odometry updates were being replayed without covariance propagation.

There’s no covariance involved in the current implementation. The covariance stuff was in the 2020 implementation, then removed later. Like I said earlier, we’ve had rewrites almost every year.

That simplification isn’t valid though, so you can’t draw conclusions from it. Firstly, the error covariance experiences quadratic growth, not exponential growth. Secondly, this doesn’t accurately describe the predict and correct steps of the Kalman filter. I’d recommend writing a small Python script to simulate the two scenarios.

2 Likes

Right, thats exactly what I said, “without”

Up until the PR I linked from 6 months ago, step 7 of add vision measurement replayed Odometry inputs in a loop despite not needing to do so to propagate an error covariance.