I've been asked by a friend recently to implement a Kalman filter for a side project. Among many others, I've read this website and still have no idea how to implement it. They all seem to be in the style of 'draw the rest of the owl'.
Can anyone point to anything that explains it in a manner a programmer would explain (eg array iteration instead of sigma notation) ?
As far as i understand, Kalman Filters seem to be a moving average of position, velocity, and (perhaps) acceleration, and use those 3 values to estimate the 'real value' as opposed to what the sensor is telling you.
As far as resources, I’m not sure. It really only takes a few lines of code if you have a linear algebra library like NumPy or similar. However you need to write down the math describing your dynamics first.
The Kalman filter is really just Bayes Theorem applied to a particular set of system dynamics. Really it’s two applications of Bayes rule: given my current state estimate, how do I update my estimate when I receive a measurement? Then, after receiving that measurement, how will the system dynamics impact my uncertainty before the next measurement? You iterate applying these two every time you receive a measurement.
Do you have a dynamics model? I.e., a set of equations that tell you how what ever you are trying to track will update in time? That’s the first thing you need and if you don’t have that, I could see you being quite confused about the implementation.
It might help you to either read about particle filters or just Bayes filters generally. The Kalman filter is a Bayes filter for a specific set of assumptions that leads to a lot of linear algebra. If you understand the Bayes filter, the Kalman filter will make more sense, and you can learn about the Bayes filter without all the matrices. The particle filter is another algorithm worth reading about. Like the Kalman filter, it is a Bayes filter but it doesn’t require the linear algebra.
Aside: I always get frustrated when I see people saying that you don’t need to know math to be a software developer. Maybe that’s true if you want to make web pages, but the more math you know, the better your ability to model problems and either solve them yourself or find solutions others have created. The scope of problems you can solve via programming becomes so much bigger the more math you know.
> As far as i understand, Kalman Filters seem to be a moving average of position, velocity, and (perhaps) acceleration, and use those 3 values to estimate the 'real value' as opposed to what the sensor is telling you.
Sort of. The actual variables (v, r, a in your case) vary depending on your use case. What matters is you make an estimate based on that state, then you take a measurement and there’s an iterative way of adjusting your next state based on those two. It’s actually pretty easy to code once you understand formulas.
> They all seem to be in the style of 'draw the rest of the owl'.
You need a linear discrete-time dynamical system model of your process. With the possible additions of external control action, and process noise. Your dynamical system has an internal state, described as a vector of values. Then you also need a linear model, of how the internal state translates to your observations, plus observation noise.
x – model state
F – linear state transition model: x(t+1) = F x(t)
Q – covariance matrix for process noise, so actually x(t+1) = F x(t) + N(0,Q)
H – linear observation model
R – covariance matrix for observation noise
u, B – external control vector u, and B for how u acts on the model state. Optional.
N(0,Q) is a normal distribution with 0 mean and covariance Q.
For example, in the case of a moving physical object, the model state is
[ x ]
[ v ]
where x is location and v is velocity. (Earlier I used x for the entire model state. Here I use x just for the location.) Then with radar or whatever, you usually only observe the location, and current velocity is unobservable. So your observation model is
[ 1 0 ]
[ 0 0 ]
so that
[ 1 0 ] [ x ] = [ x ]
[ 0 0 ] [ v ] [ 0 ].
This is the owl you need to have, before doing anything with a Kalman filter. You really need to come up with these things on your own, based on what kind of a process you have.
Then Kalman filter tells you, how to best estimate the real state vector at each time step, when both your state transition model, and your observation contains noise.
You are correct, many Kalman filter tutorials are a mess, where the explanation of the parts of the chosen process model, and parts of the Kalman filter, are blended together and the reader has a hard time telling them apart. And quite often they start with the simplest possible example, where the model state is 1-dimensional, i.e. just one number. And where the process model is a constant (no change): x(t+1) = x(t). So F = 1. This can lead even further confusion, what is the dynamical model and what are the parts of the Kalman filter, when the model is so small that it sort of disappears in the equations.
> x – model state F – linear state transition model: x(t+1) = F x(t) Q – covariance matrix for process noise, so actually x(t+1) = F x(t) + N(0,Q) H – linear observation model R – covariance matrix for observation noise u, B – external control vector u, and B for how u acts on the model state. Optional.
Better formatting for this part:
x – model state
F – linear state transition model: x(t+1) = F x(t)
Q – covariance matrix for process noise, so actually x(t+1) = F x(t) + N(0,Q)
H – linear observation model
R – covariance matrix for observation noise
u, B – external control vector u, and B for how u acts on the model state. Optional.
I can recommend a textbook called Probabilistic Robotics by Dieter Fox, Sebastian Thrun, and Wolfram Burgard. There's good coverage of Kalman and information filters in the first few chapters.
Hoping I can help you grok the problem a bit more. Typing on phone, so please excuse the lack of decent formatting.
A moving average is a good place to start thinking about it. Think about a case where you would use a moving average, and why. You are probably using it because you have some measurement (sensor) that you know is noisy, and so by averaging out the noise (taking the mean of a number of samples) you try to get a better estimate of the true value. If you know how noisy the sensor is, you can get an idea of how many samples you should average over to get a good measurement. You can also take the standard deviation and report both the mean and variance of your measurement over multiple samples if you wanted.
For purposes of this example moving forward -- we are going to estimate all of our sensed or inferred values as a gaussian, parameterized by mean and variance. It's a simple way to give a measurement with some uncertainty around it.
This can be a good start if you know nothing little about the system you are measuring, other then the sensor / measurement is noisy. However for many systems you may have multiple quantities you are interested in estimating, and you have some idea of how they relate to each other.
Take your example of a physical system with acceleration, velocity and position. Basic physics will tell you that if at time t, you are at position p, moving at velocity v, then at time t+dt, your position should be roughly p+(v*dt). Similarly, you can update your velocity estimate using your estimate of acceleration. If this is a system under your control, then you can also take things like a force you commanded to update your acceleration model. This is great, by using physics, without any measuring after time 0, we can just figure everything out forward in time forever, by simply using our process model! However, because your initial estimates had some uncertainty, what you will find is that, if you just keep doing this, the uncertainty grows larger and larger with each time step, and eventually become so large as to be useless.
Enter tha kalman filter. What the kalman filter does is tries to combine the information given by your sensors and combine it with your process model to give you a better estimate of the quantities you are interested in than you could get from either technique alone.
Every time step the filter will make an state estimate using the process model based on your previous state estimate, and then use your current sensor measurements to update that state estimate, both in terms of the mean and uncertainty. In the basic kalman filter you assume your process model is linear, all of your estimates are simple gaussians, and then decide how much you want to weigh your model vs. your sensors via a simple multiplying factor, "the kalman gain"
Sorry again I couldn't write this out as a program, and the likely horrible run-on sentences that come from typing on a phone, but I hope that a quick overview of what the technique is trying to do will help make it a little easier to fill in the owl!
Wow. Thanks. That was a wonderful explanation. I’ve been wondering if a Kalman filter could be used in a distributed system to estimate latencies between pairs of processes for the purpose of computing a near optimal dissemination tree.
Thanks a lot! That does explain some of the ideas behind it nicely. Sometimes when looking at equations it's easy to miss the forest for the trees so to speak, so this helps.
Can anyone point to anything that explains it in a manner a programmer would explain (eg array iteration instead of sigma notation) ?
As far as i understand, Kalman Filters seem to be a moving average of position, velocity, and (perhaps) acceleration, and use those 3 values to estimate the 'real value' as opposed to what the sensor is telling you.