The part of Kalman filtering that's always frustrated me is everyone loves to roll up their sleeves and show the fancy theory, but nobody ever seems interested in the practical aspects of how to implement it - in particular, how to come up with good covariance matrices in the first place, especially for a dynamical system where the covariance might change significantly over time. It's not easy [1] and tutorials seem to provide very little if any guidance on how to do it properly for something more complicated than a chicken crossing a road. You can't run the algorithm if you don't know what inputs to give it!
The heuristic I learned in a Bayesian filtering class is to use a diagonal matrix and inflate the prior predictive variance by a small amount each iteration (1-5%) so that the data can easily tamp it down if there’s any kind of signal. The problem then becomes model selection over those inflation factors.
In deep learning it's usually hard enough to get past the installation of the 2 billion dependencies that one needs to run anything before you realize you don't have enough vram anyhow.
That is why, even in systems where it seems straightforward how to choose the system variables, it is important to do a system identification analysis before committing to any form of Kalman processing. It provides performance and stability assurances, and is what separates the amateurs from the pros.
I've tried to frame it in terms of standard Propagation of Uncertainty rules. This ignores any correlated errors between inputs, but it makes it easier to understand.
Things clicked fairly fast for me when I tried this. It's just a weighted average between a prediction (which you can do easily using the error-propagation rules) and a new measurement.
For the past two years, I've been working on and off on a basic, open source scientific computing library in C (basic linear algebra on embedded devices, etc.), and hired a student this summer to work on adding some orientation algorithms.
It's still a work in progress, but we did dig up a couple other filter options to try out beyond the usual Extended Kalman (contributors very much welcome!) [1].
As another poster mentioned, finding the right parameters for the filters is essential, and involves a fair amount of testing and some trial and error to narrow them down to the right range for your application. There isn't a lot of info out there on that, either.
When you implement a Kalman filter, make sure to use the actual timesteps (i.e. delta time = now - last) instead of a theoretical fixed timestep (e.g. delta time = 0.1s or similar) or averaged timestep.
Makes all the difference
This is also true for post-processing data, like applying FFT filters. Oftentimes real world data does not have constant timestep sizes but most library functions for things like FFT assumed that they do. So I always resample to constant timestep size before doing anything filtering. This of course doesn't apply to real-time analysis.
I actually had to submit my thesis with - lets say not the best - results because I did not use the actual timesteps. My inertial navigation system for an autonomous rc model car would estimate paths that where kinda right, but always strongly distorted.
For weeks and months after the thesis defence it really bothered me and I regularly dived into the system to find the cause for it.
Some day I fiddled with the timesteps, rerun the evaluation and boom all the curves where perfectly aligned with the actual driving trajectory that we used to generate the data. I was amazed how good it now worked, and kinda sad that I could not present _these_ results for my thesis. Should have figured it out sooner. Too bad that all the literature, that I read so far, always only talked about fixed timesteps...
Does 'filter' (as it is used here) have a precise mathematical definition? The only other place where I've seen this word used is the 'bloom filter', which according to Wikipedia is not an algorithm like the Kalman filter but a data structure.
In general, filters do what their names suggest -- they filter out noise or unwanted components of signals. The simple Kalman filter filters out white noise to estimate parameters that describe a system; other filters use things like the Fourier or wavelet transform to, say, select certain frequencies and remove others.
IIRC, The kalman filter has a state-space formulation, in which it is clear that it is equivalent to an LTI filter (https://en.m.wikipedia.org/wiki/Linear_filter). However, I believe updating the statistics of the system in response to updated measurements breaks the "time invariant" part.
This is precisely the classic concept of an audio filter, which is, I believe, the inspiration for the term "filter" in other contexts. Bloom filters and selfie filters don't have much in common in a technical sense, but if you squint you can see the connection.
Filter has very established definition in Signal Processing, and covers things which can do a lot of things, essentially something that can manipulate how energy or power is distributed over a set of frequencies.
A common analogy is yo imagine it as an entity that can turn a block of stone to a sculpture, by removing material.
It is a filter in the sense that it can track the system state from one (or more) noisy measurements. The filter part thus reflects the input denoising.
[1] https://en.wikipedia.org/wiki/Kalman_filter#Estimation_of_th...