Background Subtraction with Dirichlet Processes
The European conference on computer vision 2012 is coming up, and I have a paper, so, without further ado:

Background Subtraction with Dirichlet Processes, by Tom SF Haines and Tao Xiang, ECCV 2012.

There is also supplemental material, and you can now find the code for it in the video module of my code repository.

If you just want to use my background subtraction code then there is the file in the video module that will take an avi file on the command line and kick out a set of masks as .png files, one for each frame. It also supports a whole bunch of options - run it without a video file to find out more. (Note that it has only been tested under Linux - if you try it elsewhere and it doesn't work please get in contact with me!)

Edit: You can also download the poster and the 30 second video they have looping.

2014 PAMI Paper: There is now a journal version, which you can download from here (Also: supplementary material). It has a few tweaks and compares to the 2012 data set. I have been delaying putting it online because I wanted to submit for the 2014 version of that data set at the same time, but have basically given up - I just don't have time to go back and add support for PTZ cameras (I know how, but its complicated...), meaning I get zero on that category:-(

This algorithm was developed in 2010, over Christmas whilst I was ill, as a learning exercise for Dirichlet processes, and because my current background subtraction algorithm sucked - several papers I have already published have in fact used this algorithm. Regardless, for a long time I just used it, but then a synthetic test of background subtraction algorithms was published, and I wondered how good it actually was. It beat all the competitors without tuning the parameters on the very first run. Some tuning later the gap had become rather large, at which point it seemed like it might be time to publish...

The algorithm is in some ways not that special - its the same basic idea of a per pixel model followed by a regularisation step as used by so many other approaches. The difference is in the use of a Dirichlet process mixture model at each pixel, which happens to work extremely well. This is in part because its great at learning exactly how much noise exists at a location in the video stream, and dealing with multi modal models, so it implicitly selects optimal thresholds between background/foreground. Additionally a learning rate in the normal sense is not needed, as the rigorous way it handles new components allows it to forget an old model gracefully. There is also a certain amount of engineering going on - the regularisation step whilst not new is basically the best available; same for the lighting correction, which certainly helps matters.