In this post, I discuss crosscorrelation. Although commonly used in signal processing, crosscorrelation can be useful in a Quantified Self context. I’ll present a bit of the mathematics behind crosscorrelation, demonstrate a quick example, and briefly explain where you might use this in analyzing your personal data.
The Inspiration
I was going through my Google Reader queue this morning and came across this talk by Jeff Zira, a product manager at Lark Technologies. The talk asks a simple question:
Do Jeff and his fiancée influence each other’s sleep patterns?
He presents raw timeseries sleep data collected using larklife, then attempts to answer this question in a couple of different ways. He first displays a timeline visualization of peak overnight activity:
Since his peaks often occur slightly after her peaks, he uses this as evidence that she’s waking him up. He also shows the difference signal between their sleep patterns, but finds this less than conclusive:
After watching this talk, I immediately thought:
Is there a more precise way to answer this question?
The Mathematics
Note that term difference signal above. Any timeseries dataset is a signal, which means the powerful tools of signal processing can be applied!
Let the sleep patterns of Jeff and his fiancée be the signals $ S(\tau) $ and $ T(\tau) $ respectively. Let $ f(S(\tau), T(\tau)) $ be the similarity between those signals. Ignoring (for now) the fact that $ f $ remains undefined, I’m looking for the time shift $ t $ that maximizes
(As a side note, the difference signal is a new signal $ R(\tau) = S(\tau)  T(\tau) $.)
First, however, I need a reasonable similarity function $ f $. The answer lies in crosscorrelation:
In signal processing, crosscorrelation is a measure of similarity of two waveforms as a function of a timelag applied to one of them.
Perfect! The core of crosscorrelation is an integral that looks suspiciously like convolution, except that we have a term $ T(\tau + t) $ instead of $ T(\tau  t) $:
The desired $ t $ is the global maximum of this crosscorrelation function.
Given two discrete periodic signals S1
, S2
of equal length, this
crosscorrelation integral can easily be computed:
1 2 3 4 5 6 7 8 9 10 11 

It can be hard to visualize what this is doing, though, so I’ve provided a quick demo below.
An Interactive Example
If you’re viewing this on an RSS reader, check out the example on my blog.
You can see the code for this demo here.
Use the select boxes to change the red and blue functions. Click and drag on the chart at top to see how sliding the blue function affects the crosscorrelation. Try different combinations of functions and see where the crosscorrelation is maximized!
Back To The Original Motivation
Given the two sleep signals $ S, T $ above, crosscorrelation makes it possible to answer these questions:
 Who wakes up first? By how long?
 Accounting for the time shift in awakening, how closely do the sleep patterns match?
This gives a more rigorous sense of whether the peaks in nighttime activity actually do coincide. It also identifies the person who wakes up first and how much earlier they wake up.
While simply looking at the data can be very effective, rigorous analysis has definite value if you plan to carry out further experiments. Armed with crosscorrelation data, you can answer questions like
Okay, I switched to a separatelycoiled mattress. How well does that prevent
us from waking each other up?
In general, signal processing techniques can be highly useful in examining timeseries data.
Up Next
This was a slight diversion from my plan to talk about upcoming experiments, which I’ll return to in my next few posts. If you just can’t wait, here’s a quick summary:
 Persistent location tracking: by constantly tracking my location, I’ll have an additional dataset to correlate against.
 Diet: by taking meal photos, tagging foods, and measuring stress levels after meals, I’ll get a better idea of how different foods affect me.
 Finances: by tracking where Valkyrie and I spend our money, we’ll hopefully be able to better control our discretionary spending.
 Loss Aversion: by experimenting with tracking methods, I’ll see if this is something that can be meaningfully tracked over time.