Self-Tracking for Panic: A Bash-ful Look at Some Data

In this post, I perform initial exploratory analysis on my panic recovery journal data using basic UNIX/bash commands.

UNIX? bash? You’re not serious, right?

Most of the data-centric Quantified Self talks I’ve seen focus on more complicated methods, including:

• linear regression, which identifies gradual trends;
• FFT, which identifies periodic effects;
• Pearson’s r, which measures correlation between datasets;
• t-test, which measures difference between datasets.

These are extremely powerful tools to have at your disposal. Better yet, many languages have community-contributed libraries that provide these tools out-of-the-box. For instance, Python’s scipy offers linregress for performing linear regression.

That said, these tools rely on mathematics that is opaque to many software developers. Even if you don’t need to know how they work to use them, you need some knowledge of what they do and where they are most appropriate. Statistical tests in particular often have strong preconditions for use:

Each of the two populations being compared should follow a normal distribution.

Even if you pick the right tool, there’s still fear associated with losing control. These tools are not hammers and screwdrivers but magic wands, and we are terrible magicians.

A Word On Exploratory Analysis

I mentioned that this post would demonstrate exploratory analysis. This is a mode of analysis where you explore your data, play around with it a bit, grab some low-hanging analytical fruit. You don’t necessarily need higher mathematics. Regular counts and averages will do. You’re not looking for ironclad proof, but rather for suggestions.

What does this data suggest?

This is an important question. Put this way, there is no “right” or “wrong” way to analyze your data. UNIX tools fit in nicely here, because you can piece them together and pretty quickly get some useful insights. Better yet, since you understand what you just did, you can explain it to someone else. Analysis becomes a demystified and shareable process.

Exploratory analysis is also a great entry point to deeper and more directed analysis. As you work with the data, you ask more complicated questions. Eventually these questions exceed the sophistication of your tools, so you look for better tools. You might not deeply understand the better tools, but at least you’ve worked with the data a bit. You can perform basic sanity checks when these better tools turn up results you don’t expect.

The Data

I took my paper recovery journal logs:

and manually converted them to handy CSV files:

Where did all those different treatments go? I didn’t end up using most of them. Making nine parallel habit changes is difficult, so I rapidly converged on a subset of four:

• relaxation breathing;
• daily exercise;
• dietary modifications; and
• vitamin supplements.

Why manual input? There wasn’t enough data to make OCR worthwhile:

Common Operations

These operations appear several times in the UNIX one-liners below, so let’s go over them quickly.

To lop off the CSV column name header:

To extract field $n$ from a CSV file:

To tabulate counts in descending order:

To sum a series of numbers:

To get the day before \$ds:

And Now, The Main Show

Let’s start by looking at my weekly practice record:

I tracked myself for 45 days. During that time, I followed all four treatments on 14 days. In order from most to least regular:

• vitamin supplements (43 days);
• relaxation breathing (36 days);
• daily exercise (32 days);
• dietary modifications (22 days).

I followed both the exercise and diet treatments for only 16 of 45 days! Right away, I have a question for further inquiry:

What was so hard about those two treatments?

Exercise

My most common exercise times were 4pm and 8pm. What was I doing at those times?

Aha! 4pm was my scheduled gym time at work, and 8pm was when I went for weekly pickup soccer. Both were regularly scheduled activities.

I rarely exercise in the morning, which might be okay: physical performance is higher in the afternoon.

It’s not surprising to see gym conditioning sets and soccer as my top activities, but walking and cycling aren’t far behind.

I most commonly exercised for 30-60 minutes, with infrequent longer blocks of activity. What was I doing in those longer blocks?

When else was I dancing?

Looking at my calendar, these blocks are easily identified:

Having fun is great for my health!

Diet

I nearly eliminated caffeine during this period! By the time I started keeping the log, I’d already started to reduce my consumption. On average, I had just over one sweet per day. More troubling is alcohol, with an average of 3.1 drinks/day. Let’s take a closer look at my drinking patterns.

My most common daily drinking amounts were 1, 2, and 4 drinks per day. It was very rare for me to go a day without drinking any alcohol. More alarmingly, binge drinking counts for over 40% of my alcohol consumption!

I drank most on Wednesdays and Saturdays; Mondays were also major drinking days, which is surprising! By contrast, I drank much less than average on Thursdays. When I narrow in on binge drinking, the pattern shifts slightly:

Wednesday is still an offender, but the weekends are clear culprits. 80% of my binge drinking days fell on weekends.

Among days where I had 5 or more drinks, I had an average of 2.7 drinks the next day.

Among days where I had fewer than 2 drinks, I had consumed an average of 3.6 drinks the previous day. This suggests a see-saw pattern: I would drink too much one day, back off the next, and repeat.

Panic

All of this skirts the real question:

What caused me to have panic attacks?

I had no data for 2012-02-28. Other than that, on days where I had reported panic attacks, my current- and previous-day consumption patterns were:

• alcohol: 3.7 drinks that day, 3.8 the previous day (overall average is 3.1);
• sweets: 1.5 sweets that day, 1.0 the previous day (overall average is 1.0);
• caffeine: 0.3 caffeinated beverages that day, 0.0 the previous day (overall average is 0.1).

This suggests that reducing alcohol and sweets consumption does help. The data is less clear on caffeine; as previously mentioned, I had mostly cut out caffeine by the time I started tracking.

Up Next

In the next post, I’ll run some of the statistical tests and transformations mentioned previously on this same data. I’ll also compare this dataset with another dataset gathered through qs-counters, a simple lightweight tracking utility I built to reduce friction in the recording process.