Quantified Savagery

Where Personal Data Runs Wild

Financial Tracking: Mint Bubbles

In this post I present Mint Bubbles, a force-directed bubble chart visualization of exported Mint data. I explain how to use force-directed layouts to produce awesome interactive visualizations with d3, and also provide details on some of the specific tricks used to create Mint Bubbles.

Getting Your Data

Exporting your data from Mint is easy. Log into Mint and go to the Transactions tab:

Scroll to the bottom pagination section. In barely-legible super-tiny type at bottom right, there’s a link to export all your transactions:

Clicking that link will download a file called transactions.csv:

Mint Bubbles

If you’re viewing this on an RSS reader, check out the example on my blog. You will need a browser that supports the HTML5 File API.

You can see the code for this demo here.

To see a visualization of your data, drag the transactions.csv file from Mint onto the drag your data here area below. You can also use my data from the last three months or so.

drop your data here

Behind The Bubbles

Inspiration

This visualization was inspired by the NYT 2013 Budget Proposal Graphic, which uses d3.js to bring Obama’s 2013 budget proposal to life as an interactive bubble chart.

I’d just started using Mint for financial tracking, and this seemed like an awesome way to visualize my personal spending patterns. To help figure out the mechanics of the NYT visualization, I consulted this article by Jim Vallandingham. He explains in detail how to create similar visualizations using d3’s force-directed layouts, which model your data as a set of particles moving about in space.

Importing Data

Unlike my previous visualizations, I wanted this visualization to allow you to play with your data. Enter the HTML5 File API, which allows access to files via JavaScript. First, I set up the drag-and-drop listeners on div#drop_zone:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
/*
 * Octopress bundles ender.js, which provides $() for DOM access; mootools
 * tries to play nice, so it won't install its $() over that. I'm using
 * document.id() instead.
 */
var dropZone = document.id('drop_zone');
function trapEvent(evt) {
  evt.stopPropagation();
  evt.preventDefault();
}
dropZone.addEventListener('dragenter', trapEvent, false);
dropZone.addEventListener('dragexit', trapEvent, false);
dropZone.addEventListener('dragover', function(evt) {
  trapEvent(evt);
  // This makes a copy icon appear during the drag operation.
  evt.dataTransfer.dropEffect = 'copy';
}, false);
dropZone.addEventListener('drop', handleFileSelect, false);

dragenter, dragexit, and dragover are analogous to mouseenter, mouseexit, and mouseover. For those events, it suffices to call trapEvent(), which prevents the browser’s default action from happening. For instance, Chrome on Mac OS will just download the transactions.csv file if you drag it into a browser tab, which is not what I want here.

drop is the interesting event:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
function handleFileSelect(evt) {
  trapEvent(evt);
  var f = evt.dataTransfer.files[0];
  // NOTE: you might want to filter out large or invalid files here.
  var reader = new FileReader();
  reader.onloadstart = function(e) {
    if (e.lengthComputable) {
      document.id('progress').removeClass('hidden');
      document.id('progress_bar').set('value', 0);
      document.id('progress_bar').set('max', e.total);
    }
  };
  reader.onprogress = function(e) {
    if (e.lengthComputable) {
      document.id('progress_bar').value = e.loaded;
    }
  };
  reader.onload = function(e) {
    document.id('caption').removeClass('hidden').addClass('chart-active');
    document.id('progress').addClass('hidden');
    document.id('drop_zone').addClass('hidden');
    document.id('chart').addClass('chart-active');
    buildChart(d3.csv.parse(e.target.result));
  };
  reader.readAsText(f);
}

This uses FileReader.readAsText() to read in the transactions.csv file, with d3.csv.parse() for turning that CSV file into a sequence of JavaScript objects representing the transactions. This parsing is triggered onload, which fires once file I/O has completed.

onloadstart and onprogress are used to monitor file I/O progress via the HTML5 progress element document.id('progress_bar'). Since transactions.csv files are typically small, and since the “uploading” is actually a client-local copy into browser memory, you’ll probably never see that progress bar.

Grouping Transactions

I group the transactions by category:

1
2
3
4
5
6
7
8
9
10
11
12
var cs = {};
data.each(function(tx) {
  var c = tx['Category'];
  if (!(c in cs)) {
    cs[c] = {
      amount: 0,
      txs: []
    };
  }
  cs[c].amount += +(tx['Amount']);
  cs[c].txs.push(tx);
});

amount stores the total amount; note the use of +(tx['Amount']) to convert CSV string values into numbers. txs is used for the transaction list.

I then convert these into nodes to be used by d3.layout.force():

1
2
3
4
5
6
7
8
9
var nodes = [];
for (var c in cs) {
  nodes.push({
    R: Math.max(2, Math.sqrt(cs[c].amount)),
    category: c,
    amount: cs[c].amount,
    txs: cs[c].txs
  });
}

Defining The Layout

Before building the visualization itself, I define a color gradient based on bubble radius, picking the colors using the excellent Color Scheme Designer:

1
2
3
4
5
6
var Rs = nodes.map(function(d) { return d.R; });
var minR = d3.min(Rs),
    maxR = d3.max(Rs);
var fill = d3.scale.linear()
  .domain([minR, maxR])
  .range(['#7EFF77', '#067500']);

Now on to the visualization. First, I need to create the SVG element:

1
2
3
4
var w = 960, h = 480;
var vis = d3.select('#chart').append('svg:svg')
  .attr('width', w)
  .attr('height', h);

Next, I define the behavior and styling of the bubbles:

1
2
3
4
5
6
7
8
9
10
var node = vis.selectAll('circle.node')
  .data(nodes)
  .enter().append('svg:circle')
  .attr('class', 'node')
  .attr('cx', function(d) { return d.x; })
  .attr('cy', function(d) { return d.y; })
  .attr('r', function(d) { return d.R; })
  .style('fill', function(d) { return fill(d.R); })
  .style('stroke', function(d) { return d3.rgb(fill(d.R)).darker(1); })
  .style('stroke-width', 1.5);

fill(d.R) uses the color gradient fill to make smaller bubbles lighter and larger bubbles darker.

As for the force-directed layout, I start with some basic properties:

1
2
3
4
5
6
var force = d3.layout.force()
  .nodes(nodes)
  .links([])          // no edges between bubbles!
  .size([w, h])
  .gravity(0.05)      // controls speed at which bubbles seek the center
  .friction(0.95);    // slows down motion

Tick Handler

force.tick(): Runs the force layout simulation one step.

Force-directed layouts model your data as a set of particles in space. Those particles are subject to various forces:

  • Gravity: in d3, this is actually an attractive force pulling particles towards the center of the visualization.
  • Friction: this slows down movement.
  • Tension: if nodes are connected via links (edges), they will resist being moved apart.
  • Charge: similar to electric charge, same-signed charges repel and opposite-signed charges attract.

A layout can describe some or all of these forces. Resolving the forces is a simple iterative process:

1
2
3
4
5
6
7
8
9
while (true) {
  for (P in particles) {
    F = [0, 0];
    for (f in forcesActingOn(P)) {
      F[0] += f[0]; F[1] += f[1];
    }
    applyForceTo(P, F);
  }
}

In addition to the above forces, visualizations using d3.layout.force() can define their own forces via the ontick handler. I use this to apply two effects:

  • Size Sorting: similar to granular convection, larger bubbles will rise while smaller bubbles sink.
  • Collision Detection: I prevent bubbles from intersecting, since that makes it easier to select them.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
var floatPoint = d3.scale.linear()
  .domain([minR, maxR])
  .range([h * 0.65, h * 0.35]);

force.on('tick', function(e) {
  // vertical size sorting
  nodes.each(function(d) {
    var dy = floatPoint(d.R) - d.y;
    d.y += 0.25 * dy * e.alpha;
  });

  // collision detection
  var q = d3.geom.quadtree(nodes);
  nodes.each(function(d1) {
    q.visit(function(quad, x1, y1, x2, y2) {
      var d2 = quad.point;
      if (d2 && (d2 !== d1)) {
        var x = d1.x - d2.x,
            y = d1.y - d2.y,
            L = Math.sqrt(x * x + y * y),
            R = d1.R + d2.R;
        if (L < R) {
          L = (L - R) / L * 0.5;
          var Lx = L * x,
              Ly = L * y;
          d1.x -= Lx; d1.y -= Ly;
          d2.x += Lx; d2.y += Ly;
        }
      }
      // This short-circuits visit() for quadtree nodes that can't collide with
      // d1, resulting in O(n log n) collision detection.
      return
        x1 > (d1.x + d1.R) ||
        x2 < (d1.x - d1.R) ||
        y1 > (d1.y + d1.R) ||
        y2 < (d1.y - d1.R);
    });
  });
  node
    .attr('cx', function(d) { return d.x; })
    .attr('cy', function(d) { return d.y; });
});

Alpha and Size Sorting

What’s e.alpha? This is described cryptically in the d3.js documentation:

Internally, the layout uses a cooling parameter alpha which controls the layout temperature: as the physical simulation converges on a stable layout, the temperature drops, causing nodes to move more slowly.

A look at the code for d3.layout.force() provides some insight into what’s happening here:

1
2
3
4
5
6
7
8
force.tick = function() {
  // simulated annealing, basically
  if ((alpha *= .99) < .005) {
    event.end({type: "end", alpha: alpha = 0});
    return true;
  }
  // ...
}

Let’s look at the size sorting code again:

1
2
3
4
nodes.each(function(d) {
  var dy = floatPoint(d.R) - d.y;
  d.y += 0.25 * dy * e.alpha;
});

floatPoint(d.R) computes a “desired height” for the node d. The d.y adjustment moves d towards that height, using e.alpha to slow down the sorting adjustment as the layout “cools” into its final state.

Collision Detection

The collision detection code is cribbed from this page, which is part of a talk given by Mike Bostock on d3.

Up Next

I’m currently working on a post for the main Quantified Self blog, in which I’m planning to feature another cool visualization for personal data. Aside from that, I’m hoping to use an upcoming post to dissect my Mint data in more detail. Keep posted!