Building a Podcast Analytics Dashboard with Highcharts

Building a Podcast Analytics Dashboard with Highcharts

My wife and I recently started a podcast called Breaking Español. Being the data geek I am, having good analytics on our audience and viewership was a high priority. After looking around the web most of what I found were third-party services that would collect the data for a fee and then provide really poor visualization options. Much less something interactive or that I could embed on my site.

So, I did what I often do, broke open my data toolbox and went to work. Here is what I ended up building in the end, follow along below for step-by-step on how you can build it yourself. Enjoy!

Breaking Español Analytics.png

Live Version: http://breakingespanol.com/analytics/

1. COLLECTING DATA

Podcasting feels a lot like building web pages in the late 90’s. Uploading them is easy, but figuring out what is going on with them is near impossible. The basic flow of hosting a podcast is
Host > RSS Feed > Podcast app
So first you need somewhere to host your app. For us, we went with Soundcloud as they seem to have the most modern and comprehensive tools for this. In Soundcloud they have some built-in stats but it’s only for people listening on Soundcloud or in their app, and about 70% of all podcasts are consumed via iTunes.

Luckily, Soundcloud offers a way to post messages to a third-party stats service. I might build my own version of this in the future, but for now we decided to use Podtrac.com as our third-party stats collector.

Podtrac collects data from any app and parses out some things like user location and source (eg. iTunes v Soundcloud). Beyond that they provide some basic visualizations, none that are worth writing home about 🙁

2. EXTRACTING THE DATA

With the data being collected on Podtrac it’s time to extract it from their site and convert to something that is easy to visualize. For this I fired up my favorite web scraping tool, ParseHub, and went to work. I am lucky enough to have a former colleague there who walked me through the initial setup.

ParseHub works by browsing to web pages with data you want to extract, selecting it, and formatting in either JSON or CSV. After some trial and error I was able to build some jobs that will extract the viewership data from Podtrac and download into a CSV file.

Now I have a series of ParseHub jobs I can fire up each week to automagically go download all my data including daily downloads, by source, and by location. In the future I’d like to automate this but for now 3 clicks per week isn’t too taxing on my schedule.

The data coming out of Podtrac via ParseHub isn’t ideal and includes lots of stuff we need to remove before we have something ready to visualize.

3. PROCESSING DATA

From here I copied the data into a Google Spreadsheet I have setup which includes formulas to identify which rows to keep in my data set. Using those formulas I then created a pivot table that aggregates and sorts my data into exactly what I want to use for my charts.

This is a key step as for me anyway, trying to cleanse data in Javascript can be cumbersome and generally wrought with errors. So for me a spreadsheet is a lot easier to do these simple data cleansing, aggregation, and sorting steps.

The last step with the data all cleaned and aggregated is to get publish my Google Spreadsheet and find the sheet ID’s to use for pulling the data into Highcharts.

  1. Publish your spreadsheet so it is public (not ideal for sensitive data!)
    1. File > Publish to Web > Entire Document > Web Page
  2. Now get the ID for the Google Sheet
    1. https://docs.google.com/spreadsheets/d/{key}/edit#gid=
  3. Now get the ID of the sheets you setup
    1. https://spreadsheets.google.com/feeds/worksheets/{key}/public/basic
    2. Pretty print the XML however you like (example)
    3. Look for your sheet with the aggregated data and get the last section of the URL in the < id > tag

https://spreadsheets.google.com/feeds/worksheets/{key}/public/basic/{spreadsheet-id}

4. BUILDING CHARTS

Now to the fun part…building the actual charts. At this stage I merely wanted to put up some simple graphs that gave my wife and I basic information about our audience and viewership numbers so I went with the classic line and bar charts.

I went with a simple layout for my dashboard containing one big trend (spline) chart on top, with two bar charts below containing information about the top locations and sources. With my layout in mind, it was time to build the charts.

I built them locally on my laptop and pulled in the data from the Google Spreadsheet using the key and worksheet id from above. I used the examples on Highcharts.com to find exactly what I was looking for and just pasted in my spreadsheet key values then added some customization such as a Title and hiding the Y axis title.

5. PUBLISHING THE DASHBOARD

The last step was to push this to our site. I am using Jekyll and Github Pages to host our site for our podcast. So the way I ended up doing this was to create a separate file for each chart that had just the code needed to build it, then combining them in another file using Liquid tags. You can view the entire repo here and below is the code for our dashboard page including each chart. Enjoy!
analytics.html

---
layout: page
title:Analytics
permalink:/analytics/
published:true
---
<style>
.half {
width:50%;
float: left;
}
</style>
<scriptsrc="//code.jquery.com/jquery-1.11.3.min.js"></script>
<scriptsrc="//code.highcharts.com/highcharts.js"></script>
<scriptsrc="//code.highcharts.com/modules/data.js"></script>
{% include analytics_downloads.html %}
{% include analytics_sources.html %}
{% include analytics_locations.html %}

analytics_downloads.html

<script>
	//setup chart definition
	var options = {
	   chart: {
	       renderTo: 'downloads',
	       type: 'spline',
				 height: 300
	   },
	   data:{
		googleSpreadsheetKey: '{key}',
		googleSpreadsheetWorksheet: '{id}'
	   },
	   title: {
	       text: 'Daily Downloads'
	   },
	   tooltip: {
	       enabled: true
	   },
	   yAxis: {
	       title: {
	           text: '',
	           margin: 10
	       },
	       labels: {
	           enabled: true
	       },
	       gridelineWidth: 0
	   },
	   series: [{
	       showInLegend: false,
	       dataLabels: {
	           enabled: true
	       },
	       data: []
	   }]
	}

	//build chart
	var chart = new Highcharts.Chart(options);
</script>
<div id="downloads"></div>

 analytics_locations.html

<script>
	//setup chart definition
	var options = {
	   chart: {
	       renderTo: 'locations',
	       type: 'bar'
	   },
	   data:{
		googleSpreadsheetKey: '{key}',
		googleSpreadsheetWorksheet: '{id}'
	   },
	   title: {
	       text: 'Top Locations'
	   },
	   tooltip: {
	       enabled: true
	   },
	   yAxis: {
	       title: {
	           text: '',
	           margin: 10
	       },
	       labels: {
	           enabled: true
	       },
	       gridelineWidth: 0
	   },
	   series: [{
	       showInLegend: false,
	       dataLabels: {
	           enabled: true
	       },
	       data: []
	   }]
	}
	var chart = new Highcharts.Chart(options);
</script>
<div id="locations" class="half"></div>

 analytics_sources.html

<script>
	//setup the chart
	var options = {
	   chart: {
	       renderTo: 'sources',
	       type: 'bar'
	   },
	   data:{              
		googleSpreadsheetKey: '{key}',
		googleSpreadsheetWorksheet: '{id}'
	   },
	   title: {
	       text: 'Top Sources'
	   },
	   tooltip: {
	       enabled: true
	   },
	   yAxis: {
	       title: {
	           text: '',
	           margin: 10
	       },
	       labels: {
	           enabled: true
	       },
	       gridelineWidth: 0
	   },
	   series: [{
	       showInLegend: false,
	       dataLabels: {
	           enabled: true
	       },
	       data: []
	   }]
	}
	//build the chart
	var chart = new Highcharts.Chart(options);
</script>
<div id="sources" class="half"></div>