Small multiple with box plot and jitter scatter charts

Scatter Boxplot

 

 

Small multiples, or grid chart, may be the ideal choice when you are looking to visualize large datasets. This method allows you to separate your data series into multiple individual charts with identical scales and axes.

In this tutorial, we will look into how to use the Small Multiples chart type to display complex data such as the Olympic 2012 athletes data.

First of all, let’s check simple small multiple charts like the demo below, which displays the seats won per party during the Canadian federal election in 2015.

 

This chart is straight forward; each quadrant has one simple pie chart to display the voting result in one province.
Now, what about a large set of data? Like the scatter demo below that displays the weight and height of over 10,000 athletes (each point represents one athlete) in various sporting disciplines during the Olympic 2012 event:

 

 

Remark
The data is hosted in this Git link. I display only 24 Olympic disciplines where both the height and weight of the athletes are reported.

Unless you are an AI, there is no way to get any useful insights from this representation. The data sets overlap each other, and that makes it extremely challenging to compare the height and weight of the different sport disciplines.
One way to solve this issue is by using small multiple techniques, where I gather in each quadrant one discipline. The small multiple demo below allows me to not only visualize two variables per sport( height and weight) but three variables (height, weight), and gender):

 

Heights/weight of 2012 Olympic athletes by sport

( Height on the xAxis in meter and weight on the yAxis in kg)
Male Female

 

Let’s pause for a moment to check how this chart is set up.
Even though the small multiple has many advantages, it comes with many challenges. One of the main challenges of the small multiple is the use of the space; it is a real headache to squeeze all these data compactly. To do that, I had to optimize any redundancy and remove any unnecessary elements :

  1. Set up one main title (at the top) instead of the same title for each chart. The only title I kept for each chart is the name of the sport.
  2. Set up one legend (at the top) for all the charts.
  3. Remove the export server icon, the credit, the yAxis, and the xAxis titles.
  4. Reduce the font size of the title and label in each chart.

By now, I already resolved the issue of a cluttered single chart. But, I wanted to push the possibility of the small multiple a little bit further by displaying statistical data with the three variables: height, weight, and gender. So I came up with another small multiple using two chart types in each quadrant: box plot and jitter (see below).

 

Heights of 2012 Olympic athletes by sport (in meter)
Male Female

 

Like the first small multiple chart, I had to overcome the challenge of optimizing the use of the space, so I went through the same steps as the previous small multiple.

The combination of a boxplot chart with a jitter chart is very effective in displaying at a glance the spread, the number of points, and some statistical information. Thanks to the boxplot, the readers can see each sport’s group heights in five variables, the max and min height of the group, the first quartile, the median, and the third quartile. And thanks to the jitter chart, the users can see the spread by the number of athletes (participants) in each sport. For example, the readers can quickly notice that the aquatics and athletics sport have many participants compared to the rest of the other discipline, where the golf and modern pentathlon discipline have the smallest number of participants.

The small multiple is by far among the most effective techniques to visualize a chart with many data sets, and the chart gets more efficient if you combine the small multiple with the right chart types.