Tutorials

Marker clusters with Highcharts

by

Mustapha Mekhatria

6 minutes read

Marker clusters are an effective method for simplifying the visualization of a huge number of data points on a chart (typically scatter chart or a map), by clustering similar data points into blocks to ease the visualization.

In this tutorial, we will show you two examples where the marker clusters technique is particularly effective; then, we will dive into the different configurations options, i.e., clustering algorithms, offered to you by the Highcharts library.

Let’s get started 🙂

1. Maps

The map below displays all US mainland (and surrounding borders) earthquakes with a magnitude of 4.5 and above, from 2000 to 2019.

See the Pen <a href='https://codepen.io/mushigh/pen/gObwqrK'>USA earthquake 2000 – 2019 (mag 4.5+)</a> by mustapha mekhatria (<a href='https://codepen.io/mushigh'>@mushigh</a>) on <a href='https://codepen.io'>CodePen</a>.

According to the map, the number of earthquakes on the east coast is far more significant than the number on the west coast. Many states on the east coast have a high number of earthquakes recorded in the last 19 years, like the state of California, where the midwest regions have almost no quake recorded. But the map doesn’t show other details that can give us better insight, such as: how many earthquakes are in each state? Where are the most earthquake-prone states besides California, as well as the safest ones? Etc. From this map, it is challenging to count the number of earthquakes since the dots overlap each other, and patterns are hidden under layers of pilled markers. One way to solve this issue is by using marker clusters (see demo below):

See the Pen <a href='https://codepen.io/mushigh/pen/dyPvezL'>USA earthquake 2000 – 2019 (mag 4.5+) with Marker Clusters</a> by mustapha mekhatria (<a href='https://codepen.io/mushigh'>@mushigh</a>) on <a href='https://codepen.io'>CodePen</a>.

The marker clusters concept allows us to aggregate close markers into many clusters using different types of algorithms. For more visibility, a gradient of red color is used to visualize the clusters’ size.
Now, we can get more insight from our map using the earthquakes’ numbers in each state and areas; it looks like in the last 19 years, there were more than 150 earthquakes recorded in the state of California alone, and more than 30 earthquakes recorded in the state of Nevada. The state of Oklahoma has 13 earthquakes recorded, followed by the state of Idaho (11) and the state of Washington (8). The border area between Wyoming, Montana, and Idaho has a record of 19 earthquakes during the last 19 years. The Upper Midwest seems to be the safest region in the United States, with almost no earthquake above 4.5 magnitudes recorded since 2000.
Displaying such a map with clustered data could guide leaders and researchers to make better decisions and set up plans with higher chances of success.

Remark
Another benefit of marker clusters is the ability to zoom in and locate a specific marker (point) in each cluster or group. To zoom in, either click directly on the cluster of tour choice or use the navigation map buttons (top left).

2.Scatter plots

Here is a scatter chart, where markers are abundant. The chart displays the relationship between the height and weight of athletes competing in the 2012 Summer Olympics.

See the Pen <a href='https://codepen.io/mushigh/pen/WNNYpLd'>Olympics 2012 by height and weight</a> by mustapha mekhatria (<a href='https://codepen.io/mushigh'>@mushigh</a>) on <a href='https://codepen.io'>CodePen</a>.

From the chart, the athletes’ heights and weight increase together; the pattern is consistent with few outliers. The chart is increasing and linear, the relationship between height and weight is a moderate association, and all those observations are easily determined with a quick glance at this chart. However, what is not clear is how many athletes share similar characteristics and if there is a practical way to group them? For example, what are the most prominent correlations between athletes’ weight and height? Yet again, due to the overwhelming markers on the chart, it is a real visual challenge for anybody to see essential information. Well, you guessed it, marker clusters is a good way to visualize clusters (see demo below):

See the Pen <a href='https://codepen.io/mushigh/pen/JjobjEB'>Olympics 2012 by height and weight</a> by mustapha mekhatria (<a href='https://codepen.io/mushigh'>@mushigh</a>) on <a href='https://codepen.io'>CodePen</a>.

From this demo, we can still see an increasing trend, a linear shape, with a correlation between the two variables. But thanks to the marker clusters technique, it is clear that the majority of athletes’ heights are between 170cm to 180cm, and their weights are between 60kg to 80kg. This chart also displays a clear outlier to the right in the blue dot.

Now you have a good idea about the practicality of a marker clusters technique; let’s look at your different ways to determine how to cluster data in the most meaningful way for your dataset.

You have three algorithms to choose from when assigning market clusters on your charts: grid, k-means, and optimizedKmeans. All you have to do is to pick an algorithm from the type option under layoutAlgorithm. For example, here is a piece of code to choose a K-means algorithm:

layoutAlgorithm: {
  type: 'kmeans',
  distance: '7%'
},

Nevertheless, you can still assign custom clustering algorithms thanks to the flexibility of the Highcharts library. Which algorithm to choose or the pros/cons of each clustering algorithm are beyond the scope of this tutorial. Yet, here are three demos with the same data, where the three clustering algorithms are used to give you a visual idea about the results. For better understanding how each algorithm works, we have superimposed two charts, with the same data, on top of each other, with one series displaying a specific clustering algorithm (in blue), and the second series displaying just a scatter chart (in red).

Grid algorithm demo

See the Pen <a href='https://codepen.io/mushigh/pen/yLyVxzq'>Olympics 2012 by height and weight (comparing data with Grid type marker clusters)</a> by mustapha mekhatria (<a href='https://codepen.io/mushigh'>@mushigh</a>) on <a href='https://codepen.io'>CodePen</a>.

K-means demo

See the Pen <a href='https://codepen.io/mushigh/pen/gObgxGM'>Olympics 2012 by height and weight (comparing data with k-means type marker clusters)</a> by mustapha mekhatria (<a href='https://codepen.io/mushigh'>@mushigh</a>) on <a href='https://codepen.io'>CodePen</a>.

Optimized K-means demo

See the Pen <a href='https://codepen.io/mushigh/pen/dyPNzZb'>Olympics 2012 by height and weight (comparing data with Optimized K-means type marker clusters)</a> by mustapha mekhatria (<a href='https://codepen.io/mushigh'>@mushigh</a>) on <a href='https://codepen.io'>CodePen</a>.

As you can see, each demo has its own clusters structure configuration. The K-means demo includes more points in clusters than the grid demo. The optimizes K-means demo looks almost similar to the K-means demo, but it is faster than the K-means to process and set up the clusters during the zooming in and out.

Marker clusters is a practical concept that allows us to have a better and quick understanding of the data without losing the accuracy of any marker. Don’t be afraid to experiment with this awesome technique in your next chart, and feel free to share your favorite chart using the marker clusters feature in the comment section below.