Share this

Data science and Highcharts: Kernel density estimation

Mustapha Mekhatria Avatar

by

4 minutes read

Kernel density estimation

Kernel density estimation is a useful statistical method to estimate the overall shape of a random variable distribution. In other words, kernel density estimation, also known as KDE, helps us to “smooth” and explore data that doesn’t follow any typical probability density distribution, such as normal distribution, binomial distribution, etc.

In this tutorial, we will show you how to create an interactive kernel density estimation in Javascript and plot the result using the Highcharts library.

Let’s first explore the KDE plot; then we will dive into the code.

The demo below displays a Gaussian kernel density estimate of a random dataset:

This chart helps us to estimate the probability distribution of our random data set, and we can see that the data are concentrated mainly at the beginning and at the end of the chart.

Basically, for each data points in red, we plot a Gaussian kernel function in orange, then we sum all the kernel functions together to create the density estimate in blue (see demo):

By the way, there are many kernel function types such as Gaussian, Uniform, Epanechnikov, etc. The one we use is the Gaussian kernel, as it offers a smooth pattern.  

The mathematical representation of the Gaussian kernel is:

Gaussian kernel

Now, you have an idea about how the kernel density estimation looks like, let’s take a look at the code behind it.
There are four main steps in the code:

  1. Create the Gaussian kernel function.
  2. Process the density estimate points.
  3. Process the kernel points.
  4. Plot the whole data points.

Gaussian kernel function

The following code represents the Gaussian kernel function:

function GaussKDE(xi, x) {
  return (1 / Math.sqrt(2 * Math.PI)) * Math.exp(Math.pow(xi - x, 2) / -2);
}

Where x represents the main data (observation), and xi represents the range to plot the kernels and the density estimate function. In our case, the xi range is from 88 to 107 to be sure to cover the range of the observation data that is from 93 to 102.

Density estimate points

The following loop creates the density estimate points using the GaussKDE() function and the range represented by the array xiData:

//Create the density estimate
for (i = 0; i < xiData.length; i++) {
  let temp = 0;
  kernel.push([]);
  kernel[i].push(new Array(dataSource.length));
  for (j = 0; j < dataSource.length; j++) {
    temp = temp + GaussKDE(xiData[i], dataSource[j]);
    kernel[i][j] = GaussKDE(xiData[i], dataSource[j]);
  }
  data.push([xiData[i], (1 / N) * temp]);
}

Kernels points

This step is required only if you would like to display the kernel points (orange charts); otherwise, you are already good with the density estimate step. Here is the code to process the data points for each kernel:

//Create the kernels
for (i = 0; i < dataSource.length; i++) {
  kernelChart.push([]);
  kernelChart[i].push(new Array(kernel.length));
  for (j = 0; j < kernel.length; j++) {
    kernelChart[i].push([xiData[j], (1 / N) * kernel[j][i]]);
  }
}

Basically, this loop is just about adding the range xiData to each kernel array that was already processed in the density estimate step.

Plot the points

Once all the data points are processed, it is time to use Highcharts to render the series. The density estimate and the kernels are spline chart types, whereas the observations are plotted as a scatter plot:

Highcharts.chart("container", {
  chart: {
    type: "spline",
    animation: true
  },
  title: {
    text: "Gaussian Kernel Density Estimation (KDE)"
  },
  yAxis: {
    title: { text: null }
  },
  tooltip: {
    valueDecimals: 3
  },
  plotOptions: {
    series: {
      marker: {
        enabled: false
      },
      dashStyle: "shortdot",
      color: "#ff8d1e",
      pointStart: xiData[0],
      animation: {
        duration: animationTime
      }
    }
  },
  series: [
    {
      type: "scatter",
      name: "Observation",
      marker: {
        enabled: true,
        radius: 5,
        fillColor: "#ff1e1f"
      },
      data: dataPoint,
      tooltip: {
        headerFormat: "{series.name}:",
        pointFormat: "<b>{point.x}</b>"
      },
      zIndex: 9
    },
    {
      name: "KDE",
      dashStyle: "solid",
      lineWidth: 2,
      color: "#1E90FF",
      data: data
    },
    {
      name: "k(" + dataSource[0] + ")",
      data: kernelChart[0]
    },...  ]
});

Now, you are ready to explore your own data using the power of the Kernel density estimation plot.
Feel free to share your comments or questions in the comment section below.

Stay in touch

No spam, just good stuff

We're on discord. Join us for challenges, fun and whatever else we can think of
XSo MeXSo Me Dark
Linkedin So MeLinkedin So Me Dark
Facebook So MeFacebook So Me Dark
Github So MeGithub So Me Dark
Youtube So MeYoutube So Me Dark
Instagram So MeInstagram So Me Dark
Stackoverflow So MeStackoverflow So Me Dark
Discord So MeDiscord So Me Dark

Comments

  1. João Vitor

    |

    Where do I put the bandwidth so I can control the line smoothness


    1. Mustapha Mekhatria

      |

      Hi,

      Feel free to explore these parameters:
      let animationDuration = 4000;
      let range = 20,
      startPoint = 88;

      Take care


Leave a Reply to João Vitor Cancel reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.