{"id":19535,"date":"2020-04-21T13:23:17","date_gmt":"2020-04-21T12:23:17","guid":{"rendered":"http:\/\/www.highcharts.com\/blog\/?p=19535"},"modified":"2026-01-12T11:39:55","modified_gmt":"2026-01-12T11:39:55","slug":"data-science-and-highcharts-linear-regression","status":"publish","type":"post","link":"https:\/\/www.highcharts.com\/blog\/tutorials\/data-science-and-highcharts-linear-regression\/","title":{"rendered":"Data science and Highcharts: linear regression"},"content":{"rendered":"<p>&nbsp;<\/p>\n<p>In this tutorial, we will learn how to calculate and plot a linear regression line, and use it to visualize a considerable number of points without cluttering a chart. We will also look at the limitations of the linear regression line.<\/p>\n<p><i><b>Remark<\/b><br \/>\nThe <a href=\"https:\/\/www.highcharts.com\/blog\/products\/stock\/\">Highcharts Stock<\/a> package has built-in support for <a href=\"https:\/\/www.highcharts.com\/docs\/stock\/custom-technical-indicators\">advanced technical indicators<\/a> including <a href=\"https:\/\/api.highcharts.com\/highstock\/plotOptions.linearregression\">linear regressions<\/a> and more. This blog article, however, focuses on how you can apply custom statistical analysis on the chart data, and render it using Highcharts.<\/p>\n<p>I am using JavaScript Statistical Library (<a href=\"http:\/\/www.jstat.org\/\" target=\"_blank\" rel=\"noopener noreferrer\">jStat<\/a>) to do all the statistical heavy lifting, such as the calculation of the mean, the standard deviation, and the population correlation coefficient.<\/i><\/p>\n<p>If you are not familiar with the linear regression here is a quick summary:<br \/>\n<a href=\"https:\/\/en.wikipedia.org\/wiki\/Linear_regression\" target=\"_blank\" rel=\"noopener noreferrer\">Linear regression<\/a> is the most popular <a href=\"https:\/\/en.wikipedia.org\/wiki\/Regression_analysis\" target=\"_blank\" rel=\"noopener noreferrer\">regression analysis technique<\/a>. It helps us to make predictions and find a causal effect relation by exploring the relationship (correlation) between continuous dependent variables and continuous or discrete independent variables. For example, the demo below visualizes a relationship between the football athletes\u2019 weight and the height of the 2012 Olympic event:<br \/>\n&nbsp;<\/p>\n<p class=\"demo-container\">\n<iframe height=\"700\" style=\"width: 100%;\" scrolling=\"no\"  src=\"https:\/\/codepen.io\/mushigh\/embed\/MWwxKWR?height=265&#038;theme-id=light&#038;default-tab=result\" frameborder=\"no\" allowtransparency=\"true\" allowfullscreen=\"true\" loading=\"lazy\" title=\"A line chart and a scatter chart visualize a relationship between the football athletes\u2019 weight and the height of the 2012 Olympic event. By Mustapha Mekhatria\"><br \/>\n  See the Pen <a href='https:\/\/codepen.io\/mushigh\/pen\/MWwxKWR'>Linear regression and scatter of football<\/a> by mustapha mekhatria<br \/>\n  (<a href='https:\/\/codepen.io\/mushigh'>@mushigh<\/a>) on <a href='https:\/\/codepen.io'>CodePen<\/a>.<br \/>\n<\/iframe>\n<\/p>\n<p>&nbsp;<br \/>\nThe regression line (the black line) represents the relationship (model) between the football athletes\u2019 height and weight.<\/p>\n<p>Technical note: Linear regression is represented by an <a href=\"https:\/\/en.wikipedia.org\/wiki\/Simple_linear_regression\" target=\"_blank\" rel=\"noopener noreferrer\">equation<\/a> Y= B*X + A. The B is the slope that is equal to r*(Sy\/Sx) where r is the correlation coefficient, Sy is the standard deviation of y values and Sx is the standard deviation of x value. The equation of A (the intercept) is equal to the meanY-(B*meanX), where meanY and meanX are the means of the y values and x values, respectively.<\/p>\n<p>And thanks to the jStat library, all I had to do is to make a few lines of code to calculate the main mathematical formula and use a simple line type to visualize the linear regression:<br \/>\n&nbsp;<\/p>\n<pre>  function (data) {\r\n    function regression(arrWeight, arrHeight) {\r\n      let r, sy, sx, b, a, meanX, meanY;\r\n      r = jStat.corrcoeff(arrHeight, arrWeight);\r\n      sy = jStat.stdev(arrWeight);\r\n      sx = jStat.stdev(arrHeight);\r\n      meanY = jStat(arrWeight).mean();\r\n      meanX = jStat(arrHeight).mean();\r\n      b = r * (sy \/ sx);\r\n      a = meanY - meanX * b;\r\n      \/\/Set up a line\r\n      let y1, y2, x1, x2;\r\n      x1 = jStat.min(arrHeight);\r\n      x2 = jStat.max(arrHeight);\r\n      y1 = a + b * x1;\r\n      y2 = a + b * x2;\r\n      return {\r\n        line: [\r\n          [x1, y1],\r\n          [x2, y2]\r\n        ],\r\n        r\r\n      };\r\n    }\r\n<\/pre>\n<p>&nbsp;<br \/>\nThe mathematical equation of the line above is Y= -86.60 + 88.79*X. The correlation coefficients or r is 0.85, which means there is a strong positive relationship between the height and the weight. This coefficient also helps us to know how much the regression line estimates the actual values (measured values). In our case, with an r=0.85, that means our model is quite a nice representation of the measured values.<\/p>\n<p>Now you have a good idea about what is a linear regression, and how to visualize it. Let\u2019s see how we can use it as a smart way to visualize many data points and still have an easy-to-read chart.<\/p>\n<p>Below is a chart with thousands of data points, representing the <a href=\"https:\/\/raw.githubusercontent.com\/mekhatria\/demo_highcharts\/master\/olympic2012.json?callback=?\" target=\"_blank\" rel=\"noopener noreferrer\">2012 Olympic athletes\u2019 height and weight<\/a> for the top 10 most popular disciplines:<br \/>\n&nbsp;<\/p>\n<p class=\"demo-container\">\n<iframe height=\"695\" style=\"width: 100%;\" scrolling=\"no\" src=\"https:\/\/codepen.io\/mushigh\/embed\/yLNoXvJ?height=265&#038;theme-id=light&#038;default-tab=result\" frameborder=\"no\" allowtransparency=\"true\" allowfullscreen=\"true\" loading=\"lazy\" title=\"A scatter chart with thousands of data points, representing the 2012 Olympic athletes\u2019 height and weight for the top 10 most popular disciplines. By Mustapha Mekhatria\"><br \/>\n  See the Pen <a href='https:\/\/codepen.io\/mushigh\/pen\/yLNoXvJ'>Olympics 2012 by height and weight for each sport<\/a> by mustapha mekhatria<br \/>\n  (<a href='https:\/\/codepen.io\/mushigh'>@mushigh<\/a>) on <a href='https:\/\/codepen.io'>CodePen<\/a>.<br \/>\n<\/iframe>\n<\/p>\n<p>&nbsp;<br \/>\nEven though I am using different colors, it is challenging to get insights from such a chart, as the data sets overlap each other. There are many ways to solve this issue, such as the small multiple technique that I talked about in detail in the <a href=\"https:\/\/www.highcharts.com\/blog\/tutorials\/small-multiple-with-boxplot-jitter-scatter-in-progress\/\" target=\"_blank\" rel=\"noopener noreferrer\">previous article<\/a>.<br \/>\nAnother option that could solve a cluttered scatter chart is the <a href=\"https:\/\/www.highcharts.com\/blog\/tutorials\/marker-clusters-with-highcharts\/\" target=\"_blank\" rel=\"noopener noreferrer\">clustered scattered plot<\/a>. But with our case, there are so many series with data points, a clustered scatter plot doesn\u2019t help that much (see <a href=\"https:\/\/codepen.io\/mushigh\/full\/jObWXzX\" target=\"_blank\" rel=\"noopener noreferrer\">demo below<\/a>):<br \/>\n&nbsp;<\/p>\n<p class=\"demo-container\">\n<iframe height=\"695\" style=\"width: 100%;\" scrolling=\"no\" src=\"https:\/\/codepen.io\/mushigh\/embed\/jObWXzX?height=265&#038;theme-id=light&#038;default-tab=result\" frameborder=\"no\" allowtransparency=\"true\" allowfullscreen=\"true\" loading=\"lazy\" title=\"A clustered scatter chart with thousands of data points, representing the 2012 Olympic athletes\u2019 height and weight for the top 10 most popular disciplines. By Mustapha Mekhatria\"><br \/>\n  See the Pen <a href='https:\/\/codepen.io\/mushigh\/pen\/jObWXzX'>Olympics 2012 by height and weight for each sport<\/a> by mustapha mekhatria<br \/>\n  (<a href='https:\/\/codepen.io\/mushigh'>@mushigh<\/a>) on <a href='https:\/\/codepen.io'>CodePen<\/a>.<br \/>\n<\/iframe>\n<\/p>\n<p>&nbsp;<br \/>\nAnother way to overcome this challenge is by using a mathematical representation or model for each discipline using linear regression, for instance (see <a href=\"https:\/\/codepen.io\/mushigh\/full\/bGdONaG\" target=\"_blank\" rel=\"noopener noreferrer\">chart below<\/a>):<br \/>\n&nbsp;<\/p>\n<p class=\"demo-container\">\n<iframe height=\"695px\" style=\"width: 100%;\" scrolling=\"no\" src=\"https:\/\/codepen.io\/mushigh\/embed\/bGdONaG?height=265&#038;theme-id=light&#038;default-tab=result\" frameborder=\"no\" allowtransparency=\"true\" allowfullscreen=\"true\" loading=\"lazy\" title=\"A chart with mathematical representation or model for each discipline using linear regression. By Mustapha Mekhatria\"><br \/>\n  See the Pen <a href='https:\/\/codepen.io\/mushigh\/pen\/bGdONaG'>Linear regression and scatter per sport<\/a> by mustapha mekhatria<br \/>\n  (<a href='https:\/\/codepen.io\/mushigh'>@mushigh<\/a>) on <a href='https:\/\/codepen.io'>CodePen<\/a>.<br \/>\n<\/iframe>\n<\/p>\n<p>&nbsp;<br \/>\nThe chart looks much cleaner using line charts (mathematical model) instead of the scatter type charts. I kept the scatter chart option on the same chart for more exploration and comparison between the disciplines.<br \/>\nAnother benefit of this solution is that The chart is now accessibility friendly, as it is easier to see the overall series&#8217; patterns. <\/p>\n<p>One major drawback of using linear regression is that it is a model, not the real representation. The model is just the best straight line that represents the measured values. Another drawback is that the linear regression is highly sensitive to outliers.<\/p>\n<p>I hope this taught you something about how to best prepare your data through statistical analysis, and how to combine the results with the appropriate chart type to get the best results from your data.<\/p>\n<p>Let me know in the comment section below if you have another favorite JavaScript statistical library, and feel free to share your experience with it.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Learn how to create a regression line with highcharts to visualize the relationship between a dependent variable and an explanatory variable (or an independent variable)<\/p>\n","protected":false},"author":32,"featured_media":19555,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"meta_title":"","meta_description":"","hc_selected_options":[],"footnotes":""},"categories":[210],"tags":[1063,1094],"coauthors":[699],"class_list":["post-19535","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-tutorials","tag-data-visualization","tag-highcharts-core"],"_links":{"self":[{"href":"https:\/\/www.highcharts.com\/blog\/wp-json\/wp\/v2\/posts\/19535","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.highcharts.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.highcharts.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.highcharts.com\/blog\/wp-json\/wp\/v2\/users\/32"}],"replies":[{"embeddable":true,"href":"https:\/\/www.highcharts.com\/blog\/wp-json\/wp\/v2\/comments?post=19535"}],"version-history":[{"count":1,"href":"https:\/\/www.highcharts.com\/blog\/wp-json\/wp\/v2\/posts\/19535\/revisions"}],"predecessor-version":[{"id":29232,"href":"https:\/\/www.highcharts.com\/blog\/wp-json\/wp\/v2\/posts\/19535\/revisions\/29232"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.highcharts.com\/blog\/wp-json\/wp\/v2\/media\/19555"}],"wp:attachment":[{"href":"https:\/\/www.highcharts.com\/blog\/wp-json\/wp\/v2\/media?parent=19535"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.highcharts.com\/blog\/wp-json\/wp\/v2\/categories?post=19535"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.highcharts.com\/blog\/wp-json\/wp\/v2\/tags?post=19535"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.highcharts.com\/blog\/wp-json\/wp\/v2\/coauthors?post=19535"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}