{"id":9494,"date":"2016-12-29T14:37:16","date_gmt":"2016-12-29T13:37:16","guid":{"rendered":"http:\/\/www.highcharts.com\/blog\/?p=9494"},"modified":"2026-01-12T09:14:53","modified_gmt":"2026-01-12T09:14:53","slug":"214-histogram-when-why-how-part2","status":"publish","type":"post","link":"https:\/\/www.highcharts.com\/blog\/tutorials\/214-histogram-when-why-how-part2\/","title":{"rendered":"Histograms: When, Why &#038; How (Part 2)"},"content":{"rendered":"<p><b><a href=\"blog\/213-histogram-when-why-how\">In part 1 <\/a><\/b>of the Histograms post, we talked about what a Histogram can do for you, why you might want to use them, and looked at some examples of Histograms that gave us some insight into the data that they represented.<br \/>\nIn this post, we\u2019ll move on to when and how to implement Histograms, using Highcharts.<\/p>\n<h2>WHEN TO USE A HISTOGRAM<\/h2>\n<p>Histograms are useful when exploring any set of continuous numeric data, with enough data points to make useful inferences.<\/p>\n<p>How many data points are enough? That\u2019s a topic of debate, with many suggestions ranging between 50 and 100 data points as a minimum size.<\/p>\n<p>The bottom line is, if you have too few data points, a Histogram isn\u2019t going to tell you anything useful. There is no danger in plotting a Histogram with too few data points &#8211; unless you try to base statistical conclusions on the result!<\/p>\n<h2>BUILDING A HISTOGRAM IN HIGHCHARTS<\/h2>\n<p>One question that has been asked many times is how to go about creating a Histogram with Highcharts. Highcharts is a great tool for displaying Histograms, with a wide variety of options available to format the chart as needed.<\/p>\n<h3>FORMATTING<\/h3>\n<p>On the formatting side, a Histogram is just a column chart. The primary difference is that gaps between columns are removed in a Histogram. This is done for one simple reason: each part of the x axis is a bin, and each bin covers part of a continuous range along the axis.<br \/>\nIf there is a range on the x axis with no column, that means that no data from the dataset fell into that range.<br \/>\nRemoving gaps between the bars makes it clear that each bin starts where the previous bin ended, and if a gap exists, it means something about your data.<\/p>\n<h3>DATA PROCESSING<\/h3>\n<p>But what about the data?<\/p>\n<p>Users are often looking to the charting library to do the work &#8211; they have a dataset, and want the chart to process and bin the data into a Histogram<\/p>\n<p>But Highcharts is not a statistical analysis tool, it\u2019s a charting tool! Sometimes the line between those things is blurry, but honestly I would prefer that a library sticks to what it does best, and does it well, than that it tries to do everything.<\/p>\n<p>Fortunately, processing data to build a Histogram is not very difficult, and we can build a Javascript function to do this for us.<br \/>\nIncluded here is a function that I use when I need to process the data on the client side.<\/p>\n<p>When possible, I process the data before it ever gets to the client, using PHP, Python, R, or whatever is available in the setup at hand. But sometimes, you need to do it client side, and as long as the dataset is not incredibly large, this is not a problem. If you are new to Histograms, follow through the code and explanation below to understand the mechanics of this kind of data preparation.<br \/>\nThe function:<\/p>\n<pre>function binData(data) {\r\n\r\n  var hData = new Array(), \/\/the output array\r\n  \tsize = data.length, \/\/how many data points\r\n  \tbins = Math.round(Math.sqrt(size)); \/\/determine how many bins we need\r\n       bins = bins &gt; 50 ? 50 : bins; \/\/adjust if more than 50 cells\r\n  var max = Math.max.apply(null, data), \/\/lowest data value\r\n  \tmin = Math.min.apply(null, data), \/\/highest data value\r\n  \trange = max-min, \/\/total range of the data\r\n  \twidth = range\/bins, \/\/size of the bins\r\n  \tbin_bottom, \/\/place holders for the bounds of each bin\r\n  \tbin_top;\r\n\r\n  \/\/loop through the number of cells\r\n  for(var i = 0; i &lt; bins; i++) {\r\n\r\n\t\/\/set the upper and lower limits of the current cell\r\n\tbin_bottom = min + (i * width) ;\r\n\tbin_top = bin_bottom + width;\r\n\r\n\t\/\/check for and set the x value of the bin\r\n\tif(!hData[i]) {\r\n  \t    hData[i] = new Array();\r\n    \t    hData[i][0] = bin_bottom + (width \/ 2);\r\n\t}\r\n\r\n\t\/\/loop through the data to see if it fits in this bin\r\n\tfor(var j = 0; j &lt; size; j++) {\r\n  \t    var x = data[j];\r\n\r\n  \t    \/\/adjust if it's the first pass\r\n  \t    i == 0 &amp;&amp; j == 0 ? bin_bottom -= 1 : bin_bottom = bin_bottom;\r\n\r\n  \t    \/\/if it fits in the bin, add it\r\n  \t    if(x &gt; bin_bottom &amp;&amp; x &lt;= bin_top) {\r\n    \t        !hData[i][1] ? hData[i][1] = 1 : hData[i][1]++;     \t \r\n  \t    }\r\n\t}\r\n  }\r\n  \/\/cleanup\r\n  $.each(hData, function(i, point) {\r\n\tif(typeof point[1] == 'undefined') {\r\n  \thData[i][1] = null;\r\n\t}\r\n  });\r\n  return hData;\r\n}\r\n<\/pre>\n<p>And a Fiddle to experiment with: \u00a0<b><a href=\"https:\/\/jsfiddle.net\/jlbriggs\/gud4bp66\/\">https:\/\/jsfiddle.net\/jlbriggs\/gud4bp66\/<\/a><\/b><br \/>\n<iframe title=\"Histogram Calculator\" style=\"width: 100%; height: 700px; border: none;\" src=\"https:\/\/www.highcharts.com\/samples\/embed\/highcharts\/blog\/histogram-calculator\" allow=\"fullscreen\"><\/iframe><\/p>\n<h4>FIRST, WHAT THIS FUNCTION DOES NOT DO:<\/h4>\n<p>It is not a fully robust, error-proof function. It\u2019s a quick-and-dirty example that can serve as a useful tool in conjunction with adequate safeguards around the data being sent to it, or that can serve as a foundation to build a more robust function or class.<\/p>\n<h4>WHAT THE FUNCTION DOES DO:<\/h4>\n<p>First the function determines the <b><strong>size<\/strong><\/b> of the data &#8211; how many data points do we have &#8211; \u00a0and the number of <b><strong>bins<\/strong><\/b> to create.<br \/>\n<b><strong>Histograms are all about the bins.<\/strong><\/b><br \/>\nWhile determining the number of bins is a topic with a wide variety of opinions, a good rule of thumb is to use the square root of the number of data points.<\/p>\n<p><i><em>The function puts a hard stop at 50 bins, however, as beyond a certain number of bins, a Histogram can often be less useful. 50 is an arbitrary number, based on observational experience &#8211; set it to any number that makes sense for your data, or remove the limit altogether if you wish.<\/em><\/i><\/p>\n<p>Next the function determines the <b><strong>range<\/strong><\/b> of the data set, and uses the range and the number of bins to determine the <b><strong>width<\/strong><\/b> that each bin needs to be.<\/p>\n<p>All bins are the same width. You will find people who advocate for, or wish for, bins to be different sizes. I strongly caution against this, because it 1) causes unnecessary complexity in the chart, which forces the user to work harder to decode the information, and 2) is not usually based on anything statistically valid. Further arguments for or against this assertion are beyond the scope of this post!<br \/>\nOnce we have these variables, we can loop through the bins, loop through the data, and pull each data point into the bin where it belongs.<br \/>\nThe output returned is a ready to use array, in the form of an array of x,y pairs.<\/p>\n<h2>A FINAL EXAMPLE<\/h2>\n<p>This is a Histogram for a dataset containing the weights of cartons that shipped from a distribution center, containing a little more than 33,000 data points.<br \/>\n<iframe title=\"Right Skewed Histogram\" style=\"width: 100%; height: 700px; border: none;\" src=\"https:\/\/www.highcharts.com\/samples\/embed\/highcharts\/blog\/histogram-right-skewed\" allow=\"fullscreen\"><\/iframe><br \/>\nOnce again, we notice several important features of this data immediately:<\/p>\n<ul>\n<li>It is definitely not normally distributed<\/li>\n<li>It is zero-bounded on the left<\/li>\n<li>It is right-skewed<\/li>\n<li>It is multi-modal<\/li>\n<\/ul>\n<p>In particular, the spike at the 40lb range jumps out immediately. Aside from that point, we have a lot of values clustered around a pretty low threshold, in the 2-10lb range, and a steady decrease up into the higher weights, as might be expected for the type of product being sold.<br \/>\nBut why so many at 40lbs? Do they have a lot of products that weigh 40 lbs? The answer was no.<br \/>\nBut what they did have was a limit on how much weight their software would tell the workers to put into a single carton. So when there are larger orders, that will take more than one carton, the first carton would often reach to, or near, the 40lb mark, and a second carton would be used for what was left.<br \/>\nIn that case, why are there any cartons more than 40lbs? \u00a0Because some of the items weigh in at more than 40lbs for a single unit.<br \/>\nThis is a fairly simple sequence of understanding the data in this dataset, but what it highlights is very important: \u00a0A Histogram might not give you the answers that you need, but it will give you the questions.<\/p>\n<p>And there is nothing more important when analysing data than asking the right questions!<\/p>\n<h2>SO HOW ABOUT IT?<\/h2>\n<p>Are you a reader who has worked with Histograms already? Is the idea new to you?<br \/>\nIf you\u2019re new to Histograms, why not try it out? Leave a comment, and let us know how it worked out &#8211; post a link, tell a story, ask a question.<br \/>\nf you work with Histograms a lot, let us know what you love about them. Leave a comment with a story of insights you\u2019ve gained through a Histogram that you might not have otherwise, or your own tips on using them.<\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Second part of why, how and when to use a Histogram to make sense of your data.<\/p>\n","protected":false},"author":43,"featured_media":10701,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"meta_title":"","meta_description":"","hc_selected_options":[],"footnotes":""},"categories":[210],"tags":[1063,1094],"coauthors":[733],"class_list":["post-9494","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-tutorials","tag-data-visualization","tag-highcharts-core"],"_links":{"self":[{"href":"https:\/\/www.highcharts.com\/blog\/wp-json\/wp\/v2\/posts\/9494","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.highcharts.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.highcharts.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.highcharts.com\/blog\/wp-json\/wp\/v2\/users\/43"}],"replies":[{"embeddable":true,"href":"https:\/\/www.highcharts.com\/blog\/wp-json\/wp\/v2\/comments?post=9494"}],"version-history":[{"count":1,"href":"https:\/\/www.highcharts.com\/blog\/wp-json\/wp\/v2\/posts\/9494\/revisions"}],"predecessor-version":[{"id":28510,"href":"https:\/\/www.highcharts.com\/blog\/wp-json\/wp\/v2\/posts\/9494\/revisions\/28510"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.highcharts.com\/blog\/wp-json\/wp\/v2\/media\/10701"}],"wp:attachment":[{"href":"https:\/\/www.highcharts.com\/blog\/wp-json\/wp\/v2\/media?parent=9494"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.highcharts.com\/blog\/wp-json\/wp\/v2\/categories?post=9494"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.highcharts.com\/blog\/wp-json\/wp\/v2\/tags?post=9494"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.highcharts.com\/blog\/wp-json\/wp\/v2\/coauthors?post=9494"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}