{"id":9462,"date":"2016-08-15T11:34:44","date_gmt":"2016-08-15T11:34:44","guid":{"rendered":"http:\/\/www.highcharts.com\/blog\/?p=9462"},"modified":"2026-01-12T09:06:44","modified_gmt":"2026-01-12T09:06:44","slug":"226-get-your-data-ready-for-charts-with-python","status":"publish","type":"post","link":"https:\/\/www.highcharts.com\/blog\/integration\/226-get-your-data-ready-for-charts-with-python\/","title":{"rendered":"Get your data ready for charts with Python"},"content":{"rendered":"<h2>PYTHON DATES AND TIMES<\/h2>\n<p>Python is my data analysis language of choice. After using <a href=\"https:\/\/www.parsehub.com\/web-scraping-for-data-science\">ParseHub<\/a> to collect huge databases of JSON from online sources, I then use Python\u2019s great libraries like <a href=\"http:\/\/pandas.pydata.org\/\">pandas<\/a> and <a href=\"http:\/\/matplotlib.org\/\">matplotlib<\/a> to get the data ready to visualize. In this tutorial, I\u2019ll give you a few examples of how you can use Python\u2019s <a href=\"https:\/\/docs.python.org\/2\/library\/datetime.html\">datetime<\/a> library to convert extracted dates and times into a format that is understood by your computer. With this tool you will be able to make visuals that change over the past, present, and future.<\/p>\n<p>This example will use data from the table found on <a href=\"https:\/\/en.wikipedia.org\/wiki\/Member_states_of_the_United_Nations\">this Wikipedia page<\/a>, which includes the date that each one of the current members joined the United Nations. I will show you how to get it ready for visualizing by converting all of the strings in the date columns into datetimes.<\/p>\n<p>I <a href=\"https:\/\/help.parsehub.com\/hc\/en-us\/articles\/218181287-Create-your-first-project\">scraped the table<\/a> using ParseHub, which allows me to download the extracted data as JSON.<\/p>\n<p>The next step is to import the libraries you will need into <a style=\"font-size: 1em;\" href=\"http:\/\/jupyter.org\/\">Jupyter notebook<\/a><span style=\"font-size: 1em;\">.<\/span><\/p>\n<pre><code class=\"language-sh hljs haskell\"><span class=\"hljs-import\"><span class=\"hljs-keyword\">import<\/span> math<\/span>\r\n<span class=\"hljs-import\"><span class=\"hljs-keyword\">import<\/span> pandas <span class=\"hljs-keyword\">as<\/span> pd<\/span>\r\n<span class=\"hljs-import\"><span class=\"hljs-keyword\">import<\/span> pylab <span class=\"hljs-keyword\">as<\/span> pl<\/span>\r\n<span class=\"hljs-import\"><span class=\"hljs-keyword\">import<\/span> numpy <span class=\"hljs-keyword\">as<\/span> np<\/span>\r\n<span class=\"hljs-import\"><span class=\"hljs-keyword\">import<\/span> json<\/span>\r\n<span class=\"hljs-import\"><span class=\"hljs-keyword\">import<\/span> datetime<\/span><\/code><\/pre>\n<p>Python\u2019s pandas library has a function read_json to import JSON into a pandas data structure. I used it to first import the data oriented as one column:<\/p>\n<pre><code class=\"language-sh hljs haskell\"><span class=\"hljs-typedef\"><span class=\"hljs-keyword\">data<\/span> = pd.read_json<span class=\"hljs-container\">(\u2018<span class=\"hljs-type\">UN_members<\/span>.<span class=\"hljs-title\">json'<\/span>, <span class=\"hljs-title\">orient<\/span>='<span class=\"hljs-title\">columns'<\/span>)<\/span><\/span><\/code><\/pre>\n<p>Next, each cell will be read as an index:<\/p>\n<pre><code class=\" hljs haskell\"><span class=\"hljs-typedef\"><span class=\"hljs-keyword\">data<\/span> = pd.read_json<span class=\"hljs-container\">(<span class=\"hljs-title\">data<\/span>['<span class=\"hljs-title\">states'<\/span>].<span class=\"hljs-title\">to_json<\/span>()<\/span>, orient='index')<\/span><\/code><\/pre>\n<p>For this example, I would like to calculate how long each country has been a member of the UN, so I add a blank column called \u201ctime\u201d to my DataFrame.<\/p>\n<pre><code class=\" hljs haskell\"><span class=\"hljs-title\">time<\/span> = pd.<span class=\"hljs-type\">Series<\/span>(<span class=\"hljs-type\">None<\/span>, index = <span class=\"hljs-typedef\"><span class=\"hljs-keyword\">data<\/span>.index, name = 'time')<\/span> <span class=\"hljs-typedef\"><span class=\"hljs-keyword\">data<\/span> = <span class=\"hljs-keyword\">data<\/span>.join<span class=\"hljs-container\">(<span class=\"hljs-title\">time<\/span>)<\/span><\/span><\/code><\/pre>\n<p><img decoding=\"async\" src=\"http:\/\/highsoftpictures.s3.amazonaws.com\/image_1.png\" alt=\"\" border=\"0\" \/><\/p>\n<p>You can\u2019t calculate anything with the dates found in the \u201cadmission\u201d column just yet, because they are still being stored as strings. Luckily, pandas has a function to_datetime that will be able to convert them to a computer readable datetime variable, as long as you tell it what the format of each date is.<\/p>\n<p>In this case, the format is the <strong>day<\/strong>, followed by a space, then the name of the <strong>month<\/strong>, then another space, then the<strong>year<\/strong>. Using <a href=\"http:\/\/strftime.org\/\">this reference<\/a>, I know that the format is written: \u201c%d %B %Y\u201d<\/p>\n<p>I\u2019ll first find today\u2019s date with the datetime <strong>today<\/strong> function. Then will use a for loop, which will convert each date string into datetimes and calculate the number of days it has been since that date, one index at a time. Indexing a pandas DataFrame can be done in many <a href=\"http:\/\/pandas.pydata.org\/pandas-docs\/stable\/indexing.html\">different ways<\/a>, but for this loop I used: dataframe.loc[<em>index<\/em>, <em>column name<\/em>]<\/p>\n<pre><code class=\"language-sh hljs perl\"><span class=\"hljs-keyword\">for<\/span> i in data.<span class=\"hljs-keyword\">index<\/span> :\r\n\r\n    data.loc[i, <span class=\"hljs-string\">'admission'<\/span>] = pd.to_datetime(data.loc[i, <span class=\"hljs-string\">'admission'<\/span>], <span class=\"hljs-keyword\">format<\/span>=<span class=\"hljs-string\">\"<span class=\"hljs-variable\">%d<\/span> <span class=\"hljs-variable\">%B<\/span> <span class=\"hljs-variable\">%Y<\/span>\"<\/span>)\r\n    data.loc[i, <span class=\"hljs-string\">'time'<\/span>] = date - data.loc[i, <span class=\"hljs-string\">'admission'<\/span>]<\/code><\/pre>\n<p>I then sorted my data frame by length of membership, found in the \u201ctime\u201d column, and alphabetical order of the country names.<\/p>\n<pre><code class=\"language-sh hljs haskell\"><span class=\"hljs-typedef\"><span class=\"hljs-keyword\">data<\/span> = <span class=\"hljs-keyword\">data<\/span>.sort_values<span class=\"hljs-container\">(['<span class=\"hljs-title\">time'<\/span>, '<span class=\"hljs-title\">name'<\/span>])<\/span><\/span><\/code><\/pre>\n<p><img decoding=\"async\" src=\"http:\/\/highsoftpictures.s3.amazonaws.com\/image_2.png\" alt=\"\" border=\"0\" \/><\/p>\n<p>Now my data is ready to be exported, analyzed and visualized! I decided to visualize this particular data as a highly interactive map using HighChart\u2019s <a style=\"font-size: 1em;\" href=\"https:\/\/www.highcharts.com\/products\/maps\/\">Highmaps<\/a>.<\/p>\n<p>One way to associate a country to its location on a map using Highmaps is to use its ISO country code. I found a <a href=\"https:\/\/www.highcharts.com\/samples\/data\/jsonp.php?filename=world-population-density.json&amp;callback=?\">set of data<\/a> from one of the Highmaps examples that had the name of every country along with their associated country code.<\/p>\n<p>I imported this data as a list, converted it to json using python\u2019s dumps function and then to a pandas DataFrame using the read_json function.<\/p>\n<pre class=\"prettyprint\"><code class=\"language-sh hljs vala\"><span class=\"hljs-preprocessor\"># converts lists into json<\/span>\r\ncountry_codes = json.dumps(country_codes)\r\n<span class=\"hljs-preprocessor\"># converts json into dataframe<\/span>\r\ncountry_codes = pd.read_json(country_codes)<\/code><\/pre>\n<p>In the UN DataFrame, I had to change some of the country names, because the two sources didn\u2019t always use the exact same names. There could have been an elegant way to do this with regex, but I simply hard coded the changes.<\/p>\n<pre><code class=\"language-sh hljs vala\">data.loc[data[<span class=\"hljs-string\">'name'<\/span>] == <span class=\"hljs-string\">'East Timor'<\/span>, <span class=\"hljs-string\">'name'<\/span>] = <span class=\"hljs-string\">'Timor-Leste'<\/span>\r\ndata.loc[data[<span class=\"hljs-string\">'name'<\/span>] == <span class=\"hljs-string\">'Macedonia'<\/span>, <span class=\"hljs-string\">'name'<\/span>] = <span class=\"hljs-string\">'Macedonia, FYR'<\/span>\r\ndata.loc[data[<span class=\"hljs-string\">'name'<\/span>] == <span class=\"hljs-string\">'Slovakia'<\/span>, <span class=\"hljs-string\">'name'<\/span>] = <span class=\"hljs-string\">'Slovak Republic'<\/span>\r\n<span class=\"hljs-preprocessor\"># etc...<\/span><\/code><\/pre>\n<p>After that I was able to join the two DataFrames, giving me one with the time data as well as the country codes.<\/p>\n<pre class=\"prettyprint\"><code class=\"language-sh hljs livecodeserver\">joined = pd.<span class=\"hljs-built_in\">merge<\/span>(data, country_codes, how=<span class=\"hljs-string\">'inner'<\/span>, <span class=\"hljs-command\"><span class=\"hljs-keyword\">on<\/span>=<span class=\"hljs-string\">'name'<\/span>)<\/span><\/code><\/pre>\n<p><img decoding=\"async\" src=\"https:\/\/wp-assets.highcharts.com\/www-highcharts-com\/blog\/wp-content\/uploads\/2017\/03\/28175618\/image03.png\" alt=\"\" border=\"0\" \/><br \/>\nAll I need for the visualisation is country code and times, so I selected those columns and then changed the column names. I changed their names to \u201chc-key\u201d and \u201cvalue\u201d.<\/p>\n<pre>sh\r\nnew_data = pd.concat([joined['code'].str.lower(), joined['time']], axis=1)\r\nnew_data.columns = ['hc-key', 'value']\r\n<\/pre>\n<p><img decoding=\"async\" src=\"https:\/\/wp-assets.highcharts.com\/www-highcharts-com\/blog\/wp-content\/uploads\/2017\/03\/28175620\/image04.png\" alt=\"\" border=\"0\" \/><br \/>\nUsing the datetime days function, I was able to turn each datetime object into an integer number of days.<\/p>\n<pre class=\"prettyprint\"><code class=\"language-sh hljs delphi\"><span class=\"hljs-keyword\">for<\/span> i <span class=\"hljs-keyword\">in<\/span> new_data.<span class=\"hljs-keyword\">index<\/span>:   \r\n    new_data.loc[i, <span class=\"hljs-string\">'value'<\/span>] = new_data.loc[i, <span class=\"hljs-string\">'value'<\/span>].days<\/code><\/pre>\n<p>I exported the DataFrame as JSON into my python directory to save it.<\/p>\n<pre class=\"prettyprint\"><code class=\"language-sh hljs bash\"><span class=\"hljs-keyword\">export<\/span> = new_data.to_json(<span class=\"hljs-string\">'UN_data.json'<\/span>, orient=<span class=\"hljs-string\">'records'<\/span>)\r\n<\/code><\/pre>\n<p><img decoding=\"async\" src=\"https:\/\/wp-assets.highcharts.com\/www-highcharts-com\/blog\/wp-content\/uploads\/2017\/03\/28175621\/image05.png\" alt=\"\" border=\"0\" \/><\/p>\n<p>From there it was ready to bring into a Highmaps visual. I copied and pasted the saved JSON, but it could have just as easily been imported from a URL.<\/p>\n<p>I gave my map navigation buttons bottom left and a color center bottom center of the chart.<\/p>\n<p>It also has a hover state, so that the countries change color when hovered over, as well as a tool tip that explicitly states the number of days. You can see the full code in the JavaScript tab below and see how easily it comes together.<br \/>\n<iframe height=\"620\" style=\"width: 100%;\" scrolling=\"no\" title=\"Days since joining UN\" src=\"https:\/\/codepen.io\/mushigh\/embed\/zYBOLdZ?height=265&#038;theme-id=light&#038;default-tab=result\" frameborder=\"no\" loading=\"lazy\" allowtransparency=\"true\" allowfullscreen=\"true\"><br \/>\n  See the Pen <a href='https:\/\/codepen.io\/mushigh\/pen\/zYBOLdZ'>Days since joining UN<\/a> by mustapha mekhatria<br \/>\n  (<a href='https:\/\/codepen.io\/mushigh'>@mushigh<\/a>) on <a href='https:\/\/codepen.io'>CodePen<\/a>.<br \/>\n<\/iframe><br \/>\nOf course, this just displays the number of days until the day the data is extracted. ParseHub allows you to schedule runs every day so that your data stays up to date. The data can also be collected using ParseHub\u2019s API options using HTTP GETs to make integration onto your website completely automated.<\/p>\n<p>This example is just one of the many ways that plotting over time can be made possible with datetimes, but every project is a little different \u2013 let me know which date and time data you visualize!<\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Prepare your data for visualization in Highmaps using Python&#8217;s datetime and pandas.<\/p>\n","protected":false},"author":38,"featured_media":11892,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"meta_title":"","meta_description":"","hc_selected_options":[],"footnotes":""},"categories":[1105],"tags":[876,885],"coauthors":[744],"class_list":["post-9462","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-integration","tag-highcharts-maps","tag-python"],"_links":{"self":[{"href":"https:\/\/www.highcharts.com\/blog\/wp-json\/wp\/v2\/posts\/9462","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.highcharts.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.highcharts.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.highcharts.com\/blog\/wp-json\/wp\/v2\/users\/38"}],"replies":[{"embeddable":true,"href":"https:\/\/www.highcharts.com\/blog\/wp-json\/wp\/v2\/comments?post=9462"}],"version-history":[{"count":1,"href":"https:\/\/www.highcharts.com\/blog\/wp-json\/wp\/v2\/posts\/9462\/revisions"}],"predecessor-version":[{"id":29073,"href":"https:\/\/www.highcharts.com\/blog\/wp-json\/wp\/v2\/posts\/9462\/revisions\/29073"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.highcharts.com\/blog\/wp-json\/wp\/v2\/media\/11892"}],"wp:attachment":[{"href":"https:\/\/www.highcharts.com\/blog\/wp-json\/wp\/v2\/media?parent=9462"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.highcharts.com\/blog\/wp-json\/wp\/v2\/categories?post=9462"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.highcharts.com\/blog\/wp-json\/wp\/v2\/tags?post=9462"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.highcharts.com\/blog\/wp-json\/wp\/v2\/coauthors?post=9462"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}