{"id":16647,"date":"2018-06-18T16:24:09","date_gmt":"2018-06-18T14:24:09","guid":{"rendered":"http:\/\/www.highcharts.com\/blog\/?p=16647"},"modified":"2020-06-19T12:08:02","modified_gmt":"2020-06-19T11:08:02","slug":"visualize-wikipedia-data-with-nodejs-and-highcharts","status":"publish","type":"post","link":"https:\/\/www.highcharts.com\/blog\/tutorials\/visualize-wikipedia-data-with-nodejs-and-highcharts\/","title":{"rendered":"Visualize Wikipedia Data with NodeJS and Highcharts"},"content":{"rendered":"<p>&nbsp;<\/p>\n<p>Wikipedia is a great source of information and data, with a rate of over 10 edits per second. The <a href=\"https:\/\/en.wikipedia.org\/wiki\/English_Wikipedia\">English Wikipedia<\/a> alone gets 600 new articles per day. But Wikipedia also offers many tools for exploring pages\u2019 statistics, such as <a href=\"https:\/\/tools.wmflabs.org\/pageviews\/?project=en.wikipedia.org&amp;platform=all-access&amp;agent=user&amp;range=latest-20&amp;pages=Cat|Camel\">Pageviews Analysis<\/a>, <a href=\"http:\/\/wikirank-2018.di.unimi.it\/\">Wikipedia Ranking<\/a>, <a href=\"https:\/\/www.mediawiki.org\/wiki\/API:Main_page\">Wikipedia API<\/a>, etc. And if you are a DataViz enthusiastic like me, this is a treasure trove of data!<\/p>\n<p>In this tutorial, I will show you how to extract and visualize the Pageviews Analysis data using Wikipedia API, NodeJS, and Highcharts.<\/p>\n<p>The good news is that MediaWiki provides\u00a0an easy and straightforward Wikipedia API, with no need for an API key.<\/p>\n<p>Let\u2019s get started!<\/p>\n<p>I will extract the <b>dates<\/b> and the <b>users\u2019 views<\/b> of the Wikipedia webpage <a href=\"https:\/\/en.wikipedia.org\/wiki\/International_Space_Station\">International Space Station<\/a> from 7\/1\/2017 &#8211; 6\/3\/2018, then plot the trends in an interactive chart (see GIF below):<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"size-full wp-image-16648 aligncenter\" src=\"https:\/\/wp-assets.highcharts.com\/www-highcharts-com\/blog\/wp-content\/uploads\/2018\/06\/18150509\/wikipediaAPINodejsHS.gif\" alt=\"\" width=\"948\" height=\"441\" \/><\/p>\n<p><i><b>Remark<\/b><\/i><\/p>\n<p>You can download the code used in this article from the following <a href=\"https:\/\/github.com\/mekhatria\/WikipediaAPINodeJSHS\">Github link<\/a>.<\/p>\n<p>&nbsp;<\/p>\n<p>I use the following Wikipedia API structure: https:\/\/wikimedia.org\/api\/rest_v1\/metrics\/pageviews\/per-article\/en.wikipedia\/all-access\/user\/International_Space_Station\/daily\/2017070100\/2018060300. Notice the use of the name of the page in the 10th field and the dates in the 12th field. For more details about Wikipedia API, click <a href=\"https:\/\/www.mediawiki.org\/wiki\/API:Main_page\">here<\/a>.<\/p>\n<p>To handle the API call, I use the <a href=\"https:\/\/www.npmjs.com\/package\/request-promise\">request-promise package<\/a>.<\/p>\n<p>First, let\u2019s create a folder to save the code. Browse to the folder you created and install the request-promise package:<\/p>\n<p><code>npm install --save request<\/code><br \/>\n<code>npm install --save request-promise<\/code><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"size-full wp-image-16651 aligncenter\" src=\"https:\/\/wp-assets.highcharts.com\/www-highcharts-com\/blog\/wp-content\/uploads\/2018\/06\/18152259\/install-request-and-promise.png\" alt=\"\" width=\"676\" height=\"313\" srcset=\"https:\/\/wp-assets.highcharts.com\/www-highcharts-com\/blog\/wp-content\/uploads\/2018\/06\/18152259\/install-request-and-promise.png 676w, https:\/\/wp-assets.highcharts.com\/www-highcharts-com\/blog\/wp-content\/uploads\/2018\/06\/18152259\/install-request-and-promise-560x259.png 560w, https:\/\/wp-assets.highcharts.com\/www-highcharts-com\/blog\/wp-content\/uploads\/2018\/06\/18152259\/install-request-and-promise-360x167.png 360w\" sizes=\"auto, (max-width: 676px) 100vw, 676px\" \/><\/p>\n<p>As I am using the highcharts library, I need to install it as well with this command line:<br \/>\n<code>npm install highcharts<\/code><br \/>\n<img loading=\"lazy\" decoding=\"async\" class=\"size-full wp-image-16654 aligncenter\" src=\"https:\/\/wp-assets.highcharts.com\/www-highcharts-com\/blog\/wp-content\/uploads\/2018\/06\/18152417\/Highcharts.png\" alt=\"\" width=\"780\" height=\"97\" srcset=\"https:\/\/wp-assets.highcharts.com\/www-highcharts-com\/blog\/wp-content\/uploads\/2018\/06\/18152417\/Highcharts.png 780w, https:\/\/wp-assets.highcharts.com\/www-highcharts-com\/blog\/wp-content\/uploads\/2018\/06\/18152417\/Highcharts-560x70.png 560w, https:\/\/wp-assets.highcharts.com\/www-highcharts-com\/blog\/wp-content\/uploads\/2018\/06\/18152417\/Highcharts-768x96.png 768w, https:\/\/wp-assets.highcharts.com\/www-highcharts-com\/blog\/wp-content\/uploads\/2018\/06\/18152417\/Highcharts-760x95.png 760w, https:\/\/wp-assets.highcharts.com\/www-highcharts-com\/blog\/wp-content\/uploads\/2018\/06\/18152417\/Highcharts-360x45.png 360w\" sizes=\"auto, (max-width: 780px) 100vw, 780px\" \/><\/p>\n<p>The last package to install is browserify.<br \/>\n<code>npm install browserify<\/code><br \/>\n<img loading=\"lazy\" decoding=\"async\" class=\"size-full wp-image-16656 aligncenter\" src=\"https:\/\/wp-assets.highcharts.com\/www-highcharts-com\/blog\/wp-content\/uploads\/2018\/06\/18152513\/browserify.png\" alt=\"\" width=\"780\" height=\"211\" srcset=\"https:\/\/wp-assets.highcharts.com\/www-highcharts-com\/blog\/wp-content\/uploads\/2018\/06\/18152513\/browserify.png 780w, https:\/\/wp-assets.highcharts.com\/www-highcharts-com\/blog\/wp-content\/uploads\/2018\/06\/18152513\/browserify-560x151.png 560w, https:\/\/wp-assets.highcharts.com\/www-highcharts-com\/blog\/wp-content\/uploads\/2018\/06\/18152513\/browserify-768x208.png 768w, https:\/\/wp-assets.highcharts.com\/www-highcharts-com\/blog\/wp-content\/uploads\/2018\/06\/18152513\/browserify-760x206.png 760w, https:\/\/wp-assets.highcharts.com\/www-highcharts-com\/blog\/wp-content\/uploads\/2018\/06\/18152513\/browserify-360x97.png 360w\" sizes=\"auto, (max-width: 780px) 100vw, 780px\" \/><\/p>\n<p>Browserify allows me to compile the whole code (including Highcharts library) into a single js file, that I can include it as a script in the HTML webpage.<\/p>\n<p>I will first display the code (you may copy and paste), run it; then, I will review the code for you.<\/p>\n<h2>The code<\/h2>\n<p>Create a new js file (ex: code.js), and copy\/paste the code below:<\/p>\n<pre>var rp = require('request-promise');\r\nvar Highcharts = require('highcharts');\r\n\r\nvar options = {\r\n  method: 'GET',\r\n  uri: 'https:\/\/wikimedia.org\/api\/rest_v1\/metrics\/pageviews\/per-article\/en.wikipedia\/all-access\/user\/International_Space_Station\/daily\/2017070100\/2018060300',\r\n  json: true,\r\n};\r\n\r\nrp(options)\r\n  .then((parseBody) =&gt; {\r\n    var arrData = [];\r\n    var year, month, day;\r\n\r\n    for (i = 0; i &lt; parseBody.items.length; i++) {\r\n      year = parseBody.items[i].timestamp.slice(0, 4);\r\n      month = parseBody.items[i].timestamp.slice(4, 6);\r\n      day = parseBody.items[i].timestamp.slice(6, 8);\r\n      arrData.push([new Date(year + '-' + month + '-' + day).toDateString(), parseBody.items[i].views]);\r\n    }\r\n\r\n    year = parseBody.items[0].timestamp.slice(0, 4);\r\n    month = parseBody.items[0].timestamp.slice(4, 6);\r\n    day = parseBody.items[0].timestamp.slice(6, 8);\r\n\r\n    \/\/ Create the chart    \r\n    Highcharts.chart('container', {\r\n      title: {\r\n        text: 'Views of the International Space Station Wikipedia webpage'\r\n      },\r\n      subtitle: {\r\n        useHTML: true,\r\n        text: 'Source: <a href=\"https:\/\/www.mediawiki.org\/wiki\/API:Main_page\">Wikipedia<\/a>'\r\n      },\r\n      xAxis: {\r\n        type: 'datetime',\r\n        dateTimeLabelFormats: {\r\n          day: '%y\/%b\/%e'\r\n        }\r\n      },\r\n      yAxis: {\r\n        title: {\r\n          text: 'Number of views'\r\n        }\r\n      },\r\n      series: [{\r\n        name: 'views',\r\n        data: arrData,\r\n        pointStart: Date.UTC(year, month, day),\r\n        pointInterval: 24 * 3600 * 1000 \/\/ one day\r\n      }]\r\n    });\r\n  });<\/pre>\n<p>Don\u2019t forget to also create an HTML file (ex: chart.html) then copy\/paste the code below:<\/p>\n<pre>&lt;html&gt;\t\r\n    &lt;head&gt;&gt;\t\r\n        &lt;script src=\"bundle.js\"&gt;&lt;\/script&gt;\t \r\n    &lt;\/head&gt;\t\r\n    &lt;body&gt;\t\r\n        &lt;div id=\"container\"&gt;&lt;\/div&gt;       \r\n    &lt;\/body&gt;\t\r\n&lt;\/html&gt;<\/pre>\n<h2>Run the code<\/h2>\n<p>To run the code just execute this command line on the terminal <code>browserify code.js &gt; bundle.js<\/code>, then click on the HTML file to see the result.<\/p>\n<h2>Explanations<\/h2>\n<p>I create the Options object that holds all the necessary information to make a request. This route does not require any authentication, so it should be pretty simple.<\/p>\n<pre>var options = {\r\n  method: 'GET',\r\n  uri: 'https:\/\/wikimedia.org\/api\/rest_v1\/metrics\/pageviews\/per-article\/en.wikipedia\/all-access\/user\/International_Space_Station\/daily\/2017070100\/2018060300',\r\n  json: true,\r\n};<\/pre>\n<p>The object includes:<\/p>\n<ul>\n<li>The method\/type of the request (GET, POST, PUT, DELETE). In this case, I use GET, as I request data from Wikipedia.<\/li>\n<li>The link to the URL represented by uri.<\/li>\n<li>The expected datatype from the URL. in this case JSON.<\/li>\n<\/ul>\n<p>The following code launches the whole fetching data process:<\/p>\n<pre>rp(options)\r\n  .then((parseBody) =&gt; {\r\n\u2026.\r\n});<\/pre>\n<p><code>parseBody<\/code> holds the data received from fetched from Wikipedia:<\/p>\n<pre>...{\"project\":\"en.wikipedia\",\"article\":\"International_Space_Station\",\"granularity\":\"daily\",\"timestamp\":\"2018021700\",\"access\":\"all-access\",\"agent\":\"user\",\"views\":4549},{\"project\":\"en.wikipedia\",\"article\":\"International_Space_Station\",\"granularity\":\"daily\",\"timestamp\":\"2018021800\",\"access\":\"all-access\",\"agent\":\"user\",\"views\":4896},{\"project\":\"en.wikipedia\",\"article\":\"International_Space_Station\",\"granularity\":\"daily\",\"timestamp\":\"2018021900\",\"access\":\"all-access\",\"agent\":\"user\",\"views\":4634},{\"project\":\"en.wikipedia\",\"article\":\"International_Space_Station\",\"granularity\":\"daily\",\"timestamp\":\"2018022000\",\"access\":\"all-access\",\"agent\":\"user\",\"views\":4701} ...,<\/pre>\n<p>The content of the <code>parseBody<\/code> has many information, but I am only interested in the number of views and the dates. To extract those data I use the following loop:<\/p>\n<pre>for (i = 0; i &lt; parseBody.items.length; i++) {\r\n      year = parseBody.items[i].timestamp.slice(0, 4);\r\n      month = parseBody.items[i].timestamp.slice(4, 6);\r\n      day = parseBody.items[i].timestamp.slice(6, 8);\r\n\r\n      arrData.push([new Date(year + '-' + month + '-' + day).toDateString(), parseBody.items[i].views]);\r\n    }<\/pre>\n<p>Notice that I use three variables to handle the dates: year, month, and day. This is because the dates in Wikipedia are structures as YYYYMMDD. I would have preferred a Unix Timestamp as it is much easier to manage. Oh, well&#8230;<\/p>\n<p>Once all those data are extracted, I build the chart using Highcharts:<\/p>\n<pre>Highcharts.chart('container', {\r\n      title: {\r\n        text: 'Views of the International Space Station (Wikipedia webpage)'\r\n      },\r\n      subtitle: {\r\n        useHTML: true,\r\n        text: 'Source: <a href=\"https:\/\/www.mediawiki.org\/wiki\/API:Main_page\">Wikipedia<\/a>'\r\n      },\r\n      xAxis: {\r\n        type: 'datetime',\r\n        dateTimeLabelFormats: {\r\n          day: '%y\/%b\/%e'\r\n        }\r\n      },\r\n      yAxis: {\r\n        title: {\r\n          text: 'Number of views'\r\n        }\r\n      },\r\n      series: [{\r\n        name: 'views',\r\n        data: arrData,\r\n        pointStart: Date.UTC(year, month, day),\r\n        pointInterval: 24 * 3600 * 1000 \/\/ one day\r\n      }]\r\n    });<\/pre>\n<p>So that\u2019s how you may visualize Wikipedia Pageviews Analysis, using NodeJS and Highcharts. I really enjoyed setting up this project, as the Wikipedia API is easy to use. I have barely scratched the surface, and I encourage you to play around with the code and API to visualize other data and trends in this amazing collection of data.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Learn how to extract and visualize the Pageviews Analysis data using Wikipedia API, NodeJS, and Highcharts.<\/p>\n","protected":false},"author":32,"featured_media":16665,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"meta_title":"","meta_description":"","hc_selected_options":[],"footnotes":""},"categories":[224,210],"tags":[818],"coauthors":[699],"class_list":["post-16647","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-post","category-tutorials","tag-nodejs"],"_links":{"self":[{"href":"https:\/\/www.highcharts.com\/blog\/wp-json\/wp\/v2\/posts\/16647","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.highcharts.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.highcharts.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.highcharts.com\/blog\/wp-json\/wp\/v2\/users\/32"}],"replies":[{"embeddable":true,"href":"https:\/\/www.highcharts.com\/blog\/wp-json\/wp\/v2\/comments?post=16647"}],"version-history":[{"count":0,"href":"https:\/\/www.highcharts.com\/blog\/wp-json\/wp\/v2\/posts\/16647\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.highcharts.com\/blog\/wp-json\/wp\/v2\/media\/16665"}],"wp:attachment":[{"href":"https:\/\/www.highcharts.com\/blog\/wp-json\/wp\/v2\/media?parent=16647"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.highcharts.com\/blog\/wp-json\/wp\/v2\/categories?post=16647"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.highcharts.com\/blog\/wp-json\/wp\/v2\/tags?post=16647"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.highcharts.com\/blog\/wp-json\/wp\/v2\/coauthors?post=16647"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}