{"id":9471,"date":"2016-07-27T12:46:13","date_gmt":"2016-07-27T12:46:13","guid":{"rendered":"http:\/\/www.highcharts.com\/blog\/?p=9471"},"modified":"2026-01-12T09:02:46","modified_gmt":"2026-01-12T09:02:46","slug":"223-visualizing-the-gender-of-us-senators","status":"publish","type":"post","link":"https:\/\/www.highcharts.com\/blog\/tutorials\/223-visualizing-the-gender-of-us-senators\/","title":{"rendered":"Visualizing the gender of US senators with R and Highmaps"},"content":{"rendered":"<p>I like interactive visualizations. It allows me, as a number cruncher, unique ways to analyze and play with the data. For the reader, it also makes data more engaging and easy to grasp than words or numbers alone.<\/p>\n<p>I have tried several charting libraries and Highcharts is the one I like the most. Why? It is elegant, very well documented, and above all, provides an extensive set of examples and demos that helps me quickly figure out how to implement my ideas.<\/p>\n<p>My tool of choice for statistical analysis is R. To my great joy, R and Highcharts play really nice together with the help of <a href=\"http:\/\/jkunst.com\/highcharter\/\">Highcharter<\/a>, an open source R wrapper for the Highcharts javascript library and its modules. Highcharter allows R programmers like me to create interactive, web ready charts really easy.<\/p>\n<p>One feature I love in particular, is the ability to generate interactive maps using Highcharter&#8217;s built-in support for Highmaps (Highchart&#8217;s sister product). In this article, I\u2019ll give you a short example of how to use the map feature to bring geo-data to life.<\/p>\n<h2>SOURCING AND PARSING THE DATA<\/h2>\n<p>The data for this example is the number of female United States senators in the 114th congress. I sourced the data from <b><a href=\"http:\/\/www.senate.gov\/\">here <\/a><\/b>via an XML file.<\/p>\n<p>As you all know, the quality of input data determines output quality. Sometimes the most laborious task may be preparing data for analysis. In this case, the task was pretty straightforward: After extracting the data, I cleaned up the first names a little and then used the <b><a href=\"https:\/\/genderize.io\/\">Genderize.io<\/a><\/b> API to append gender data to each record. Each gender assignment includes a probability measure, which I used to single out records that could benefit from manual review and correction. (E.g. names like Pat and Rand are gender neutral, and thus neither I nor Genderize have complete confidence in the result\u2026)<\/p>\n<p>Next, I needed to tabulate the number of male and female senators by state (check the code below):<\/p>\n<pre>#Load required libraries\r\nlibrary(XML)\r\nlibrary(genderizeR)\r\nlibrary(stringr)\r\nlibrary(dplyr)\r\nlibrary(highcharter)\r\n#Read XML file with senators info from www.senate.gov\r\nurl=\"http:\/\/www.senate.gov\/general\/contact_information\/senators_cfm.xml\"\r\ndata=xmlToDataFrame(url)\r\n#Create names variable with cleaned first names and genderize using genderize.io API\r\ndata %&gt;% \r\n \u00a0mutate(name=str_to_lower(word(first_name, 1))) %&gt;%\r\n \u00a0select(name) %&gt;% \r\n \u00a0findGivenNames() -&gt; names\r\n#Join output with original data\r\ndata %&gt;% select(first_name, last_name, member_full, state) %&gt;% \r\n \u00a0mutate(name=str_to_lower(word(first_name, 1))) %&gt;% \r\n \u00a0mutate(name=str_replace_all(name,\"[[:punct:]]\",\"\")) %&gt;% \r\n \u00a0na.omit() %&gt;% left_join(names, by=\"name\")-&gt;data\r\n#Inspect cases with low probability\r\ndata %&gt;% filter(probability &lt; 0.75)\r\ndata %&gt;% mutate(gender = ifelse(name %in% c(\"rand\", \"pat\"), \"male\", gender))-&gt;data\r\n#Summarize data by state\r\ndata %&gt;% \r\n \u00a0group_by(state, gender) %&gt;% \r\n \u00a0summarise(senators=n()) %&gt;% \r\n \u00a0tidyr::spread(gender, senators) %&gt;% \r\n \u00a0ungroup()-&gt;data\r\ndata[is.na(data)] &lt;- 0\r\n#Load geojson with US states boundaries\r\ndata(\"usgeojson\")\r\n#Map colors\r\ncolfunc &lt;- colorRampPalette(c(\"white\", \"darkviolet\"))\r\nn=max(data$female)\r\n#Assign one color to each output (# of female senators)\r\ncolstops &lt;- data.frame(q = 0:n, c = colfunc(n+1)) %&gt;%\r\n \u00a0list.parse2()\r\n#Create highmap with the previous info using highcharter package\r\nhighchart() %&gt;%\r\n \u00a0hc_title(text = \"Current Women Senators of the 114th Congress\") %&gt;%\r\n \u00a0hc_subtitle(text = \"Source: http:\/\/www.senate.gov\/\") %&gt;%\r\n \u00a0hc_add_series_map(usgeojson, data, name = \"Women Senators\",\r\n \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0value = \"female\", joinBy = c(\"postalcode\", \"state\"),\r\n \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0dataLabels = list(enabled = TRUE,\r\n \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0format = '{point.properties.postalcode}')) %&gt;%\r\n \u00a0hc_colorAxis(stops = colstops) %&gt;%\r\n \u00a0hc_mapNavigation(enabled = TRUE)-&gt;m\r\n#Save map to local\r\nlibrary(htmlwidgets)\r\nsaveWidget(m, file=\"m.html\")\r\n<\/pre>\n<p>Next I join my freshly prepared gender data with a geojson dataset of United States (1), which gives me all the data I need to generate the map visualization and assign values to each state.<\/p>\n<p>The result is what you see below. You may hover each state to see additional data.<br \/>\n<iframe style=\"border: none; width: 100%; height: 518px;\" data-cookieconsent=\"marketing\" data-src=\"https:\/\/s3.amazonaws.com\/highsoftpictures\/visualizing_the_gender_of_us_senators.html\" width=\"320\" height=\"240\"><\/iframe><br \/>\nIf you are a statistician (or data-scientists, as the fashionable term is), realize that you are a story-teller. As such, data visualization is perhaps your most important vehicle for conveying information, not just make data easier to analyze for your own needs. If data visualization is not a key part of your analysis workflow, it\u2019s time to pick up some new tricks!<\/p>\n<p>If you like R, you\u2019ll love Highcharter and Highcharts. Give it a try.<\/p>\n<p><i><em>(Editor&#8217;s note: There is currently a promotion with a steep discount on commercial use of Highcharts and Highmaps in conjunction with <\/em><\/i>Highcharter<i><em>. <b><a href=\"http:\/\/shop.highcharts.com\/foss\/\">Click here to apply.<\/a><\/b>)<\/em><\/i><\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>How to parse public data and create interactive maps with Highcharts and the R language.<\/p>\n","protected":false},"author":40,"featured_media":11888,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"meta_title":"","meta_description":"","hc_selected_options":[],"footnotes":""},"categories":[210],"tags":[876,793],"coauthors":[739],"class_list":["post-9471","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-tutorials","tag-highcharts-maps","tag-r"],"_links":{"self":[{"href":"https:\/\/www.highcharts.com\/blog\/wp-json\/wp\/v2\/posts\/9471","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.highcharts.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.highcharts.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.highcharts.com\/blog\/wp-json\/wp\/v2\/users\/40"}],"replies":[{"embeddable":true,"href":"https:\/\/www.highcharts.com\/blog\/wp-json\/wp\/v2\/comments?post=9471"}],"version-history":[{"count":1,"href":"https:\/\/www.highcharts.com\/blog\/wp-json\/wp\/v2\/posts\/9471\/revisions"}],"predecessor-version":[{"id":29070,"href":"https:\/\/www.highcharts.com\/blog\/wp-json\/wp\/v2\/posts\/9471\/revisions\/29070"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.highcharts.com\/blog\/wp-json\/wp\/v2\/media\/11888"}],"wp:attachment":[{"href":"https:\/\/www.highcharts.com\/blog\/wp-json\/wp\/v2\/media?parent=9471"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.highcharts.com\/blog\/wp-json\/wp\/v2\/categories?post=9471"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.highcharts.com\/blog\/wp-json\/wp\/v2\/tags?post=9471"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.highcharts.com\/blog\/wp-json\/wp\/v2\/coauthors?post=9471"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}