Undoubtedly, the pandemic has brought with it a multitude of problems while enhancing the pre-existing ones. Many businesses, schools, or government procedures assimilated a digital model, while a large part of them closed their doors. Accessing those digital models became an almost inaccessible task. Years of unnoticed growth of this problem now stand out clearly. The issue of the broadband gap is now impossible to ignore. ?
You are a curious person. ✨ Microsoft’s The Verge designed an article about the bandwidth gap problem in the United States. They generated a map that highlights bivalence between counties that do not have a good internet connection.
In this blog, you will learn how to use highcharter and the tidyverse ecosystem to create an essential replica of the map created by The Verge. Are you ready? Let’s get started! ?
Libraries
You will use:
- ? highcharter for charts.
- ? janitor to clean the data.
- ? tidyTuesdayR to download the data.
- ? tidyverse for full data manipulation.
- ? widgetframe to transform the chart into an iframe.
library(highcharter) library(janitor) library(tidytuesdayR) library(tidyverse) library(widgetframe)
Get the data
The data used in this article is about the United States Broadbank Usage Percentage Dataset
tuesday_data <- tidytuesdayR::tt_load(2021, week = 20)
Extract data to use
A full description of the data can be found in this map of America's broadband problem Github link
broadband <- tuesday_data$broadband %>% glimpse()
Download map data
In the broadband
table, you will find five columns, of which, COUNTY ID
represents the identifier of the county in which the broadband usage metrics are being performed.
Therefore, COUNTY ID
can be used to locate the broadband usage values of each county on a map. Here are several alternatives that you could make use of.
However, in this blog, you will use a map of the USA provided by the Highchart
team.
In particular, highcharter
provides two utility functions for accessing and browsing different common maps:
download_map_data()
: Download the geojson data from the highchartsJS collection.get_data_from_map()
: Get the properties for each region in the map, as the keys from the map data.
map_layout_data <- download_map_data("countries/us/us-all-all.js")
If you look askance at the columns of the United States map, you will notice that there is a column named fips
; these are unique 5-digit codes to identify counties in the United States. So, if you consider this, the COUNTY ID
and fips
columns are equivalent. Yes, you may have to make a few adjustments later, but stick with that idea for now. ??
get_data_from_map(map_layout_data) %>% clean_names() %>% glimpse()
General preprocessing
To replicate the map of
The Verge by Microsoft (i.e., hereinafter referred as target map
) you will need to focus only on the two following variables:
|variable |class |description | |:------------------------------|:---------|:-----------| |county_id |double | County ID | |broadband_usage |character | Percent of people per county that use the internet at broadband speeds based on the methodology explained above. Data is from November 2019. |
But wait... maybe you should be asking yourself why you do not need other indicator variables like ST
or COUNTY NAME
?
Well, that's because in a few moments, you will use one of the tables provided by Highcharts
(described in the section above) to carry out the name mapping of each county. However, in case you don't want to use these provided maps, you could keep the other columns and use packages like zipcodeR
or tigris
to help you with the correct mappings. ?
Now, getting back to general data processing, this is a relatively straightforward task. First, and as a good practice, you will use clean_names()
provided by the janitor
package to convert the names of the data columns to a standard one. In this way, typing the columns will be simple, easy to remember, and visibly appropriate. ?
Then, you will convert the broadban_usage
column from a character
type to a numeric
type. While you might initially consider using the R base as.numeric()
or type.convert()
, type_convert()
from the reader
package provides a useful interface if you need to do manual work. Select, clean, and let type_converter()
take care of the parsing of your variables. ??
broadband_processed <- broadband %>% clean_names() %>% select(county_id, broadband_usage) %>% type_convert( col_types = cols( broadband_usage = col_double() ), na = "-" ) %>% glimpse()
Prepare data for plot
Our data is clean, however, to build the target map
with highcharter
, you will need to transform it a bit. To begin with, the data does not have information of broadband usage on certain places (i.e., data with NA
) and, the county_id
are not five-character codes, but four-character. Therefore, you must add a zero to those under four characters. ?✅
Besides, the target map
shows two categories to highlight the problem of broadband use. Specifically, those counties that use a broadband speed less than or greater than and equal to 15%. You must apply this condition to categorize the data in the broadband
table.
Finally, it is time to transform the data into a format that highcharter
can understand. To do this, you will create a list of series, one serie for each category of the map you want to display. Hence, you will use group_nest()
to create a subset of data per group. Then, you can add metadata for each series, for example, a color variable. Now, you can transform each subset of data to a list format that highcharter
can understand with list_parse()
.
Importantly, each subset has columns named exactly as the base highcharter
chart needs with some metadata. For example, to recreate the target map
, each category has the columns: name, value, and fips. Of which, only fips
is a metadata variable. ???
broadband_series_data <- broadband_processed %>% filter(!is.na(broadband_usage)) %>% transmute( # Transform to a fips format ------------------------------------------ fips = str_pad( string = county_id, width = 5, side = "left", pad = "0" ), # Categorize data ----------------------------------------------------- value = broadband_usage * 100, name = if_else( condition = value < 15, true = "< 15%", false = ">= 15%" ) ) %>% # Transform data so that highcharter accepts it. ------------------------- group_nest(name) %>% mutate( color = c("#0B0073", "#B6B8B8"), data = map(data, list_parse) ) %>% glimpse()
I need to tell you that there is another way to create choropleths of categorical areas using dataClasses() ?
However, honestly, the process of creating the dataClasses
does not seem as intuitive to me as the version used in this blog (i.e., using group_nest()
). ?
Who knows. Observe, discuss and leave in the comments how you create them! ??
United States Broadband Usage Percentages figure
Well, you've done a ton of stunts to get here, but now is the time to create the target map
. Are you ready? Let's do it! ???
The first step is to map the data to a graph. For this, you will use highchart()
to define that the desired graph is of the map
type.
Next, you will define how you want to display each of your data groups.For instance, groups below or above 15% broadband usage.
Using hc_plotOptions()
, you can configure the behavior of each series when plotting. In turn, you can also overwrite the default values by adding more columns of metadata (_e.g._, color) in the data created with list_parse()
.
So when you run hc_add_series_list()
, they will all share the parameters used in hc_plotOptions()
, but you can always overwrite some as you like ?
Although you built a similar graph to the `target map` with the previous step, it is time for you to add the corresponding text annotations. Normally, HTML within the highcharter
charts is not necessary, much less mandatory. However, if required, you can use the useHTML = TRUE
parameter within each display text call (_e.g._, hc_title()
, hc_subtitle()
, etc.). In this way, all the strings that you define will be treated as HTML, and you will have greater control over how and where to display specific pieces of text. ?
The result so far should look almost identical; only one thing is missing; the navigation buttons. By default, the navigation buttons are disabled, so you must activate them with hc_mapNavigation()
. To position it in a specific place, use the buttonOptions
parameter. With it, you can change the alignment with the parameters align
, verticalAlign
. To mention, you can also find these parameters in the positioning of the legend or texts. ?
It would be best to use frameWidget()
to display highcharter
plots within static pages. Inside a Rmarkdown or a Shiny app, it might not be necessary. ?
highchart( type = "map" ) %>% # How do you want to display the data? ---------------------------------- hc_plotOptions( map = list( allAreas = FALSE, joinBy = "fips", value = "value", mapData = map_layout_data ), series = list( states = list( inactive = list( opacity = 0.7 ) ) ) ) %>% hc_add_series_list(broadband_series_data) %>% # Display text as in original source. ----------------------------------- hc_title( text = " <p style='text-align:center;'> <b>THIS IS A MAP OF AMERICAN'S BROADBAND PROBLEM</b> <br> A county-by-county look at the broadband gap </p> ", align = "center", useHTML = TRUE ) %>% hc_subtitle( text = " By Russell Brandom and William Joel | May 10, 2021, 9:00am EDT ", align = "center", useHTML = TRUE ) %>% hc_caption( text = " Map: The Verge. The database does not include broadband rates for Oglala Lakota County, South Dakota, or Kusilvak Census Area, Alaska, because of a coding irregularity. Source: <a href='https://github.com/microsoft/USBroadbandUsagePercentages'> Microsoft </a> - <a href='https://www.theverge.com/22418074/broadband-gap-america-map-county-microsoft-data'> Get the data </a>. ", useHTML = TRUE ) %>% hc_legend( title = list( text = " Percentage of people using the internet at 25Mbps or above per county. " ), align = "left", symbolHeight = 12, symbolWidth = 12, symbolRadius = 0, squareSymbol = FALSE ) %>% hc_tooltip( pointFormat = " <b>{point.name}</b><br> <br> <b>{point.value}%</b><br> Percent using broadband speed ", useHTML = TRUE ) %>% hc_credits( text = " <b>Replica made by: </b> <a href='https://twitter.com/jvelezmagic'>@jvelezmagic</a> ", enabled = TRUE, useHTML = TRUE ) %>% # Customize zoom options ------------------------------------------------ hc_mapNavigation( enabled = TRUE, enableMouseWheelZoom = FALSE, buttonOptions = list( align = "right", verticalAlign = "bottom" ) ) %>% # Include your favorite theme! ----------------------------------------- hc_add_theme( hc_thm = hc_theme_elementary() ) %>% # Display as iframe ----------------------------------------------------- frameWidget() %>% identity()
Congratulations, you have learned to create a handy map to share and show a problem of interest. I hope to see you soon with another blog. ?✨✨✨
Feel free to share your comments in the comment section below.
Comments
There are no comments on this post
Want to leave a comment?
Comments are moderated. They will publish only if they add to the discussion in a constructive way. Please be polite.