Using Highcharter in Tidytuesday Internet Access


Blog Posts Data Journalism Data Science Highcharts Highmaps R Tutorials0 comments

Featured image

 
Undoubtedly, the pandemic has brought with it a multitude of problems while enhancing the pre-existing ones. Many businesses, schools, or government procedures assimilated a digital model, while a large part of them closed their doors. Accessing those digital models became an almost inaccessible task. Years of unnoticed growth of this problem now stand out clearly. The issue of the broadband gap is now impossible to ignore. 🧐

You are a curious person. ✨ Microsoft’s The Verge designed an article about the bandwidth gap problem in the United States. They generated a map that highlights bivalence between counties that do not have a good internet connection.

In this blog, you will learn how to use highcharter and the tidyverse ecosystem to create an essential replica of the map created by The Verge. Are you ready? Let’s get started! πŸ™Œ

Libraries

You will use:

  • πŸ“ˆ highcharter for charts.
  • πŸͺ£ janitor to clean the data.
  • πŸ“₯ tidyTuesdayR to download the data.
  • πŸ“ tidyverse for full data manipulation.
  • πŸͺž widgetframe to transform the chart into an iframe.
library(highcharter)
library(janitor)
library(tidytuesdayR)
library(tidyverse)
library(widgetframe)

Get the data

The data used in this article is about the United States Broadbank Usage Percentage Dataset

tuesday_data <- tidytuesdayR::tt_load(2021, week = 20)

Extract data to use

A full description of the data can be found in this map of America's broadband problem Github link

broadband <- tuesday_data$broadband %>%
  glimpse()

Download map data

In the broadband table, you will find five columns, of which, COUNTY ID represents the identifier of the county in which the broadband usage metrics are being performed.

Therefore, COUNTY ID can be used to locate the broadband usage values of each county on a map. Here are several alternatives that you could make use of.
However, in this blog, you will use a map of the USA provided by the Highchart team.

In particular, highcharter provides two utility functions for accessing and browsing different common maps:

  • download_map_data(): Download the geojson data from the highchartsJS collection.
  • get_data_from_map(): Get the properties for each region in the map, as the keys from the map data.
map_layout_data <- download_map_data("countries/us/us-all-all.js")

If you look askance at the columns of the United States map, you will notice that there is a column named fips; these are unique 5-digit codes to identify counties in the United States. So, if you consider this, the COUNTY ID and fips columns are equivalent. Yes, you may have to make a few adjustments later, but stick with that idea for now. πŸ‘€πŸ‘Œ

get_data_from_map(map_layout_data) %>%
  clean_names() %>% 
  glimpse()

General preprocessing

To replicate the map of
The Verge by Microsoft (i.e., hereinafter referred as target map) you will need to focus only on the two following variables:

|variable                       |class     |description |
|:------------------------------|:---------|:-----------|
|county_id                      |double    | County ID |
|broadband_usage                |character | Percent of people per county that use the internet at broadband speeds based on the methodology explained above. Data is from November 2019. |

But wait... maybe you should be asking yourself why you do not need other indicator variables like ST or COUNTY NAME?

Well, that's because in a few moments, you will use one of the tables provided by Highcharts (described in the section above) to carry out the name mapping of each county. However, in case you don't want to use these provided maps, you could keep the other columns and use packages like zipcodeR or tigris to help you with the correct mappings. πŸ€”

Now, getting back to general data processing, this is a relatively straightforward task. First, and as a good practice, you will use clean_names() provided by the janitor package to convert the names of the data columns to a standard one. In this way, typing the columns will be simple, easy to remember, and visibly appropriate. πŸ˜‡

Then, you will convert the broadban_usage column from a character type to a numeric type. While you might initially consider using the R base as.numeric() or type.convert(), type_convert() from the reader package provides a useful interface if you need to do manual work. Select, clean, and let type_converter() take care of the parsing of your variables. πŸ’»πŸ™Œ

broadband_processed <- broadband %>%
  clean_names() %>%
  select(county_id, broadband_usage) %>%
  type_convert(
    col_types = cols(
      broadband_usage = col_double()
    ),
    na = "-"
  ) %>%
  glimpse()

Prepare data for plot

Our data is clean, however, to build the target map with highcharter, you will need to transform it a bit. To begin with, the data does not have information of broadband usage on certain places (i.e., data with NA) and, the county_id are not five-character codes, but four-character. Therefore, you must add a zero to those under four characters. πŸ˜¬βœ…

Besides, the target map shows two categories to highlight the problem of broadband use. Specifically, those counties that use a broadband speed less than or greater than and equal to 15%. You must apply this condition to categorize the data in the broadband table.

Finally, it is time to transform the data into a format that highcharter can understand. To do this, you will create a list of series, one serie for each category of the map you want to display. Hence, you will use group_nest() to create a subset of data per group. Then, you can add metadata for each series, for example, a color variable. Now, you can transform each subset of data to a list format that highcharter can understand with list_parse().

Importantly, each subset has columns named exactly as the base highcharter chart needs with some metadata. For example, to recreate the target map, each category has the columns: name, value, and fips. Of which, only fips is a metadata variable. πŸ˜‹πŸ‘‰πŸ‘ˆ

broadband_series_data <- broadband_processed %>%
  filter(!is.na(broadband_usage)) %>%
  transmute(
    # Transform to a fips format ------------------------------------------
    fips = str_pad(
      string = county_id,
      width = 5,
      side = "left",
      pad = "0"
    ),
    # Categorize data -----------------------------------------------------
    value = broadband_usage * 100,
    name = if_else(
      condition = value < 15,
      true = "< 15%",
      false = ">= 15%"
    )
  ) %>%
  # Transform data so that highcharter accepts it. -------------------------
  group_nest(name) %>%
  mutate(
    color = c("#0B0073", "#B6B8B8"),
    data = map(data, list_parse)
  ) %>%
  glimpse()

I need to tell you that there is another way to create choropleths of categorical areas using dataClasses() 😬
However, honestly, the process of creating the dataClasses does not seem as intuitive to me as the version used in this blog (i.e., using group_nest()). πŸ˜…
Who knows. Observe, discuss and leave in the comments how you create them! 🧐πŸ₯΄

United States Broadband Usage Percentages figure

Well, you've done a ton of stunts to get here, but now is the time to create the target map. Are you ready? Let's do it! 🀭🀭🀭

The first step is to map the data to a graph. For this, you will use highchart() to define that the desired graph is of the map type.

Next, you will define how you want to display each of your data groups.For instance, groups below or above 15% broadband usage.
Using hc_plotOptions(), you can configure the behavior of each series when plotting. In turn, you can also overwrite the default values by adding more columns of metadata (_e.g._, color) in the data created with list_parse().
So when you run hc_add_series_list(), they will all share the parameters used in hc_plotOptions(), but you can always overwrite some as you like πŸ˜‹

Although you built a similar graph to the `target map` with the previous step, it is time for you to add the corresponding text annotations. Normally, HTML within the highcharter charts is not necessary, much less mandatory. However, if required, you can use the useHTML = TRUE parameter within each display text call (_e.g._, hc_title(), hc_subtitle(), etc.). In this way, all the strings that you define will be treated as HTML, and you will have greater control over how and where to display specific pieces of text. πŸ€—

The result so far should look almost identical; only one thing is missing; the navigation buttons. By default, the navigation buttons are disabled, so you must activate them with hc_mapNavigation(). To position it in a specific place, use the buttonOptions parameter. With it, you can change the alignment with the parameters align, verticalAlign. To mention, you can also find these parameters in the positioning of the legend or texts. πŸ€“

It would be best to use frameWidget() to display highcharter plots within static pages. Inside a Rmarkdown or a Shiny app, it might not be necessary. 😬

highchart(
  type = "map"
) %>%
  # How do you want to display the data? ----------------------------------
  hc_plotOptions(
    map = list(
      allAreas = FALSE,
      joinBy = "fips",
      value = "value",
      mapData = map_layout_data
    ),
    series = list(
      states = list(
        inactive = list(
          opacity = 0.7
        )
      )
    )
  ) %>% 
  hc_add_series_list(broadband_series_data) %>%
  # Display text as in original source. -----------------------------------
  hc_title(
    text = "
      <p style='text-align:center;'>
        <b>THIS IS A MAP OF AMERICAN'S BROADBAND PROBLEM</b>
        <br>
        A county-by-county look at the broadband gap
      </p>
    ",
    align = "center",
    useHTML = TRUE
  ) %>%
  hc_subtitle(
    text = "
      By Russell Brandom
      and William Joel | May 10, 2021, 9:00am EDT
    ",
    align = "center",
    useHTML = TRUE
  ) %>%
  hc_caption(
    text = "
      Map: The Verge. The database does not include broadband rates for
      Oglala Lakota County, South Dakota, or Kusilvak Census Area, Alaska,
      because of a coding irregularity. Source: 
      <a href='https://github.com/microsoft/USBroadbandUsagePercentages'>
        Microsoft
      </a> 
       - 
      <a href='https://www.theverge.com/22418074/broadband-gap-america-map-county-microsoft-data'>
      Get the data
      </a>.
    ",
    useHTML = TRUE
  ) %>%
  hc_legend(
    title = list(
      text = "
        Percentage of people using the internet at 25Mbps or above per county.
        "
      ),
    align = "left",
    symbolHeight = 12,
    symbolWidth = 12,
    symbolRadius = 0,
    squareSymbol = FALSE
  ) %>%
  hc_tooltip(
      pointFormat = "
        <b>{point.name}</b><br>
        <br>
        <b>{point.value}%</b><br>
        Percent using broadband speed
        ",
      useHTML = TRUE
  ) %>%
  hc_credits(
    text = "
      <b>Replica made by: </b> 
      <a href='https://twitter.com/jvelezmagic'>@jvelezmagic</a>
    ",
    enabled = TRUE,
    useHTML = TRUE
  ) %>%
  # Customize zoom options ------------------------------------------------
  hc_mapNavigation(
    enabled = TRUE,
    enableMouseWheelZoom = FALSE,
    buttonOptions = list(
      align = "right",
      verticalAlign = "bottom"
    )
  ) %>%
  # Include your favorite theme! -----------------------------------------
  hc_add_theme(
    hc_thm = hc_theme_elementary()
  ) %>%
  # Display as iframe -----------------------------------------------------
  frameWidget() %>% 
  identity()

A map of American's Broadband Problem

Congratulations, you have learned to create a handy map to share and show a problem of interest. I hope to see you soon with another blog. πŸ˜‹βœ¨βœ¨βœ¨

Feel free to share your comments in the comment section below.

Consent for marketing cookies needs to be given to post comments