10 Maps in R

In this chapter I want to focus on mapping data in R, and given how many ways we could do this, I have to narrow the list of what we want to build to just a few things. In particular, most often what you will want is a map showing the administrative regions of interest to you, perhaps with a pin or two thrown in to flag some vital feature. I’ve had students build these types of maps, for example, when showing the mouths of various streams being sampled and sampling locations per stream, with simple terrain maps. Another common use is to build a heat-map, showing the distribution of poverty, infant mortality, income, etc by Census Tract, place, county, and so on. Finally, there is the beauty of {leaflet}, {mapboxapi}, {mapdeck}, {mapview}, {highcharter} and other vehicles for creating some highly functional map-driven visualizations. As we work through these techniques, please remember: We are not geographers; A reliable GIS professional would make an excellent reviewer of any maps we generate for a client or for publication.

10.1 Mapping in R

There are several libraries that will allow you to do everything from drawing a simple map to complicated spatial analyses but we want just the basics so let us start with seeing how ggmap() works. We can then switch to {tmap}, {urbnmapr}, and {tigris} before we explore {highcharter}, {leaflet}, {mapview}, {mapboxapi} and {mapdeck} for some interactive maps and other spatial functionality.

10.1.1 Using {ggmap}

Let us load a few libraries that we know we will need:

library(tidyverse)
library(ggmap)
library(maps)
library(mapdata)
library(maptools)
library(ggthemes)

The {maps} package will allow us to pull data-frames with pre-populated information to generate a map of USA counties, states, the world, some European states, New Zealand, etc. We start with a simple data-frame for all USA counties. We then subset this to just the 88 counties in Ohio.

map_data(
  "county"
  ) -> usa  
usa %>%
  filter(
    region == "ohio"
    ) -> oh 
names(oh)
#> [1] "long"      "lat"       "group"     "order"     "region"    "subregion"

If you look at the variables in oh, note what they refer to:

  • long = longitude … Lines of longitude, or meridians, run between the North and South Poles and measure east-west positions. While prime meridian is assigned the value of 0 degrees, and runs through Greenwich (England), meridians to the west of the prime meridian are measured in degrees west (up to 180 degrees) and those to the east of the prime meridian are measured to in degrees east (up to 180 degrees). Athens, Ohio has a longitude of -82.101255
  • lat = latitude … Lines of latitude measure north-south position between the poles with the equator defined as 0 degrees, the North Pole defined as 90 degrees north, and the South Pole defined as 90 degrees south. Athens, Ohio has a latitude of 39.329240
  • group = an identifier that is unique for each sub-region (here the counties)
  • order = an identifier that indicates the order in which the boundary lines should be drawn
  • region = string indicator for regions (here the states)
  • subregion = string indicator for sub-regions (here the county names)

{ggplot2} will draw the map for you with geom_polygon, provided you put longitude on the x-axis, latitude on the y-axis, and indicate the grouping variable (since it will use this and the order variable to trace the pen as it draws the map). If you do not specify group you will get an incorrect map.

ggplot() + 
  geom_polygon(
    data = oh, 
    aes(
      x = long, 
      y = lat
      ), 
    fill = "white", 
    color = "black"
    ) + 
  labs(
    title = "Woah! What happened here??"
    ) 

Well, when we ask for the borders to be sketched, we have to specify how the pen should trace the border. This is done by specifying the group variable.

ggplot() + 
  geom_polygon(
    data = oh, 
    aes(
      x = long, 
      y = lat,
      group = group
      ), 
    fill = "white", 
    color = "black"
    ) + 
  labs(
    title = "This is an improvement!!"
    ) 

Ohio looks a little “spatchcocked” (flattened), doesn’t it? We can correct that with a simple coord_fixed(...) command.

ggplot() + 
  geom_polygon(
    data = oh, 
    aes(
      x = long, 
      y = lat,
      group = group
      ), 
    fill = "white", 
    color = "black"
    ) + 
  coord_fixed(1.3) +
  labs(
    title = "Much better!!"
    ) 

Most of the times you see the maps filled with some color, and while 88 unique colors would require a lot of prep-work, we can ignore such niceties for now and just look at the code of how to fill by some variable, here subregion. We can also eliminate the latitude/longitude axis labels and text, and the default grey theme.

ggplot() + 
  geom_polygon(
    data = oh, 
    aes(
      x = long, 
      y = lat,
      group = group,
      fill = subregion
      ), 
    color = "white"
    ) + 
  coord_fixed(1.3) +
  labs(
    title = "What a work of art!!"
    ) +
  ggthemes::theme_map() + 
  theme(
    legend.position = "hide"
  )

The coord_fixed() switch specifies the aspect ratio, basically indicating the relationship between the x-axis and the y-axis. The ggplot2 documentation says: “A fixed scale coordinate system forces a specified ratio between the physical representation of data units on the axes. The ratio represents the number of units on the y-axis equivalent to one unit on the x-axis. The default, ratio = 1, ensures that one unit on the x-axis is the same length as one unit on the y-axis. Ratios higher than one make units on the y axis longer than units on the x-axis, and vice-versa.” If I set it to 1.3 that is telling R to always plot the y-axis to be 1.3 times whatever is the x-axis. Why 1.3? Because that seems to be used a lot by mapmakers working with {ggmap}. Below you see the four plots.

10.1.1.1 Labeling

Of course, there are no labels on the counties so we may want county names tossed in. We may also want to pinpoint Ohio University campuses. Let us do both next. In using the county names I know they exist in subregion but are all lowercase so we will have to capitalize the first letter of each county’s name. I’ll rely on str_to_title() from the {stringr} library.

oh %>%
  mutate(
    county = stringr::str_to_title(subregion)
    ) -> oh

I am now tossing in a small data-frame that has each Campus’ latitude and longitude, and then using geom_text_repel(), geom_label_repel(), and geom_point() to add these layers to the existing colorless map. You can geocode or reverse geocode via:

load("data/latlongs.RData")
library(ggrepel)
ggplot() + 
  geom_polygon(
    data = oh, 
    aes(
      x = long, 
      y = lat, 
      group = group
      ), 
    fill = "white", 
    color = "gray"
    ) + 
  coord_fixed(1.3)  + 
  geom_label_repel(
    data = latlongs, 
    aes(
      x = lon, 
      y = lat, 
      label = Campus
      ), 
    color = "darkblue"
    ) + 
  geom_point(
    data = latlongs, 
    aes(
      x = lon, 
      y = lat
      ), 
    size = 1, 
    color = "red", 
    alpha = 0.5
    ) + 
  ggthemes::theme_map() +
  labs(title = "With geom_label_repel()")
Adding Locations of Ohio University Campuses

Figure 10.1: Adding Locations of Ohio University Campuses


ggplot() + 
  geom_polygon(
    data = oh, 
    aes(
      x = long, 
      y = lat, 
      group = group
      ), 
    fill = "white", 
    color = "gray"
    ) + 
  coord_fixed(1.3)  + 
  geom_text_repel(
    data = latlongs, 
    aes(
      x = lon, 
      y = lat, 
      label = Campus
      ), 
    color = "darkblue"
    ) + 
  geom_point(
    data = latlongs, 
    aes(
      x = lon, 
      y = lat
      ), 
    size = 1, 
    color = "red", 
    alpha = 0.5
    ) + 
  ggthemes::theme_map() +
  labs(title = "With geom_text_repel()")
Adding Locations of Ohio University Campuses

Figure 10.2: Adding Locations of Ohio University Campuses

And now finally the map with county names superimposed, but to do this I have to create a small data-frame that has the center of each county and the county name. For the centroids I’ll use the code from the linked stackoverflow post. Note that simply taking the median or the mean of lat/long for each county will be a poor approximation of the center of each county; hence the use of the code. Also note the switch to geom_text() for labeling each centroid since we want the names falling within county boundaries as much as possible and geom_..._repel() would nudge them right/left.

library(sp)
getLabelPoint <- # Returns a county-named list of label points
function(county) {Polygon(county[c('long', 'lat')])@labpt}
centroids = by(oh, oh$county, getLabelPoint) # Returns list
centroids2 <- do.call("rbind.data.frame", centroids) # Convert to Data Frame
centroids2$county = rownames(centroids)
names(centroids2) <- c('clong', 'clat', "county") # Appropriate Header
ggplot() + 
  geom_polygon(
    data = oh, 
    aes(
      x = long, 
      y = lat, 
      group = group
      ), 
    fill = "white", 
    color = "gray"
    ) + 
  coord_fixed(1.3)  + 
  geom_text(
    data = centroids2, 
    aes(
      x = clong, 
      y = clat, 
      label = county
      ), 
    color = "darkblue", 
    size = 2.25
    ) + 
  theme_map()
Adding County Names

Figure 10.3: Adding County Names

10.1.1.2 Fill with another variable

What if we wanted to build a choropleth – whereby data values would be represented by a color scheme? For instance, we may have population density, population size or poverty for each county and want to show how any of these differ across the counties by using gradations of color. Let us read in a data-set that has overall and child poverty estimates for two periods – 2007-2011 and 2012-2016.

readxl::read_excel(
  "data/acpovertyOH.xlsx", 
  sheet = "counties"
  ) -> acpovertyOH 
colnames(acpovertyOH) = c(
  "ranking", "county", "child1216", 
  "child0711", "all1216", "all0711"
  )

I’ve cleaned up the data-frame and made sure I called the county name variable county since I will have to use it to merge these poverty estimates with the oh data used for mapping.

merge(
  oh, 
  acpovertyOH[, c(2:3)], 
  by = "county", 
  all.x = TRUE, 
  sort = FALSE
  ) -> my.df 
my.df[order(my.df$order), ] -> my.df 
ggplot() + 
  geom_polygon(
    data = my.df, 
    aes(
      x = long, 
      y = lat, 
      group = group, 
      fill = child1216
      ), 
    color = "black"
    ) + 
  coord_fixed(1.3) + 
  geom_text(
    data = centroids2, 
    aes(
      x = clong, 
      y = clat, 
      label = county
      ), 
    color = "black", 
    size = 2.25
    ) + 
  scale_fill_distiller(palette = "Spectral") + 
  labs(fill = "Child Poverty %") + 
  theme_map() +
  theme(legend.position = "top")
Using Child Poverty (% in 2012-16) to fill map

Figure 10.4: Using Child Poverty (% in 2012-16) to fill map

Maybe I could improve upon this map by creating groups of child poverty? Let us see:

ApplyQuintiles <- function(x) {
  cut(
    x, 
    breaks = c(quantile(
      my.df$child1216, probs = seq(0, 1, by = 0.20))
      ), 
    labels = c("0-20","20-40","40-60","60-80","80-100"),
    include.lowest = TRUE)
  }
my.df$grouped_poverty <- sapply(
  my.df$child1216, 
  ApplyQuintiles
  )
ggplot() + 
  geom_polygon(
    data = my.df, 
    aes(
      x = long, 
      y = lat, 
      group = group, 
      fill = grouped_poverty
      ), 
    color = "black"
    ) + 
  coord_fixed(1.3) + 
  geom_text(
    data = centroids2, 
    aes(
      x = clong, 
      y = clat, 
      label = county
      ), 
    color = "white", 
    size = 2.25
    ) + 
  scale_fill_brewer(
    palette = "Set1", 
    direction = -1
    ) + labs(
      fill = "Poverty Quntiles"
      ) + 
  theme_map() + 
  theme(legend.position = "top")

Hmmm, maybe an improvement if we want to show the quantiles but the reason I am never happy with this color map is because it attenuates the differences between counties. Take the most egregious instance I see: Mahoning and Jackson get tossed in the same bin even though the former has a child poverty rate of 20.2% and Jackson has the highest rate, 34.6%. So I suppose it depends how nuanced a narrative we want to broadcast, and how important a map is to the story. My eyes look for a spatial pattern, some sort of spatial clustering, because that is really why we map, don’t we, to look for spatial relationships? These relationships would likely jump out at us if we used Census tract data since the poor tend to live with poor folks and the wealthy distance themselves from everyone else as much as possible. Well, once we turn to building more maps we will have have a chance to play around with those data too.

What if I wanted to make a nationwide map, say county-level?

usa %>%
  ggplot() +
  geom_polygon(
    aes(
      x = long,
      y = lat, 
      group = group,
      fill = subregion
      ),
    color = "white"
    ) +
  coord_fixed(1.3) + 
  ggthemes::theme_map() + 
  theme(legend.position = "hide")

Aha! Wait a minute, wait a minute. Where is Alaska? Hawaii? They are both absent, as is Puerto Rico, and the territories. This is a major limitation of {ggmap}. Consequently, what I recommend for simple maps is to use the urbnmapr package. You have two data-frames to work with – statea and counties – to generate maps. Note that Alaska and Hawaii have been “shifted” for display purposes.

#install.packages("devtools")
#devtools::install_github("UrbanInstitute/urbnmapr")
library(urbnmapr)
names(states)
#> [1] "long"       "lat"        "order"      "hole"       "piece"      "group"     
#> [7] "state_fips" "state_abbv" "state_name"
names(counties)
#>  [1] "long"        "lat"         "order"       "hole"        "piece"      
#>  [6] "group"       "county_fips" "state_abbv"  "state_fips"  "county_name"
#> [11] "fips_class"  "state_name"
ggplot() +
  geom_polygon(
    data = states,
    aes(
      x = long,
      y = lat,
      group = group
      )
    ) +
  geom_polygon(
    data = counties,
    aes(
      x = long,
      y = lat,
      group = group,
      fill = state_abbv
      ),
    color = "grey90"
    ) +
  coord_fixed(1.3) +
  ggthemes::theme_map() +
  theme(legend.position = "hide")

10.2 Using {tigris}

One of my favorite #rstats developers is Kyle Walker because he has developed some very impressive packages to work with Census geographies and data. tigris happens to be one of the best packages to get US-centric shapefiles.

#install.packages("tigris")
#devtools::install_github('walkerke/tigris')
library(tigris)
options(tigris_use_cache = TRUE)

Say I want all the places in California. The default, without specifying the year, will be whatever boundaries are on file as the most recent batch. This is a good default unless you are really looking for a specific vintage.

places(state = "CA") -> caplaces
caplaces %>%
  ggplot() +
  geom_sf() +
  theme_void() +
  labs(title = "Places in California")

What about Census Tracts in Ohio?

tracts(state = "OH") -> ohtracts
ohtracts %>%
  ggplot() +
  geom_sf() +
  theme_void() +
  labs(title = "Census Tracts in Ohio")

School districts in Ohio, West Virginia, Kentucky, Pennsylvania, and Indiana?

school_districts(state = "OH") -> ohsds
ohsds %>%
  ggplot() +
  geom_sf() +
  theme_void() +
  labs(title = "School Districts in Ohio")

Isn’t it beautiful? The ease with which you can get a specific geography? I am always blown away by the magic every time I use {tigris}. What about roads?

roads(
  state = "OH", 
  county = "Cuyahoga"
  ) -> cuyroads
cuyroads %>%
  ggplot() +
  geom_sf(
    aes(
      color = RTTYP)
    ) +
  theme_void() +
  theme(legend.position = "top") + 
  labs(
    title = "Roads in Cuyahoga, Ohio",
    color = "")

This is all well and good but what if I want to build a map of another country? Maybe, Nepal, for example? In that case, the gadm website is a good source to manually download the files that give me the boundaries I am looking for. Here are the files for Nepal … level 0, level 1, level 2, level 3, and level 4

readRDS(
  "data/nepal/gadm36_NPL_0_sf.rds"
  ) -> nepal0
nepal0 %>%
  ggplot() +
  geom_sf() +
  labs(title = "Nepal's Country Border")

readRDS(
  "data/nepal/gadm36_NPL_1_sf.rds"
  ) -> nepal1
nepal1 %>%
  ggplot() +
  geom_sf() +
  labs(title = "Nepal's Development Regions")

readRDS(
  "data/nepal/gadm36_NPL_2_sf.rds"
  ) -> nepal2
nepal2 %>%
  ggplot() +
  geom_sf() +
  labs(title = "Nepal's Administrative Zones")

readRDS(
  "data/nepal/gadm36_NPL_3_sf.rds"
  ) -> nepal3
nepal3 %>%
  ggplot() +
  geom_sf() +
  labs(title = "Nepal's Districts")

readRDS(
  "data/nepal/gadm36_NPL_4_sf.rds"
  ) -> nepal4
nepal4 %>%
  ggplot() +
  geom_sf(
    aes(
      fill = NAME_1
    )
  ) +
  labs(
    title = "Nepal's Villages",
    fill = "Development Regions"
    )

And there you have it! Five geographies, starting from the nation to the lowest administrative region. If we had data to fill each geography with, all we would need to do is to merge/join it with these data-frames. The trick to a successful merge/join will be having a common merge key … could be the geography’s unique name, a code, whatever.

10.2.1 Using {leaflet}

Leaflet is a tremendously popular open-source JavaScript library for interactive mapping. It is mind-boggling to think about the possibilities, especially when you look at the ever-growing number of exemplars out there on the web. Here, I want to show you the basics, starting with a simple map that has tiles and markers and then moving on to adding data from data-frames. Here is the basic setup:

library(htmltools)
library(htmlwidgets)
library(leaflet) 
leaflet(width = "100%") %>%
  addTiles() %>%  # Add default OpenStreetMap map tiles
  addMarkers(
    lng = -82.107102, 
    lat = 39.319744, 
    popup = "The Voinovich School"
    ) 

I specified the coordinates for a marker that will show up on the map, and as soon as I set these, the map is built using these coordinates as the center of the map. The map tiles are coming from OpenStreetMap but you can switch tile providers.

Well, this is all well and good, but what if I have a substantive data-frame and want to map it with {leaflet}? For example, say I have data on parking tickets issues in Philadelphia. The original data-set has 1.2 million tickets but that will overwhelm our machines so I will draw a 1% random sample.

readr::read_csv(
  "https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-12-03/tickets.csv"
  ) -> phltix
set.seed(13579)
phltix %>%
  sample_frac(size = 0.01) %>%
  leaflet(width = "100%") %>%
  addTiles() %>%
  addCircles(
    lng = ~ lon,
    lat = ~ lat,
    opacity = 0.10,
    radius = 1,
    color = "darkred"
    ) 

You can zoom in and see the underlying streets, etc where tickets were issued. However, wouldn’t it be nice if you could get some details when you click on a dot?

library(htmltools)
set.seed(13579)
phltix %>%
  sample_frac(size = 0.01) %>%
  mutate(
    label1 = paste("Fine: US$", fine, sep = ""),
    label2 = paste("Date-Time:", issue_datetime, sep = " "),
    mylabel = paste(label1, label2, sep = "; ")
    ) %>%
  leaflet(width = "100%") %>%
  setView(
    lat = 39.984791, 
    lng = -75.128336,
    zoom = 11
    ) %>%
  addTiles() %>%
  addCircles(
    lng = ~ lon,
    lat = ~ lat,
    opacity = 0.05,
    radius = 1,
    color = "red",
    popup = ~ mylabel
    ) 

I tweaked the earlier map by centering and zooming via setView(...) after pulling the latitude/longitude from Google maps in a browser.

Here, each circle caries the same weight, but what if we wanted the circles to represent the zip-codes, with the radius varying by how many tickets were issued in that zip-code?

library(htmltools)
set.seed(13579)
phltix %>%
  sample_frac(size = 0.01) %>%
  mutate(
    label1 = paste("Fine: US$", fine, sep = ""),
    label2 = paste("Date-Time:", issue_datetime, sep = " "),
    mylabel = paste(label1, label2, sep = "; ")
    ) %>%
  group_by(zip_code) %>%
  summarize(
    mlat = median(lat, na.rm = TRUE),
    mlon = median(lon, na.rm = TRUE),
    ntix = n(),
    tfine = paste(
      "Total Fine Collected: US$", 
      sum(fine, na.rm = TRUE), 
      sep = ""
      )
    ) %>% 
  leaflet(width = "100%") %>%
  setView(
    lat = 39.984791, 
    lng = -75.128336,
    zoom = 11
    ) %>%
  addTiles() %>%
  addCircleMarkers(
    lng = ~ mlon,
    lat = ~ mlat,
    opacity = 0.05,
    radius = ~ sqrt(ntix),
    color = "red",
    popup = ~ htmlEscape(tfine)
    ) 

Voila! Note what we did here. First we grouped-by zip-code and then created the median latitude and median longitude for the zip-code. Simultaneously, we calculated ntix – how many tickets were issued in each zip-code. Then we mapped the resulting data-frame, making sure to set the radius of the circleMarkers to be driven by the square-root of the number of tickets. The popup gives us the total fine collected in that zip-code.

10.2.1.1 Tiles and tilesets

What are these tiles? With the rise of web-based mapping it became important to deliver maps speedily to millions of users, but that meant finding a way to not have to draw the map of a particular location every single time. Tiles were thus born to be static, uneditable images available with specific pre-built zoom levels. Once you pulled a tile you could use it as the base-map on which to add layers – points, polylines, polygons, etc. {leaflet} uses OpenStreetMap tiles by default, but you can use some other tile-sets if you want to. The full list of available tile-set providers is here but note that some require an API key, and some may not be free.

leaflet(width = "100%") %>% 
  setView(
    lng = -82.107102, 
    lat = 39.319744, 
    zoom = 9
    ) %>% 
  addProviderTiles(
    providers$Stamen.TonerLite
    ) 
leaflet(width = "100%") %>% 
  setView(
    lng = -82.107102, 
    lat = 39.319744, 
    zoom = 9
    ) %>% 
  addProviderTiles(
    providers$OpenTopoMap
    ) 
leaflet(width = "100%") %>% 
  setView(
    lng = -82.107102, 
    lat = 39.319744, 
    zoom = 9
    ) %>% 
  addProviderTiles(
    providers$Esri.NatGeoWorldMap
    ) 
leaflet(width = "100%") %>% 
  setView(
    lng = -82.107102, 
    lat = 39.319744, 
    zoom = 9
    ) %>% 
  addProviderTiles(
    providers$CartoDB
    ) 

10.2.1.2 Markers and labels

We could add markers, for example, perhaps the OU campuses that we pinned earlier. The Athens campus is the small red dot, and each dot will show a label if you hover the mouse over it.

load("data/latlongs.RData")
colorFactor(
  c("darkblue", "red"), 
  domain = c("Yes", "No")
  ) -> pal 
leaflet(
  latlongs, 
  width = "100%"
  ) %>% 
  addTiles() %>% 
  addCircleMarkers(
    radius = ~ifelse(Main == "Yes", 6, 10), 
    color = ~pal(Main), 
    stroke = FALSE, 
    fillOpacity = 0.15, 
    label = ~as.character(Campus)
    ) 

There are other markers available besides circles (one is shown below) and you can also create your own icons; see here for details.

icons(
  iconUrl = ifelse(
    latlongs$Main == "Yes",
    "http://leafletjs.com/examples/custom-icons/leaf-green.png",
    "http://leafletjs.com/examples/custom-icons/leaf-red.png"
    ),
  iconWidth = 19, 
  iconHeight = 47,
  iconAnchorX = 11, 
  iconAnchorY = 47,
  shadowUrl = "http://leafletjs.com/examples/custom-icons/leaf-shadow.png",
  shadowWidth = 25, 
  shadowHeight = 32,
  shadowAnchorX = 2, 
  shadowAnchorY = 31
  ) -> leafIcons 
leaflet(
  latlongs, 
  width = "100%"
  ) %>% 
  addTiles() %>% 
  addMarkers(
    label = ~as.character(Campus), 
    icon = leafIcons
    ) 

Labels can be tweaked as well. For example, below we allow the name to popup if you click the leaf and the popup will stay visible until the user closes it. The point here is that this popup can be tweaked to be as brief as a name or an address to other details (such as clicking on a school district and seeing some key information from its latest Report Card, for instance).

library(htmltools)
leaflet(
  latlongs, 
  width = "100%"
  ) %>% 
  addTiles() %>% 
  addMarkers(
    icon = leafIcons, 
    popup = ~htmlEscape(Campus)
    ) 

10.2.1.3 Polygons, Circles, etc

We have the total population size of the city that is home to each campus and so this could be used to draw a circle around each campus. Note that the radius argument is specifying what should be used. This is a mathematical function so you could use the square-root or any other transformation to adjust how much space the circle is going to occupy. Of course your choice will be dictated by some logic. These circles are not the same as the circle markers used earlier; those are drawn in pixels so as you zoom in or out, their size stays fixed. These circles have the specified radius (that you so kindly provided) converted and drawn in meters, and hence will change size as you zoom in or out because they scale (increase/decrease in size) with the zoom factor.

leaflet(
  latlongs, 
  width = "100%"
  ) %>% 
  addTiles() %>% 
  addCircles(
    lng = ~lon, 
    lat = ~lat, 
    weight = 1, 
    radius = ~Popn, 
    popup = ~htmlEscape(Campus)
    ) 

We could also draw polygons showing Ohio’s 613 school districts, given the shapefile. Say this is precisely what I’d like to highlight, with the fill = color of each school district dictated by its land area (ALAND) and the district name showing up with popup = ~NAME.

library(rgdal)
readOGR(
  "data/tl_2017_39_unsd/", 
  "tl_2017_39_unsd", 
  GDAL1_integer64_policy = TRUE
  ) -> sds 
leaflet(
  sds,
  width = "100%"
) %>%
  setView(
    lat = 40.419896, 
    lng = -82.906952,
    zoom = 6
  ) %>%
  addTiles() %>%
  addPolygons(
    color = "#444444", 
    weight = 1, 
    smoothFactor = 0.5,
    opacity = 0.5, 
    fillOpacity = 0.5,
    fillColor = ~colorQuantile("YlOrRd", ALAND)(ALAND),
    highlightOptions = highlightOptions(
      color = "white", 
      weight = 2,
      bringToFront = TRUE
      ), 
    popup = ~NAME
    ) 

Note that ~colorQuantile(“YlOrRd”, ALAND)(ALAND) is asking for five colors, based on the built-in YlOrRd palette, with the colors attached to ALAND quantiles.

This is not very a informative map, of course, and we could try to do better by using a different fill color for, say, the grade a district earned last year, student population size, student poverty, etc. Or we could throw some of these data into the popup. Here is a small example with a map of Moldova’s administrative areas. Here, I have combined the area’s name with its type (is it a city, a District, etc). To do this I extracted the mda@data fields as a data-frame to create mypopup; mda is a SpatialPolygonsDataFrame (Google it).

library(sp)
readRDS("data/MDA_adm1.rds") -> mda
mda@data -> mdadf 
mdadf %>%
  unite(
    "mypopup", 
    c(NAME_1, TYPE_1), 
    sep = ", Type: ", 
    remove = FALSE
    ) -> mdadf
colorFactor(
  topo.colors(4), 
  mda$ENGTYPE_1, 
  alpha = TRUE
  ) -> mypal
leaflet(
  mda, 
  width = "100%"
  ) %>%
  addTiles() %>%
  addPolygons(
    color = "#444444", 
    weight = 1, 
    smoothFactor = 0.5,
    opacity = 1.0, 
    fillOpacity = 0.5,
    fillColor = ~mypal(ENGTYPE_1),
    highlightOptions = highlightOptions(
      color = "white", 
      weight = 2,
      bringToFront = TRUE
      ), 
    popup = ~c(mdadf$mypopup)
    ) 

In the tweak that follows, we are essentially formatting three elements NAME_1, TYPE_1, and ENGTYPE_1 into a single popup, forcing a paragraph break between NAME_1 and the rest via the <p></p> piece of html code.

lapply(
  seq(nrow(mdadf)), 
  function(i) {
   paste0( '<p>', mdadf[i, "NAME_1"], '<p></p>', 
           mdadf[i, "TYPE_1"], ', i.e.,', 
           mdadf[i, "ENGTYPE_1"], '</p>')
    }
  ) -> mylabs 
leaflet(
  mda, 
  width = "100%"
  ) %>% 
  addTiles() %>%
  addPolygons(
    color = "#444444", 
    weight = 1, 
    smoothFactor = 0.5, 
    opacity = 1.0, 
    fillOpacity = 0.5,
    fillColor = ~mypal(ENGTYPE_1), 
    highlightOptions = highlightOptions(
      color = "white", 
      weight = 2,
      bringToFront = TRUE
      ), 
    popup = ~lapply(mylabs, HTML)
    ) 

Next we modify the popup to have the stems Name: and Type: in bold font-faces.

paste0(
  "<strong>Name: </strong>", 
  mdadf$NAME_1, "<br>",
  "<strong>Type: </strong>", 
  mdadf$TYPE_1, "<br>", 
  "(in English): ", mdadf$ENGTYPE_1
  ) -> polygon_popup 
leaflet(
  mda, 
  width = "100%"
  ) %>% 
  addTiles() %>%
  addPolygons(
    color = "#444444", 
    weight = 1, 
    smoothFactor = 0.5, 
    opacity = 1.0, 
    fillOpacity = 0.5,
    fillColor = ~mypal(ENGTYPE_1),
    highlightOptions = highlightOptions(
      color = "white", 
      weight = 2,
      bringToFront = TRUE
      ), 
    popup = ~lapply(
      polygon_popup, 
      HTML)
    ) %>% 
  addProviderTiles(
    providers$Stamen.TonerLite
    ) %>%
  setView(
    28.846153, 
    47.023193, 
    zoom = 6
    ) 

Here we switch the ProviderTiles to CartoDB.Positron

paste0(
  "<strong>Name: </strong>", mdadf$NAME_1, "<br>",
  "<strong>Type: </strong>", mdadf$TYPE_1, "<br>", 
  "(in English): ", mdadf$ENGTYPE_1
  ) -> polygon_popup 
leaflet(
  mda, 
  width = "100%"
  ) %>% 
  addPolygons(
    color = "#444444", 
    weight = 1, 
    smoothFactor = 0.5, 
    opacity = 1.0, 
    fillOpacity = 0.5,
    fillColor = ~mypal(ENGTYPE_1),
    highlightOptions = highlightOptions(
      color = "white", 
      weight = 2,
      bringToFront = TRUE
      ), 
    popup = ~lapply(
      polygon_popup, 
      HTML
      )
    ) %>% 
  addProviderTiles(
    "CartoDB.Positron"
    ) %>% 
  setView(
    28.846153, 
    47.023193, 
    zoom = 6
    ) 

10.2.1.4 Showing/hiding layers

{leaflet} allows you to very easily cede some control to the viewer – enabling the user to show or hide layers you have built – so long as you assign the layers to some group. For example, say we have a lot of information on Starbucks located in the USA. I’ll create smaller data-frames that contain specific content – ownership type, products offered, services offered, stations, and venue type. For ease of exposition I’ll also restrict the data-set to just Ohio.

load("data/starbucks2.RData")
starbucks2 %>%
  filter(State == "OH") %>%
  rename(
    Ownership = `Ownership Type`,
    Products = `Features - Products`,
    Service = `Features - Service`,
    Stations = `Features - Stations`
    ) -> starbucks.oh 

We have created four layers and now we create four versions of the data-set, one per layer. This is done below via base-R (so don;t let that throw you off).

starbucks.a = starbucks.oh[, c(3:4, 21:22)]
starbucks.b = starbucks.oh[, c(3, 6, 21:22)]
starbucks.c = starbucks.oh[, c(3, 7, 21:22)]
starbucks.d = starbucks.oh[, c(3, 8, 21:22)]

Having created these, now we assemble the layers and we build the map. You can toggle the layers on-and off as needed.

leaflet(
  starbucks.a, 
  width = "100%"
  ) %>% 
  addTiles() %>% 
  addMarkers(
    data = starbucks.a, 
    lng = ~ Longitude, 
    lat = ~ Latitude, 
    label = ~ Name, 
    popup = ~ Ownership
    ) %>%
  addMarkers(
    data = starbucks.b, 
    lng = ~ Longitude, 
    lat = ~ Latitude, 
    label = ~ Products, 
    group = ~ Products, 
    popup = ~ Products
    ) %>%
  addMarkers(
    data = starbucks.c, 
    lng = ~ Longitude, 
    lat = ~ Latitude, 
    label = ~ Service, 
    group = ~ Service, 
    popup = ~ Service
    ) %>%  
  addMarkers(
    data = starbucks.d, 
    lng = ~ Longitude, 
    lat = ~ Latitude, 
    label = ~ Stations, 
    group = ~ Stations, 
    popup = ~ Stations
    ) %>% 
  addLayersControl(
    baseGroups = c("OSM (default)", "Toner", "Toner Lite"),
    overlayGroups = c("Products", "Service", "Stations"),
    options = layersControlOptions(collapsed = FALSE)
    ) 

10.3 Using {mapview}

I am a big fan of the {mapview} package because it allows you to interactively plot a host of information. For example, take our Starbucks data-set. Before we use our data with {mapview}, however, we will have to convert it to a spatial object. We do this with the {sf} package, creating mystars in the process.

sf::st_as_sf(
  starbucks.oh, 
  coords = c("Longitude", "Latitude"), 
  crs = 4326
  ) -> mystars 
library(mapview)
mapview(mystars) 

If you hover your mouse cursor over a point you will see the feature identifier pop-up (this is essentially the row number in the present case). But if you click a point you will see all the columns in mystars reflected as a table on the screen. You have probably realized you can change the tiles using the radio buttons, and this is a pretty neat feature as well. Now, we could also restrict the columns to be displayed as layers in the table, as shown below.

mapview(
  mystars, 
  zcol = c("Stations", "City", "Street Address"),
  legend = FALSE
  ) 

What if we wanted to display polygons representing each county?

counties(state = "OH") -> ohcts
sf::st_as_sf(ohcts) -> octs
mapview(
  list(mystars, octs),
  layer.name = c("Starbucks", "Counties")
  ) 

10.4 Using {mapboxapi} and {mapdeck}

We will look at a single example here but you should trawl the web, starting with Kyle’s site here to see key features of {mapboxapi} in action. The simple example below shows the time needed to get to The Boston Commons, from each Census Tract in Boston (MA).

library(tigris)
tracts(
  "MA", 
  "Suffolk", 
  cb = TRUE, 
  class = "sf"
  ) -> bostracts
library(mapboxapi)
mb_geocode(
  "Boston Commons, Boston MA"
  ) -> toBC
mb_matrix(bostracts, toBC) -> bctime
bostracts$time <- bctime
library(mapdeck)
mapdeck(
  style = mapdeck_style("light"),
  location = c(-71.065683, 42.355296),
  zoom = 10
  ) %>%
  add_polygon(
    data = bostracts, 
    fill_colour = "time",
    fill_opacity = 0.3,
    legend = TRUE,
    update_view = FALSE
    ) 

How far can you walk in 5, 10, 15, 20, and 25 minutes from Faneuil Hall? Well, here we lean on isochrones. Say what?

Isochrones are shapes that represent the reachable area around one or more locations within a given travel time. Isochrones can be computed for driving, walking, or cycling routing profiles, and can optionally be set to return distances rather than times.

mb_isochrone(
  "Faneuil Hall, Boston, MA", 
  time = c(5, 10, 15, 20, 25),
  profile = "walking"
  ) -> fhdistance 
mapdeck(
  style = mapdeck_style("light"),
  location = c(-71.055050, 42.360351),
  zoom = 13
  ) %>%
  add_polygon(
    data = fhdistance, 
    fill_colour = "time",
    fill_opacity = 0.20,
    legend = TRUE,
    update_view = FALSE
    ) 

What if we were driving instead of walking? How far could we get?

mb_isochrone(
  "Faneuil Hall, Boston, MA", 
  time = c(5, 10, 15, 20, 25),
  profile = "driving"
  ) -> fhdistance 
mapdeck(
  style = mapdeck_style("light"),
  location = c(-71.055050, 42.360351),
  zoom = 10
  ) %>%
  add_polygon(
    data = fhdistance, 
    fill_colour = "time",
    fill_opacity = 0.20,
    legend = TRUE,
    update_view = FALSE
    ) 

Isochrones are not the only feature that is of value here but one that I find very useful in my work. One of the other valuable functions I use a lot is the route optimization ability. For example, what would be the optimal route to get to the Logan International Airport from Faneuil Hall?

library(sf)
mb_directions(
  origin = "Faneuil Hall, Boston, MA",
  destination = "Logan International Airport, Boston, MA",
  profile = "driving",
  steps = TRUE,
  language = "en"
  ) -> mybase_route
leaflet(mybase_route) %>%
  addMapboxTiles(
    style_id = "light-v9",
    username = "mapbox"
    ) %>%
  addPolylines() 

What if there are various sites you will be stopping at on some other trip?

library(sf)
data.frame(
    X = c(-0.209307, -0.185875, -0.216877, -0.233511, -0.234541),
    Y = c(5.556019, 5.58031, 5.582528, 5.566771, 5.550209)
  ) %>%
  st_as_sf(
    coords = c("X", "Y"), 
    crs = 4326
    ) -> to_visit
mb_optimized_route(
  to_visit,
  profile = "driving-traffic"
  ) -> optimized_route 
static_mapbox(
  style_id = "light-v9",
  username = "mapbox",
  overlay_sf = optimized_route$route,
  overlay_style = list(stroke = "#144708"),
  overlay_markers = prep_overlay_markers(
    data = optimized_route$waypoints,
    marker_type = "pin-l",
    label = optimized_route$waypoints$waypoint_index,
    color = "144708"
  ),
  height = 400,
  width = 1200
  ) 

10.5 Using {highcharter}

This is one of my favorite libraries because it makes it very easy to create interactive maps, both the usual spatial maps but also heat-maps. Let us start with a heatmap, and the example I will use is the stock example from highcharter. Say we have data on vaccines and infectious diseases

The data are contained in data(vaccines) so be sure to load it and review the contents. You have three variables – year, state, and count. The data span 1928 through 2003. fnltp is a Javascript-based functional tooltip that guides what happens when you click on a point in the graphic. The nice thing about this function is that it allows you to smoothly control how and what data will be displayed.

The plotline is essentially drawing a vertical line at 1963 to indicate the year vaccines were introduced.

library(highcharter)
data(vaccines)
JS(
  "function(){
  return this.point.x + ' ' +
  this.series.yAxis.categories[this.point.y] + ': ' +
  Highcharts.numberFormat(this.point.value, 2);
  }"
  ) -> fntltp 
list(
  color = "#fde725", 
  value = 1963, 
  width = 2, 
  zIndex = 5,
  label = list(
    text = "Vaccine Intoduced", 
    verticalAlign = "top",
    style = list(color = "#606060"), 
    textAlign = "left",
    rotation = 0, y = -5
    )
  ) -> plotline 

Now we build the actual plot via hchart(...) with the usual specifications. For example, what data to use data = vaccines, what kind of a graphic to generate via heatmap and then the aesthetics via hcaes(...) You can guide how many colors are to be used in the graphic and hence the legend by specifying the number of “stops” and the color palette.

hchart(
  vaccines, 
  "heatmap", 
  hcaes(
    x = year,
    y = state, 
    value = count
    )
  ) %>%
  hc_colorAxis(
    stops = color_stops(
      10, 
      viridisLite::viridis(10, direction = -1)),
    type = "logarithmic"
    ) %>%
  hc_yAxis(
    title = list(text = ""),
    reversed = TRUE, 
    offset = -20,
    tickLength = 0,
    gridLineWidth = 0, 
    minorGridLineWidth = 0,
    labels = list(
      style = list(fontSize = "9px")
      )
  ) %>%
  hc_tooltip(
    formatter = fntltp
    ) %>%
  hc_xAxis(
    plotLines = list(plotline)
    ) %>%
  hc_title(
    text = "Infectious Diseases and Vaccines"
    ) %>%
  hc_subtitle(
    text = "Number of cases per 100,000 people"
  ) %>% 
  hc_legend(
    layout = "horizontal",
    verticalAlign = "top",
    align = "left",
    valueDecimals = 0
  ) %>%
  hc_size(height = 1000) 

Heat-maps are well and good but what if we want a simple interactive spatial map, of Spain, for example?

hcmap(
  "https://code.highcharts.com/mapdata/countries/es/es-all.js"
  )

What is this? Just a map of the country and regions/provinces but no interactivity. Exactly. We will build from here but before we do that, let us note where this javascript file was sourced … from the main collection of maps Let us download the data so we can explore the contents. You will notice that once the file downloads, I am reading in specific elements in espana and as you work through the features you will see a total of 53 features. Each a region. So maybe we want the names to be displayed when we click on a point.

download_map_data(
  "https://code.highcharts.com/mapdata/countries/es/es-all.js"
  ) -> espana 
espana$features[[1]]$properties$name
#> [1] "Baleares"
espana$features[[2]]$properties$name
#> [1] "Valladolid"

Could we make reading the map data easier? You bet.

get_data_from_map(
  espana
  ) -> esp.df
glimpse(esp.df)
#> Rows: 53
#> Columns: 20
#> $ `hc-group`    <chr> "admin1", "admin1", "admin1", "admin1", "admin1", "admin1", …
#> $ `hc-middle-x` <dbl> 0.56, 0.49, 0.48, 0.43, 0.43, 0.43, 0.49, 0.40, 0.50, 0.41, …
#> $ `hc-middle-y` <dbl> 0.34, 0.67, 0.49, 0.52, 0.44, 0.39, 0.45, 0.34, 0.52, 0.43, …
#> $ `hc-key`      <chr> "es-pm", "es-va", "es-le", "es-me", "es-p", "es-s", "es-na",…
#> $ `hc-a2`       <chr> "PM", "VA", "LE", "ME", "PA", "CA", "NA", "CE", "CU", "VI", …
#> $ labelrank     <chr> "3", "3", "3", "9", "3", "3", "3", "9", "3", "3", "3", "3", …
#> $ hasc          <chr> "ES.PM", "ES.CL", "ES.CL", "ES.CE", "ES.CL", "ES.CB", "ES.NA…
#> $ `woe-id`      <chr> "12602088", "12602122", "12602117", "55862984", "12602118", …
#> $ fips          <chr> "SP07", "SP85", "SP85", "SP00", "SP85", "SP85", "SP88", "SP0…
#> $ `postal-code` <chr> "PM", "VA", "LE", "ME", "P", "S", "NA", "CE", "CU", "VI", "S…
#> $ name          <chr> "Baleares", "Valladolid", "León", "Melilla", "Palencia", "Ca…
#> $ country       <chr> "Spain", "Spain", "Spain", "Spain", "Spain", "Spain", "Spain…
#> $ `type-en`     <chr> "Autonomous Community", "Autonomous Community", "Autonomous …
#> $ region        <chr> "Islas Baleares", "Castilla y León", "Castilla y León", "Mel…
#> $ longitude     <chr> "2.99156", "-4.83256", "-5.75872", "-2.94015", "-4.59868", "…
#> $ `woe-name`    <chr> "Islas Baleares", "Castilla y León", "Castilla y León", "Mel…
#> $ latitude      <chr> "39.6162", "41.5715", "42.6006", "35.2934", "42.3386", "43.2…
#> $ `woe-label`   <chr> "Balearic Islands, ES, Spain", "Castille and Leon, ES, Spain…
#> $ type          <chr> "Comunidad Autónoma", "Comunidad Autónoma", "Comunidad Autón…
#> $ `alt-name`    <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …

There you have it! Okay, now let us get some data for the map. I have the 2021 population for Spain’s autonomous communities in 2021. We will build it up such that the names displayed are those of the region but if you click on a place you see the name of the autonomous community and the population size. The color axis is also being driven by the population.

readxl::read_excel(
  "data/spain-regions.xlsx"
  ) -> spain
highchart() %>%
  hc_title(
    text = "2021 Population of Spanish Communities"
    ) %>%
  hc_subtitle(
    text = "Source: Wikipedia"
    ) %>%
  hc_add_series_map(
    espana, spain,
    name = "City & Population",
    value = "population", 
    joinBy = c("name", "name"),
    dataLabels = list(
      enabled = TRUE,
      format = "{point.properties.region}"
    )
  ) %>%
  hc_colorAxis(
    stops = color_stops()
    ) %>%
  hc_legend(
    layout = "vertical",
    verticalAlign = "top",
    align = "left",
    valueDecimals = 0,
    valueSuffix = ""
    ) %>%
  hc_mapNavigation(
    enabled = TRUE
    ) %>%
  hc_size(height = 500) 

If we don’t really care about the regions, well then we could tweak this as follows.

highchart() %>%
  hc_title(
    text = "2021 Population of Spanish Communities"
    ) %>%
  hc_subtitle(
    text = "Source: Wikipedia"
    ) %>%
  hc_add_series_map(
    espana, spain,
    name = "2021 Population Size:",
    value = "population", 
    joinBy = c("name", "name"),
    dataLabels = list(
      enabled = TRUE,
      format = "{point.properties.name} ({point.properties.region})"
    )
  ) %>%
  hc_tooltip(
    useHTML = TRUE, 
    headerFormat = "",
    pointFormat = "2021 Population of {point.name} was {point.properties.population}"
    ) %>%
  hc_colorAxis(
    stops = color_stops()
    ) %>%
  hc_legend(
    layout = "vertical",
    verticalAlign = "top",
    align = "left",
    valueDecimals = 0,
    valueSuffix = ""
    ) %>%
  hc_mapNavigation(
    enabled = TRUE
    ) %>%
  hc_size(height = 500) 

Let us close with something other than a map … a donut chart of the same data! It will shows us for each region, the relative population share of each autonomous community in that region. We can follow that up with a treemap.

data_to_hierarchical(
  spain,
  c(region, name),
  population
  ) -> esp_donut
hchart(
  esp_donut, 
  type = "sunburst"
  )
hchart(
  esp_donut, 
  type = "treemap"
  )