8 Working with Census Data in R

If you have to work with Census data, you can either explore Census data via data.census.gov, tap other Census Bureau resources to find and download the data you need, or rely upon some R packages (the more efficient option). Let us see some of these packages in action, starting with censusapi, tidycensus, and ipumsr.

8.1 Using {censusapi}

Authored by Hannah Recht, this package allows you to access pretty much all Census products via the API. Of course, we will have to get and save our Census API key, and then install the package and the key as shown below. The Census API key is easily obtained from here.

install.packages("devtools")
devtools::install_github("hrecht/censusapi")
Sys.setenv(CENSUS_KEY = YOURKEYHERE)
readRenviron("~/.Renviron")
Sys.getenv("CENSUS_KEY")

Now we can get to work. Let us start by loading the library and then seeing what Census APIs are available.

library(censusapi)
listCensusApis() -> apis

Each API could have slightly varying parameters so Hannah recommends checking the API documentation available here but points out some common parameters – name (the API’s name), vintage (the year of the dataset), vars (the variables you want to access), and region (state, county, etc).

It is easy to find variable names via the built-in function listCensusMetadata. The bureau’s small area health insurance estimates (SAHIE) are shown below, as well as the small area income and poverty estimates (SAIPE).

listCensusMetadata(
  name = "timeseries/healthins/sahie", 
  type = "variables"
  ) -> sahie_vars
listCensusMetadata(
  name = "timeseries/poverty/saipe",
  type = "variables"
  ) -> saipe_vars 

Here are the SAHIE variables

and here are the SAIPE variables.

If you want to see what geographies are available, switch type = to geography:

listCensusMetadata(
  name = "timeseries/healthins/sahie",
  type = "geography"
  ) -> sahie_geos 
listCensusMetadata(
  name = "timeseries/poverty/saipe",
  type = "geography"
  ) -> saipe_geos

Similarly, here are the SAHIE geographies

and here are the SAIPE geographies.

Let us get the most recent county-level data, noting that the latest SAHIE are for 2015 but the latest SAIPE are for 2016.

getCensus(
  name = "timeseries/healthins/sahie",
  vars = c("NAME", "IPRCAT", "IPR_DESC", "PCTUI_PT"),
  region = "county:*", 
  regionin = "state:39", 
  time = 2019, 
  key = CENSUS_KEY
  ) -> sahie_counties
getCensus(
  name = "timeseries/poverty/saipe",
  vars = c("NAME", "SAEPOVRTALL_PT", "SAEMHI_PT"), 
  region = "county:*", 
  regionin = "state:39", 
  YEAR = 2020, 
  key = CENSUS_KEY
  ) -> saipe_counties

Almost everything is read-in as a character (see the chr flag in the data frame) so we’ll have to convert the variables we want to numeric.

library(tidyverse)
saipe_counties %>%
  mutate(
    prate = as.numeric(SAEPOVRTALL_PT),
    mdhinc = as.numeric(SAEMHI_PT)
    ) -> saipe_counties

If you want data for a lot of geographies, the package will let you do it in seconds. For example, if you want tract-level data for all tracts, you can get it as shown below:

fips
tracts <- NULL
for (f in fips) {
    stateget <- paste("state:", f, sep = "")
    temp <- getCensus(name = "sf3", vintage  = 1990,
    vars = c("P0070001", "P0070002", "P114A001"), 
    region = "tract:*",
    regionin = stateget, key = CENSUS_KEY)
    tracts <- rbind(tracts, temp)
    }
head(tracts)

Notice that you had to specify the region, and since tracts are nested within states, you had to add regionin = stateget.

The 2010 Decennial Census summary file 1 requires you to specify a state and county to retrieve block-level data. Use region to request block level data, and regionin to specify the desired state and county.

Here we run it for Athens County, Ohio.

getCensus(
  name = "dec/sf1",
  vintage = 2010,
  vars = c("P001001", "P003001"), 
  region = "tract:*",
  regionin = "state:39",
  key = CENSUS_KEY
  ) -> oh2010sf1a

For the 2000 Decennial Census summary file 1, tract is also required to retrieve block-level data. This example requests data for all blocks within Census tract 010000 in county 027 of state 36.

getCensus(
  name = "sf1",
  vintage = 2000,
  vars = c("P001001", "P003001"), 
  region = "block:*",
  regionin = "state:39 + county:009 + tract:972600",
  key = CENSUS_KEY
  ) -> oh2000sf1a

There is a lot more that you can do with the package, and to get a sense of the possible, look at the voluminous example list.

8.2 Using {tidycensus}

The package’s author, Kyle Walker, describes it thus:

tidycensus is an R package that allows users to interface with the US Census Bureau’s decennial Census and five-year American Community APIs and return tidyverse-ready data frames, optionally with simple feature geometry included.

Install the Census API key as shown below.

library(tidyverse)
install.packages("tidycensus")
library(tidycensus)
census_api_key(
  "YOUR API KEY GOES HERE", 
  install = TRUE
  )

The install = TRUE switch will add the key to your .Renviron. The key should then be automatically read in future sessions with {tidycensus} so be sure to not use that switch for subsequent runs. Otherwise you will be repeatedly asked if you want to overwrite the previous setting in .Renviron.

The two Census products of most value to most of us will either be the decennial censuses or then the American Community Surveys (ACS). One can choose the product via get_decennial() or get_acs() as shown below. We start with a simple look at the variables available to us in the 2015-2019 ACS (the default ACS product for the current version of the package).

load_variables(
  year = 2017, 
  dataset = "acs5", 
  cache = TRUE
  ) -> my.vars.acs5

Let us look at the variables in the database.

DT::datatable(my.vars.acs5)