7 Health Data with/without APIs

Over the few years the volume of health-related data available for public use has grown exponentially. Much of it comes to us from the CDC WONDER. WONDER is an acronym for Wide-ranging Online Data for Epidemiologic Research – an easy-to-use, menu-driven system that makes the information resources of the Centers for Disease Control and Prevention (CDC) available to public health professionals and the public at large. Then there is CDC’s Open Data API, the Substance Abuse and Mental Health Data Archive (SAMHDA), the County Health Rankins Data, the National Survey of Children’s Health data, the 500 Cities and PLACES Data, HealthData.gov and too many other sites to enumerate in a calendar year.

7.0.1 CDC’s Open Data API with {RSocrata}

So what data should we focus on? Let us start with the CDC's Open Data portal, not only because we have a wide array of data to explore here but also because it will force us to work with {RSocrata} – “Provides easier interaction with Socrata open data portals http://dev.socrata.com. Users can provide a ‘Socrata’ data set resource URL, or a ‘Socrata’ Open Data API (SoDA) web query, or a ‘Socrata’”human-friendly” URL, returns an R data frame. Converts dates to ‘POSIX’ format. Manages throttling by ‘Socrata’.”github source You may need to create and verify a Socrata account and create a token. You will need the token, email, and password to push more queries to the API.

#install.packages("devtools")
#devtools::install_github("Chicago/RSocrata")
library(RSocrata)
library(tidyverse)

What data-sets are available on CDC’s Open Data portal? A convenient ls.socrata(...) function will pull the metadata for us.

ls.socrata(
  "https://data.cdc.gov"
  ) -> cdc_catalog

At the time of writing this chapter the latest data-set was issued on 2021-12-29, https://data.cdc.gov/api/views/8xkx-amqh, pertaining to COVID-19 vaccinations in the United States. The catalog describe the contents as “Overall US COVID-19 Vaccine administration and vaccine equity data at county level. Data represents all vaccine partners including jurisdictional partner clinics, retail pharmacies, long-term care facilities, dialysis centers, Federal Emergency Management Agency and Health Resources and Services Administration partner sites, and federal entity facilities.” Let us grab these data by using the URL of the landing page; be patient, with 1.2 million records the file may take some time to download.

read.socrata(
  "https://data.cdc.gov/api/views/8xkx-amqh"
  ) -> cdc_df01

One of the nice things about Socrata is that if you go look at the actual landing page for these data, you will find the API documentation shows you the code for various programming languages, including R. This is fast becoming, but not yet a default feature of most organizations that provide APIs.

Note that we could have also run the following; note the slightly altered URL and the file-extension.

read.socrata(
  "https://data.cdc.gov/resource/8xkx-amqh.csv"
  ) -> cdc_df02

read.socrata(
  "https://data.cdc.gov/resource/8xkx-amqh.json",
  app_token = "is22WS120KRLaeklWNrNZgAKR",
  email     = "ani.ruhil@gmail.com",
  password  = "J@cA!SH99SCPTrs8"
  ) -> cdc_df02

That is it! Of course you could also skip this process and manually download the data but that defeats the whole purpose of the API. Again, the value lies in the fact that since these files are updated daily, weekly, etc, you can have your code set to automatically run on the same frequency, grabbing, cleaning, and analyzing the data.

7.0.2 CDC WONDER Data with {wonderapi}

What if we wanted to work with the CDC WONDER data? Well, we could use the wonderapi package to grab the data: “This package makes it easier to use the CDC Wonder API. It does so by employing hidden default query lists and look-up tables, allowing users to focus only on the variables they’re interested in obtaining, and to write queries using human readable names rather than numeric codes.” However, the package taps only five data-sets and is still in development. {wondr} will be needed as well.

# devtools::install_github("socdataR/wonderapi")
# devtools::install_github("hrbrmstr/wondr")
library(wonderapi)
library(tidyverse)
show_databases() -> cdcwonder
list(
  list(
    "And By", "Gender"
    )
  ) -> mylist
getData(
  agree = TRUE,
  "Detailed Mortality", 
  mylist
  ) -> d76

Joyce Robbins has a wonderful blog post you should read about the travails of working with CDC WONDER API calls and how she went about solving them to build the five data-sets – D10, D27, D66, D76, and D104.

Given the inherent challenges trying to write API queries for CDC WONDER my habit has been to download the data manually and then process it in R. On the bright side, these data are not updated very often so the automation power of an API call is not really being surrendered by going the manual route. Keep in mind that CDC WONDER limits how many rows of data you can pull in a single instance so it often pays to break up your download requests into reasonable piece that you then concatenate in R.

7.0.3 RWJF’s County Health Rankings Data

For over a decade now the Robert Wood Johnson Foundation has worked with researchers from the University of Wisconsin Population Health Institute on amassing a treasure trove of health data indicators. Called the County Health Rankings and Roadmaps program, this initiative compiles and releases, every year, national, state-, and county-level data on a variety of health behaviors, chronic health conditions, health outcomes, healthcare access, unmet needs, and social determinants of health.

For any given year you have access to a variety of data files. The National file summarizes how states and counties are ranked and includes the rankings but not necessarily all the details. The Analytics file includes almost 600-700 variables and provides granular data that build up into the state and county rankings. The trends file allows you to make some comparisons over time. There is a lot of documentation that accompanies each of these files, and a file that documents instances where the measures have changed over time. It is absolutely critical to start with these technical documents because the data are not always what you think they are. The biggest limitation, in terms of what is made available, is that the measures are not all drawn from the same data sources or years; some of the data in the 2021 report are from 2015-2019, others are from 2013, or 2014 and 2016, and yet others from 2010 or 2020. Ignoring this has led many analysts to jump to the conclusion that the data are all from 2020 or 2021.

There is no API so the calls will be as shown below. The Excel version of the data dictionary is also read below but the code-book that shows granular details of each measure is only available as a PDF here.

readr::read_csv(
  "https://www.countyhealthrankings.org/sites/default/files/media/document/analytic_data2021.csv",
  skip = 1
  ) -> analytic_data_2001
rio::import(
  "https://www.countyhealthrankings.org/sites/default/files/media/document/DataDictionary_2021.xlsx"
  ) -> data_documentation_2001

7.0.4 U.S. Census Populations With Bridged Race Categories

“Race bridging refers to making data collected using one set of race categories consistent with data collected using a different set of race categories, to permit estimation and comparison of race-specific statistics at a point in time or over time. More specifically, race bridging is a method used to make multiple-race and single-race data collection systems sufficiently comparable to permit estimation and analysis of race-specific statistics.

The National Center for Health Statistics (NCHS) releases bridged-race population estimates of the resident population of the United States for use in calculating vital rates. These estimates result from bridging the 31 race categories used in Census 2000 and Census 2010, as specified in the 1997 Office of Management and Budget (OMB) standards for the collection of data on race and ethnicity, to the four race categories specified in the 1977 OMB standards. Many data systems, such as vital statistics, are continuing to use the 1977 OMB standards during the transition to full implementation of the 1997 OMB standards. The bridged-race population estimates are produced under a collaborative arrangement with the U. S. Census Bureau.” Source

Besides the obvious NCHS use, these data are invaluable for creating unusual age-groups by race, hispanicity, and gender, at the state- and county-level. Why? Because you get single-year estimates that can be aggregated as needed. The most recent file, the July 1, 2020 vintage, was released in September 2021 and is shown below. The technical documentation is available here.

myfile = "https://www.cdc.gov/nchs/nvss/bridged_race/pcen_v2020_y20_sas7bdat.zip"
tempfile() -> tmp
curl::curl_download(
  myfile,
  tmp
  )
haven::read_sas(
  unz(
    tmp, 
    "pcen_v2020_y20.sas7bdat"
    )
  ) %>%
  janitor::clean_names() -> bracey20_df
unlink(tmp)
glimpse(bracey20_df)
#> Rows: 4,324,768
#> Columns: 10
#> $ age     <dbl> 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, …
#> $ hisp    <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
#> $ racesex <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
#> $ vintage <dbl> 2020, 2020, 2020, 2020, 2020, 2020, 2020, 2020, 2020, 2020, 2020, …
#> $ race4   <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
#> $ pop     <dbl> 211, 230, 256, 249, 270, 242, 260, 235, 254, 252, 259, 271, 292, 2…
#> $ year    <dbl> 2020, 2020, 2020, 2020, 2020, 2020, 2020, 2020, 2020, 2020, 2020, …
#> $ month   <dbl> 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, …
#> $ st_fips <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
#> $ co_fips <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …

Note that for hisp you have 1 = Not Hispanic and 2 = Hispanic value labels provided. For racesex you have the following:

Value Label
1 White male
2 White female
3 Black or African American male
4 Black or African American female
5 American Indian or Alaska Native male
6 American Indian or Alaska Native female
7 Asian or Pacific Islander male
8 Asian or Pacific Islander female

and for race4 you have 1 = White; 2 = Black; 3 = American Indian, Eskimo or Aleutian; 4 = Asian or Pacific Islander

Let us clean this up a bit, creating a new variable for binary_gender, and a new race_ethnicity variable.

bracey20_df %>%
  mutate(
    binary_gender = case_when(
      racesex %in% c(1, 3, 5, 7) ~ "Male",
      racesex %in% c(2, 4, 6, 8) ~ "Female"
    ),
    race_ethnicity = case_when(
      hisp == 1 & racesex %in% c(1, 2) ~ "Non-Hispanic White",
      hisp == 1 & racesex %in% c(3, 4) ~ "Non-Hispanic Black",
      hisp == 1 & racesex %in% c(5, 6) ~ "Non-Hispanic AIAN",
      hisp == 1 & racesex %in% c(7, 8) ~ "Non-Hispanic API",
      hisp == 2 ~ "Hispanic"
    ),
    st_fips = stringr::str_pad(
      st_fips, width = 2, side = "left", pad = "0"
      ),
    co_fips = stringr::str_pad(
      co_fips, width = 3, side = "left", pad = "0"
      ),
    fips = paste(
      st_fips, co_fips, sep = ""
      )
    ) -> bracey20_df
bracey20_df %>%
  select(
    st_fips, fips, age, binary_gender, race_ethnicity, pop
    ) -> brace20
glimpse(brace20)
#> Rows: 4,324,768
#> Columns: 6
#> $ st_fips        <chr> "01", "01", "01", "01", "01", "01", "01", "01", "01", "01",…
#> $ fips           <chr> "01001", "01001", "01001", "01001", "01001", "01001", "0100…
#> $ age            <dbl> 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 1…
#> $ binary_gender  <chr> "Male", "Male", "Male", "Male", "Male", "Male", "Male", "Ma…
#> $ race_ethnicity <chr> "Non-Hispanic White", "Non-Hispanic White", "Non-Hispanic W…
#> $ pop            <dbl> 211, 230, 256, 249, 270, 242, 260, 235, 254, 252, 259, 271,…

This data file is missing the state and county names but those can be easily inserted.

urbnmapr::counties %>%
  select(county_fips, state_abbv, state_name, county_name) %>%
  distinct() -> cdf
brace20 %>%
  left_join(
    cdf, by = c("fips" = "county_fips")
  ) -> brace20
glimpse(brace20)
#> Rows: 4,324,768
#> Columns: 9
#> $ st_fips        <chr> "01", "01", "01", "01", "01", "01", "01", "01", "01", "01",…
#> $ fips           <chr> "01001", "01001", "01001", "01001", "01001", "01001", "0100…
#> $ age            <dbl> 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 1…
#> $ binary_gender  <chr> "Male", "Male", "Male", "Male", "Male", "Male", "Male", "Ma…
#> $ race_ethnicity <chr> "Non-Hispanic White", "Non-Hispanic White", "Non-Hispanic W…
#> $ pop            <dbl> 211, 230, 256, 249, 270, 242, 260, 235, 254, 252, 259, 271,…
#> $ state_abbv     <chr> "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL",…
#> $ state_name     <chr> "Alabama", "Alabama", "Alabama", "Alabama", "Alabama", "Ala…
#> $ county_name    <chr> "Autauga County", "Autauga County", "Autauga County", "Auta…

7.0.5 Health Survey Data with asdfree

Anthony Damico has a treasure chest of code for working with a garden variety of survey data. Make sure to install the {lodown} package, and then the {convey} and {srvyr} packages.

#install.packages( "devtools" , repos = "http://cran.rstudio.com/" )
#devtools::install_github( "ajdamico/lodown" , dependencies = TRUE )
#install.packages(c( "convey", "srvyr", "survey" ), repos = "http://cran.rstudio.com/" )

One thing you want to pay attention to is the fact that for more recent data, web developments and design changes have led several data hosting sites’ URLs to break, resulting in code failures. If this happens you should download the data directly from the hosting agency. Let us first look at the Behavioral Risk Factor Surveillance System (BRFSS) data.

7.0.5.1 BRFSS

“The Behavioral Risk Factor Surveillance System (BRFSS) is the nation’s premier system of health-related telephone surveys that collect state data about U.S. residents regarding their health-related risk behaviors, chronic health conditions, and use of preventive services. Established in 1984 with 15 states, BRFSS now collects data in all 50 states as well as the District of Columbia and three U.S. territories. BRFSS completes more than 400,000 adult interviews each year, making it the largest continuously conducted health survey system in the world.”

Say we want the 2020 data. We start by building up the catalog of BRFSS data.

library(lodown)
get_catalog(
  "brfss" ,
  output_dir = file.path(
    path.expand( "~" ) , "BRFSS"
    ) 
  ) -> brfss_cat 
subset(
  brfss_cat , 
  year == 2020
  ) -> brfss_cat_2020
lodown(
  "brfss",
  brfss_cat_2020
  )

You should end up with a folder called BRFSS in your root directory, with the data file will you asked for. In my case, the file is 2020 main.rds and we will read it in now.

readRDS( 
  file.path(
    path.expand( "~" ),
    "BRFSS", 
    "2020 main.rds"
    )
  ) -> brfss_df
brfss_df %>%
    select( 
      'one' , 'xpsu' , 'xststr' , 'xllcpwt' , 
      'genhlth' , 'medcost' , 'xstate' , 'xage80' , 
      'nummen' , 'numadult' , 'hlthpln1' 
      ) -> brfss20

Once the file is read, the sheer size will overwhelm you, and unless you really need all the variables it is best to trim them to the ones you plan on working with. We have done this and then saved the result as brfss20. Because of the complex sampling scheme that underlies the BRFSS, you have to weight the data. This is accomplished by creating a svydesign(...) object, as shown below. All analysis will now run off this object, what I named brfss_design

library(survey)
options(
  survey.lonely.psu = "adjust"
  )
svydesign(
        ids = ~ xpsu ,
        strata = ~ xststr ,
        data = brfss20 ,
        weights = ~ xllcpwt ,
        nest = TRUE
    ) -> brfss_design

Here is Anthony’s code for updating brfss-design so you have meaningful value labels.

update( 
  brfss_design ,
  fair_or_poor_health = ifelse( 
    genhlth %in% c(1:5) , as.numeric( genhlth > 3 ) , NA 
    ) ,
    couldnt_see_doc_due_to_cost = factor( 
      medcost , 
      levels = c( 1 , 2 , 7 , 9 ) , 
      labels = c( "yes" , "no" , "dk" , "rf" )
      ) ,
  state_name = factor(
      xstate ,
      levels = c(
        1, 2, 4, 5, 6, 8, 9, 10, 11, 12, 13, 15, 16, 17, 18, 19, 20, 
        21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 
        37, 38, 39, 40, 41, 42, 44, 45, 46, 47, 48, 49, 50, 51, 53, 54, 
        55, 56, 66, 72, 78) ,
      labels = c(
        "ALABAMA", "ALASKA", "ARIZONA", "ARKANSAS", "CALIFORNIA", 
        "COLORADO", "CONNECTICUT", "DELAWARE", "DISTRICT OF COLUMBIA", 
        "FLORIDA", "GEORGIA", "HAWAII", "IDAHO", "ILLINOIS", "INDIANA",
        "IOWA", "KANSAS", "KENTUCKY", "LOUISIANA", "MAINE", "MARYLAND",
        "MASSACHUSETTS", "MICHIGAN", "MINNESOTA", "MISSISSIPPI", 
        "MISSOURI", "MONTANA", "NEBRASKA", "NEVADA", "NEW HAMPSHIRE",
        "NEW JERSEY", "NEW MEXICO", "NEW YORK", "NORTH CAROLINA", 
        "NORTH DAKOTA", "OHIO", "OKLAHOMA", "OREGON", "PENNSYLVANIA",
        "RHODE ISLAND", "SOUTH CAROLINA", "SOUTH DAKOTA", "TENNESSEE",
        "TEXAS", "UTAH", "VERMONT", "VIRGINIA", "WASHINGTON",
        "WEST VIRGINIA", "WISCONSIN", "WYOMING", "GUAM", "PUERTO RICO",
        "U.S. VIRGIN ISLANDS")
      )
  ) -> brfss_design 

Now we can generate estimates fo, for example, self-reported health status.

svymean(
  ~ couldnt_see_doc_due_to_cost , 
  brfss_design , 
  na.rm = TRUE 
  )
#>                                    mean SE
#> couldnt_see_doc_due_to_costyes 0.106090  0
#> couldnt_see_doc_due_to_costno  0.890957  0
#> couldnt_see_doc_due_to_costdk  0.002411  0
#> couldnt_see_doc_due_to_costrf  0.000542  0

I am not running the code chunk below but you should; it will generate state-level estimates of the proportion saying they could not see the doctor due to cost, by state.

svyby(
  ~ couldnt_see_doc_due_to_cost , 
  ~ state_name , 
  brfss_design , 
  svymean , 
  na.rm = TRUE 
  )

For more detailed code showing how to generate uncertainty estimates, run a regression, etc, please see here. Detailed instructions for using the {survey} and {srvyr} packages, please see here and here, respectively.

7.0.6 The USAID’s Demographic and Health Surveys (DHS)

For a couple of decades now the United States Agency for International Development (USAID) has been running periodic surveys in many developing countries. This program, known as the DHS, has now tapped over 400 surveys in 90 countries, on subjects like childhood mortality, maternal mortality, family planning, nutrition, HIV, gender equality, and more. Before you can access the data you will have to register here.

There are several ways to access the data, including downloading specific files, using Anthony’s {lodown}, with the rdhs package, or then with IPUMS I will opt for the {rdhs} route since this gives us a chance to see a new package in action. Make sure you run the set_rdhs_config(...) command specifying, at minimum, the folder you want the data saved to, and then your email and password.

#devtools::install_github("ropensci/rdhs")
library(rdhs)
# set_rdhs_config()

We start by looking at what indicators are available, and end up with a fairly detailed catalog. What I will do is pull the details for the Total fertility rate (short_name == "TFR 15-49").

dhs_indicators() %>%
  janitor::clean_names() -> dhs_cat
dhs_cat %>%
  filter(
    short_name == "TFR 15-49"
    )
#>                                                                                              definition
#> 1 Total fertility rate for the three years preceding the survey for age group 15-49 expressed per woman
#>   number_scale indicator_type measurement_type is_quick_stat short_name
#> 1            1              I             Rate             1  TFR 15-49
#>    indicator_id    level1 indicator_total_id          level2 level3      sdrid
#> 1 FE_FRTR_W_TFR Fertility                    Fertility rates  Women FEFRTRWTFR
#>   indicator_old_id   tag_ids denominator_weighted_id                      label
#> 1         19170000 74, 0, 77                         Total fertility rate 15-49
#>   indicator_order denominator quick_stat_order indicator_special1id
#> 1        11763080   Per woman               10                     
#>   denominator_unweighted_id indicator_special2id
#> 1

Say I want this measure for a few countries. How can I download the data?

dhs_data(
  tagIds = 74,
#  countryIds = c("EG", "GH"),
  breakdown = "subnational",
  surveyYearStart = 2019
  ) -> tfr_df

And voila, you have the data. I disabled the countryIds = switch to show you how to grab every sub-national estimate for any country with TFR data since 2019. For more details of how the package works please see the package vignette.

7.0.7 The National Survey on Drug Use and Health (NSDUH) Data

“The NSDUH series, formerly the National Household Survey on Drug Abuse, is the leading source of statistical information on the use of illicit drugs, alcohol, and tobacco and mental health issues in the United States. The survey tracks trends in specific substance use and mental illness measures and assesses substance use disorders and treatment for these disorders.

The population of the NSDUH series is the general civilian population aged 12 and older. Questions include age at first use, as well as lifetime, annual, and past-month use of the following drugs: alcohol, marijuana, cocaine (including crack), hallucinogens, heroin, inhalants, tobacco, pain relievers, tranquilizers, stimulants, and sedatives. The survey covers substance use treatment history and perceived need for treatment, and it includes questions from the “Diagnostic and Statistical Manual (DSM) of Mental Disorders” (DSM) that allow diagnostic criteria to be applied.

Respondents are also asked about personal and family income, health care access and coverage, illegal activities and arrest records, problems resulting from the use of drugs, and perceptions of risks. Demographic data include gender, race, age, ethnicity, educational level, employment status, income level, veteran status, household composition, and population density.” Source

At the time of writing this section the latest data available are for 2019, and so let us grab these data. Note that you get data in various formats and I will pull the R format data-set.

myurl = "https://www.datafiles.samhsa.gov/sites/default/files/field-uploads-protected/studies/NSDUH-2019/NSDUH-2019-datasets/NSDUH-2019-DS0001/NSDUH-2019-DS0001-bundles-with-study-info/NSDUH-2019-DS0001-bndl-data-r.zip"
tempfile() -> tmp
curl::curl_download(
  myurl,
  tmp
  )
load(
  unz(
    tmp,
    "NSDUH_2019.RData"
    )
  ) 
unlink(tmp)
PUF2019_100920 %>%
  janitor::clean_names() -> PUF2019_100920

With 2,741 variables it will be absolutely critical to review the code-book and other accompanying technical documentation to add value labels to select variables of interest. I will skip this for now but point out that you could also lean on Anthony’s asdfree code to grab, weight, and analyze the data. The code below is going to weight the data and create our survey design object.

library(survey)
svydesign( 
        ids = ~ verep , 
        strata = ~ vestr , 
        data = PUF2019_100920 , 
        weights = ~ analwt_c , 
        nest = TRUE 
        ) -> nsduh_df
rm(PUF2019_100920)
update(
  nsduh_df , 
  one = 1 ,
  health = factor(
    health , 
    levels = 1:5 , 
    labels = c(
      "excellent" , "very good" , "good" , "fair" , "poor" 
      )
    ) ,
  age_tried_first_cigarette = ifelse( cigtry > 99 , NA , cigtry ) ,
  age_tried_cocaine = ifelse( cocage > 99 , NA , cocage ) ,
  ever_used_marijuana = as.numeric( mjever == 1 ) ,
  county_type = factor(
    coutyp4 ,
    levels = 1:3 ,
    labels = c( "large metro" , "small metro" , "nonmetro" )
    )
  ) -> nsduh_design

Let us look at health, by itself, and then by county_type respectively.

svymean(
  ~ health , 
  nsduh_design , 
  na.rm = TRUE 
  )
#>                   mean SE
#> healthexcellent 0.2179  0
#> healthvery good 0.3603  0
#> healthgood      0.2902  0
#> healthfair      0.1092  0
#> healthpoor      0.0224  0
svyby(
  ~ health , 
  ~ county_type , 
  nsduh_design , 
  svymean , 
  na.rm = TRUE 
  )
#>             county_type healthexcellent healthvery good healthgood healthfair
#> large metro large metro          0.2322          0.3650     0.2796     0.1022
#> small metro small metro          0.2114          0.3561     0.2973     0.1144
#> nonmetro       nonmetro          0.1741          0.3501     0.3176     0.1262
#>             healthpoor se.healthexcellent se.healthvery good se.healthgood
#> large metro    0.02094           0.003483           0.004208      0.003670
#> small metro    0.02073           0.005280           0.006216      0.005918
#> nonmetro       0.03199           0.006067           0.006425      0.006923
#>             se.healthfair se.healthpoor
#> large metro      0.003264      0.001495
#> small metro      0.004901      0.001979
#> nonmetro         0.006229      0.003254