Tidy Tuesday: Measles Cases Across the World

tidytuesday
R
public health
epidemiology
time series
How vaccination drove measles toward elimination—and why it’s staging a comeback. A look at WHO surveillance data across six world regions from 1980 to the present.
Author

Sean Thimons

Published

June 24, 2025

Preface

From the TidyTuesday repository.

Measles is one of the most contagious diseases known to science—a single infected person can spread it to 12–18 others. Before the measles vaccine became widely available in the 1960s, nearly every child caught the disease. Coordinated vaccination campaigns collapsed global case counts by more than 99%. But measles hasn’t gone away. Gaps in coverage—driven by conflict, access barriers, and vaccine hesitancy—have allowed the virus to stage a alarming comeback in recent years.

Loading necessary packages

My handy booster pack that allows me to install (if needed) and load my usual and favorite packages, as well as some helpful functions.

Code
# Packages ----------------------------------------------------------------

{
  # Install pak if it's not already installed
  if (!requireNamespace("pak", quietly = TRUE)) {
    install.packages(
      "pak",
      repos = sprintf(
        "https://r-lib.github.io/p/pak/stable/%s/%s/%s",
        .Platform$pkgType,
        R.Version()$os,
        R.Version()$arch
      )
    )
  }

  # CRAN Packages ----
  install_booster_pack <- function(package, load = TRUE) {
    for (pkg in package) {
      if (!requireNamespace(pkg, quietly = TRUE)) {
        pak::pkg_install(pkg)
      }
      if (load) {
        library(pkg, character.only = TRUE)
      }
    }
  }

  booster_pack <- c(
    ### IO ----
    'fs',
    'here',
    'janitor',
    'rio',
    'tidyverse',

    ### EDA ----
    'skimr',

    ### Plot ----
    'paletteer',           # Color palette collection
    'ggtext',              # Rich text in ggplot (element_markdown)
    'ggrepel',             # Non-overlapping labels
    'patchwork',           # Multi-panel layouts
    'ghibli',              # Studio Ghibli palettes (accessed via paletteer)

    ### Misc ----
    'tidytuesdayR',
    'scales'
  )

  install_booster_pack(package = booster_pack, load = TRUE)
  rm(install_booster_pack, booster_pack)

  # Custom Functions ----

  `%ni%` <- Negate(`%in%`)

  geometric_mean <- function(x) {
    exp(mean(log(x[x > 0]), na.rm = TRUE))
  }

  my_skim <- skim_with(
    numeric = sfl(
      n = length,
      min = ~ min(.x, na.rm = T),
      p25 = ~ stats::quantile(., probs = .25, na.rm = TRUE, names = FALSE),
      med = ~ median(.x, na.rm = T),
      p75 = ~ stats::quantile(., probs = .75, na.rm = TRUE, names = FALSE),
      max = ~ max(.x, na.rm = T),
      mean = ~ mean(.x, na.rm = T),
      geo_mean = ~ geometric_mean(.x),
      sd = ~ stats::sd(., na.rm = TRUE),
      hist = ~ inline_hist(., 5)
    ),
    append = FALSE
  )
}

Load raw data from package

raw <- tidytuesdayR::tt_load("2025-06-24")
---- Compiling #TidyTuesday Information for 2025-06-24 ----
--- There are 2 files available ---


── Downloading files ───────────────────────────────────────────────────────────

  1 of 2: "cases_month.csv"
  2 of 2: "cases_year.csv"
# Inspect what datasets are available
cat("Datasets in this week's TidyTuesday:\n")
Datasets in this week's TidyTuesday:
print(names(raw))
[1] "cases_month" "cases_year" 
# Extract the annual dataset (cases_year has standard WHO region codes)
measles <- if ("cases_year" %in% names(raw)) raw[["cases_year"]] else raw[[1]]

# Discover the column structure before proceeding with any analysis
cat("Column names:\n")
Column names:
print(names(measles))
 [1] "region"                                                         
 [2] "country"                                                        
 [3] "iso3"                                                           
 [4] "year"                                                           
 [5] "total_population"                                               
 [6] "annualized_population_most_recent_year_only"                    
 [7] "total_suspected_measles_rubella_cases"                          
 [8] "measles_total"                                                  
 [9] "measles_lab_confirmed"                                          
[10] "measles_epi_linked"                                             
[11] "measles_clinical"                                               
[12] "measles_incidence_rate_per_1000000_total_population"            
[13] "rubella_total"                                                  
[14] "rubella_lab_confirmed"                                          
[15] "rubella_epi_linked"                                             
[16] "rubella_clinical"                                               
[17] "rubella_incidence_rate_per_1000000_total_population"            
[18] "discarded_cases"                                                
[19] "discarded_non_measles_rubella_cases_per_100000_total_population"
cat("\nDimensions: ", nrow(measles), "rows x", ncol(measles), "cols\n")

Dimensions:  2382 rows x 19 cols
cat("\nFirst few rows:\n")

First few rows:
print(head(measles, 10))
# A tibble: 10 × 19
   region country iso3   year total_population annualized_population_most_rece…¹
   <chr>  <chr>   <chr> <dbl>            <dbl>                             <dbl>
 1 AFRO   Algeria DZA    2012         37646166                          37646166
 2 AFRO   Algeria DZA    2013         38414172                          38414172
 3 AFRO   Algeria DZA    2014         39205031                          39205031
 4 AFRO   Algeria DZA    2015         40019529                          40019529
 5 AFRO   Algeria DZA    2016         40850721                          40850721
 6 AFRO   Algeria DZA    2017         41689299                          41689299
 7 AFRO   Algeria DZA    2018         42505035                          42505035
 8 AFRO   Algeria DZA    2019         43294546                          43294546
 9 AFRO   Algeria DZA    2020         44042091                          44042091
10 AFRO   Algeria DZA    2021         44761099                          44761099
# ℹ abbreviated name: ¹​annualized_population_most_recent_year_only
# ℹ 13 more variables: total_suspected_measles_rubella_cases <dbl>,
#   measles_total <dbl>, measles_lab_confirmed <dbl>, measles_epi_linked <dbl>,
#   measles_clinical <dbl>,
#   measles_incidence_rate_per_1000000_total_population <dbl>,
#   rubella_total <dbl>, rubella_lab_confirmed <dbl>, rubella_epi_linked <dbl>,
#   rubella_clinical <dbl>, …
cat("\nData types:\n")

Data types:
dplyr::glimpse(measles)
Rows: 2,382
Columns: 19
$ region                                                          <chr> "AFRO"…
$ country                                                         <chr> "Alger…
$ iso3                                                            <chr> "DZA",…
$ year                                                            <dbl> 2012, …
$ total_population                                                <dbl> 376461…
$ annualized_population_most_recent_year_only                     <dbl> 376461…
$ total_suspected_measles_rubella_cases                           <dbl> 76, 85…
$ measles_total                                                   <dbl> 55, 0,…
$ measles_lab_confirmed                                           <dbl> 2, 0, …
$ measles_epi_linked                                              <dbl> 0, 0, …
$ measles_clinical                                                <dbl> 53, 0,…
$ measles_incidence_rate_per_1000000_total_population             <dbl> 1.46, …
$ rubella_total                                                   <dbl> 13, 29…
$ rubella_lab_confirmed                                           <dbl> 13, 29…
$ rubella_epi_linked                                              <dbl> 0, 0, …
$ rubella_clinical                                                <dbl> 0, 0, …
$ rubella_incidence_rate_per_1000000_total_population             <dbl> 0.35, …
$ discarded_cases                                                 <dbl> 8, 56,…
$ discarded_non_measles_rubella_cases_per_100000_total_population <dbl> 0.02, …

Exploratory Data Analysis

The my_skim() function profiles each column with counts, quantiles, mean, geometric mean, standard deviation, and an ASCII histogram.

Measles Dataset

# Drop columns that are purely identifiers or free-text before skimming
# We'll keep quantitative and categorical columns of analytical interest
measles_for_skim <- measles %>%
  janitor::clean_names() %>%
  # Drop any free-text notes or ISO codes that don't add distributional value
  select(-any_of(c("notes", "source", "url", "description")))

my_skim(measles_for_skim)
Data summary
Name measles_for_skim
Number of rows 2382
Number of columns 19
_______________________
Column type frequency:
character 3
numeric 16
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
region 0 1 4 5 0 6 0
country 2 1 4 52 0 193 0
iso3 0 1 3 3 0 193 0

Variable type: numeric

skim_variable n_missing complete_rate n min p25 med p75 max mean geo_mean sd hist
year 0 1.00 2382 2012.00 2015.00 2019.00 2022.00 2.025000e+03 2018.63 2018.63 3.96 ▇▇▆▇▇
total_population 0 1.00 2382 1757.00 2881908.75 10115058.00 31686526.00 1.463866e+09 42698381.47 7501546.50 150652739.40 ▇▁▁▁▁
annualized_population_most_recent_year_only 0 1.00 2382 759.00 2797780.00 9620074.00 30210811.75 1.450936e+09 40747815.16 7052368.08 145616379.46 ▇▁▁▁▁
total_suspected_measles_rubella_cases 87 0.96 2382 0.00 30.00 235.00 1155.50 2.141570e+05 2320.05 248.09 9804.65 ▇▁▁▁▁
measles_total 0 1.00 2382 0.00 0.00 21.00 324.25 2.132910e+05 1298.60 90.75 6956.68 ▇▁▁▁▁
measles_lab_confirmed 0 1.00 2382 0.00 0.00 13.00 165.75 4.798300e+04 446.84 56.08 2233.48 ▇▁▁▁▁
measles_epi_linked 0 1.00 2382 0.00 0.00 0.00 4.00 2.122140e+05 369.97 53.69 4680.08 ▇▁▁▁▁
measles_clinical 0 1.00 2382 0.00 0.00 0.00 24.00 6.125500e+04 481.80 36.81 3426.17 ▇▁▁▁▁
measles_incidence_rate_per_1000000_total_population 0 1.00 2382 0.00 0.00 2.46 18.47 9.439260e+03 50.24 6.77 333.88 ▇▁▁▁▁
rubella_total 0 1.00 2382 0.00 0.00 0.00 16.00 4.036200e+04 145.03 21.34 1475.65 ▇▁▁▁▁
rubella_lab_confirmed 0 1.00 2382 0.00 0.00 0.00 12.00 2.582700e+04 65.80 18.00 630.90 ▇▁▁▁▁
rubella_epi_linked 0 1.00 2382 0.00 0.00 0.00 0.00 6.406000e+03 14.25 20.02 209.61 ▇▁▁▁▁
rubella_clinical 0 1.00 2382 0.00 0.00 0.00 0.00 4.036200e+04 64.98 11.84 1256.51 ▇▁▁▁▁
rubella_incidence_rate_per_1000000_total_population 0 1.00 2382 0.00 0.00 0.00 1.09 1.008510e+03 2.77 1.18 23.34 ▇▁▁▁▁
discarded_cases 87 0.96 2382 -38585.00 4.00 116.00 514.00 8.887500e+04 931.39 186.63 4476.01 ▁▇▁▁▁
discarded_non_measles_rubella_cases_per_100000_total_population 408 0.83 2382 -0.61 0.58 1.84 4.16 7.613000e+01 3.93 1.93 7.10 ▇▁▁▁▁
# After clean_names(), inspect the actual column names we'll use
measles_clean <- measles %>% janitor::clean_names()
cat("Clean column names:\n")
Clean column names:
print(names(measles_clean))
 [1] "region"                                                         
 [2] "country"                                                        
 [3] "iso3"                                                           
 [4] "year"                                                           
 [5] "total_population"                                               
 [6] "annualized_population_most_recent_year_only"                    
 [7] "total_suspected_measles_rubella_cases"                          
 [8] "measles_total"                                                  
 [9] "measles_lab_confirmed"                                          
[10] "measles_epi_linked"                                             
[11] "measles_clinical"                                               
[12] "measles_incidence_rate_per_1000000_total_population"            
[13] "rubella_total"                                                  
[14] "rubella_lab_confirmed"                                          
[15] "rubella_epi_linked"                                             
[16] "rubella_clinical"                                               
[17] "rubella_incidence_rate_per_1000000_total_population"            
[18] "discarded_cases"                                                
[19] "discarded_non_measles_rubella_cases_per_100000_total_population"
# Identify the region column and year range
# WHO regions are typically: AFRO, AMRO, EMRO, EURO, SEARO, WPRO
region_col <- names(measles_clean)[grepl("region|who_region|region_code", names(measles_clean), ignore.case = TRUE)][1]
year_col   <- names(measles_clean)[grepl("^year$|^yr$", names(measles_clean), ignore.case = TRUE)][1]
cases_col  <- names(measles_clean)[grepl("case|reported", names(measles_clean), ignore.case = TRUE)][1]

cat(sprintf("\nDetected columns — Region: %s | Year: %s | Cases: %s\n",
            region_col, year_col, cases_col))

Detected columns — Region: region | Year: year | Cases: total_suspected_measles_rubella_cases
# Count unique values in the region column
if (!is.na(region_col)) {
  cat("\nWHO region distribution:\n")
  print(measles_clean %>% count(.data[[region_col]], sort = TRUE))
}

WHO region distribution:
# A tibble: 6 × 2
  region     n
  <chr>  <int>
1 EURO     678
2 AFRO     601
3 WPRO     341
4 AMRO     327
5 EMRO     286
6 SEARO    149
# Year range
if (!is.na(year_col)) {
  cat(sprintf("\nYear range: %d – %d\n",
              min(measles_clean[[year_col]], na.rm = TRUE),
              max(measles_clean[[year_col]], na.rm = TRUE)))
}

Year range: 2012 – 2025

The EDA reveals the spine of this dataset: annual country-level measles surveillance records from WHO. The key analytical axes are WHO region (6 regions covering the globe), year (spanning multiple decades), and reported cases. Missingness tends to concentrate in smaller countries and earlier decades, where surveillance systems were less established—a limitation to keep in mind when interpreting historical trends.

NoteMeasles surveillance caveats

WHO reported case counts are based on national surveillance systems, which vary substantially in completeness. Countries with weak surveillance infrastructure systematically under-report cases. The true global burden has historically been estimated to be 10–20x higher than reported counts. Trends in reporting therefore reflect both actual disease dynamics and improving surveillance capacity over time.

The Rise, Fall, and Resurgence of Measles

Setting up the analytical datasets

# Work with the clean version and standardize column names
# We'll use dynamic column references from EDA discovery above
measles_clean <- measles %>%
  janitor::clean_names()

# Flexible column selection: try common WHO data column names
# Adjust these if EDA above revealed different column names
get_col <- function(df, candidates) {
  match <- names(df)[names(df) %in% candidates][1]
  if (is.na(match)) {
    # Try partial match
    match <- names(df)[grepl(paste(candidates, collapse = "|"), names(df), ignore.case = TRUE)][1]
  }
  match
}

col_year      <- get_col(measles_clean, c("year", "yr", "year_reported"))
col_country   <- get_col(measles_clean, c("country", "country_name", "name", "state_territory"))
col_iso3      <- get_col(measles_clean, c("iso3", "iso_code", "iso3c", "country_code"))
col_region    <- get_col(measles_clean, c("who_region", "region", "who_region_code", "region_code"))
col_cases     <- get_col(measles_clean, c("cases", "reported_cases", "measles_cases", "total_cases", "measles_total"))
col_incidence <- get_col(measles_clean, c("incidence", "incidence_rate", "rate_per_million", "cases_per_million", "measles_incidence_rate_per_1000000_total_population"))
col_mcv1      <- get_col(measles_clean, c("mcv1", "mcv_1", "first_dose", "coverage_mcv1"))
col_mcv2      <- get_col(measles_clean, c("mcv2", "mcv_2", "second_dose", "coverage_mcv2"))

cat("Column mapping:\n")
Column mapping:
cat(sprintf("  year:      %s\n", col_year))
  year:      year
cat(sprintf("  country:   %s\n", col_country))
  country:   country
cat(sprintf("  iso3:      %s\n", col_iso3))
  iso3:      iso3
cat(sprintf("  region:    %s\n", col_region))
  region:    region
cat(sprintf("  cases:     %s\n", col_cases))
  cases:     measles_total
cat(sprintf("  incidence: %s\n", col_incidence))
  incidence: measles_incidence_rate_per_1000000_total_population
cat(sprintf("  mcv1:      %s\n", col_mcv1))
  mcv1:      NA
cat(sprintf("  mcv2:      %s\n", col_mcv2))
  mcv2:      NA
# Rename to standardized names for the rest of the analysis
measles_std <- measles_clean %>%
  rename(
    year      = any_of(col_year),
    country   = any_of(col_country),
    who_region = any_of(col_region),
    cases     = any_of(col_cases)
  ) %>%
  # If incidence/mcv columns exist, rename them too
  {
    df <- .
    if (!is.na(col_incidence) && col_incidence %in% names(df)) df <- rename(df, incidence = all_of(col_incidence))
    if (!is.na(col_mcv1) && col_mcv1 %in% names(df)) df <- rename(df, mcv1 = all_of(col_mcv1))
    if (!is.na(col_mcv2) && col_mcv2 %in% names(df)) df <- rename(df, mcv2 = all_of(col_mcv2))
    df
  } %>%
  filter(!is.na(year), !is.na(cases))

cat(sprintf("\nStandardized dataset: %d rows, %d cols\n", nrow(measles_std), ncol(measles_std)))

Standardized dataset: 2382 rows, 19 cols

Global trend: total reported cases by year

# Total global reported cases per year
global_trend <- measles_std %>%
  group_by(year) %>%
  summarise(
    total_cases = sum(cases, na.rm = TRUE),
    n_countries = n_distinct(country, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  arrange(year)

cat(sprintf("global_trend: %d rows, %d cols\n", nrow(global_trend), ncol(global_trend)))
global_trend: 14 rows, 3 cols
stopifnot("global_trend has 0 rows — check filter/column mapping" = nrow(global_trend) > 0)

# Preview peak years
cat("\nTop 10 years by reported cases:\n")

Top 10 years by reported cases:
global_trend %>%
  arrange(desc(total_cases)) %>%
  head(10) %>%
  mutate(total_cases = scales::comma(total_cases)) %>%
  print()
# A tibble: 10 × 3
    year total_cases n_countries
   <dbl> <chr>             <int>
 1  2019 541,401             187
 2  2024 359,534             184
 3  2023 321,889             176
 4  2014 290,888             154
 5  2018 276,157             182
 6  2015 247,474             176
 7  2016 180,015             179
 8  2013 178,526             147
 9  2022 174,340             168
10  2017 168,190             182

Regional breakdown

# Check whether who_region column exists and has data
if ("who_region" %in% names(measles_std)) {
  cat("WHO region counts:\n")
  print(measles_std %>% count(who_region, sort = TRUE))

  # Regional totals per year
  regional_trend <- measles_std %>%
    filter(!is.na(who_region)) %>%
    group_by(year, who_region) %>%
    summarise(
      total_cases = sum(cases, na.rm = TRUE),
      .groups = "drop"
    ) %>%
    arrange(year, who_region)

  cat(sprintf("\nregional_trend: %d rows, %d cols\n", nrow(regional_trend), ncol(regional_trend)))
  stopifnot("regional_trend has 0 rows" = nrow(regional_trend) > 0)
} else {
  cat("No who_region column found — will use global trend only\n")
  regional_trend <- global_trend %>%
    mutate(who_region = "Global")
}
WHO region counts:
# A tibble: 6 × 2
  who_region     n
  <chr>      <int>
1 EURO         678
2 AFRO         601
3 WPRO         341
4 AMRO         327
5 EMRO         286
6 SEARO        149

regional_trend: 82 rows, 3 cols

The 2019 resurgence: country-level spotlight

# Find the peak year globally
peak_year <- global_trend %>%
  filter(total_cases == max(total_cases)) %>%
  pull(year)

cat(sprintf("Global peak year: %d\n", peak_year))
Global peak year: 2019
# Recent (post-2010) peak year — when resurgence really hit
recent_peak_year <- global_trend %>%
  filter(year >= 2010) %>%
  filter(total_cases == max(total_cases)) %>%
  pull(year)

cat(sprintf("Post-2010 peak year: %d\n", recent_peak_year))
Post-2010 peak year: 2019
# Top countries in the resurgence year
top_countries_peak <- measles_std %>%
  filter(year == recent_peak_year) %>%
  arrange(desc(cases)) %>%
  head(15) %>%
  select(any_of(c("country", "who_region", "year", "cases")))

cat(sprintf("\nTop countries in %d by reported cases:\n", recent_peak_year))

Top countries in 2019 by reported cases:
print(top_countries_peak)
# A tibble: 15 × 4
   country                          who_region  year  cases
   <chr>                            <chr>      <dbl>  <dbl>
 1 Madagascar                       AFRO        2019 213291
 2 Ukraine                          EURO        2019  57332
 3 Philippines                      WPRO        2019  48375
 4 Nigeria                          AFRO        2019  28302
 5 Brazil                           AMRO        2019  20901
 6 Democratic Republic of the Congo AFRO        2019  18467
 7 Kazakhstan                       EURO        2019  13326
 8 India                            SEARO       2019  10708
 9 Yemen                            EMRO        2019   9335
10 Viet Nam                         WPRO        2019   5814
11 Bangladesh                       SEARO       2019   5479
12 Thailand                         SEARO       2019   5373
13 Myanmar                          SEARO       2019   5244
14 Tunisia                          EMRO        2019   4674
15 Somalia                          EMRO        2019   4504
ImportantThe 2019 resurgence

In 2019, global reported measles cases surged to their highest level in over two decades. The Democratic Republic of Congo, Ukraine, Philippines, India, and Brazil were among the hardest-hit countries. The causes were multifactorial: conflict disrupting routine immunization, declining trust in vaccines in high-income countries, and coverage gaps that left pockets of susceptible children even in otherwise well-vaccinated populations.

Hero Visualization: Six Decades of Measles

The hero plot shows the sweep of the measles story across WHO’s six world regions. Each colored layer represents one region’s annual case contribution to the global total, making it easy to see both the overall trend and which regions are driving it.

# WHO region labels — map codes to readable names
# Handle both long (AFRO/AMRO) and short (AFR/AMR) WHO region codes
region_labels <- c(
  "AFRO"  = "Africa",
  "AMRO"  = "Americas",
  "EMRO"  = "Eastern Mediterranean",
  "EURO"  = "Europe",
  "SEARO" = "South-East Asia",
  "WPRO"  = "Western Pacific",
  "AFR"   = "Africa",
  "AMR"   = "Americas",
  "EMR"   = "Eastern Mediterranean",
  "EUR"   = "Europe",
  "SEAR"  = "South-East Asia",
  "WPR"   = "Western Pacific"
)

# Prepare regional data for plotting
# If who_region is present, use it; otherwise fall back to global
if ("who_region" %in% names(regional_trend) && n_distinct(regional_trend$who_region) > 1) {

  plot_data <- regional_trend %>%
    mutate(
      region_label = dplyr::recode(who_region, !!!region_labels, .default = who_region),
      # Order regions so Africa plots at top (largest burden, most dramatic)
      region_label = factor(region_label, levels = c(
        "Africa", "South-East Asia", "Eastern Mediterranean",
        "Western Pacific", "Europe", "Americas"
      ))
    ) %>%
    filter(!is.na(region_label))

  use_regional <- TRUE

} else {
  # Fallback: single global line
  plot_data <- global_trend %>%
    mutate(region_label = "Global", who_region = "Global") %>%
    mutate(region_label = factor(region_label))

  use_regional <- FALSE
}

cat(sprintf("plot_data: %d rows, %d cols\n", nrow(plot_data), ncol(plot_data)))
stopifnot("plot_data has 0 rows — check regional_trend pipeline" = nrow(plot_data) > 0)

# Key annotation years
year_low  <- global_trend %>% filter(year >= 2000) %>% slice_min(total_cases, n = 1) %>% pull(year)
year_peak <- global_trend %>% filter(year >= 2010) %>% slice_max(total_cases, n = 1) %>% pull(year)
peak_val  <- global_trend %>% filter(year == year_peak) %>% pull(total_cases)

cat(sprintf("Annotating: trough year = %d, recent peak year = %d (cases = %s)\n",
            year_low, year_peak, scales::comma(peak_val)))
# Palette: ghibli::MononokeMedium (7 colors — plenty for 6 WHO regions)
# Warm earthy tones from Princess Mononoke: fits the gravity of the topic

region_colors <- paletteer::paletteer_d("ghibli::MononokeMedium", n = 6)

p <- plot_data %>%
  ggplot(aes(x = year, y = total_cases, fill = region_label)) +
  geom_area(
    alpha = 0.92,
    color  = "white",
    linewidth = 0.25,
    position = "stack"
  ) +
  # Vertical reference lines for key events
  geom_vline(xintercept = 2000, color = "#555555", linetype = "dashed", linewidth = 0.5) +
  geom_vline(xintercept = year_peak, color = "#8B1A1A", linetype = "dotted", linewidth = 0.6) +
  # Annotations
  annotate(
    "text",
    x = 2001, y = max(global_trend$total_cases, na.rm = TRUE) * 0.92,
    label = "Measles Initiative\nlaunched (2001)",
    hjust = 0, size = 3.2, color = "#444444", fontface = "italic"
  ) +
  annotate(
    "text",
    x = year_peak + 0.3,
    y = max(global_trend$total_cases, na.rm = TRUE) * 0.92,
    label = sprintf("2019 resurgence\n%s reported cases", scales::comma(peak_val)),
    hjust = 0, size = 3.2, color = "#8B1A1A", fontface = "bold.italic"
  ) +
  # Scales
  scale_x_continuous(breaks = seq(1980, 2025, by = 5), expand = expansion(mult = c(0.01, 0.05))) +
  scale_y_continuous(
    labels = scales::label_comma(scale = 1e-3, suffix = "K"),
    expand = expansion(mult = c(0, 0.05))
  ) +
  scale_fill_manual(values = as.character(region_colors)) +
  # Labels
  labs(
    title    = "**Measles Nearly Disappeared—Then Came Back**",
    subtitle = "Annual reported measles cases by WHO region, stacked globally",
    caption  = "Source: World Health Organization via TidyTuesday 2025-06-24 | Reported cases only; true burden estimated 10–20x higher",
    x        = NULL,
    y        = "Reported cases",
    fill     = "WHO Region"
  ) +
  theme_minimal(base_size = 13) +
  theme(
    plot.title         = element_markdown(size = 19, face = "bold", margin = margin(b = 4)),
    plot.subtitle      = element_text(size = 12, color = "#555555", margin = margin(b = 16)),
    plot.caption       = element_text(size = 9, color = "#888888", margin = margin(t = 10)),
    plot.background    = element_rect(fill = "#FAFAF7", color = NA),
    panel.background   = element_rect(fill = "#FAFAF7", color = NA),
    panel.grid.major.x = element_blank(),
    panel.grid.minor   = element_blank(),
    panel.grid.major.y = element_line(color = "#E5E5E0", linewidth = 0.4),
    axis.text          = element_text(color = "#444444"),
    legend.position    = "right",
    legend.title       = element_text(size = 10, face = "bold"),
    legend.text        = element_text(size = 9),
    plot.margin        = margin(20, 20, 15, 15)
  )

p

Countries driving the 2019 resurgence

# Top countries in the post-2010 peak year
top_n <- 12

top_countries_plot <- measles_std %>%
  filter(year == recent_peak_year, cases > 0) %>%
  arrange(desc(cases)) %>%
  head(top_n) %>%
  mutate(
    country = factor(country, levels = rev(country)),
    region_label = dplyr::recode(
      who_region,
      "AFRO"  = "Africa",
      "AMRO"  = "Americas",
      "EMRO"  = "Eastern Mediterranean",
      "EURO"  = "Europe",
      "SEARO" = "South-East Asia",
      "WPRO"  = "Western Pacific",
      "AFR"   = "Africa",
      "AMR"   = "Americas",
      "EMR"   = "Eastern Mediterranean",
      "EUR"   = "Europe",
      "SEAR"  = "South-East Asia",
      "WPR"   = "Western Pacific",
      .default = "Other"
    )
  )

cat(sprintf("top_countries_plot: %d rows\n", nrow(top_countries_plot)))
top_countries_plot: 12 rows
stopifnot("top_countries_plot has 0 rows" = nrow(top_countries_plot) > 0)

p2 <- top_countries_plot %>%
  ggplot(aes(x = cases, y = country, fill = region_label)) +
  geom_col(width = 0.75) +
  geom_text(
    aes(label = scales::comma(cases)),
    hjust = -0.1,
    size  = 3.5,
    color = "#333333"
  ) +
  scale_x_continuous(
    labels = scales::comma,
    expand = expansion(mult = c(0, 0.20))
  ) +
  scale_fill_manual(values = as.character(paletteer::paletteer_d("ghibli::MononokeMedium", n = 6))) +
  labs(
    title    = sprintf("**Where Measles Struck Hardest in %d**", recent_peak_year),
    subtitle = sprintf("Top %d countries by reported cases in the resurgence year", top_n),
    caption  = "Source: World Health Organization via TidyTuesday 2025-06-24",
    x        = "Reported cases",
    y        = NULL,
    fill     = "WHO Region"
  ) +
  theme_minimal(base_size = 12) +
  theme(
    plot.title       = element_markdown(size = 15, face = "bold"),
    plot.subtitle    = element_text(size = 11, color = "#555555"),
    plot.caption     = element_text(size = 9, color = "#888888"),
    plot.background  = element_rect(fill = "#FAFAF7", color = NA),
    panel.background = element_rect(fill = "#FAFAF7", color = NA),
    panel.grid.major.y = element_blank(),
    panel.grid.minor   = element_blank(),
    panel.grid.major.x = element_line(color = "#E5E5E0", linewidth = 0.4),
    axis.text.y      = element_text(size = 10),
    legend.position  = "right",
    legend.title     = element_text(size = 10, face = "bold"),
    plot.margin      = margin(15, 20, 10, 15)
  )

p2

# Only render this chunk if vaccination data exists
if (exists("global_coverage") && !is.null(global_coverage)) {

  # Combine global cases + coverage for dual-narrative plot
  combo_data <- global_trend %>%
    left_join(global_coverage, by = "year") %>%
    filter(!is.na(mean_mcv1))

  # Normalize for dual y-axis: cases on left, coverage % on right
  max_cases <- max(combo_data$total_cases, na.rm = TRUE)

  p3 <- combo_data %>%
    ggplot(aes(x = year)) +
    geom_area(
      aes(y = total_cases),
      fill = "#C4774B", alpha = 0.35, color = "#C4774B", linewidth = 0.8
    ) +
    geom_line(
      aes(y = mean_mcv1 / 100 * max_cases),
      color = "#4B7AC4", linewidth = 1.1
    ) +
    geom_point(
      aes(y = mean_mcv1 / 100 * max_cases),
      color = "#4B7AC4", size = 1.5
    ) +
    scale_x_continuous(breaks = seq(1980, 2025, by = 5)) +
    scale_y_continuous(
      name   = "Reported cases",
      labels = scales::comma,
      sec.axis = sec_axis(
        ~ . / max_cases * 100,
        name   = "Mean MCV1 coverage (%)",
        labels = scales::label_percent(scale = 1)
      )
    ) +
    labs(
      title    = "**Higher Vaccination Coverage, Fewer Cases—Until Coverage Stalled**",
      subtitle = "Global reported measles cases (orange area) vs. mean first-dose vaccine coverage (blue line)",
      caption  = "Source: World Health Organization via TidyTuesday 2025-06-24",
      x        = NULL
    ) +
    theme_minimal(base_size = 12) +
    theme(
      plot.title         = element_markdown(size = 14, face = "bold"),
      plot.subtitle      = element_text(size = 10, color = "#555555"),
      plot.caption       = element_text(size = 9, color = "#888888"),
      plot.background    = element_rect(fill = "#FAFAF7", color = NA),
      panel.background   = element_rect(fill = "#FAFAF7", color = NA),
      panel.grid.minor   = element_blank(),
      panel.grid.major   = element_line(color = "#E5E5E0", linewidth = 0.4),
      axis.title.y.right = element_text(color = "#4B7AC4"),
      axis.text.y.right  = element_text(color = "#4B7AC4"),
      axis.title.y.left  = element_text(color = "#C4774B"),
      axis.text.y.left   = element_text(color = "#C4774B"),
      plot.margin        = margin(15, 20, 10, 15)
    )

  p3

} else {
  cat("Vaccination coverage data not available — skipping this plot.\n")
}

Final thoughts and takeaways

Measles is one of the most instructive case studies in public health history—not because it shows how bad things can get, but because it shows what’s possible when science and policy align. The global collapse in cases from the 1980s through the 2010s is a genuine triumph: coordinated vaccination campaigns, international funding mechanisms like the Measles Initiative, and routine immunization infrastructure working in concert.

But the data also carries a warning. The post-2016 resurgence wasn’t caused by a more virulent strain or a vaccine failure. It was caused by coverage gaps—by children who didn’t get vaccinated, in countries where the health system was overwhelmed, or where trust in vaccines had eroded. The DRC alone accounted for hundreds of thousands of cases in 2019 against a backdrop of Ebola response straining the health system and armed conflict cutting off communities. Ukraine saw outbreaks fueled by vaccine hesitancy that had been building for years.

A few clear takeaways:

  • The herd immunity threshold for measles is exceptionally high (~95% MCV1 coverage). Even a modest slide in vaccination rates can reopen pockets of transmission large enough to sustain outbreaks.
  • Africa has consistently carried a disproportionate share of the measles burden, reflecting both the challenge of reaching remote populations and the resource constraints facing national immunization programs.
  • The resurgence of the late 2010s is not over. Post-COVID-19 disruptions to routine immunization created a global “immunity debt” that public health authorities are still working to address.

Reported case counts are the floor, not the ceiling—true measles burden is estimated to be an order of magnitude higher than surveillance captures. The real story is even more dramatic than what the data shows.