Tidy Tuesday: Allrecipes

tidytuesday
R
food
nutrition
cuisine
visualization
What does the macronutrient composition of recipes reveal about culinary traditions? Analyzing 2,200+ cuisine-tagged Allrecipes dishes to map the fat-carb-protein fingerprint of 35 global food cultures.
Author

Sean Thimons

Published

September 16, 2025

Preface

From the TidyTuesday repository.

This week’s dataset features recipe data from Allrecipes.com, sourced through the tastyR R package. Two complementary tables are provided: all_recipes — 14,426 recipes with nutritional facts (calories, fat, carbs, protein), cooking times, ratings, and review counts — and cuisines — 2,218 recipes tagged by country or regional origin with the same nutritional and timing fields. Together they offer a rare window into how culinary traditions differ across macronutrient composition, cooking effort, and community reception.

Suggested questions from the dataset authors:

  • Which authors are most prolific, and do top recipe creators achieve higher ratings?
  • Is there a correlation between preparation time and average rating?
  • Which cuisines receive the highest average ratings and review engagement?
  • Which recipes are most “actionable” — high ratings with minimal total prep and cook time?

Loading necessary packages

My handy booster pack that allows me to install (if needed) and load my usual and favorite packages, as well as some helpful functions.

Code
# Packages ----------------------------------------------------------------

{
  # Install pak if it's not already installed
  if (!requireNamespace("pak", quietly = TRUE)) {
    install.packages(
      "pak",
      repos = sprintf(
        "https://r-lib.github.io/p/pak/stable/%s/%s/%s",
        .Platform$pkgType,
        R.Version()$os,
        R.Version()$arch
      )
    )
  }

  # CRAN Packages ----
  install_booster_pack <- function(package, load = TRUE) {
    for (pkg in package) {
      if (!requireNamespace(pkg, quietly = TRUE)) {
        pak::pkg_install(pkg)
      }
      if (load) {
        library(pkg, character.only = TRUE)
      }
    }
  }

  booster_pack <- c(
    ### IO ----
    'fs',
    'here',
    'janitor',
    'rio',
    'tidyverse',

    ### EDA ----
    'skimr',

    ### Plot ----
    'paletteer',           # Color palette collection
    'ggtext',              # Rich text in ggplot (markdown titles/labels)
    'ggrepel',             # Non-overlapping labels

    ### Misc ----
    'tidytuesdayR'
  )

  install_booster_pack(package = booster_pack, load = TRUE)
  rm(install_booster_pack, booster_pack)

  # Custom Functions ----

  `%ni%` <- Negate(`%in%`)

  geometric_mean <- function(x) {
    exp(mean(log(x[x > 0]), na.rm = TRUE))
  }

  my_skim <- skim_with(
    numeric = sfl(
      n = length,
      min = ~ min(.x, na.rm = T),
      p25 = ~ stats::quantile(., probs = .25, na.rm = TRUE, names = FALSE),
      med = ~ median(.x, na.rm = T),
      p75 = ~ stats::quantile(., probs = .75, na.rm = TRUE, names = FALSE),
      max = ~ max(.x, na.rm = T),
      mean = ~ mean(.x, na.rm = T),
      geo_mean = ~ geometric_mean(.x),
      sd = ~ stats::sd(., na.rm = TRUE),
      hist = ~ inline_hist(., 5)
    ),
    append = FALSE
  )
}

Load raw data from package

raw <- tidytuesdayR::tt_load("2025-09-16")

all_recipes <- raw$all_recipes %>% janitor::clean_names()
cuisines    <- raw$cuisines    %>% janitor::clean_names()

Exploratory Data Analysis

The my_skim() function profiles each dataframe with count, min, quartiles, mean, geometric mean, standard deviation, and an ASCII histogram.

all_recipes

all_recipes %>%
  dplyr::select(-name, -url, -author, -ingredients) %>%
  my_skim()
Data summary
Name Piped data
Number of rows 14426
Number of columns 12
_______________________
Column type frequency:
Date 1
numeric 11
________________________
Group variables None

Variable type: Date

skim_variable n_missing complete_rate min max median n_unique
date_published 0 1 2004-11-24 2025-07-31 2024-06-12 1542

Variable type: numeric

skim_variable n_missing complete_rate n min p25 med p75 max mean geo_mean sd hist
calories 200 0.99 14426 1 181.0 307.0 453.0 9538 344.88 271.76 250.02 ▇▁▁▁▁
fat 356 0.98 14426 0 7.0 14.0 24.0 612 17.84 13.38 16.68 ▇▁▁▁▁
carbs 214 0.99 14426 0 14.0 29.0 45.0 746 32.86 23.05 27.59 ▇▁▁▁▁
protein 248 0.98 14426 0 3.0 8.0 22.0 939 14.42 8.70 17.53 ▇▁▁▁▁
avg_rating 972 0.93 14426 1 4.4 4.6 4.8 5 4.53 4.50 0.41 ▁▁▁▂▇
total_ratings 972 0.93 14426 1 5.0 26.0 112.0 997 102.62 24.25 172.98 ▇▁▁▁▁
reviews 1073 0.93 14426 1 5.0 24.0 100.0 999 94.42 22.75 164.52 ▇▁▁▁▁
prep_time 0 1.00 14426 0 10.0 15.0 20.0 2160 17.35 14.94 24.87 ▇▁▁▁▁
cook_time 0 1.00 14426 0 10.0 20.0 45.0 4325 42.52 28.26 96.82 ▇▁▁▁▁
total_time 0 1.00 14426 0 30.0 55.0 100.0 60485 144.07 61.06 874.25 ▇▁▁▁▁
servings 21 1.00 14426 1 4.0 8.0 12.0 300 11.03 7.87 13.03 ▇▁▁▁▁

A few things stand out immediately. Ratings cluster high — the median avg_rating is 4.6 out of 5.0, with the 25th percentile already at 4.4. This is typical of self-selected recipe sites: unpopular recipes disappear, survivors skew positive. Review counts are highly right-skewed: the median recipe has just 26 ratings but the mean is 103, pulled up by a long tail of viral hits. Cooking times show enormous spread — median total time is 55 minutes, but the 90th percentile is 4+ hours, with slow-cooker braises and overnight doughs pulling the tail out.

cuisines

cuisines %>%
  dplyr::select(-name, -url, -author, -ingredients) %>%
  my_skim()
Data summary
Name Piped data
Number of rows 2218
Number of columns 13
_______________________
Column type frequency:
character 1
Date 1
numeric 11
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
country 0 1 4 28 0 49 0

Variable type: Date

skim_variable n_missing complete_rate min max median n_unique
date_published 0 1 2009-02-09 2025-07-29 2024-07-14 751

Variable type: numeric

skim_variable n_missing complete_rate n min p25 med p75 max mean geo_mean sd hist
calories 32 0.99 2218 3 190.0 319.5 477.0 2266 358.41 276.11 240.04 ▇▃▁▁▁
fat 55 0.98 2218 0 7.0 15.0 26.0 225 18.76 13.62 16.96 ▇▁▁▁▁
carbs 35 0.98 2218 1 13.0 26.0 45.0 264 31.96 21.83 26.06 ▇▂▁▁▁
protein 39 0.98 2218 0 4.0 11.0 25.0 159 16.61 10.11 16.30 ▇▁▁▁▁
avg_rating 97 0.96 2218 1 4.3 4.6 4.8 5 4.51 4.49 0.40 ▁▁▁▂▇
total_ratings 97 0.96 2218 1 6.0 24.0 87.0 997 85.25 23.18 148.24 ▇▁▁▁▁
reviews 108 0.95 2218 1 6.0 21.0 74.0 975 76.93 20.87 142.07 ▇▁▁▁▁
prep_time 0 1.00 2218 0 10.0 15.0 25.0 1800 21.50 16.61 60.72 ▇▁▁▁▁
cook_time 0 1.00 2218 0 10.0 25.0 45.0 600 41.75 29.59 63.18 ▇▁▁▁▁
total_time 0 1.00 2218 0 35.0 60.0 120.0 14440 170.98 70.86 641.73 ▇▁▁▁▁
servings 2 1.00 2218 1 4.0 8.0 12.0 240 10.48 7.48 13.42 ▇▁▁▁▁

The cuisines table mirrors the all_recipes structure but adds the country column. It spans 49 distinct cuisine labels with roughly 25–67 recipes each. Nutritional distributions look similar to the main table — median calories around 300, fat heavy relative to protein. The avg_rating distribution here is slightly higher (median 4.6, mean closer to 4.5 after accounting for NA), consistent with the general recipes dataset.

cuisines %>%
  dplyr::count(country, sort = TRUE) %>%
  print(n = 49)
# A tibble: 49 × 2
   country                          n
   <chr>                        <int>
 1 Brazilian                       67
 2 Canadian                        67
 3 Filipino                        66
 4 Australian and New Zealander    65
 5 Chinese                         65
 6 Cuban                           65
 7 French                          65
 8 Indian                          65
 9 Russian                         65
10 Italian                         64
11 Cajun and Creole                63
12 Japanese                        63
13 Soul Food                       63
14 German                          62
15 Greek                           62
16 Thai                            62
17 Vietnamese                      62
18 Amish and Mennonite             61
19 Jewish                          61
20 Polish                          61
21 Spanish                         61
22 Puerto Rican                    60
23 Korean                          56
24 Tex-Mex                         55
25 Portuguese                      53
26 Lebanese                        51
27 Southern Recipes                50
28 Persian                         45
29 Jamaican                        43
30 Peruvian                        38
31 Scandinavian                    38
32 Turkish                         36
33 Danish                          33
34 Swedish                         31
35 Argentinian                     30
36 Norwegian                       26
37 Pakistani                       25
38 Indonesian                      24
39 Malaysian                       24
40 Israeli                         23
41 Austrian                        22
42 Chilean                         22
43 Dutch                           22
44 South African                   19
45 Finnish                         18
46 Bangladeshi                     12
47 Colombian                       11
48 Swiss                           10
49 Belgian                          6

The 49 cuisines are roughly balanced, most with 50–67 recipes. Smaller samples appear toward the bottom (Israeli: 23, Nigerian: 14). For the main analysis I’ll restrict to cuisines with at least 30 recipes to ensure stable medians.

Macronutrient Fingerprints of Global Cuisines

The central question: does cuisine identity translate into nutritionally distinct recipes? Each culinary tradition carries embedded constraints — available ingredients, historical influences, climate, and cultural preferences around protein sources, starchy staples, and cooking fats. If those patterns are real, they should appear in the macronutrient composition of typical dishes.

To test this, I convert raw grams of fat, carbohydrates, and protein into caloric contributions using standard Atwater factors (fat: 9 kcal/g; carbohydrate and protein: 4 kcal/g each), then express each macro as a share of total macro-derived calories per recipe. Taking the median within each cuisine gives a stable central tendency that resists individual outliers (e.g., one unusually rich French pastry won’t define French cuisine).

# Filter to cuisines with at least 30 recipes
top_cuisines <- cuisines %>%
  dplyr::count(country, sort = TRUE) %>%
  dplyr::filter(n >= 30) %>%
  dplyr::pull(country)

# Compute macro caloric percentages per recipe
macro_by_recipe <- cuisines %>%
  dplyr::filter(country %in% top_cuisines) %>%
  dplyr::filter(
    !is.na(fat), !is.na(carbs), !is.na(protein),
    !is.na(calories), calories > 50, calories < 2500
  ) %>%
  dplyr::mutate(
    cal_from_fat     = fat * 9,
    cal_from_carb    = carbs * 4,
    cal_from_protein = protein * 4,
    cal_total        = cal_from_fat + cal_from_carb + cal_from_protein
  ) %>%
  dplyr::filter(cal_total > 0) %>%
  dplyr::mutate(
    pct_fat     = cal_from_fat     / cal_total * 100,
    pct_carb    = cal_from_carb    / cal_total * 100,
    pct_protein = cal_from_protein / cal_total * 100
  )

cat(sprintf("Recipes after filtering: %d across %d cuisines\n",
    nrow(macro_by_recipe), length(unique(macro_by_recipe$country))))
stopifnot("Zero-row filtered dataset" = nrow(macro_by_recipe) > 0)

# Sanity check: proportions should vary across cuisines
if (length(unique(round(macro_by_recipe$pct_protein, 1))) == 1) {
  warning("All protein percentages identical — check grouping logic")
}

# Aggregate to cuisine-level medians
cuisine_macros <- macro_by_recipe %>%
  dplyr::group_by(country) %>%
  dplyr::summarise(
    n               = dplyr::n(),
    med_pct_fat     = median(pct_fat),
    med_pct_carb    = median(pct_carb),
    med_pct_protein = median(pct_protein),
    med_calories    = median(calories),
    .groups         = "drop"
  ) %>%
  dplyr::arrange(dplyr::desc(med_pct_protein))

cat(sprintf("Cuisine-level rows: %d\n", nrow(cuisine_macros)))
stopifnot("Zero cuisine rows" = nrow(cuisine_macros) > 0)
# Global medians across all qualifying recipes
global_fat  <- median(macro_by_recipe$pct_fat)
global_carb <- median(macro_by_recipe$pct_carb)
global_prot <- median(macro_by_recipe$pct_protein)

cat(sprintf(
  "Global macro medians:\n  Fat: %.1f%%\n  Carbohydrate: %.1f%%\n  Protein: %.1f%%\n",
  global_fat, global_carb, global_prot
))
Global macro medians:
  Fat: 43.9%
  Carbohydrate: 37.9%
  Protein: 14.9%
# Top and bottom by protein
cuisine_macros %>%
  dplyr::select(country, n, med_pct_fat, med_pct_carb, med_pct_protein, med_calories) %>%
  dplyr::mutate(dplyr::across(dplyr::starts_with("med_pct"), ~ round(.x, 1))) %>%
  print(n = 35)
# A tibble: 35 × 6
   country               n med_pct_fat med_pct_carb med_pct_protein med_calories
   <chr>             <int>       <dbl>        <dbl>           <dbl>        <dbl>
 1 Jamaican             38        39           28.9            26.4         346.
 2 Cuban                63        46.4         28.4            24.2         301 
 3 Chinese              63        37.2         36.4            24           319 
 4 Vietnamese           59        38.5         39.9            22.3         411 
 5 Tex-Mex              50        44.7         29.8            22.1         394 
 6 Cajun and Creole     59        55.6         19.3            21.4         480 
 7 Indian               61        46.9         31.7            20.3         299 
 8 Peruvian             35        37.8         38.2            19.4         401 
 9 Soul Food            62        49.5         28.9            19.2         382.
10 Filipino             62        47.6         32.3            19           344.
11 Portuguese           50        41.8         36.1            18.3         397 
12 Italian              61        45.1         32              17.7         458 
13 Spanish              52        40.6         36.7            17.1         354.
14 Turkish              32        46.6         39.4            16.6         394.
15 Korean               52        43.2         31.6            16.3         312.
16 Canadian             66        41.6         43.7            16.1         344 
17 Greek                56        55.9         26.7            14.9         340.
18 Thai                 59        45.8         34.9            14.8         383 
19 Japanese             54        38.6         39              14.8         290.
20 Russian              64        43.4         40.9            14.3         294.
21 Polish               59        48.4         39.1            14.2         356 
22 Persian              40        43.4         42.6            13.7         328 
23 Lebanese             44        44           43.3            13.3         267 
24 French               64        53.6         28.2            13           329 
25 German               57        42.9         37.3            12.5         349 
26 Puerto Rican         56        38.2         43.9            12.2         386.
27 Southern Recipes     48        44.2         42.5            11.7         454 
28 Brazilian            63        42           47.6            10.2         303 
29 Jewish               61        41.5         43.4            10.1         278 
30 Swedish              30        47.7         44.8             9.6         286.
31 Danish               30        44.6         47.2             8.8         260.
32 Argentinian          29        54.4         36.2             8.8         266 
33 Scandinavian         32        47.7         45.1             8.3         290.
34 Amish and Mennon…    59        38           52.5             7.2         252 
35 Australian and N…    62        41.3         53.2             4.9         216.
Note

A note on methodology. These figures represent the median recipe in each cuisine category, not the average diet of people in those countries. Allrecipes skews toward home cooking, special occasion dishes, and recipes submitted by users in English-speaking markets. The data reflects what Allrecipes users associate with each culinary label — a useful proxy, but not a direct measure of traditional foodways.

The spread is striking. Jamaican, Cuban, and Chinese recipes sit at the top for protein share (26%, 24%, and 24% respectively), driven by meat-forward dishes like jerk chicken, ropa vieja, and stir-fries. At the other extreme, Australian/New Zealand and Amish/Mennonite cuisines are the most carbohydrate-dominant (53% of calories from carbs), reflecting traditions rich in baked goods, pies, and grain-based staples. Cajun/Creole and Greek recipes stand out as the most fat-heavy, consistent with generous use of butter, cream, olive oil, and rich cuts of meat.

Hero Visualization

# Build factor order: cuisines sorted by protein % (ascending so top protein = top of chart)
protein_order <- cuisine_macros %>%
  dplyr::arrange(med_pct_protein) %>%
  dplyr::pull(country)

# Pivot to long format for stacked bar
macro_long <- cuisine_macros %>%
  dplyr::select(country, med_pct_fat, med_pct_carb, med_pct_protein) %>%
  tidyr::pivot_longer(
    cols      = dplyr::starts_with("med_pct"),
    names_to  = "macro",
    values_to = "pct"
  ) %>%
  dplyr::mutate(
    macro = dplyr::case_when(
      macro == "med_pct_fat"     ~ "Fat",
      macro == "med_pct_carb"    ~ "Carbohydrate",
      macro == "med_pct_protein" ~ "Protein"
    ),
    macro   = factor(macro, levels = c("Fat", "Carbohydrate", "Protein")),
    country = factor(country, levels = protein_order)
  )

cat(sprintf("macro_long: %d rows, %d cols\n", nrow(macro_long), ncol(macro_long)))
macro_long: 105 rows, 3 cols
stopifnot("Plot data is empty" = nrow(macro_long) > 0)

# Palette: IslamicArt::samarqand — earthy brown (fat), soft aqua (carbs), deep indigo (protein)
samarqand_pal <- paletteer::paletteer_d("IslamicArt::samarqand")
macro_colors <- c(
  "Fat"          = as.character(samarqand_pal[2]),   # #907A58 warm earth
  "Carbohydrate" = as.character(samarqand_pal[5]),   # #A2D2D4 soft aqua
  "Protein"      = as.character(samarqand_pal[6])    # #475286 deep indigo
)

# Annotation data: label highest- and lowest-protein cuisines
label_cuisines <- c("Jamaican", "Cuban", "Chinese",
                    "Amish and Mennonite", "Australian and New Zealander")

annotation_df <- cuisine_macros %>%
  dplyr::filter(country %in% label_cuisines) %>%
  dplyr::mutate(
    country = factor(country, levels = protein_order),
    label   = sprintf("%.0f%% protein", round(med_pct_protein, 0))
  )

p <- macro_long %>%
  ggplot2::ggplot(ggplot2::aes(x = pct, y = country, fill = macro)) +
  ggplot2::geom_col(position = "fill", width = 0.72, colour = "white", linewidth = 0.2) +
  # Reference line: global median protein share
  ggplot2::geom_vline(
    xintercept = 1 - (global_fat + global_carb) / (global_fat + global_carb + global_prot),
    colour = "grey30", linetype = "dashed", linewidth = 0.5
  ) +
  # Protein % labels on highlighted cuisines
  ggrepel::geom_text_repel(
    data         = annotation_df,
    ggplot2::aes(x = 1, y = country, label = label),
    inherit.aes  = FALSE,
    hjust        = 0,
    nudge_x      = 0.02,
    size         = 3.2,
    colour       = "#475286",
    fontface     = "bold",
    direction    = "y",
    segment.size = 0.3,
    segment.colour = "grey50"
  ) +
  ggplot2::scale_x_continuous(
    labels = scales::percent_format(accuracy = 1),
    expand = ggplot2::expansion(mult = c(0, 0.12)),
    breaks = seq(0, 1, 0.25)
  ) +
  ggplot2::scale_fill_manual(values = macro_colors) +
  ggplot2::labs(
    title    = "The Macronutrient Fingerprint of Global Cuisines",
    subtitle = paste0(
      "Median share of calories from fat, carbohydrate, and protein across 35 culinary traditions on Allrecipes.\n",
      "Cuisines ordered by protein share (highest at top). Dashed line = cross-cuisine median protein share (~15%)."
    ),
    x        = "Share of macro-derived calories",
    y        = NULL,
    fill     = NULL,
    caption  = "Source: Allrecipes via tastyR · TidyTuesday 2025-09-16 · n ≥ 30 recipes per cuisine"
  ) +
  ggplot2::theme_minimal(base_size = 11.5) +
  ggplot2::theme(
    plot.title         = ggtext::element_markdown(face = "bold", size = 15, margin = ggplot2::margin(b = 4)),
    plot.subtitle      = ggplot2::element_text(size = 9.5, colour = "grey35", lineheight = 1.35,
                                               margin = ggplot2::margin(b = 12)),
    plot.caption       = ggplot2::element_text(size = 8, colour = "grey50",
                                               margin = ggplot2::margin(t = 10)),
    legend.position    = "top",
    legend.key.size    = ggplot2::unit(0.9, "lines"),
    legend.text        = ggplot2::element_text(size = 10),
    axis.text.y        = ggplot2::element_text(size = 9.5),
    axis.text.x        = ggplot2::element_text(size = 9, colour = "grey40"),
    panel.grid.minor   = ggplot2::element_blank(),
    panel.grid.major.y = ggplot2::element_blank(),
    panel.grid.major.x = ggplot2::element_line(colour = "grey88"),
    plot.margin        = ggplot2::margin(16, 24, 12, 12)
  )

p

Final thoughts and takeaways

The macronutrient fingerprint of a cuisine is not random noise — it reflects real structural differences in culinary logic. The high-protein cluster at the top (Jamaican, Cuban, Chinese, Vietnamese) skews heavily toward meat-centric main courses: jerk preparations, braised pork, stir-fried proteins, and pho. These traditions put the protein source at the center of the plate and build everything else around it.

The carbohydrate-dominant end of the spectrum tells a different story. Australian and New Zealand recipes on Allrecipes lean disproportionately toward baked goods — pavlova, lamingtons, Anzac biscuits — categories where flour, sugar, and oats dominate. Amish and Mennonite cooking reflects a tradition grounded in stretching protein-scarce ingredients with starches: potato dishes, noodle casseroles, and pies. Neither cuisine is “less healthy” in some absolute sense, but the carb emphasis is a real feature of what gets submitted and rated in those categories.

Cajun/Creole and Greek stand out as the most fat-heavy cuisines, both exceeding 55% of calories from fat. For Cajun cooking this tracks with the heavy use of butter and roux in étouffée, gumbo, and jambalaya. For Greek cuisine it reflects olive oil used generously in everything from phyllo pastry to moussaka.

A few caveats worth noting:

  • Allrecipes users tend to submit and rate celebratory or indulgent dishes more than everyday meals, which likely inflates calorie counts and fat percentages across the board.
  • Cuisine tags are assigned by Allrecipes editors, not by cultural experts — the “Australian and New Zealand” category almost certainly oversamples European-descended baking traditions relative to Indigenous or Pacific Island foodways.
  • The dataset captures a snapshot of user-submitted content through mid-2025; trending dietary styles (keto, plant-based) may skew toward recent submission dates.

Despite these limitations, the broad patterns are robust: culinary tradition does leave a legible macronutrient signature, and that signature is meaningful enough to distinguish 35 cuisines in a way that largely matches intuition about each food culture’s defining characteristics.

Render note: Render locally once with quarto render before committing — the _freeze/ directory captures executed output and CI will not re-execute.