Long-term monitoring of Hermann’s tortoises in the western Balkans reveals a near-perfect relationship between body size and clutch size—and that females consistently outweigh their own shells.
This week’s dataset comes from a long-term mark-recapture study of Hermann’s tortoises (Testudo hermanni) monitored across three localities in the western Balkans. Two datasets are included: body condition index (BCI) measurements collected annually from 2008–2023, and clutch size records paired with morphometric measurements from nesting females. The BCI captures how an individual’s actual body mass compares to the expected mass given its shell length — a proxy for nutritional condition and fat reserves.
Loading necessary packages
My handy booster pack that allows me to install (if needed) and load my usual and favorite packages, as well as some helpful functions.
raw <- tidytuesdayR::tt_load('2026-03-03')# Clutch size: nesting females with egg counts and morphometricsclutch <- raw$clutch_size_cleaned# Body condition index: repeated annual measurements, 2008–2023body <- raw$tortoise_body_condition_cleaned
Exploratory Data Analysis
The my_skim() function is a modified version of skimr::skim() that returns the number of missing data points as well as the inverse, the count, minimum, 25%, median, 75%, max, mean, geometric mean, and standard deviation. It also generates a little ASCII histogram.
Clutch size data
# Drop: individual ID and date (not analytically useful for profiling)clutch %>%select(-individual, -date) %>%my_skim()
Data summary
Name
Piped data
Number of rows
53
Number of columns
5
_______________________
Column type frequency:
character
1
numeric
4
________________________
Group variables
None
Variable type: character
skim_variable
n_missing
complete_rate
min
max
empty
n_unique
whitespace
locality
0
1
5
7
0
3
0
Variable type: numeric
skim_variable
n_missing
complete_rate
n
min
p25
med
p75
max
mean
geo_mean
sd
hist
age
0
1
53
15
20
22
30
30
24.09
23.53
5.19
▃▅▅▁▇
eggs
0
1
53
1
3
5
6
11
4.79
4.31
2.11
▇▇▇▁▁
body_mass_grams
0
1
53
541
1116
1494
1712
2920
1505.04
1404.26
564.60
▆▆▇▃▂
straight_carapace_length_mm
0
1
53
142
176
192
206
260
191.15
189.25
27.33
▅▅▇▂▂
The clutch dataset is small — 53 observations from nesting females across three localities (Beach, Konjsko, Plateau), ages 15–30 years. Egg counts range from 1 to 11, with a median of 5. Body mass spans from 541 g to nearly 3 kg, and carapace length from 142 mm to 260 mm. The wide ranges across both size measures hint at strong individual variation — and possibly a tight relationship between size and clutch size.
cat("Localities:\n")
Localities:
print(count(clutch, locality))
# A tibble: 3 × 2
locality n
<chr> <int>
1 Beach 13
2 Konjsko 31
3 Plateau 9
# Drop: year_recode (redundant with year)body %>%select(-year_recode) %>%my_skim()
Data summary
Name
Piped data
Number of rows
10174
Number of columns
8
_______________________
Column type frequency:
character
4
numeric
4
________________________
Group variables
None
Variable type: character
skim_variable
n_missing
complete_rate
min
max
empty
n_unique
whitespace
individual
0
1
1
6
0
2139
0
season
0
1
6
6
0
2
0
locality
0
1
5
7
0
3
0
sex
0
1
1
1
0
2
0
Variable type: numeric
skim_variable
n_missing
complete_rate
n
min
p25
med
p75
max
mean
geo_mean
sd
hist
year
0
1
10174
2008.00
2011.00
2016.00
2018.00
2023.0
2014.90
2014.90
4.51
▇▂▆▃▃
body_mass_grams
0
1
10174
528.00
962.00
1092.00
1248.00
2913.0
1133.48
1106.36
263.52
▅▇▁▁▁
body_condition_index
0
1
10174
3.52
5.66
6.19
6.78
11.7
6.31
6.24
0.98
▁▇▂▁▁
straight_carapace_length_mm
0
1
10174
150.00
169.00
177.00
185.00
250.0
177.77
177.27
13.36
▅▇▂▁▁
The longitudinal body condition dataset contains 10,174 measurements across 16 years (2008–2023). No missing values. The BCI ranges from 3.5 to ~11.7, centered around 6.1–6.3. The split between Spring and Summer observations is roughly 55/45. Sex coding is "f" and "m" — important to note for filtering later.
cat("Sex values:\n")
Sex values:
print(count(body, sex))
# A tibble: 2 × 2
sex n
<chr> <int>
1 f 1435
2 m 8739
cat("\nLocalities:\n")
Localities:
print(count(body, locality))
# A tibble: 3 × 2
locality n
<chr> <int>
1 Beach 618
2 Konjsko 1616
3 Plateau 7940
cat("\nSeasons:\n")
Seasons:
print(count(body, season))
# A tibble: 2 × 2
season n
<chr> <int>
1 Spring 5747
2 Summer 4427
Sex gap: Females average BCI 7.48 vs. males 6.12 — more than a full unit difference, with females showing substantially wider spread (SD 1.37 vs. 0.74).
Locality gap: Konjsko individuals score highest (BCI 7.46), well above Beach (5.80) and Plateau (6.12).
These patterns motivate the two analytical threads below.
Body condition: who’s in the best shape, and where?
The body condition index here is a residual measure — it captures how much an individual’s actual body mass deviates from the expected mass given its shell length. Higher BCI means the animal is carrying more mass relative to its structural size, a proxy for fat reserves and nutritional status.
Note
In reptiles, elevated fat reserves in females are not simply a sign of good health — they’re a reproductive necessity. Female tortoises must accumulate substantial lipid stores prior to egg formation. The ~22% BCI advantage females show over males in this dataset is likely a direct reflection of that investment.
# Verify sex × locality combination counts before plottingbci_plot_data <- body %>%mutate(sex_label =ifelse(sex =="f", "Female", "Male"),sex_label =factor(sex_label, levels =c("Female", "Male")) )cat(sprintf("bci_plot_data: %d rows, %d cols\n", nrow(bci_plot_data), ncol(bci_plot_data)))
bci_plot_data: 10174 rows, 10 cols
stopifnot("Plot data has 0 rows"=nrow(bci_plot_data) >0)# Check actual BCI range is sensiblebci_range <-range(bci_plot_data$body_condition_index)cat(sprintf("BCI range: %.2f – %.2f\n", bci_range[1], bci_range[2]))
BCI range: 3.52 – 11.70
locality_colors <- paletteer::paletteer_d("MetBrewer::Kandinsky", n =3)names(locality_colors) <-c("Beach", "Konjsko", "Plateau")p_bci <- ggplot2::ggplot( bci_plot_data, ggplot2::aes(x = body_condition_index,y = locality,fill = locality,color = locality )) + ggridges::geom_density_ridges(alpha =0.7,scale =0.9,quantile_lines =TRUE,quantiles =2, # median linejittered_points =FALSE ) + ggplot2::facet_wrap(~sex_label, ncol =2) + ggplot2::scale_fill_manual(values = locality_colors) + ggplot2::scale_color_manual(values = locality_colors) + ggplot2::scale_x_continuous(breaks =seq(4, 12, by =2)) + ggplot2::labs(title ="Female tortoises maintain higher body condition across all localities",subtitle ="Body Condition Index (BCI) distributions by sex and locality, 2008–2023\nVertical lines mark median BCI per group",x ="Body Condition Index",y =NULL,fill ="Locality",color ="Locality",caption ="Source: TidyTuesday 2026-03-03 | Golem Grad Tortoise Data" ) + ggplot2::theme_minimal(base_size =13) + ggplot2::theme(plot.title = ggplot2::element_text(face ="bold", size =15),plot.subtitle = ggplot2::element_text(color ="gray40", size =11),plot.caption = ggplot2::element_text(color ="gray55", size =9),strip.text = ggplot2::element_text(face ="bold", size =13),legend.position ="none",panel.grid.minor = ggplot2::element_blank(),panel.grid.major.y = ggplot2::element_blank() )p_bci
The ridge distributions tell a clear story. For females, the Konjsko population is shifted markedly rightward — a heavy right tail reaching past BCI 10. Beach and Plateau females overlap considerably but trail Konjsko by a full unit at the median. For males, distributions are tighter and more uniform across localities, all clustering around BCI 5.5–6.5. The male Konjsko advantage disappears in the tight BCI range.
This locality × sex interaction suggests that Konjsko habitat uniquely enables females to achieve the elevated fat deposits necessary for robust egg production — a pattern we can test directly with the clutch data.
Body size predicts reproductive output
# Verify actual locality values in the clutch datasetcat("Localities in clutch data:\n")
Localities in clutch data:
print(sort(unique(clutch$locality)))
[1] "Beach" "Konjsko" "Plateau"
# Confirm these match the BCI datasetstopifnot(all(clutch$locality %in%c("Beach", "Konjsko", "Plateau")))cat(sprintf("clutch: %d rows, %d cols\n", nrow(clutch), ncol(clutch)))
clutch: 53 rows, 7 cols
stopifnot("clutch has 0 rows"=nrow(clutch) >0)# Correlation statisticscor_mass <-cor.test(clutch$body_mass_grams, clutch$eggs)cor_scl <-cor.test(clutch$straight_carapace_length_mm, clutch$eggs)cat(sprintf("\nBody mass × eggs: r = %.3f, p < 2.2e-16\n", cor_mass$estimate))
Body mass × eggs: r = 0.896, p < 2.2e-16
cat(sprintf("Carapace × eggs: r = %.3f, p < 2.2e-16\n", cor_scl$estimate))
Carapace × eggs: r = 0.882, p < 2.2e-16
# Sanity check: are eggs proportional values or counts?cat(sprintf("\nEgg count range: %d – %d (should be counts, not proportions)\n",min(clutch$eggs), max(clutch$eggs)))
Egg count range: 1 – 11 (should be counts, not proportions)
if (max(clutch$eggs) <=1) {warning("Eggs column looks like proportions, not counts — check data")}
The Pearson correlation between body mass and clutch size is r = 0.896 (p < 2.2e–16). For a biological dataset with only 53 observations spanning wild-caught individuals across multiple years, this is a remarkably tight relationship. Carapace length shows nearly identical predictive power (r = 0.882).
Important
What r = 0.896 means in practice: roughly 80% of the variance in egg count is accounted for by body mass alone. The largest female in the sample (2,920 g) laid 11 eggs; the smallest (541 g) laid 1. That’s an 11-fold range in reproductive output explained almost entirely by how big the animal is.
# Label outliers: largest clutch per locality, and the minimumlabel_pts <-bind_rows( clutch %>%group_by(locality) %>%slice_max(order_by = eggs, n =1, with_ties =FALSE), clutch %>%slice_min(order_by = eggs, n =1, with_ties =FALSE)) %>%distinct(individual, .keep_all =TRUE) %>%mutate(label =paste0(eggs, " eggs\n", body_mass_grams, " g"))cat(sprintf("label_pts: %d rows\n", nrow(label_pts)))
The scatter makes the relationship almost uncomfortable in its linearity — these points want to fall on a line. Konjsko individuals (amber) tend to cluster in the upper-right corner (high mass, high clutch size), consistent with the BCI advantage we saw in the ridge plots. Beach tortoises (teal) appear mostly at mid-mass; the single Plateau point near the top right is a notable outlier.
Log palette
Final thoughts and takeaways
This dataset from a Balkan Hermann’s tortoise study turns out to be a compact illustration of two intertwined ecological principles:
Body condition is sex-stratified by design. The ~22% BCI gap between females and males isn’t a sign that males are unhealthy — it reflects the asymmetric energetic demands of reproduction. Females must mobilize fat reserves to produce calcified eggs; males don’t. Konjsko amplifies this effect, suggesting better vegetation cover or food availability that enables females there to achieve the highest pre-nesting condition in the dataset. This locality-level effect has real conservation implications: habitat degradation at Konjsko would disproportionately harm female reproductive success.
Size is reproductive destiny, at r = 0.896. This correlation is extremely high for a wild biological dataset. It means that a female tortoise’s body mass is, in practice, a near-sufficient predictor of how many eggs she will lay — more than age, year, or locality alone. Since tortoises grow slowly and live for decades, this finding elevates early life conditions (juvenile growth rates, food availability) as critical determinants of lifetime reproductive output. A tortoise that grows slowly in its first decade carries that reproductive penalty for life.
Limitations: The clutch size dataset is small (n = 53) and restricted to females who were observed nesting — likely biased toward better-condition individuals who successfully nested. The BCI data is far richer but tracks a narrow three-locality window of a much wider species range. Generalizing these patterns to other T. hermanni populations requires caution.
The overall picture, though, is coherent: take care of the habitat, the females grow bigger, and they lay more eggs. That’s a tractable conservation lever.