Mapping the linguistic diversity of Africa — which language families dominate, which countries are most multilingual, and how many speakers do they represent?
This dataset explores popular languages spoken across the African continent, sourced from the Wikipedia page “Languages of Africa.” The collection captures linguistic diversity with estimates of between 1,250 and 3,000+ languages natively spoken in Africa, depending on how language versus dialect distinctions are made.
Which African country has the largest number of spoken languages?
Which language family demonstrates the highest speaker density?
Do any languages span multiple countries?
Loading necessary packages
My handy booster pack that allows me to install (if needed) and load my usual and favorite packages, as well as some helpful functions.
raw <- tidytuesdayR::tt_load('2026-01-13')africa <- raw$africa
Exploratory Data Analysis
The my_skim() function is a modified version of the skimr::skim() function that returns the number of missing data points (cells as NA) as well as the inverse (e.g.: number of rows that are notNA), the count, minimum, 25%, median, 75%, max, mean, geometric mean, and standard deviation. It also generates a little ASCII histogram. Neat!
The hero plot shows the top languages by native speakers, colored by language family, with annotations for cross-border languages.
# Warm, earthy African-inspired palette for language familiesfamily_cols <-c("#D4A373", # warm sand"#588157", # savanna green"#BC6C25", # terracotta"#344E41", # deep forest"#DDA15E", # golden"#606C38", # olive"#9B2226", # deep red"#005F73", # teal"#AE2012", # rust"#CA6702"# amber)# Get top 20 languages by native speakerstop_langs <- africa %>%group_by(language, family) %>%summarize(total_speakers =sum(native_speakers, na.rm =TRUE),n_countries =n_distinct(country),.groups ="drop" ) %>%arrange(desc(total_speakers)) %>%head(20)# Mark cross-border languagestop_langs <- top_langs %>%mutate(cross_border =ifelse(n_countries >1, paste0(n_countries, " countries"), ""))ggplot(top_langs, aes(x =reorder(language, total_speakers), y = total_speakers, fill = family)) +geom_col(width =0.7) +geom_text(aes(label =ifelse(cross_border !="",paste0(scales::comma(total_speakers), "\n(", cross_border, ")"), scales::comma(total_speakers))),hjust =-0.05,size =3.2,lineheight =0.85 ) +scale_y_continuous(labels = scales::label_number(scale_cut = scales::cut_short_scale()),expand =expansion(mult =c(0, 0.25)) ) +scale_fill_manual(values = family_cols, name ="Language Family") +coord_flip() +labs(title ="Most Spoken Languages of Africa",subtitle ="Top 20 languages by native speakers, with cross-border reach annotated",x =NULL,y ="Native Speakers",caption ="Source: TidyTuesday 2026-01-13 | Wikipedia Languages of Africa" ) +theme_minimal(base_size =13) +theme(plot.title =element_text(face ="bold", size =18, color ="#344E41"),plot.subtitle =element_text(size =12, color ="#555555"),plot.caption =element_text(size =9, color ="#888888"),legend.position ="bottom",panel.grid.major.y =element_blank(),panel.grid.minor =element_blank() ) +guides(fill =guide_legend(nrow =2))
Final thoughts and takeaways
Africa is the most linguistically diverse continent on Earth, and this dataset — even as a curated subset of the most popular languages — showcases that richness. The dominance of the Niger-Congo and Afro-Asiatic families in terms of both language count and total speakers reflects deep historical patterns of migration and cultural development across the continent.
The cross-border language data is particularly revealing. Languages like Arabic, Swahili, and Hausa don’t respect national boundaries drawn by colonial powers in the 19th and 20th centuries. These lingua francas serve as vital connectors for trade, culture, and communication across regions where dozens of local languages coexist.
Note
This dataset captures only “popular” languages — the full linguistic picture of Africa is far richer. Many languages with fewer speakers are endangered, and the ongoing tension between lingua francas (which enable economic participation) and local languages (which carry cultural heritage) is one of the defining sociolinguistic challenges of the continent.