Tidy Tuesday: Brazilian Companies

tidytuesday
R
economics
brazil
Exploring Brazil’s open CNPJ company registry — how capital stock distributes across legal structures, company sizes, and owner qualifications.
Author

Sean Thimons

Published

January 27, 2026

Preface

From TidyTuesday repository.

This dataset explores Brazil’s open CNPJ (Cadastro Nacional da Pessoa Jurídica) records from the Brazilian Ministry of Finance. The raw company records were cleaned and enriched with lookup tables (legal nature, owner qualification, and company size), then filtered to retain firms above a share-capital threshold.

  • Which legal nature categories have the highest total and average capital stock?
  • How does company size relate to capital stock distribution?
  • Which owner qualification groups dominate high-capital companies?

Loading necessary packages

My handy booster pack that allows me to install (if needed) and load my usual and favorite packages, as well as some helpful functions.

Code
# Packages ----------------------------------------------------------------

{
  if (!requireNamespace("pak", quietly = TRUE)) {
    install.packages(
      "pak",
      repos = sprintf(
        "https://r-lib.github.io/p/pak/stable/%s/%s/%s",
        .Platform$pkgType,
        R.Version()$os,
        R.Version()$arch
      )
    )
  }

  install_booster_pack <- function(package, load = TRUE) {
    for (pkg in package) {
      if (!requireNamespace(pkg, quietly = TRUE)) {
        pak::pkg_install(pkg)
      }
      if (load) {
        library(pkg, character.only = TRUE)
      }
    }
  }

  if (file.exists('packages.txt')) {
    packages <- read.table('packages.txt')
    install_booster_pack(package = packages$Package, load = FALSE)
    rm(packages)
  } else {
    booster_pack <- c(
      ### IO ----
      'fs',
      'here',
      'janitor',
      'rio',
      'tidyverse',

      ### EDA ----
      'skimr',

      ### Plot ----
      'ggrepel',
      'scales',

      ### Misc ----
      'tidytuesdayR'
    )

    install_booster_pack(package = booster_pack, load = TRUE)
    rm(install_booster_pack, booster_pack)
  }

  # Custom Functions ----

  `%ni%` <- Negate(`%in%`)

  geometric_mean <- function(x) {
    exp(mean(log(x[x > 0]), na.rm = TRUE))
  }

  my_skim <- skim_with(
    numeric = sfl(
      n = length,
      min = ~ min(.x, na.rm = T),
      p25 = ~ stats::quantile(., probs = .25, na.rm = TRUE, names = FALSE),
      med = ~ median(.x, na.rm = T),
      p75 = ~ stats::quantile(., probs = .75, na.rm = TRUE, names = FALSE),
      max = ~ max(.x, na.rm = T),
      mean = ~ mean(.x, na.rm = T),
      geo_mean = ~ geometric_mean(.x),
      sd = ~ stats::sd(., na.rm = TRUE),
      hist = ~ inline_hist(., 5)
    ),
    append = FALSE
  )
}

Load raw data from package

raw <- tidytuesdayR::tt_load('2026-01-27')

companies <- raw$companies

Exploratory Data Analysis

The my_skim() function is a modified version of the skimr::skim() function that returns the number of missing data points (cells as NA) as well as the inverse (e.g.: number of rows that are not NA), the count, minimum, 25%, median, 75%, max, mean, geometric mean, and standard deviation. It also generates a little ASCII histogram. Neat!

Companies

companies %>%
  select(-company_id, -company_name) %>%
  my_skim(.)
Data summary
Name Piped data
Number of rows 141332
Number of columns 4
_______________________
Column type frequency:
character 3
numeric 1
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
legal_nature 0 1 11 47 0 22 0
owner_qualification 0 1 10 61 0 13 0
company_size 0 1 5 16 0 3 0

Variable type: numeric

skim_variable n_missing complete_rate n min p25 med p75 max mean geo_mean sd hist
capital_stock 0 1 141332 150000.1 210000 4e+05 1e+06 1e+12 353697048 633219.4 14646156959 ▇▁▁▁▁

Let’s get a sense of the categorical breakdowns:

companies %>%
  count(company_size, sort = TRUE)
# A tibble: 3 × 2
  company_size         n
  <chr>            <int>
1 micro-enterprise 66202
2 other            42520
3 small-enterprise 32610
companies %>%
  count(legal_nature, sort = TRUE) %>%
  head(15)
# A tibble: 15 × 2
   legal_nature                                         n
   <chr>                                            <int>
 1 Limited Liability Business Company (LLC)        119288
 2 Sole Proprietorship                              15209
 3 Privately Held Corporation                        2897
 4 Silent Partnership                                2585
 5 Simple Limited Partnership                         550
 6 Individual Limited Liability Company (Business)    182
 7 Simple Innovation Company                          144
 8 Sole Member Law Firm                               123
 9 Simple Partnership (Pure)                          114
10 Cooperative                                         78
11 Publicly Traded Corporation                         52
12 Individual Real Estate Company                      50
13 Mixed-Capital Company                               21
14 State-Owned Enterprise                              15
15 General Partnership                                  8
companies %>%
  count(owner_qualification, sort = TRUE) %>%
  head(15)
# A tibble: 13 × 2
   owner_qualification                                                n
   <chr>                                                          <int>
 1 Managing Partner / Partner-Administrator                      107027
 2 Administrator / Manager                                        15236
 3 Entrepreneur / Business Owner                                  15201
 4 Director / Officer                                              1634
 5 President / Chair                                               1343
 6 Beneficial Owner (individual) resident or domiciled in Brazil    442
 7 Judicial Administrator (Court-appointed)                         302
 8 Sole Owner of an Individual Real Estate Company                   50
 9 Ostensible Partner (Managing partner in a silent partnership)     32
10 Liquidator                                                        29
11 Executor / Estate Administrator                                   22
12 Attorney-in-fact / Legal Representative (Power of Attorney)       13
13 Intervenor / Court-appointed Administrator                         1

Capital Stock Analysis

Distribution by Company Size

The capital stock distribution is likely highly right-skewed — a few massive firms alongside many small ones. Let’s see how the size categories compare.

companies %>%
  filter(!is.na(company_size), !is.na(capital_stock), capital_stock > 0) %>%
  group_by(company_size) %>%
  summarize(
    n = n(),
    median_capital = median(capital_stock),
    mean_capital = mean(capital_stock),
    total_capital = sum(capital_stock),
    .groups = "drop"
  ) %>%
  arrange(desc(total_capital))
# A tibble: 3 × 5
  company_size         n median_capital mean_capital total_capital
  <chr>            <int>          <dbl>        <dbl>         <dbl>
1 small-enterprise 32610         350000   837193374.       2.73e13
2 other            42520        1037169   500429583.       2.13e13
3 micro-enterprise 66202         300000    21291946.       1.41e12

Owner Qualifications in High-Capital Firms

Which owner qualifications show up most often among the top-capital firms?

capital_p90 <- quantile(companies$capital_stock, 0.9, na.rm = TRUE)

companies %>%
  filter(capital_stock >= capital_p90) %>%
  count(owner_qualification, sort = TRUE) %>%
  head(10)
# A tibble: 10 × 2
   owner_qualification                                               n
   <chr>                                                         <int>
 1 Managing Partner / Partner-Administrator                       6874
 2 Administrator / Manager                                        4256
 3 Entrepreneur / Business Owner                                  1289
 4 Director / Officer                                              966
 5 President / Chair                                               568
 6 Judicial Administrator (Court-appointed)                        128
 7 Beneficial Owner (individual) resident or domiciled in Brazil    26
 8 Liquidator                                                       12
 9 Ostensible Partner (Managing partner in a silent partnership)     6
10 Executor / Estate Administrator                                   5

Visualizing Capital Stock Distribution

The hero plot shows the log-scaled capital stock distribution across company sizes, with annotations for the median values and a callout for the concentration of capital in a few large entities.

# Brazilian flag-inspired palette
brazil_cols <- c(
  "#009739",  # green
  "#FEDD00",  # yellow
  "#002776"   # blue
)

plot_data <- companies %>%
  filter(!is.na(company_size), !is.na(capital_stock), capital_stock > 0)

# Calculate medians for annotation
size_medians <- plot_data %>%
  group_by(company_size) %>%
  summarize(
    med = median(capital_stock),
    n = n(),
    .groups = "drop"
  )

ggplot(plot_data, aes(x = capital_stock, fill = company_size)) +
  geom_density(alpha = 0.6, color = NA) +
  geom_vline(
    data = size_medians,
    aes(xintercept = med, color = company_size),
    linetype = "dashed",
    linewidth = 0.8
  ) +
  geom_text(
    data = size_medians,
    aes(
      x = med,
      y = Inf,
      label = paste0("Median: R$", scales::comma(med)),
      color = company_size
    ),
    vjust = 1.5,
    hjust = -0.1,
    size = 3.5,
    fontface = "bold",
    show.legend = FALSE
  ) +
  scale_x_log10(
    labels = scales::label_dollar(prefix = "R$", big.mark = ","),
    breaks = 10^(seq(0, 12, by = 2))
  ) +
  scale_fill_manual(values = brazil_cols, name = "Company Size") +
  scale_color_manual(values = brazil_cols, name = "Company Size") +
  labs(
    title = "Capital Stock Distribution of Brazilian Companies",
    subtitle = "Log-scaled density by company size category — CNPJ open registry data",
    x = "Capital Stock (BRL, log scale)",
    y = "Density",
    caption = "Source: TidyTuesday 2026-01-27 | Brazilian Ministry of Finance via dados.gov.br"
  ) +
  theme_minimal(base_size = 13) +
  theme(
    plot.title = element_text(face = "bold", size = 18, color = "#002776"),
    plot.subtitle = element_text(size = 12, color = "#555555"),
    plot.caption = element_text(size = 9, color = "#888888"),
    legend.position = "bottom",
    panel.grid.minor = element_blank()
  )

Final thoughts and takeaways

Brazil’s open CNPJ registry is a remarkable transparency tool — few countries publish their corporate registrations this openly. The capital stock distributions reveal the expected power-law shape: a long tail of micro and small enterprises with modest capitalization, and a handful of massive corporate entities that dominate total capital.

The legal nature breakdown is particularly interesting for understanding Brazil’s business landscape. Limited liability companies (Ltda.) vastly outnumber other forms, which tracks with their flexibility and lower compliance burden compared to corporations (S.A.). But when you look at total capital stock, the picture inverts — a small number of S.A. entities command disproportionate capital.

Note

Capital stock (capital social) in Brazil’s registry represents the declared investment by owners at incorporation or amendment. It’s a useful proxy for firm size but doesn’t capture retained earnings, debt, or market value — so the largest firms by capital stock aren’t necessarily the largest by revenue.

The owner qualification data adds another dimension: among the highest-capitalized firms, we see a concentration of specific professional qualifications that reflect Brazil’s regulatory requirements for certain industries.