Make many header rows into column names

Usage

mash_colnames(
  df,
  n_name_rows,
  keep_names = TRUE,
  sliding_headers = FALSE,
  sep = "_"
)

Arguments

df: A data.frame or tibble object in which the names are broken up across the top n rows.
n_name_rows: Number of rows at the top of the data to be used to create the new variable (column) names. Must be >= 1.
keep_names: If TRUE, existing names will be included when building the new variable names. Defaults to TRUE.
sliding_headers: If TRUE, empty values in the first (topmost) header header row be filled column-wise. Defaults to FALSE. See details.
sep: Character string to separate the unified values (default is underscore).

Value

The original data frame, but with new column names and without the top n rows that held the broken up names.

Details

Tables are often shared with the column names broken up across the first few rows. This function takes the number of rows at the top of a table that hold the broken up names and whether or not to include the names, and mashes the values column-wise into a single string for each column. The keep_names argument can be helpful for tables we imported using a skip argument. If keep_names is set to FALSE, adjust the value of n_name_rows accordingly.

This function will throw a warning when possible NA values end up in the variable names. sliding_headers can be used for tables with ragged names in which not every column has a value in the very first row. In these cases attribution by adjacency is assumed, and when sliding_headers is set to TRUE the names in the topmost row are filled row-wise. This can be useful for tables reporting survey data or experimental designs in an untidy manner.

Author

This function was originally contributed by Jarrett Byrnes through a GitHub issue.

Examples

babies <-
  data.frame(
    stringsAsFactors = FALSE,
    Baby = c(NA, NA, "Angie", "Yean", "Pierre"),
    Age = c("in", "months", "11", "9", "7"),
    Weight = c("kg", NA, "2", "3", "4"),
    Ward = c(NA, NA, "A", "B", "C")
  )
# Including the object names
mash_colnames(babies, n_name_rows = 2, keep_names = TRUE)
#>     Baby Age_in_months Weight_kg Ward
#> 3  Angie            11         2    A
#> 4   Yean             9         3    B
#> 5 Pierre             7         4    C

babies_skip <-
  data.frame(
    stringsAsFactors = FALSE,
    X1 = c("Baby", NA, NA, "Jennie", "Yean", "Pierre"),
    X2 = c("Age", "in", "months", "11", "9", "7"),
    X3 = c("Hospital", NA, NA, "A", "B", "A")
  )
#' # Discarding the automatically-generated names (X1, X2, etc...)
mash_colnames(babies_skip, n_name_rows = 3, keep_names = FALSE)
#>     Baby Age_in_months Hospital
#> 4 Jennie            11        A
#> 5   Yean             9        B
#> 6 Pierre             7        A

fish_experiment <-
  data.frame(
    stringsAsFactors = FALSE,
    X1 = c("Sample", NA, "Pacific", "Atlantic", "Freshwater"),
    X2 = c("Larvae", "Control", "12", "11", "10"),
    X3 = c(NA, "Low Dose", "11", "12", "8"),
    X4 = c(NA, "High Dose", "8", "7", "9"),
    X5 = c("Adult", "Control", "13", "13", "8"),
    X6 = c(NA, "Low Dose", "13", "12", "7"),
    X7 = c(NA, "High Dose", "10", "10", "9")
  )
# Ragged names
mash_colnames(fish_experiment,
  n_name_rows = 2,
  keep_names = FALSE, sliding_headers = TRUE
)
#>       Sample Larvae_Control Larvae_Low Dose Larvae_High Dose Adult_Control
#> 3    Pacific             12              11                8            13
#> 4   Atlantic             11              12                7            13
#> 5 Freshwater             10               8                9             8
#>   Adult_Low Dose Adult_High Dose
#> 3             13              10
#> 4             12              10
#> 5              7               9