Make many header rows into column names
Arguments
- df
A
data.frame
ortibble
object in which the names are broken up across the top n rows.- n_name_rows
Number of rows at the top of the data to be used to create the new variable (column) names. Must be >= 1.
- keep_names
If TRUE, existing names will be included when building the new variable names. Defaults to TRUE.
- sliding_headers
If TRUE, empty values in the first (topmost) header header row be filled column-wise. Defaults to FALSE. See details.
- sep
Character string to separate the unified values (default is underscore).
Value
The original data frame, but with new column names and without the top n rows that held the broken up names.
Details
Tables are often shared with the column names broken up across the
first few rows. This function takes the number of rows at the top of a
table that hold the broken up names and whether or not to include the
names, and mashes the values column-wise into a single string for each
column. The keep_names
argument can be helpful for tables we
imported using a skip
argument. If keep_names
is set to FALSE
,
adjust the value of n_name_rows
accordingly.
This function will throw a warning when possible NA
values end up in the
variable names. sliding_headers
can be used for tables with ragged
names in which not every column has a value in the very first row. In these
cases attribution by adjacency is assumed, and when sliding_headers
is set to TRUE
the names in the topmost row are filled row-wise. This can
be useful for tables reporting survey data or experimental designs in an
untidy manner.
Examples
babies <-
data.frame(
stringsAsFactors = FALSE,
Baby = c(NA, NA, "Angie", "Yean", "Pierre"),
Age = c("in", "months", "11", "9", "7"),
Weight = c("kg", NA, "2", "3", "4"),
Ward = c(NA, NA, "A", "B", "C")
)
# Including the object names
mash_colnames(babies, n_name_rows = 2, keep_names = TRUE)
#> Baby Age_in_months Weight_kg Ward
#> 3 Angie 11 2 A
#> 4 Yean 9 3 B
#> 5 Pierre 7 4 C
babies_skip <-
data.frame(
stringsAsFactors = FALSE,
X1 = c("Baby", NA, NA, "Jennie", "Yean", "Pierre"),
X2 = c("Age", "in", "months", "11", "9", "7"),
X3 = c("Hospital", NA, NA, "A", "B", "A")
)
#' # Discarding the automatically-generated names (X1, X2, etc...)
mash_colnames(babies_skip, n_name_rows = 3, keep_names = FALSE)
#> Baby Age_in_months Hospital
#> 4 Jennie 11 A
#> 5 Yean 9 B
#> 6 Pierre 7 A
fish_experiment <-
data.frame(
stringsAsFactors = FALSE,
X1 = c("Sample", NA, "Pacific", "Atlantic", "Freshwater"),
X2 = c("Larvae", "Control", "12", "11", "10"),
X3 = c(NA, "Low Dose", "11", "12", "8"),
X4 = c(NA, "High Dose", "8", "7", "9"),
X5 = c("Adult", "Control", "13", "13", "8"),
X6 = c(NA, "Low Dose", "13", "12", "7"),
X7 = c(NA, "High Dose", "10", "10", "9")
)
# Ragged names
mash_colnames(fish_experiment,
n_name_rows = 2,
keep_names = FALSE, sliding_headers = TRUE
)
#> Sample Larvae_Control Larvae_Low Dose Larvae_High Dose Adult_Control
#> 3 Pacific 12 11 8 13
#> 4 Atlantic 11 12 7 13
#> 5 Freshwater 10 8 9 8
#> Adult_Low Dose Adult_High Dose
#> 3 13 10
#> 4 12 10
#> 5 7 9