Skip to contents

Aligning strings with regex.

Usage

regex_valign(stringvec, regex_ai, sep_str = "")

Arguments

stringvec

A character vector with one element for each line.

regex_ai

A regular expression matching the position for alignment.

sep_str

Optional character vector that will be inserted at the positions matched by the regular expression.

Value

A character vector with one element for each line, with padding inserted at the matched positions so that elements are vertically aligned across lines.

Details

Written mainly for reading fixed width files, text, or tables parsed from PDFs.

See also

This function is based loosely on textutils::valign().

Examples

guests <-
  unlist(strsplit(c("6       COAHUILA        20/03/2020
7       COAHUILA             20/03/2020
18 BAJA CALIFORNIA     16/03/2020
109       CDMX      12/03/2020
1230   QUERETARO       21/03/2020"), "\n"))

# align at first uppercase word boundary , inserting a separator
regex_valign(guests, "\\b(?=[A-Z])", " - ")
#> [1] "6          - COAHUILA        20/03/2020"     
#> [2] "7          - COAHUILA             20/03/2020"
#> [3] "18         - BAJA CALIFORNIA     16/03/2020" 
#> [4] "109        - CDMX      12/03/2020"           
#> [5] "1230       - QUERETARO       21/03/2020"     
# align dates at end of string
regex_valign(guests, "\\b(?=[0-9]{2}[\\/]{1}[0-9]{2}[\\/]{1}[0-9]{4}$)")
#> [1] "6       COAHUILA             20/03/2020"
#> [2] "7       COAHUILA             20/03/2020"
#> [3] "18 BAJA CALIFORNIA           16/03/2020"
#> [4] "109       CDMX               12/03/2020"
#> [5] "1230   QUERETARO             21/03/2020"