Read delimited data for optimal use with daiquiri

Popular file readers such as readr::read_delim() perform datatype conversion by default, which can interfere with daiquiri's ability to detect non-conformant values. Use this function instead to ensure optimal compatibility with daiquiri's features.

Usage

read_data(
  file,
  delim = NULL,
  col_names = TRUE,
  quote = "\"",
  trim_ws = TRUE,
  comment = "",
  skip = 0,
  n_max = Inf,
  show_progress = TRUE
)

Arguments

file: A string containing path of file containing data to load, or a URL starting http://, file://, etc. Compressed files with extension .gz, .bz2, .xz and .zip are supported.
delim: Single character used to separate fields within a record. E.g. "," or "\t"
col_names: Either TRUE, FALSE or a character vector of column names. If TRUE, the first row of the input will be used as the column names, and will not be included in the data frame. If FALSE, column names will be generated automatically. Default = TRUE
quote: Single character used to quote strings.
trim_ws: Should leading and trailing whitespace be trimmed from each field?
comment: A string used to identify comments. Any text after the comment characters will be silently ignored
skip: Number of lines to skip before reading data. If comment is supplied any commented lines are ignored after skipping
n_max: Maximum number of lines to read.
show_progress: Display a progress bar? Default = TRUE

Value

A data frame

Details

This function is aimed at non-expert users of R, and operates as a restricted implementation of readr::read_delim(). If you prefer to use read_delim() directly, ensure you set the following parameters: col_types = readr::cols(.default = "c") and na = character()

Examples

raw_data <- read_data(
  system.file("extdata", "example_prescriptions.csv", package = "daiquiri"),
  delim = ",",
  col_names = TRUE
)

head(raw_data)
#> # A tibble: 6 × 8
#>   PrescriptionID PrescriptionDate   AdmissionDate Drug  Dose  DoseUnit PatientID
#>   <chr>          <chr>              <chr>         <chr> <chr> <chr>    <chr>    
#> 1 6000           2021-01-01 00:00:… 2020-12-31    Ceft… 500   mg       4993679  
#> 2 6001           NULL               2020-12-31    Fluc… 1000  mg       819452   
#> 3 6002           NULL               2020-12-30    Teic… 400   mg       275597   
#> 4 6003           2021-01-01 01:00:… 1800-01-01    Fluc… 1000  NULL     819452   
#> 5 6004           2021-01-01 02:00:… 1800-01-01    Fluc… 1000  NULL     528071   
#> 6 6005           2021-01-01 03:00:… 2020-12-30    Co-a… 1.2   g        1001434  
#> # ℹ 1 more variable: Location <chr>