Popular file readers such as readr::read_delim()
perform datatype
conversion by default, which can interfere with daiquiri's ability to detect
non-conformant values. Use this function instead to ensure optimal
compatibility with daiquiri's features.
Usage
read_data(
file,
delim = NULL,
col_names = TRUE,
quote = "\"",
trim_ws = TRUE,
comment = "",
skip = 0,
n_max = Inf,
show_progress = TRUE
)
Arguments
- file
A string containing path of file containing data to load, or a URL starting
http://
,file://
, etc. Compressed files with extension.gz
,.bz2
,.xz
and.zip
are supported.- delim
Single character used to separate fields within a record. E.g.
","
or"\t"
- col_names
Either
TRUE
,FALSE
or a character vector of column names. IfTRUE
, the first row of the input will be used as the column names, and will not be included in the data frame. IfFALSE
, column names will be generated automatically. Default =TRUE
- quote
Single character used to quote strings.
- trim_ws
Should leading and trailing whitespace be trimmed from each field?
- comment
A string used to identify comments. Any text after the comment characters will be silently ignored
- skip
Number of lines to skip before reading data. If
comment
is supplied any commented lines are ignored after skipping- n_max
Maximum number of lines to read.
- show_progress
Display a progress bar? Default =
TRUE
Details
This function is aimed at non-expert users of R, and operates as a restricted
implementation of readr::read_delim()
. If you prefer to use read_delim()
directly, ensure you set the following parameters: col_types = readr::cols(.default = "c")
and na = character()
Examples
raw_data <- read_data(
system.file("extdata", "example_prescriptions.csv", package = "daiquiri"),
delim = ",",
col_names = TRUE
)
head(raw_data)
#> # A tibble: 6 × 8
#> PrescriptionID PrescriptionDate Admis…¹ Drug Dose DoseU…² Patie…³ Locat…⁴
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 6000 2021-01-01 00:00:00 2020-1… Ceft… 500 mg 4993679 SITE1
#> 2 6001 NULL 2020-1… Fluc… 1000 mg 819452 SITE1
#> 3 6002 NULL 2020-1… Teic… 400 mg 275597 SITE1
#> 4 6003 2021-01-01 01:00:00 2020-1… Fluc… 1000 NULL 819452 SITE1
#> 5 6004 2021-01-01 02:00:00 2020-1… Fluc… 1000 NULL 528071 SITE1
#> 6 6005 2021-01-01 03:00:00 2020-1… Co-a… 1.2 g 1001434 SITE1
#> # … with abbreviated variable names ¹AdmissionDate, ²DoseUnit, ³PatientID,
#> # ⁴Location