This function reads a DHS recode dataset from the zipped Stata dataset. By default (`mode = "haven"`), it reads in the stata data set using read_dta

read_dhs_dta(zfile, mode = "haven", all_lower = TRUE, ...)

Arguments

zfile

Path to `.zip` file containing Stata dataset, usually ending in filename `XXXXXXDT.zip`

mode

Read mode for Stata `.dta` file. Defaults to "haven", see 'Details' for other options.

all_lower

Logical indicating whether all value labels should be lower case. Default to `TRUE`.

...

Other arguments to be passed to read_zipdata. Here this will be arguments to pass to either read_dta or read.dta depending on the mode provided

Value

A data frame. If mode = 'map', value labels for each variable are stored as the `labelled` class from `haven`.

Details

The default `mode="haven"` uses read_dta to read in the dataset. We have chosen this option as it is more consistent with respect to variable labels and descriptions than others. The other options either use use read.dta or they use the `.MAP` dictionary file provided with the DHS Stata datasets to reconstruct the variable labels and value labels. In this case, value labels are stored are stored using the the `labelled` class from `haven`. See `?haven::labelled` for more information. Variable labels are stored in the "label" attribute of each variable, the same as `haven::read_dta()`.

Currently, `mode="map"` is only implemented for 111 character fixed-width .MAP files, which comprises the vast majority of recode data files from DHS Phases V, VI, and VII and some from Phase IV. Parsers for other .MAP formats will be added in future.

Other available modes read labels from the Stata dataset with various options available in R:

* `mode="map"` uses the `.MAP` dictionary file provided with the DHS Stata datasets to reconstruct the variable labels and value labels. In this case, value labels are stored are stored using the the `labelled` class from `haven`. See `?haven::labelled` for more information. Variable labels are stored in the "label" attribute of each variable, the same as `haven::read_dta()`.

* `mode="haven"`: use `haven::read_dta()` to read dataset. This option retains the native value codings with value labels affixed with the 'labelled' class.

* `mode="foreign"`: use `foreign::read.dta()`, with default options convert.factors=TRUE to add variable labels. Note that variable labels will not be added if labels are not present for all values, but variable labels are available via the "val.labels" attribute.

* `mode="foreignNA"`: use `foreign::read.dta(..., convert.factors=NA)`, which converts any values without labels to 'NA'. This risks data loss if labelling is incomplete in Stata datasets.

* `mode="raw"`: use `foreign::read.dta(..., convert.factors=FALSE)`, which simply loads underlying value coding. Variable labels and value labels are still available through dataset attributes (see examples).

See also

read.dta, labelled, read_dta.

For more information on the DHS filetypes and contents of distributed dataset .ZIP files, see https://dhsprogram.com/data/File-Types-and-Names.cfm#CP_JUMP_10334.

Examples

mrdt_zip <- tempfile()
download.file("https://dhsprogram.com/data/model_data/dhs/zzmr61dt.zip",
              mrdt_zip, mode="wb")

mr <- rdhs::read_dhs_dta(mrdt_zip,mode="map")
attr(mr$mv213, "label")
#> [1] "partner currently pregnant"
class(mr$mv213)
#> [1] "haven_labelled" "vctrs_vctr"     "integer"       
head(mr$mv213)
#> <labelled<integer>[6]>: partner currently pregnant
#> [1] NA  0  0 NA  0 NA
#> 
#> Labels:
#>  value   label
#>      0      no
#>      1     yes
#>      8  unsure
#>      9 missing
table(mr$mv213)
#> 
#>    0    1    8 
#> 1766  239   57 
table(haven::as_factor(mr$mv213))
#> 
#>      no     yes  unsure missing 
#>    1766     239      57       0 

## If Stata file codebook is complete, `mode="map"` and `"haven"`
## should be the same.
mr_hav <- rdhs::read_dhs_dta(mrdt_zip, mode="haven")
attr(mr_hav$mv213, "label")
#> [1] "partner currently pregnant"
class(mr_hav$mv213)
#> [1] "haven_labelled" "vctrs_vctr"     "double"        
head(mr_hav$mv213)  # "9=missing" omitted from .dta codebook
#> <labelled<double>[6]>: partner currently pregnant
#> [1] NA  0  0 NA  0 NA
#> 
#> Labels:
#>  value  label
#>      0     no
#>      1    yes
#>      8 unsure
table(mr_hav$mv213)
#> 
#>    0    1    8 
#> 1766  239   57 
table(haven::as_factor(mr_hav$mv213))
#> 
#>     no    yes unsure 
#>   1766    239     57 

## Parsing codebook when using foreign::read.dta()
# foreign issues with duplicated factors
# Specifying foreignNA can help but often will not as below.
# Thus we would recommend either using mode = "haven" or mode = "raw"
if (FALSE) { # \dontrun{
mr_for <- rdhs::read_dhs_dta(mrdt_zip, mode="foreign")
mr_for <- rdhs::read_dhs_dta(mrdt_zip, mode = "foreignNA")
} # }
## Don't convert factors
mr_raw <- rdhs::read_dhs_dta(mrdt_zip, mode="raw")
table(mr_raw$mv213)
#> 
#>    0    1    8 
#> 1766  239   57