Imagine the situation where we have a dataframe with a factor variable and we
have stored it with
write_vc(optimize = TRUE). The raw data file contains
the factor indices and the metadata contains the link between the factor
index and the corresponding label. See
vignette("version_control", package = "git2rdata"). In such a case,
relabelling a factor can be fast and lightweight by updating the metadata.
relabel(file, root = ".", change)
the name of the git2rdata object. Git2rdata objects cannot
have dots in their name. The name may include a relative path.
file is a
path relative to the
file must point to a location within
The root of a project. Can be a file path or a
Defaults to the current working directory (
list or a
data.frame. In case of a
list is a
list with named
vectors. The names of list elements must match the
names of the variables. The names of the vector elements must match the
existing factor labels. The values represent the new factor labels. In case
data.frame it needs to have the variables
factor (name of the
old (the old) factor label and
new (the new factor label).
relabel() ignores all other columns.
# initialise a git repo using git2r repo_path <- tempfile("git2rdata-repo-") dir.create(repo_path) repo <- git2r::init(repo_path) git2r::config(repo, user.name = "Alice", user.email = "email@example.com") # Create a dataframe and store it as an optimized git2rdata object. # Note that write_vc() uses optimization by default. # Stage and commit the git2rdata object. ds <- data.frame( a = c("a1", "a2"), b = c("b2", "b1"), stringsAsFactors = TRUE ) junk <- write_vc(ds, "relabel", repo, sorting = "b", stage = TRUE) cm <- commit(repo, "initial commit") # check that the workspace is clean status(repo) #> working directory clean # Define new labels as a list and apply them to the git2rdata object. new_labels <- list( a = list(a2 = "a3") ) relabel("relabel", repo, new_labels) # check the changes read_vc("relabel", repo) #> a b #> 1 a3 b1 #> 2 a1 b2 # relabel() changed the metadata, not the raw data status(repo) #> Unstaged changes: #> Modified: relabel.yml #> git2r::add(repo, "relabel.*") cm <- commit(repo, "relabel using a list") # Define new labels as a dataframe and apply them to the git2rdata object change <- data.frame( factor = c("a", "a", "b"), old = c("a3", "a1", "b2"), new = c("c2", "c1", "b3"), stringsAsFactors = TRUE ) relabel("relabel", repo, change) # check the changes read_vc("relabel", repo) #> a b #> 1 c2 b1 #> 2 c1 b3 # relabel() changed the metadata, not the raw data status(repo) #> Unstaged changes: #> Modified: relabel.yml #>