z - Advanced topic: Reproducible Analytical Pipelines with Nix
Source:vignettes/z-advanced-topic-reproducible-analytical-pipelines-with-nix.Rmd
z-advanced-topic-reproducible-analytical-pipelines-with-nix.Rmd
Introduction
Isolated environments are great to run pipelines in a safe and reproducible manner. This vignette details how to build a reproducible analytical pipeline using an environment built with Nix that contains the right version of R and packages.
An example of a reproducible analytical pipeline using Nix
Suppose that you’ve used targets to build a pipeline
for a project and that you did so using a tailor-made Nix environment.
Here is the call to rix()
that you could have used to build
that environment:
path_default_nix <- tempdir()
rix(
r_ver = "4.2.2",
r_pkgs = c("targets", "tarchetypes", "rmarkdown"),
system_pkgs = NULL,
git_pkgs = list(
package_name = "housing",
repo_url = "https://github.com/rap4all/housing/",
commit = "1c860959310b80e67c41f7bbdc3e84cef00df18e"
),
ide = "other",
project_path = path_default_nix,
overwrite = TRUE
)
This call to rix()
generates the following
default.nix
file:
#> # This file was generated by the {rix} R package v0.11.0 on 2024-09-16
#> # with following call:
#> # >rix(r_ver = "8ad5e8132c5dcf977e308e7bf5517cc6cc0bf7d8",
#> # > r_pkgs = c("targets",
#> # > "tarchetypes",
#> # > "rmarkdown"),
#> # > system_pkgs = NULL,
#> # > git_pkgs = list(package_name = "housing",
#> # > repo_url = "https://github.com/rap4all/housing/",
#> # > commit = "1c860959310b80e67c41f7bbdc3e84cef00df18e"),
#> # > ide = "other",
#> # > project_path = path_default_nix,
#> # > overwrite = TRUE)
#> # It uses nixpkgs' revision 8ad5e8132c5dcf977e308e7bf5517cc6cc0bf7d8 for reproducibility purposes
#> # which will install R version 4.2.2.
#> # Report any issues to https://github.com/ropensci/rix
#> let
#> pkgs = import (fetchTarball "https://github.com/NixOS/nixpkgs/archive/8ad5e8132c5dcf977e308e7bf5517cc6cc0bf7d8.tar.gz") {};
#>
#> rpkgs = builtins.attrValues {
#> inherit (pkgs.rPackages)
#> rmarkdown
#> tarchetypes
#> targets;
#> };
#>
#> git_archive_pkgs = [
#> (pkgs.rPackages.buildRPackage {
#> name = "housing";
#> src = pkgs.fetchgit {
#> url = "https://github.com/rap4all/housing/";
#> rev = "1c860959310b80e67c41f7bbdc3e84cef00df18e";
#> sha256 = "sha256-s4KGtfKQ7hL0sfDhGb4BpBpspfefBN6hf+XlslqyEn4=";
#> };
#> propagatedBuildInputs = builtins.attrValues {
#> inherit (pkgs.rPackages)
#> dplyr
#> ggplot2
#> janitor
#> purrr
#> readxl
#> rlang
#> rvest
#> stringr
#> tidyr;
#> };
#> })
#> ];
#>
#> system_packages = builtins.attrValues {
#> inherit (pkgs)
#> glibcLocales
#> nix
#> R;
#> };
#>
#> in
#>
#> pkgs.mkShell {
#> LOCALE_ARCHIVE = if pkgs.system == "x86_64-linux" then "${pkgs.glibcLocales}/lib/locale/locale-archive" else "";
#> LANG = "en_US.UTF-8";
#> LC_ALL = "en_US.UTF-8";
#> LC_TIME = "en_US.UTF-8";
#> LC_MONETARY = "en_US.UTF-8";
#> LC_PAPER = "en_US.UTF-8";
#> LC_MEASUREMENT = "en_US.UTF-8";
#>
#> buildInputs = [ git_archive_pkgs rpkgs system_packages ];
#>
#> }
The environment that gets built from this default.nix
file contains R version 4.2.2, the targets and
tarchetypes packages, as well as the
{housing}
packages, which is a package that is hosted on
GitHub only with some data and useful functions for the project. Because
it is on Github, it gets installed using the buildRPackage
function from Nix. You can use this environment to work on you project,
or to launch a targets pipeline. This
Github repository contains the finalized project.
On your local machine, you could execute the pipeline in the environment by running this in a terminal:
cd /absolute/path/to/housing/ && nix-shell default.nix --run "Rscript -e 'targets::tar_make()'"
If you wish to run the pipeline whenever you drop into the Nix shell,
you could add a Shell-hook to the generated
default.nix
file:
path_default_nix <- tempdir()
rix(
r_ver = "4.2.2",
r_pkgs = c("targets", "tarchetypes", "rmarkdown"),
system_pkgs = NULL,
git_pkgs = list(
package_name = "housing",
repo_url = "https://github.com/rap4all/housing/",
commit = "1c860959310b80e67c41f7bbdc3e84cef00df18e"
),
ide = "other",
shell_hook = "Rscript -e 'targets::tar_make()'",
project_path = path_default_nix,
overwrite = TRUE
)
Now, each time you drop into the Nix shell for that project using
nix-shell
, the pipeline gets automatically executed.
rix also features a function called
tar_nix_ga()
that adds a GitHub Actions workflow file to
make the pipeline run automatically on GitHub Actions. The GitHub
repository linked above has such a file, so each time changes get
pushed, the pipeline runs on Github Actions and the results are
automatically pushed to a branch called targets-runs
. See
the workflow file here.
This feature is very heavily inspired and adapted from the
targets::github_actions()
function.