Chapter 1 Packaging Guide

rOpenSci accepts packages that meet our guidelines via a streamlined onboarding process. To ensure a consistent style across all of our tools we have written this chapter highlighting our guidelines for package development. Please also read and apply our chapter about continuous integration (CI). Further guidance for after onboarding is provided in the third section of this book starting with a chapter about collaboration.

We strongly recommend that package developers read Hadley Wickham’s concise but thorough book on package development which is available for free online (and print). Our guide is partially redundant with other resources but highlights rOpenSci’s guidelines.

To read why submitting a package to rOpenSci is worth the effort to meet guidelines, have a look at reasons to submit.

1.1 Package name and metadata

1.1.1 Naming your package

  • We strongly recommend short, descriptive names in lower case. If your package deals with one or more commercial services, please make sure the name does not violate branding guidelines. You can check if your package name is available, informative and not offensive by using the available package. In particular, do not choose a package name that’s already used on CRAN or Bioconductor.

  • A more unique package name might be easier to track (for you and us to assess package use) and search (for users to find it and to google their questions). Obviously a too unique package name might make the package less discoverable (e.g. it might be an argument for naming your package geojson).

  • Find other interesting aspects of naming your package in this blog post by Nick Tierney, and in case you change your mind, find out how to rename your package in this other blog post of Nick’s.

1.1.2 Creating metadata for your package

We recommend you to use the codemetar package for creating and updating a JSON CodeMeta metadata file for your package via codemetar::write_codemeta(). It will automatically include all useful information, including GitHub topics. CodeMeta uses Schema.org terms so as it gains popularity the JSON metadata of your package might be used by third-party services, maybe even search engines.

1.2 Package API

1.3 Function and argument naming

  • Functions and arguments naming should be chosen to work together to form a common, logical programming API that is easy to read, and auto-complete.

    • Consider an object_verb() naming scheme for functions in your package that take a common data type or interact with a common API. object refers to the data/API and verb the primary action. This scheme helps avoid namespace conflicts with packages that may have similar verbs, and makes code readable and easy to auto-complete. For instance, in stringi, functions starting with stri_ manipulate strings (stri_join(), stri_sort(), and in googlesheets functions starting with gs_ are calls to the Google Sheets API (gs_auth(), gs_user(), gs_download()).
  • For functions that manipulate an object/data and return an object/data of the same type, make the object/data the first argument of the function so as to enhance compatibility with the pipe operator (%>%)

  • We strongly recommend snake_case over all other styles unless you are porting over a package that is already in wide use.

  • Avoid function name conflicts with base packages or other popular ones (e.g. ggplot2, dplyr, magrittr, data.table)

    • Argument naming and order should be consistent across functions that use similar inputs.
  • Package functions importing data should not import data to the global environment, but instead must return objects. Assignments to the global environment are to be avoided in general.

1.3.1 Console messages

  • Use message() and warning() to communicate with the user in your functions. Please do not use print() or cat() unless it’s for a print.*() method, as these methods of printing messages are harder for the user to suppress.

1.3.2 Interactive/Graphical Interfaces

If providing graphical user interface (GUI) (such as a Shiny app), to facilitate workflow , include a mechanism to automatically reproduce steps taken in the GUI. This could include auto-generation of code to reproduce the same outcomes, output of intermediate values produced in the interactive tool, or simply clear and well-documented mapping between GUI actions and scripted functions. (See also “Testing” below.)

The tabulizer package e.g. has an interactive workflow to extract tables, but can also only extract coordinates so one can re-run things as a script. Besides, two examples of shiny apps that do code generation are https://gdancik.shinyapps.io/shinyGEO/, and https://github.com/wallaceEcoMod/wallace/

1.4 Code Style

  • For more information on how to style your code, name functions, and R scripts inside the R/ folder, we recommend reading the code chapter in Hadley’s book. We recommend the styler package for automating part of the code styling.

1.5 README

  • All packages should have a README file, named README.md, in the root of the repository. The README should include, from top to bottom:

    • The package name
    • Badges for continuous integration and test coverage, the badge for rOpenSci peer-review once it has started (see below), a repostatus.org badge, and any other badges
    • Short description of goals of package, with descriptive links to all vignettes (rendered, i.e. readable, cf the documentation website section) unless the package is small and there’s only one vignette repeating the README.
    • Installation instructions
    • Any additional setup required (authentication tokens, etc)
    • Brief demonstration usage
    • If applicable, how the package compares to other similar packages and/or how it relates to other packages
    • Citation information

If you use another repo status badge such as a lifecycle badge, please also add a repostatus.org badge. Example of a repo README with two repo status badges.

  • Once you have submitted a package and it has passed editor checks, add a peer-review badge via
[![](https://badges.ropensci.org/<issue_id>_status.svg)](https://github.com/ropensci/onboarding/issues/<issue_id>)

where issue_id is the number of the issue in the onboarding repository. For instance, the badge for rtimicropem review uses the number 126 since it’s the review issue number. The badge will first indicated “under review” and then “peer-reviewed” once your package has been onboarded (issue labelled “approved” and closed), and will link to the review issue.

  • If your README has many badges consider ordering them in an html table to make it easier for newcomers to gather information at a glance. See examples in drake repo and in qualtRics repo. Possible sections are
  • If your package connects to a data source or online service, or wraps other software, consider that your package README may be the first point of entry for users. It should provide enough information for users to understand the nature of the data, service, or software, and provide links to other relevant data and documentation. For instance, a README should not merely read, “Provides access to GooberDB,” but also include, “…, an online repository of Goober sightings in South America. More information about GooberDB, and documentation of database structure and metadata can be found at link”.

  • We recommend not creating README.md directly, but from a README.Rmd file (an R Markdown file) if you have any demonstration code. The advantage of the .Rmd file is you can combine text with code that can be easily updated whenever your package is updated.

  • Extensive examples should be kept for a vignette. If you want to make the vignettes more accessible before installing the package, we suggest creating a website for your package

  • Consider using usethis::use_readme_rmd() to get a template for a README.Rmd file and to automatically set up a pre-commit hook to ensure that README.md is always newer than README.Rmd.

  • After a package is accepted but before transfer, the rOpenSci footer should be added to the bottom of the README file with the following markdown line:

[![ropensci_footer](http://ropensci.org/public_images/github_footer.png)](https://ropensci.org)

1.6 Documentation

  • All exported package functions should be fully documented with examples.

  • We request all submissions to use roxygen2 for documentation. roxygen2 is an R package that automatically compiles .Rd files to your man folder in your package from simple tags written above each function.

  • More information on using roxygen2 documentation is available in the R packages book.

  • One key advantage of using roxygen2 is that your NAMESPACE will always be automatically generated and up to date.

  • All functions should document the type of object returned under the @return heading.

  • We recommend using the @family tag in the documentation of functions to allow their grouping in the documentation of the installed package and potentially in the package’s website, see this section of Hadley Wickham’s book and this section of the present chapter for more details.

  • The package should contain top-level documentation for ?foobar, (or ?foobar-package if there is a naming conflict). Optionally, you can use both ?foobar and ?foobar-package for the package level manual file, using @aliases roxygen tag. usethis::use_package_doc() adds the template for the top-level documentation.

  • The package should contain at least one vignette providing a substantial coverage of package functions, illustrating realistic use cases and how functions are intended to interact. If the package is small, the vignette and the README can have the same content.

  • As is the case for a README, top-level documentation or vignettes may be the first point of entry for users. If your package connects to a data source or online service, or wraps other software, it should provide enough information for users to understand the nature of the data, service, or software, and provide links to other relevant data and documentation. For instance, a vignette intro or documentation should not merely read, “Provides access to GooberDB,” but also include, “…, an online repository of Goober sightings in South America. More information about GooberDB, and documentation of database structure and metadata can be found at link”. Any vignette should outline prerequisite knowledge to be able to understand the vignette upfront.

The general vignette should present a series of examples progressing in complexity from basic to advanced usage.

  • Functionality likely to be used by only more advanced users or developers might be better put in a separate vignette (i.e. programming/NSE with dplyr).

  • The vignette(s) should include citations to software and papers where appropriate.

  • Add #' @noRd to internal functions.

  • Only use package startup messages when necessary (function masking for instance). Avoid package startup messages like “This is foobar 2.4-0” or citation guidance because they can be annoying to the user. Rely on documentation for such guidance.

  • You can choose to have a README section about use cases of your package (other packages, blog posts, etc.), example.

  • If you prefer not to clutter up code with extensive documentation, place further documentation/examples in files in a man-roxygen folder in the root of your package, and those will be combined into the manual file by the use of @template <file name>, for example.
    • Put any documentation for an object in a .R file in the man-roxygen folder (at the root of your package). For example, this file. Link to that template file from your function (e.g.) with the @template keyword (e.g.). The contents of the template will be inserted when documentation is built into the resulting .Rd file that users will see when they ask for documentation for the function.
    • Note that if you are using markdown documentation, markdown currently doesn’t work in template files, so make sure to use latex formatting.
    • In most cases you can ignore templates and man-roxygen, but there are two cases in which leveraging them will greatly help:
      1. When you have a lot of documentation for a function/class/object separating out certain chunks of that documentation can keep your .R source file tidy. This is especially useful when you have a lot of code in that .R file.
      2. When you have the same documentation parts used across many .R functions it’s helpful to use a template. This reduces duplicated text, and helps prevent mistakingly updating documentation for one function but not the other.

1.7 Documentation website

We recommend creating a documentation website for your package using pkgdown. Here is a good tutorial to get started with pkgdown, and unsurprisingly pkgdown has a its own documentation website.

There are a few tips we’d like to underline here.

1.7.1 Grouping functions in the reference

When your package has many functions, use grouping in the reference, which you can do more or less automatically.

If you use roxygen above version 6.0.1.9000 (as of July 2018, development version to be installed via remotes::install_github("klutometis/roxygen")) you should use the @family tag in your functions documentation to indicate grouping. This will give you links between functions in the local documentation of the installed package (“See also” section) and allow you to use the pkgdown has_concept function in the config file of your website. Non-rOpenSci example courtesy of optiRum: family tag, pkgdown config file and resulting reference section.

Less automatically, see the example of drake website and associated config file.

1.7.2 Automatic deployment of the documentation website

You could use the tic package for automatic deployment of the package’s website, see this example repo. This would save you the hassle of running (and remembering to run) pkgdown::build_site() yourself every time the site needs to be updated. First refer to our chapter on continuous integration if you’re not familiar with continuous integration/Travis.

1.7.3 Branding of authors

You can make the names of (some) authors clickable by adding their URL, and you can even replace their names with a logo (think rOpenSci… or your organisation/company!). See pkgdown documentation and this example in the wild: pkgdown config file, resulting website.

1.8 Authorship

The DESCRIPTION file of a package should list package authors and contributors to a package, using the Authors@R syntax to indicate their roles (author/creator/contributor etc.) if there is more than one author. See this section of “Writing R Extensions” for details. If you feel that your reviewers have made a substantial contribution to the development of your package, you may list them in the Authors@R field with a Reviewer contributor type ("rev"), like so:

    person("Bea", "Hernández", role = "rev",
    comment = "Bea reviewed the package for ropensci, see <https://github.com/ropensci/onboarding/issues/116>"),

Only include reviewers after asking for their consent. Read more in this blog post “Thanking Your Reviewers: Gratitude through Semantic Metadata”. Note that ‘rev’ will raise a CRAN NOTE unless the package is built using R v3.5. As of June 2018 you need to use roxygen2 dev version for the list of authors in the package-level documentation to be compiled properly with the “rev” role (because this is a MARC role not included yet in royxgen2 CRAN version from February 2017).

Please do not list editors as contributors. Your participation in and contribution to rOpenSci is thanks enough!

1.9 Testing

  • All packages should pass R CMD check/devtools::check() on all major platforms.

  • All packages should have a test suite that covers major functionality of the package. The tests should also cover the behavior of the package in case of errors.

  • It is good practice to write unit tests for all functions, and all package code in general, ensuring key functionality is covered. Test coverage below 75% will likely require additional tests or explanation before being sent for review.

  • We recommend using testthat for writing tests. Strive to write tests as you write each new function. This serves the obvious need to have proper testing for the package, but allows you to think about various ways in which a function can fail, and to defensively code against those. More information.

  • Packages with shiny apps should use a unit-testing framework such as shinytest to test that interactive interfaces behave as expected.

  • Once you’ve set up CI, use your package’s code coverage report (cf this section of our book) to identify untested lines, and to add further tests.

  • testthat has a function skip_on_cran() that you can use to not run tests on CRAN. We recommend using this on all functions that are API calls since they are quite likely to fail on CRAN. These tests will still run on Travis.

  • Even if you use continuous integration, we recommend that you run tests locally prior to submitting your package, as some tests are often skipped (you may need to set Sys.setenv(NOT_CRAN="true") in order to ensure all tests are run). In addition, we recommend that prior to submitting your package, you use MangoTheCat’s goodpractice package to check your package for likely sources of errors, and run spelling::spell_check_package() to find spelling errors in documentation.

1.10 Examples

  • Include extensive examples in the documentation. In addition to demonstrating how to use the package, these can act as an easy way to test package functionality before there are proper tests. However, keep in mind we require tests in contributed packages.

  • You can run examples with devtools::run_examples(). Note that when you run R CMD CHECK or equivalent (e.g., devtools::check()) your examples that are not wrapped in \dontrun{} or \donttest{} are run.
  • In addition to running examples locally on your own computer, we strongly advise that you run examples on one of the CI systems, e.g. Travis-CI. Again, examples that are not wrapped in \dontrun{} or \donttest{} will be run, but for those that are you can add r_check_args: "--run-dontrun" to run examples wrapped in \dontrun{} in your .travis.yml (and/or --run-donttest if you want to run examples wrapped in \donttest{}).

1.11 Package dependencies

  • Use Imports instead of Depends for packages providing functions from other packages. Make sure to list packages used for testing (testthat), and documentation (knitr, roxygen2) in your Suggests section of package dependencies. If you use any package in the examples or tests of your package, make sure to list it in Suggests, if not already listed in Imports.

  • For most cases where you must expose functions from dependencies to the user, you should import and re-export those individual functions rather than listing them in the Depends fields. For instance, if functions in your package produce raster objects, you might re-export only printing and plotting functions from the raster package.

  • If your package uses a system dependency, you should
  • When considering depending on a new package think about whether you really need to use the package. First, if you can easily write the code yourself to do what the package does, and it’s relatively simple, you probably don’t need to import a package. Second, consider “how heavy” the package is that you would import to your package. A “heavy” package is one with a lot of dependencies itself and/or with one or more dependencies that are hard to install. The more dependencies required means more possible problems users will have either with installation or breaking changes.

  • Consider if any dependencies have overlapping functionality, and if so, if either of the dependencies can be removed.

1.13 Miscellaneous CRAN gotchas

This is a collection of CRAN gotchas that are worth avoiding at the outset.

  • Make sure your package title is in Title Case.
  • Do not put a period on the end of your title.
  • Avoid starting the description with the package name or “This package …”.
  • Make sure you include links to websites if you wrap a web API, scrape data from a site, etc. in the Description field of your DESCRIPTION file.
  • Avoid long running tests and examples. Consider testthat::skip_on_cran in tests to skip things that take a long time but still test them locally and on Travis.
  • Include top-level files such as paper.md, .travis.yml in your .Rbuildignore file.

1.14 Further guidance

  • Hadley Wickham’s R Packages is an excellent, readable resource on package development which is available for free online (and print).

  • Writing R Extensions is the canonical, usually most up-to-date, reference for creating R packages.

  • If you are submitting a package to rOpenSci via the onboarding repo, you can direct further questions to the rOpenSci team in the issue tracker, or in our discussion forum.

  • Before submitting a package use the goodpractice package (goodpractice::gp()) as a guide to improve your package, since most exceptions to it will need to be justified. E.g. the use of foo might be generally bad and therefore flagged by goodpractice but you had a good reason to use it in your package.