Chapter 5 Onboarding policies
This chapter contains the policies of rOpenSci onboarding system of packages.
In particular, you’ll read our policies regarding onboarding itself: the review submission process, and the aims and scope of the onboarding system. This chapter also features our policies regarding package ownership and maintenance.
Last but not least, you’ll find the code of conduct of rOpenSci onboarding here.
5.1 Review process
- For a package to be considered for the rOpenSci suite, package authors must initiate a request on the ropensci/onboarding repository.
- Packages are reviewed for quality, fit, documentation, clarity and the review process is quite similar to a manuscript review (see our packaging guide and reviewing guide for more details). Unlike a manuscript review, this process will be an ongoing conversation.
- Once all major issues and questions, and those addressable with reasonable effort, are resolved, the editor assigned to a package will make a decision (accept, hold, or reject). Rejections are usually done early (before the review process begins, see the aims and scope section), but in rare cases a package may also be not onboarded after review & revision. It is ultimately editor’s decision on whether or not to reject the package based on how the reviews are addressed.
- Communication between authors, reviewers and editors will first and foremost take place on GitHub, although you can choose to contact the editor by email or Slack for some issues. When submitting a package, please make sure your GitHub notification settings make it unlikely you will miss a comment.
- The author can choose to have their submission put on hold (editor applies the holding label). The holding status will be revisited every 3 months, and after one year the issue will be closed.
- If the author hasn’t requested a holding label, but is simply not responding, we should close the issue within one month after the last contact intent. This intent will include a comment tagging the author, but also an email using the email address listed in the DESCRIPTION of the package which is one of the rare cases where the editor will try to contact the author by email.
- If a submission is closed and the author wishes to re-submit, they’ll have to start a new submission. If the package is still in scope, the author will have to respond to the initial reviews before the editor starts looking for new reviewers.
5.1.1 Publishing in other Venues
- We strongly suggest submitting your package for review before publishing on CRAN or submitting a software paper describing the package to a journal. Review feedback may result in major improvements and updates to your package, including renaming and breaking changes to functions. We do not consider previous publication on CRAN or in other venues sufficient reason to not adopt reviewer or editor recommendations.
- Do not submit your package for review while it or an associated manuscript is also under review at another venue, as this may result on conflicting requests for changes.
5.2 Aims and Scope
rOpenSci aims to support packages that enable reproducible research and managing the data lifecycle for scientists. Packages submitted to rOpenSci should fit into one or more of the categories outlined below. If you are unsure whether your package fits into one of these categories, please open an issue as a pre-submission inquiry (Examples).
As this is a living document, these categories may change through time and not all previously onboarded packages would be in-scope today. While we strive to be consistent, we evaluate packages on a case-by-case basis and may make exceptions.
Please note that rOpenSci package development happens in several different spaces: unconferences, internal development by staff and onboarding, which can be confusing. The emphasis of the unconference is on experimentation, exploration and community-building – participants work on all kinds of things that wouldn’t be in scope for onboarding. Similarly, some of the software developed by rOpenSci staff is focused on filling gaps in R infrastructure and wouldn’t be in-scope for onboarding.
5.2.1 Package categories
data retrieval: Packages for accessing and downloading data from online sources with scientific applications. Our definition of scientific applications is broad, including data storage services, journals, and other remote servers, as many data sources may be of interest to researchers. However, retrieval packages should be focused on data sources / topics, rather than services. For example a general client for Amazon Web Services data storage would not be in-scope. (Examples: rotl, gutenbergr)
data extraction: Packages that aid in retrieving data from unstructured sources such as text, images and PDFs, as well as parsing scientific data types and outputs from scientific equipment. Statistical/ML libraries for modeling or prediction are typically not included in this category, but trained models that act as utilities (e.g., for optical character recognition), may qualify. (Examples: tabulizer, robotstxt, genbankr)
database access: Bindings and wrappers for generic database APIs (Example: rrlite)
data munging: Packages for processing data from formats above. This area does not include broad data manipulations tools such as reshape2 or tidyr, but rather tools for handling data in specific scientific formats. (Example: plateR)
data deposition: Packages that support deposition of data into research repositories, including data formatting and metadata generation. (Example: EML)
reproducibility: Tools that facilitate reproducible research. This includes packages that facilitate use of version control, provenance tracking, automated testing of data inputs and statistical outputs, citation of software and scientific literature. It does not include general tools for literate programming (e.g., R markdown extensions not under the previous topics). (Example: assertr)
In addition, we have some specialty topics with a slightly broader scope.
text analysis: We are currently piloting a sub-specialty area for text analysis that includes implementation of statistical/ML methods for analyzing or extracting text data. This does not include packages with new methods, but only implementation or wrapping of previously published methods. As this is a pilot, the scope for this topic is not fully defined and we are still developing a reviewer base and process for this area. Please open a pre-submission inquiry if you are considering submitting a package that falls under this topic.
5.2.2 Other scope considerations
Packages should be general in the sense that they should solve a problem as broadly as possible while maintaining a coherent user interface and code base. For instance, if several data sources use an identical API, we prefer a package that provides access to all the data sources, rather than just one.
Packages that include interactive tools to facilitate researcher workflows (e.g., shiny apps) must have a mechanism to make the interactive workflow reproducible, such as code generation or a scriptable API.
Here are some types of packages we are unlikely to accept:
- Packages that wrap or implement statistical or machine learning methods. We are not organized so as to review the correctness of these methods. (But see “text analysis”, above)
- Exploratory data analysis packages that visualize or summarize data.
- General workflow or package development support packages
For packages that are not in the scope of rOpenSci, we encourage submitting them to CRAN, BioConductor, as well as other R package development initiatives (e.g., cloudyr), and software journals such as JOSS, JSS, or the R journal.
Note that the packages developed internally by rOpenSci, through our events or through collaborations are not all in-scope for our onboarding process.
5.2.3 Package overlap
rOpenSci encourages competition among packages, forking and re-implementation as they improve options of users overall. However, as we want packages in the rOpenSci suite to be our top recommendations for the tasks they perform, we aim to avoid duplication of functionality of existing R packages in any repo without significant improvements. An R package that replicates the functionality of an existing R package may be considered for inclusion in the rOpenSci suite if it significantly improves on alternatives in any repository (RO, CRAN, BioC) by being:
- More open in licensing or development practices
- Broader in functionality (e.g., providing access to more data sets, providing a greater suite of functions), but not only by duplicating additional packages
- Better in usability and performance
- Actively maintained while alternatives are poorly or no longer actively maintained
These factors should be considered as a whole to determine if the package is a significant improvement. A new package would not meet this standard only by following our package guidelines while others do not, unless this leads to a significant difference in the areas above.
We recommend that packages highlight differences from and improvements over overlapping packages in their README and/or vignettes.
We encourage developers whose packages are not accepted due to overlap to still consider submittal to other repositories or journals.
5.3 Package ownership and maintenance
5.3.1 Role of the rOpenSci team
Authors of contributed packages essentially maintain the same ownership they had prior to their package joining the rOpenSci suite. Package authors will continue to maintain and develop their software after acceptance into rOpenSci. Unless explicitly added as collaborators, the rOpenSci team will not interfere much with day to day operations. However, this team may intervene with critical bug fixes, or address urgent issues if package authors do not respond in a timely manner (see the section about maintainer responsiveness).
5.3.2 Maintainer responsiveness
If package maintainers do not respond in a timely manner to requests for package fixes from CRAN or from us, we will remind the maintainer a number of times, but after 3 months (or shorter time frame, depending on how critical the fix is) we will make the changes ourselves.
The above is a bit vague, so the following are a few areas of consideration.
- Examples where we’d want to move quickly:
foois imported by one or more packages on CRAN, and
foois broken, and thus would break its reverse dependencies.
barmay not have reverse dependencies on CRAN, but is widely used, thus quickly fixing problems is of greater importance.
- Examples where we can wait longer:
hellois not on CRAN, or on CRAN, but has no reverse dependencies.
worldneeds some fixes. The maintainer has responded but is simply very busy with a new job, or other reason, and will attend to soon.
We urge package maintainers to make sure they are receiving GitHub notifications, as well as making sure emails from rOpenSci staff and CRAN maintainers are not going to their spam box. Authors of onboarded packages will be invited to the rOpenSci Slack to chat with the rOpenSci team and the greater rOpenSci community. Anyone can also discuss with the rOpenSci community on the rOpenSci discussion forum.
Should authors abandon the maintenance of an actively used package in our suite, we will consider petitioning CRAN to transfer package maintainer status to rOpenSci.
5.3.3 Quality commitment
rOpenSci strives to develop and promote high quality research software. To ensure that your software meets our criteria, we review all of our submissions as part of the onboarding process, and even after acceptance will continue to step in with improvements and bug fixes.
Despite our best efforts to support contributed software, errors are the responsibility of individual maintainers. Buggy, unmaintained software may be removed from our suite at any time.
5.3.4 Package removal
In the unlikely scenario that a contributor of a package requests removal of their package from the suite, we retain the right to maintain a version of the package in our suite for archival purposes.
5.4 Code of Conduct
- We are committed to providing a friendly, safe and welcoming environment for all, regardless of gender, sexual orientation, disability, ethnicity, religion, or similar personal characteristic.
- Please avoid using openly sexual nicknames or other nicknames that might detract from a friendly, safe and welcoming environment for all.
- Please be kind and courteous. There’s no need to be mean or rude.
- Respect that people have differences of opinion and that every design or implementation choice carries a trade-off and numerous costs. There is seldom a right answer.
- Please keep unstructured critique to a minimum. If you have solid ideas you want to experiment with, make a fork and see how it works.
- We will exclude you from interaction if you insult, demean or harass anyone. That is not welcome behavior. We interpret the term “harassment” as including the definition in the Citizen Code of Conduct; if you have any lack of clarity about what might be included in that concept, please read their definition. In particular, we don’t tolerate behavior that excludes people in socially marginalized groups.
- Private harassment is also unacceptable. No matter who you are, if you feel you have been or are being harassed or made uncomfortable by a community member, please contact us at firstname.lastname@example.org immediately with a capture (log, photo, email) of the harassment if possible. Whether you’re a regular contributor or a newcomer, we care about making this community a safe place for you and we’ve got your back.
- Likewise, any spamming, trolling, flaming, baiting or other attention-stealing behavior are not welcome.
- Avoid the use of personal pronouns in code comments or documentation. There is no need to address persons when explaining code (e.g. “When the developer”)