3  Modern R Package Development Guide

This guide consolidates package development practice using modern recommendations from the tidyverse ecosystem, especially R Packages (2nd edition) by Hadley Wickham and Jenny Bryan.

3.1 Package Structure

A standard R package typically contains:

  • R/: function source files (one primary function family per file).
  • man/: generated Rd help files (do not edit manually).
  • DESCRIPTION: package metadata and dependency declarations.
  • NAMESPACE: exports and imports, usually generated by roxygen2.
  • tests/testthat/: automated tests.
  • vignettes/: long-form user guides.
  • data/: user-facing datasets (.rda), loaded lazily when needed.
  • data-raw/: scripts that build derived datasets (not shipped to users directly).
  • inst/extdata/: raw or auxiliary files users may access via system.file().
  • src/: compiled code (for example C/C++), with LinkingTo when relevant.
  • .Rbuildignore: files excluded from package build.
  • .gitignore: files ignored by Git.
  • NEWS.md: notable user-facing changes.
  • README.Rmd and built README.md: project overview and installation instructions.

3.2 Project Setup (End-to-End)

  1. Create the package project:

    usethis::create_package("~/path/to/pkgname")
  2. Validate naming conventions (letters, numbers, and . only; start with a letter).

  3. Activate common infrastructure:

    usethis::use_git()
    usethis::use_github()
    usethis::use_mit_license("Your Name")
    usethis::use_readme_rmd()
    usethis::use_news_md()
  4. Add continuous checks early:

    usethis::use_testthat(3)
    usethis::use_github_action_check_standard()
  5. During development, use fast feedback loops:

    devtools::load_all()
    devtools::test()
    devtools::document()
    devtools::check()

3.3 Writing Functions

  • Put package code in R/; avoid source() inside a package.
  • Do not call library() or require() in package code.
  • Prefer explicit namespace calls (pkg::fun) unless a function is imported repeatedly.
  • Keep side effects local; for temporary global option/state changes, use withr.
  • Use snake_case naming and tidyverse style conventions.

3.4 Dependencies and Namespaces

Use DESCRIPTION fields correctly:

  • Imports: packages required at runtime.
  • Suggests: optional packages used in tests, examples, or vignettes.
  • LinkingTo: headers needed for compiled code interfaces.
  • SystemRequirements: non-R system dependencies.

Recommended workflow:

usethis::use_package("dplyr", type = "Imports")
usethis::use_package("testthat", type = "Suggests")

In roxygen comments:

  • @export exposes functions to users.
  • @importFrom pkg fun imports specific functions where justified.

Prefer selective imports over broad @import directives.

3.5 Documentation with roxygen2

Document functions directly above their definitions and regenerate docs with:

devtools::document()

Core tags:

  • @param: input arguments.
  • @return: output object.
  • @examples: runnable examples (skip expensive calls with \dontrun{} only when necessary).
  • @seealso: links to related functions.
  • @family: groups related functions.
  • @rdname or @describeIn: combine multiple functions on one help page when appropriate.

For package-level documentation, create R/<pkg>-package.R with:

  • @keywords internal
  • @name <pkg>-package
  • a trailing NULL

3.6 Testing with testthat

Initialise and run tests:

usethis::use_testthat(3)
devtools::test()

Guidelines:

  • Store tests in tests/testthat/.
  • Name files as test-*.R.
  • Group related expectations in test_that() blocks.
  • Use the most specific matcher available (expect_identical(), expect_equal(), expect_match(), etc.).
  • Use snapshot/reference testing where outputs are large or complex.
  • Skip selectively (skip_if_not_installed(), skip_on_cran()) rather than disabling broad test suites.

3.7 Data in Packages

Use the right location for each data type:

  • data/: user-facing, documented datasets.
  • R/sysdata.rda: internal package data not exposed to users.
  • inst/extdata/: external files users can access via system.file().
  • data-raw/: scripts to build reproducible datasets.

Helpful commands:

usethis::use_data_raw()
usethis::use_data(my_dataset, overwrite = TRUE)
usethis::use_data(my_internal_object, internal = TRUE, overwrite = TRUE)
tools::checkRdaFiles()

Document datasets with roxygen @format and @source.

3.8 Build, Check, and Release

Before release:

  • Run devtools::check() locally and in CI.

  • Confirm no unintended notes/warnings/errors.

  • Keep NAMESPACE generated (do not hand-edit unless absolutely necessary).

  • Verify encoding and avoid non-ASCII issues for CRAN where possible.

  • Increment versions with semantic intent:

    usethis::use_version("patch")
  • Maintain a clear NEWS.md.

For CRAN submission, follow current CRAN policies and perform final checks with devtools::check() and devtools::check(cran = TRUE).

3.9 Documenting Multiple Functions on One Help Page

Default recommendation: one function family per file and one principal help topic per user concept.

Use:

  • @seealso and @family for cross-links across related pages.
  • @rdname when several functions should share one help page.
  • @describeIn for closely related variants/methods with a shared conceptual contract.

Example:

#' Basic arithmetic
#'
#' @param x,y Numeric vectors.
#' @name arith
NULL

#' @rdname arith
add <- function(x, y) x + y

#' @rdname arith
times <- function(x, y) x * y

3.10 Parallel Computing in R (Practical Notes)

Check available cores:

parallel::detectCores()

Local multicore workflows:

  • parallel::mclapply() (Unix-like systems).
  • foreach with %dopar% plus doParallel.
  • Use reproducible RNG streams for parallel simulations (for example RNGkind("L'Ecuyer-CMRG") with a fixed seed).

Cluster workflows (PSOCK clusters), local or remote:

  1. Create cluster with parallel::makeCluster().
  2. Export required objects via parallel::clusterExport() when needed.
  3. Evaluate with parallel::parLapply() or related functions.
  4. Stop cleanly using parallel::stopCluster().

Always close clusters and backend registrations to avoid orphan workers.

3.11 Lifecycle R development

The following diagrams summarise package construction, build, installation, and loading in an Excalidraw-like flow.

3.11.1 1) High-level package lifecycle

flowchart LR
    A["Source package<br/>editable files<br/>nothing compiled"] --> B["Bundle (tar.gz)<br/>single compressed archive<br/>distribution artefact"]
    B --> C["Binary package<br/>platform-specific build<br/>compiled where needed"]
    C --> D["Installed package<br/>copied into local library<br/>ready to attach or namespace-load"]
    D --> E["Loaded package in memory<br/>namespace loaded / attached<br/>functions available in session"]

3.11.2 2) What changes between source, bundle, and binary

flowchart LR
    subgraph S["Source"]
      S1["DESCRIPTION"]
      S2["NAMESPACE"]
      S3["README.md"]
      S4["man/"]
      S5["R/ (plain .R files)"]
      S6["src/"]
      S7["tests/"]
      S8["vignettes/"]
      S9["inst/"]
      S10["dev-only files<br/>.Rbuildignore, devtools config, CRAN notes"]
    end

    subgraph B["Bundle"]
      B1["DESCRIPTION"]
      B2["NAMESPACE"]
      B3["README.md"]
      B4["man/"]
      B5["R/"]
      B6["src/"]
      B7["tests/"]
      B8["inst/doc (built vignettes)"]
      B9["inst/"]
    end

    subgraph Y["Binary"]
      Y1["DESCRIPTION (+ parsed metadata cache)"]
      Y2["NAMESPACE"]
      Y3["README.md"]
      Y4["help system<br/>Meta/, html/, help/, INDEX"]
      Y5["R/ (lazy-load database)"]
      Y6["libs/ (compiled shared objects)"]
      Y7["doc/"]
      Y8["top-level files from inst/"]
    end

    S1 --> B1 --> Y1
    S2 --> B2 --> Y2
    S3 --> B3 --> Y3
    S4 --> B4 --> Y4
    S5 --> B5 --> Y5
    S6 --> B6 --> Y6
    S8 --> B8 --> Y7
    S9 --> B9 --> Y8

3.11.3 3) Command-driven workflow (construction to load)

flowchart TD
    A["Create scaffolding<br/>usethis::create_package()"] --> B["Write code + docs<br/>R/, roxygen comments"]
    B --> C["Generate docs + namespace<br/>devtools::document()"]
    C --> D["Run tests<br/>devtools::test()"]
    D --> E["Run full checks<br/>devtools::check()"]
    E --> F["Build source bundle<br/>devtools::build()"]
    F --> G["Install locally<br/>devtools::install()"]
    G --> H["Load in session<br/>library(pkg) or pkg::fun"]