Introduction to projrsimple

library(projrsimple)

Introduction

The goal of projrsimple is to make it easy to run a clean project workflow.

It helps you initialise a reproducible project structure with pre‐defined directories, run analysis scripts, and ensure clean output directories with minimal effort.

Installation

You can install the development version of projrsimple like so:

if (!requireNamespace("remotes", quietly = TRUE)) {
  install.packages("remotes")
}
remotes::install_github("MiguelRodo/projrsimple")

TL;DR

Install projrsimple (see above).
Open an R session in your project directory (folder where you want to work).
Run projr_init() to initialise your project.
Write code in scripts (.R, .Rmd, .qmd and/or Quarto projects) in the project directory.
Run projr_run to execute all scripts. Save outputs to _output directory.
View rendered documents in the docs directory.

To connect to GitHub automatically, run projr_init(init_github = TRUE) (step 3) after the following steps:

Create a GitHub account (link).
Set up a personal access token (PAT) in R (instructions).

Details

Project Initialisation with `projr_init`

The function projr_init is designed to help you quickly set up a standard project
structure. It will (all of these features are optional):

Create Standard Directories:

By default, it creates directories for:
- Raw data: _raw_data
- Rendered documents: docs
- Final outputs: _output
- Temporary/cache files: _tmp
- Raw (non‐rendered) docs: _reference
You can supply your own values (provide NULL to skip a directory).
Initialise a Git Repository:

projr_init can initialise a Git repository in your project. It will automatically add the cache directory (by default _tmp) to your .gitignore to avoid committing
temporary files, and (if enabled) will add and commit any changes.
Note: To push your repository to GitHub, you’ll need a GitHub account and must
ensure that R is configured with your personal access token (PAT).
Create a README File:

A default README is created with instructions on how to run your analyses using projrsimple, as well as a brief description of the project, its structure and empty fields for links and project details.
Connect to GitHub:

projr_init can connect your local Git repository to GitHub. The repository is private by default.

Running Analyses with `projr_run`

After you have initialised your project, you can use projr_run to execute your
analyses. By default:

Script Detection and Execution:

All scripts in your working directory with extensions .R, .Rmd/.rmd, and
.qmd are run automatically. If a Quarto project is detected (_quarto.yml exists), then a Quarto project is rendered.
Copying Generated Documents:

The function will copy any generated documents (typically .html, .pdf, or
.docx) to the docs directory.
Output Directory Management:

Optionally, the output directory (by default _output) is cleared before running
all the scripts. By default, this is FALSE (to avoid unintended deletions), but should be set to TRUE to ensure clean runs.
Selective Execution:

You can specify a subset of scripts to run, rather than all found in the scripts directory.

Multiple Pipelines

For long-running and/or complex workflows, you can run projr_run multiple times to create
separate “pipelines”. For example, you might run one pipeline that processes raw data
and another that generates reports, each using different script selections and
output/document directories. This modular approach allows you to flexibly manage
complex projects with multiple stages.

For example:

# processed data
projr_run(
  scripts = "process-data.R",
  dir_output = "_output/processed_data",
  dir_docs = "docs/processed_data",
  clear_output_and_docs = TRUE
)
# analyse data
projr_run(
  scripts = "analyse-data.R",
  dir_output = "_output/analysis",
  dir_docs = "docs/analysis",
  clear_output_and_docs = TRUE
)

Since separate output and docs directories are used for the different pipelines, the analysis pipeline can be run cleanly (clear output and docs directories) without affecting the processed data pipeline.

Execution Directory

Keeping scripts in a sub-directory (e.g. src/ or scripts/) can improve organisation, but may complicate the execution directory when running Quarto and RMarkdown documents.

The execution directory is the directory where R thinks it is when running scripts.

For Quarto and RMarkdown, the execution directory is the directory of the .Rmd/.qmd file. So, if we run quarto::quarto_render("scripts/analysis.qmd") from the project directory, the execution directory is actually scripts/. For example, if we run quarto::quarto_render("scripts/analysis.qmd") from the project directory, the execution directory is scripts/.

This has two possible problems:

The user needs to be aware of this, and specify all paths within the Quarto/RMarkdown doc relative to scripts/, rather than the project directory. This is not typically what people expect.
This is not the same behaviour as R scripts. If we run source("scripts/analysis.R"), the execution directory is not scripts/ but the project directory.

For this reason, to prevent surprises, if you use Quarto or RMarkdown it is recommended to keep all scripts in the project directory.

Workarounds

However, if you would like to keep Quarto/RMarkdown docs in a subdirectory but use the project directory as the execution directory, you have the following two options.

Option 1 (preferred): Use `knitr` and specify the execution directory

To ensure consistency between projr_run and manual knitting, place the following at the very top of your .Rmd/.qmd file (within a setup chunk, of course):

knitr::opts_knit$set(root.dir = "/your/desired/path")

For example, if the Quarto doc is in the scripts directory but you want the execution directory to be the project directory, replace "/your/desired/path" with ".." (note two dots - this means “one level up”).

Option 2: Specify the execution directory in `projr_run`

You can specify the dir_exec argument in projr_run to set the execution directory. For example:

projr_run(
  scripts = c("scripts/data_processing.R", "scripts/analysis.qmd"),
  dir_exec = "." # note only one dot
)

This forces the execution directory to be the project directory for both the R script and Quarto doc, even though the scripts are in a subdirectory.

The problem with this approach is that the Knit/Render button in RStudio will still render the Quarto/RMarkdown docs with the parent directory as the execution directory (scripts/, in this case). This is the reason Option 1 is preferred.

On the other hand, if you wanted the R script scripts/data_processing.R to have scripts/ as the execution directory, then using projr_run with dir_exec = "scripts/" would be a good approach.

Citation

To cite projrsimple in publications, use:

Miguel Rodo (2024). projrsimple: Initialise and run a simple project workflow. Version 1.0.0. Available at: https://github.com/MiguelRodo/projrsimple.

Alternatively, in BibTeX format:

@Misc{rodo,
  title = {projrsimple: Initialise and run a simple project workflow},
  author = {Miguel Rodo},
  url = {https://github.com/MiguelRodo/projrsimple/#readme},
  abstract = {Initialise and run a simple R project workflow},
  version = {1.0.0},
}