Introduction
The goal of projrsimple is to make it easy to run a
clean project workflow.
It helps you initialise a reproducible project structure with pre‐defined directories, run analysis scripts, and ensure clean output directories with minimal effort.
Installation
You can install the development version of projrsimple
like so:
if (!requireNamespace("remotes", quietly = TRUE)) {
install.packages("remotes")
}
remotes::install_github("MiguelRodo/projrsimple")TL;DR
- Install
projrsimple(see above). - Open an R session in your project directory (folder where you want to work).
- Run
projr_init()to initialise your project. - Write code in scripts (
.R,.Rmd,.qmdand/or Quarto projects) in the project directory. - Run
projr_runto execute all scripts. Save outputs to_outputdirectory. - View rendered documents in the
docsdirectory.
To connect to GitHub automatically, run
projr_init(init_github = TRUE) (step 3) after the
following steps:
- Create a GitHub account (link).
- Set up a personal access token (PAT) in R (instructions).
Details
Project Initialisation with projr_init
The function projr_init is designed to help you quickly
set up a standard project
structure. It will (all of these features are optional):
-
Create Standard Directories:
By default, it creates directories for:
- Raw data:
_raw_data - Rendered documents:
docs - Final outputs:
_output - Temporary/cache files:
_tmp - Raw (non‐rendered) docs:
_reference
You can supply your own values (provide
NULLto skip a directory). - Raw data:
-
Initialise a Git Repository:
projr_initcan initialise a Git repository in your project. It will automatically add the cache directory (by default_tmp) to your.gitignoreto avoid committing
temporary files, and (if enabled) will add and commit any changes.
Note: To push your repository to GitHub, you’ll need a GitHub account and must
ensure that R is configured with your personal access token (PAT). -
Create a README File:
A default README is created with instructions on how to run your analyses using
projrsimple, as well as a brief description of the project, its structure and empty fields for links and project details. -
Connect to GitHub:
projr_initcan connect your local Git repository to GitHub. The repository is private by default.
Running Analyses with projr_run
After you have initialised your project, you can use
projr_run to execute your
analyses. By default:
-
Script Detection and Execution:
All scripts in your working directory with extensions
.R,.Rmd/.rmd, and.qmdare run automatically. If a Quarto project is detected (_quarto.ymlexists), then a Quarto project is rendered. -
Copying Generated Documents:
The function will copy any generated documents (typically
.html,.pdf, or.docx) to thedocsdirectory. -
Output Directory Management:
Optionally, the output directory (by default
_output) is cleared before running
all the scripts. By default, this isFALSE(to avoid unintended deletions), but should be set toTRUEto ensure clean runs. -
Selective Execution:
You can specify a subset of scripts to run, rather than all found in the
scriptsdirectory.
Multiple Pipelines
For long-running and/or complex workflows, you can run
projr_run multiple times to create
separate “pipelines”. For example, you might run one pipeline that
processes raw data
and another that generates reports, each using different script
selections and
output/document directories. This modular approach allows you to
flexibly manage
complex projects with multiple stages.
For example:
# processed data
projr_run(
scripts = "process-data.R",
dir_output = "_output/processed_data",
dir_docs = "docs/processed_data",
clear_output_and_docs = TRUE
)
# analyse data
projr_run(
scripts = "analyse-data.R",
dir_output = "_output/analysis",
dir_docs = "docs/analysis",
clear_output_and_docs = TRUE
)Since separate output and docs directories
are used for the different pipelines, the analysis pipeline can be run
cleanly (clear output and docs directories)
without affecting the processed data pipeline.
Execution Directory
Keeping scripts in a sub-directory (e.g. src/ or
scripts/) can improve organisation, but may complicate the
execution directory when running Quarto and
RMarkdown documents.
The execution directory is the directory where R thinks
it is when running scripts.
For Quarto and RMarkdown, the execution
directory is the directory of the .Rmd/.qmd
file. So, if we run
quarto::quarto_render("scripts/analysis.qmd") from the
project directory, the execution directory is actually
scripts/. For example, if we run
quarto::quarto_render("scripts/analysis.qmd") from the
project directory, the execution directory is scripts/.
This has two possible problems:
- The user needs to be aware of this, and specify all paths within the
Quarto/RMarkdowndoc relative toscripts/, rather than the project directory. This is not typically what people expect. - This is not the same behaviour as
Rscripts. If we runsource("scripts/analysis.R"), the execution directory is notscripts/but the project directory.
For this reason, to prevent surprises, if you use Quarto
or RMarkdown it is recommended to keep all scripts in the
project directory.
Workarounds
However, if you would like to keep
Quarto/RMarkdown docs in a subdirectory but
use the project directory as the execution directory, you have the
following two options.
Option 1 (preferred): Use knitr and specify the
execution directory
To ensure consistency between projr_run and manual
knitting, place the following at the very top of your
.Rmd/.qmd file (within a setup chunk, of
course):
knitr::opts_knit$set(root.dir = "/your/desired/path")For example, if the Quarto doc is in the
scripts directory but you want the execution directory to
be the project directory, replace "/your/desired/path" with
".." (note two dots - this means “one level up”).
Option 2: Specify the execution directory in
projr_run
You can specify the dir_exec argument in
projr_run to set the execution directory. For example:
projr_run(
scripts = c("scripts/data_processing.R", "scripts/analysis.qmd"),
dir_exec = "." # note only one dot
)This forces the execution directory to be the project directory for
both the R script and Quarto doc, even though
the scripts are in a subdirectory.
The problem with this approach is that the
Knit/Render button in RStudio
will still render the Quarto/RMarkdown docs
with the parent directory as the execution directory
(scripts/, in this case). This is the reason
Option 1 is preferred.
On the other hand, if you wanted the R script
scripts/data_processing.R to have scripts/ as
the execution directory, then using projr_run with
dir_exec = "scripts/" would be a good approach.
Citation
To cite projrsimple in publications, use:
Miguel Rodo (2024). projrsimple: Initialise and run a simple project workflow. Version 1.0.0. Available at: https://github.com/MiguelRodo/projrsimple.
Alternatively, in BibTeX format: