6  Reproducible Research Workflows

6.1 Introduction to Reproducible Research in Clinical Settings

Reproducibility is fundamental to clinical research integrity, particularly in regulated environments. In this chapter, we explore how to implement robust reproducible research workflows using R.

6.1.1 The Value of Reproducibility in Clinical Research

Reproducible research provides several key benefits in clinical settings:

  1. Regulatory compliance: Meeting FDA and other regulatory requirements
  2. Error reduction: Minimizing mistakes through automation and validation
  3. Transparency: Enabling review and verification of methods and results
  4. Efficiency: Streamlining updates when data or requirements change
  5. Knowledge transfer: Facilitating collaboration and continuity

6.1.2 Key Elements of Reproducible Research

Code
library(knitr)
library(tidyverse)

# Create a table of reproducibility elements
reproducibility_elements <- tribble(
  ~Element, ~Description, ~R_Tools,
  "Version control", "Tracking changes to code and documents", "git, GitHub, GitLab",
  "Environment management", "Capturing software dependencies", "renv, packrat, Docker",
  "Code organization", "Structuring analysis code", "R packages, targets, drake",
  "Documentation", "Recording methods and decisions", "roxygen2, knitr, quarto",
  "Data management", "Tracking data provenance", "DataPackageR, pins, arrow",
  "Workflow automation", "Orchestrating analysis steps", "Make, targets, drake",
  "Validation", "Verifying analysis correctness", "testthat, valr, riskmetric"
)

# Display the table
kable(reproducibility_elements)

6.2 Setting Up a Reproducible Project Structure

6.2.1 Project Organization

Creating a well-organized project structure is the foundation of reproducibility:

Code
# Function to create a standardized project structure
create_clinical_project <- function(project_name, 
                                   base_dir = "~/projects") {
  
  # Create main project directory
  project_dir <- file.path(base_dir, project_name)
  dir.create(project_dir, recursive = TRUE, showWarnings = FALSE)
  
  # Create standard subdirectories
  dirs <- c(
    "data/raw",                    # Original unmodified data
    "data/processed",              # Cleaned and processed data
    "data/external",               # External reference data
    "R",                           # R functions and scripts
    "analysis",                    # Analysis scripts
    "reports/figures",             # Generated figures
    "reports/tables",              # Generated tables
    "docs",                        # Documentation
    "output/logs",                 # Log files
    "renv/library"                 # Isolated package library
  )
  
  # Create directories
  for (d in dirs) {
    dir.create(file.path(project_dir, d), recursive = TRUE, showWarnings = FALSE)
  }
  
  # Create files like README.md, .gitignore, and setup.R
  # (Code omitted for brevity - see online repository for full example)
  
  # Return the project directory path
  return(project_dir)
}

# Example usage
# project_path <- create_clinical_project("clinical_trial_2023")

6.2.2 Using the here Package

The here package helps maintain reproducible file paths across different systems:

Code
library(here)

# Instead of this (not reproducible across systems)
data <- read.csv("C:/Users/username/projects/clinical_trial/data/raw/baseline.csv")

# Use this (works the same on any system)
data <- read.csv(here("data", "raw", "baseline.csv"))

# Create a function that uses reproducible paths
save_analysis_result <- function(result, filename) {
  output_path <- here("output", filename)
  saveRDS(result, output_path)
  return(output_path)
}

# Example usage
model_result <- lm(outcome ~ treatment + age + sex, data = clinical_data)
save_analysis_result(model_result, "primary_analysis_model.rds")

For further details on structuring clinical research projects, see the additional resources section at the end of this chapter.

6.3 References