6 Reproducible Research Workflows

6.1 Introduction to Reproducible Research in Clinical Settings

Reproducibility is fundamental to clinical research integrity, particularly in regulated environments. In this chapter, we explore how to implement robust reproducible research workflows using R.

6.1.1 The Value of Reproducibility in Clinical Research

Reproducible research provides several key benefits in clinical settings:

Regulatory compliance: Meeting FDA and other regulatory requirements
Error reduction: Minimizing mistakes through automation and validation
Transparency: Enabling review and verification of methods and results
Efficiency: Streamlining updates when data or requirements change
Knowledge transfer: Facilitating collaboration and continuity

6.1.2 Key Elements of Reproducible Research

Code

library(knitr)
library(tidyverse)

# Create a table of reproducibility elements
reproducibility_elements <- tribble(
  ~Element, ~Description, ~R_Tools,
  "Version control", "Tracking changes to code and documents", "git, GitHub, GitLab",
  "Environment management", "Capturing software dependencies", "renv, packrat, Docker",
  "Code organization", "Structuring analysis code", "R packages, targets, drake",
  "Documentation", "Recording methods and decisions", "roxygen2, knitr, quarto",
  "Data management", "Tracking data provenance", "DataPackageR, pins, arrow",
  "Workflow automation", "Orchestrating analysis steps", "Make, targets, drake",
  "Validation", "Verifying analysis correctness", "testthat, valr, riskmetric"
)

# Display the table
kable(reproducibility_elements)

6.2 Setting Up a Reproducible Project Structure

6.2.1 Project Organization

Creating a well-organized project structure is the foundation of reproducibility:

Code

# Function to create a standardized project structure
create_clinical_project <- function(project_name, 
                                   base_dir = "~/projects") {
  
  # Create main project directory
  project_dir <- file.path(base_dir, project_name)
  dir.create(project_dir, recursive = TRUE, showWarnings = FALSE)
  
  # Create standard subdirectories
  dirs <- c(
    "data/raw",                    # Original unmodified data
    "data/processed",              # Cleaned and processed data
    "data/external",               # External reference data
    "R",                           # R functions and scripts
    "analysis",                    # Analysis scripts
    "reports/figures",             # Generated figures
    "reports/tables",              # Generated tables
    "docs",                        # Documentation
    "output/logs",                 # Log files
    "renv/library"                 # Isolated package library
  )
  
  # Create directories
  for (d in dirs) {
    dir.create(file.path(project_dir, d), recursive = TRUE, showWarnings = FALSE)
  }
  
  # Create files like README.md, .gitignore, and setup.R
  # (Code omitted for brevity - see online repository for full example)
  
  # Return the project directory path
  return(project_dir)
}

# Example usage
# project_path <- create_clinical_project("clinical_trial_2023")

6.2.2 Using the `here` Package

The here package helps maintain reproducible file paths across different systems:

Code

library(here)

# Instead of this (not reproducible across systems)
data <- read.csv("C:/Users/username/projects/clinical_trial/data/raw/baseline.csv")

# Use this (works the same on any system)
data <- read.csv(here("data", "raw", "baseline.csv"))

# Create a function that uses reproducible paths
save_analysis_result <- function(result, filename) {
  output_path <- here("output", filename)
  saveRDS(result, output_path)
  return(output_path)
}

# Example usage
model_result <- lm(outcome ~ treatment + age + sex, data = clinical_data)
save_analysis_result(model_result, "primary_analysis_model.rds")

For further details on structuring clinical research projects, see the additional resources section at the end of this chapter.

6.3 References

# Reproducible Research Workflows ## Introduction to Reproducible Research in Clinical Settings Reproducibility is fundamental to clinical research integrity, particularly in regulated environments. In this chapter, we explore how to implement robust reproducible research workflows using R. ```{r} #| echo: false #| fig-cap: "Components of a Reproducible Research Workflow" library(DiagrammeR) # This would render a workflow diagram in the actual document # Placeholder comment for the diagram code ``` ### The Value of Reproducibility in Clinical Research Reproducible research provides several key benefits in clinical settings: 1. **Regulatory compliance**: Meeting FDA and other regulatory requirements 2. **Error reduction**: Minimizing mistakes through automation and validation 3. **Transparency**: Enabling review and verification of methods and results 4. **Efficiency**: Streamlining updates when data or requirements change 5. **Knowledge transfer**: Facilitating collaboration and continuity ### Key Elements of Reproducible Research ```{r} #| echo: true #| eval: false library(knitr) library(tidyverse) # Create a table of reproducibility elements reproducibility_elements <- tribble( ~Element, ~Description, ~R_Tools, "Version control", "Tracking changes to code and documents", "git, GitHub, GitLab", "Environment management", "Capturing software dependencies", "renv, packrat, Docker", "Code organization", "Structuring analysis code", "R packages, targets, drake", "Documentation", "Recording methods and decisions", "roxygen2, knitr, quarto", "Data management", "Tracking data provenance", "DataPackageR, pins, arrow", "Workflow automation", "Orchestrating analysis steps", "Make, targets, drake", "Validation", "Verifying analysis correctness", "testthat, valr, riskmetric" ) # Display the table kable(reproducibility_elements) ``` ## Setting Up a Reproducible Project Structure ### Project Organization Creating a well-organized project structure is the foundation of reproducibility: ```{r} #| echo: true #| eval: false # Function to create a standardized project structure create_clinical_project <- function(project_name, base_dir = "~/projects") { # Create main project directory project_dir <- file.path(base_dir, project_name) dir.create(project_dir, recursive = TRUE, showWarnings = FALSE) # Create standard subdirectories dirs <- c( "data/raw", # Original unmodified data "data/processed", # Cleaned and processed data "data/external", # External reference data "R", # R functions and scripts "analysis", # Analysis scripts "reports/figures", # Generated figures "reports/tables", # Generated tables "docs", # Documentation "output/logs", # Log files "renv/library" # Isolated package library ) # Create directories for (d in dirs) { dir.create(file.path(project_dir, d), recursive = TRUE, showWarnings = FALSE) } # Create files like README.md, .gitignore, and setup.R # (Code omitted for brevity - see online repository for full example) # Return the project directory path return(project_dir) } # Example usage # project_path <- create_clinical_project("clinical_trial_2023") ``` ### Using the `here` Package The `here` package helps maintain reproducible file paths across different systems: ```{r} #| echo: true #| eval: false library(here) # Instead of this (not reproducible across systems) data <- read.csv("C:/Users/username/projects/clinical_trial/data/raw/baseline.csv") # Use this (works the same on any system) data <- read.csv(here("data", "raw", "baseline.csv")) # Create a function that uses reproducible paths save_analysis_result <- function(result, filename) { output_path <- here("output", filename) saveRDS(result, output_path) return(output_path) } # Example usage model_result <- lm(outcome ~ treatment + age + sex, data = clinical_data) save_analysis_result(model_result, "primary_analysis_model.rds") ``` For further details on structuring clinical research projects, see the additional resources section at the end of this chapter. ## References

6.1 Introduction to Reproducible Research in Clinical Settings

6.1.1 The Value of Reproducibility in Clinical Research

6.1.2 Key Elements of Reproducible Research

6.2 Setting Up a Reproducible Project Structure

6.2.1 Project Organization

6.2.2 Using the here Package

6.3 References

6.2.2 Using the `here` Package