7  Data Visualization for Clinical Research

7.1 Introduction to Effective Data Visualization in Clinical Settings

Data visualization is a critical component of clinical research, serving as the bridge between complex statistical analyses and clear, actionable insights. In this chapter, we explore how to create effective visualizations tailored specifically for clinical data using R.

7.1.1 The Importance of Visualization in Clinical Research

Effective data visualization in clinical settings provides several key benefits:

  1. Pattern recognition: Detecting trends, outliers, and relationships that may not be apparent in tables
  2. Communication: Facilitating understanding across multidisciplinary teams
  3. Decision-making: Supporting evidence-based clinical and regulatory decisions
  4. Quality control: Identifying data issues or inconsistencies visually
  5. Stakeholder engagement: Making results accessible to patients, clinicians, and non-statistical audiences

7.1.2 Visualization Principles for Clinical Data

Code
library(knitr)
library(tidyverse)

# Create a table of visualization principles
viz_principles <- tribble(
  ~Principle, ~Description, ~Clinical_Relevance,
  "Accuracy", "Represent data faithfully without distortion", "Essential for regulatory compliance and scientific integrity",
  "Clarity", "Create visualizations that are easy to understand", "Ensures correct interpretation by clinicians and reviewers",
  "Efficiency", "Use minimal visual elements to convey the message", "Reduces cognitive load in complex clinical contexts",
  "Consistency", "Apply uniform visual styles across related graphics", "Facilitates comparison across trials or time points",
  "Accessibility", "Design for all viewers including those with visual impairments", "Ensures equitable access to clinical findings",
  "Context", "Include reference values and clinical thresholds", "Connects statistical results to clinical relevance"
)

# Display the table
kable(viz_principles)

7.2 The ggplot2 Framework for Clinical Visualization

7.2.1 Why ggplot2 for Clinical Research

The ggplot2 package has become the standard for statistical visualization in R, offering several advantages for clinical research:

  1. Grammar-based approach: Provides a systematic way to build complex visualizations
  2. Reproducibility: Integrates well with the reproducible workflows discussed in Chapter 6
  3. Customization: Allows tailoring to specific clinical and regulatory requirements
  4. Consistency: Enforces visual standards across different visualizations
  5. Extensions: Numerous extensions developed specifically for clinical data

7.2.2 Essential ggplot2 Elements for Clinical Visualization

Code
library(tidyverse)
library(survival)
library(here)

# Load sample clinical data (using reproducible path from Chapter 6)
clinical_data <- read_csv(here("data", "processed", "clinical_trial_data.csv")) %>%
  mutate(
    treatment = factor(treatment_group, 
                      levels = c("Placebo", "Low Dose", "High Dose")),
    response = factor(response_status, 
                     levels = c("Non-responder", "Partial", "Complete"))
  )

# Basic ggplot2 structure for clinical visualization
ggplot(clinical_data, aes(x = visit_week, y = efficacy_score, color = treatment)) +
  # Add geometric elements
  geom_point(alpha = 0.6) +
  geom_smooth(method = "loess", se = TRUE) +
  # Add clinical context
  geom_hline(yintercept = 15, linetype = "dashed", color = "darkred") +
  annotate("text", x = 0, y = 16, label = "Clinical threshold", hjust = 0) +
  # Customize aesthetics
  scale_color_brewer(palette = "Set1") +
  # Add informative labels
  labs(
    title = "Efficacy Score Over Time by Treatment Arm",
    subtitle = "Dashed line indicates clinically meaningful improvement threshold",
    x = "Study Week",
    y = "Efficacy Score (0-30)",
    color = "Treatment Group",
    caption = "Data source: Clinical Trial XYZ-123"
  ) +
  # Apply clinical theme
  theme_minimal() +
  theme(
    legend.position = "bottom",
    panel.grid.minor = element_blank(),
    axis.title = element_text(face = "bold")
  )

7.3 Specialized Visualizations for Clinical Data

7.3.1 Patient Flow Diagrams (CONSORT)

CONSORT diagrams are essential for reporting clinical trial results:

Code
library(ggplot2)
library(ggdag)

# Creating a simplified CONSORT diagram with ggdag
# This is a conceptual example - in practice, this would use actual trial data

consort_data <- dagify(
  randomized ~ screened,
  allocated_control ~ randomized,
  allocated_treatment ~ randomized,
  followed_control ~ allocated_control,
  followed_treatment ~ allocated_treatment,
  analyzed_control ~ followed_control,
  analyzed_treatment ~ followed_treatment,
  coords = list(
    x = c(screened = 0, randomized = 0, 
          allocated_control = -1, allocated_treatment = 1,
          followed_control = -1, followed_treatment = 1,
          analyzed_control = -1, analyzed_treatment = 1),
    y = c(screened = 0, randomized = -1, 
          allocated_control = -2, allocated_treatment = -2,
          followed_control = -3, followed_treatment = -3,
          analyzed_control = -4, analyzed_treatment = -4)
  ),
  labels = c(
    screened = "Assessed for eligibility\n(n=350)",
    randomized = "Randomized\n(n=300)",
    allocated_control = "Allocated to control\n(n=150)",
    allocated_treatment = "Allocated to treatment\n(n=150)",
    followed_control = "Completed follow-up\n(n=140)",
    followed_treatment = "Completed follow-up\n(n=145)",
    analyzed_control = "Analyzed\n(n=138)",
    analyzed_treatment = "Analyzed\n(n=142)"
  )
)

# Plot the CONSORT diagram
ggdag(consort_data, text = FALSE, use_labels = "label") +
  theme_dag() +
  theme(
    panel.background = element_rect(fill = "white", color = NA),
    plot.title = element_text(hjust = 0.5)
  ) +
  labs(title = "CONSORT Flow Diagram", caption = "Trial ID: XYZ-123")

7.3.2 Kaplan-Meier Survival Curves

Survival analysis is fundamental to many clinical trials:

Code
library(survival)
library(survminer)
library(tidyverse)
library(here)

# Load sample survival data
survival_data <- read_csv(here("data", "processed", "survival_data.csv")) %>%
  mutate(treatment = factor(treatment_group, 
                           levels = c("Placebo", "Active")))

# Fit survival model
surv_fit <- survfit(Surv(time, event) ~ treatment, data = survival_data)

# Create publication-quality survival curve
ggsurvplot(
  surv_fit,
  data = survival_data,
  risk.table = TRUE,              # Add risk table
  pval = TRUE,                    # Add p-value
  conf.int = TRUE,                # Add confidence intervals
  palette = c("#E7B800", "#2E9FDF"),
  legend.labs = c("Placebo", "Active Treatment"),
  risk.table.height = 0.25,       # Adjust risk table height
  tables.theme = theme_cleantable(),
  xlab = "Time (Months)",
  ylab = "Overall Survival Probability",
  title = "Kaplan-Meier Estimate of Overall Survival",
  ggtheme = theme_bw() +
    theme(
      plot.title = element_text(face = "bold"),
      legend.position = "bottom"
    )
)

7.3.3 Forest Plots for Subgroup Analysis

Forest plots provide a clear visualization of treatment effects across subgroups:

Code
library(tidyverse)
library(forestplot)
library(here)

# Load subgroup analysis data
subgroup_data <- read_csv(here("data", "processed", "subgroup_analysis.csv"))

# Prepare data for forest plot
forest_data <- subgroup_data %>%
  mutate(
    subgroup = factor(subgroup, 
                     levels = c("Overall", "Age < 65", "Age ≥ 65", 
                               "Male", "Female", 
                               "Prior Therapy: Yes", "Prior Therapy: No")),
    label = paste0(subgroup, " (n = ", n, ")"),
    effect = hazard_ratio,
    lower = ci_lower,
    upper = ci_upper
  ) %>%
  arrange(subgroup) %>%
  select(label, effect, lower, upper, p_value)

# Create forest plot header
header <- c("Subgroup", "Hazard Ratio (95% CI)", "", "p-value")

# Create the forest plot
forestplot(
  labeltext = forest_data$label,
  mean = forest_data$effect,
  lower = forest_data$lower,
  upper = forest_data$upper,
  is.summary = c(TRUE, rep(FALSE, nrow(forest_data) - 1)),
  zero = 1,
  boxsize = 0.2,
  lineheight = unit(0.8, "cm"),
  clip = c(0.3, 3),
  xlog = TRUE,
  col = fpColors(
    box = "royalblue",
    line = "darkblue",
    summary = "royalblue"
  ),
  title = "Treatment Effect by Subgroup",
  xlab = "Hazard Ratio (95% CI)\n<- Favors Treatment | Favors Control ->",
  txt_gp = fpTxtGp(
    ticks = gpar(cex = 0.9),
    xlab = gpar(cex = 0.9),
    title = gpar(cex = 1.1)
  )
)

7.4 Customizing Visualizations for Regulatory Submissions

7.4.1 Implementing Visual Style Guides

Maintaining consistency across visualizations is crucial for regulatory submissions:

Code
# Create a custom theme for clinical visualizations
theme_clinical <- function(base_size = 12, base_family = "sans") {
  theme_minimal(base_size = base_size, base_family = base_family) %+replace%
    theme(
      # Typography
      text = element_text(color = "black"),
      plot.title = element_text(face = "bold", size = rel(1.2), hjust = 0),
      plot.subtitle = element_text(size = rel(0.9), hjust = 0, margin = margin(b = 10)),
      axis.title = element_text(face = "bold", size = rel(0.9)),
      
      # Gridlines and borders
      panel.grid.major = element_line(color = "grey85"),
      panel.grid.minor = element_blank(),
      axis.line = element_line(color = "black", size = 0.5),
      
      # Legend
      legend.position = "bottom",
      legend.title = element_text(face = "bold"),
      
      # Background
      panel.background = element_rect(fill = "white", color = NA),
      plot.background = element_rect(fill = "white", color = NA),
      
      # Margins
      plot.margin = margin(0.5, 0.5, 0.5, 0.5, "cm")
    )
}

# Define a consistent color palette for treatment groups
clinical_colors <- c(
  "Placebo" = "#999999",
  "Low Dose" = "#E69F00",
  "Medium Dose" = "#56B4E9",
  "High Dose" = "#009E73"
)

# Example plot with custom theme
ggplot(clinical_data, aes(x = visit_week, y = efficacy_score, color = treatment)) +
  geom_point(alpha = 0.7) +
  geom_smooth(method = "loess", se = TRUE) +
  scale_color_manual(values = clinical_colors) +
  labs(
    title = "Efficacy Score Over Time by Treatment Arm",
    x = "Study Week",
    y = "Efficacy Score",
    color = "Treatment Group"
  ) +
  theme_clinical()

7.4.2 Adding Key Statistical Information

Enhancing visualizations with statistical annotations:

Code
library(ggpubr)

# Create boxplot with statistical comparisons
ggplot(clinical_data, aes(x = treatment, y = efficacy_score, fill = treatment)) +
  geom_boxplot(width = 0.7, outlier.shape = 1) +
  # Add individual points for transparency
  geom_jitter(width = 0.2, alpha = 0.5) +
  # Add statistical comparisons
  stat_compare_means(comparisons = list(c("Placebo", "Low Dose"),
                                       c("Placebo", "High Dose"),
                                       c("Low Dose", "High Dose")),
                    label = "p.signif") +
  # Add mean difference annotation
  stat_compare_means(label.y = max(clinical_data$efficacy_score) + 5) +
  # Apply custom theme and colors
  scale_fill_manual(values = clinical_colors) +
  theme_clinical() +
  labs(
    title = "Primary Endpoint: Efficacy Score at Week 12",
    subtitle = "Comparisons show statistical significance (* p<0.05, ** p<0.01, *** p<0.001)",
    x = "Treatment Group",
    y = "Efficacy Score",
    caption = "Analysis based on ITT population (N=240)"
  )

7.5 Interactive Visualizations for Clinical Data Exploration

While static visualizations are typically required for regulatory submissions, interactive tools can enhance data exploration and communication among research teams:

Code
library(plotly)
library(DT)

# Create an interactive scatterplot
efficacy_plot <- ggplot(clinical_data, 
                      aes(x = baseline_score, y = efficacy_score, 
                          color = treatment, text = patient_id)) +
  geom_point(size = 3, alpha = 0.7) +
  geom_smooth(method = "lm", se = FALSE) +
  scale_color_manual(values = clinical_colors) +
  labs(
    title = "Relationship Between Baseline and Week 12 Efficacy Scores",
    x = "Baseline Score",
    y = "Week 12 Efficacy Score",
    color = "Treatment Group"
  ) +
  theme_clinical()

# Convert to interactive plotly object
interactive_plot <- ggplotly(efficacy_plot, tooltip = "text") %>%
  layout(hoverlabel = list(bgcolor = "white"))

# Display the interactive plot
interactive_plot

For more advanced interactive visualization options for clinical data, see Chapter 10 on Interactive Elements.

7.6 Visualization Best Practices for Specific Clinical Data Types

7.6.1 Laboratory Data Visualization

Code
# Create a function for lab data visualization
plot_lab_data <- function(data, lab_param, 
                          reference_low = NULL, reference_high = NULL,
                          log_scale = FALSE) {
  
  p <- ggplot(data, aes(x = visit_week, y = .data[[lab_param]], 
                       group = patient_id, color = treatment)) +
    # Add individual patient lines
    geom_line(alpha = 0.3) +
    # Add treatment group means with error bands
    stat_summary(aes(group = treatment), fun = mean, geom = "line", 
                size = 1.5) +
    stat_summary(aes(group = treatment, fill = treatment), 
                fun.data = mean_se, geom = "ribbon", 
                alpha = 0.2, color = NA) +
    # Apply colors and labels
    scale_color_manual(values = clinical_colors) +
    scale_fill_manual(values = clinical_colors) +
    labs(
      title = paste("Change in", lab_param, "Over Time"),
      x = "Study Week",
      y = lab_param,
      color = "Treatment Group",
      fill = "Treatment Group"
    ) +
    theme_clinical()
  
  # Add reference ranges if provided
  if (!is.null(reference_low)) {
    p <- p + geom_hline(yintercept = reference_low, 
                       linetype = "dashed", color = "darkred")
  }
  
  if (!is.null(reference_high)) {
    p <- p + geom_hline(yintercept = reference_high, 
                       linetype = "dashed", color = "darkred")
  }
  
  # Apply log scale if requested
  if (log_scale) {
    p <- p + scale_y_log10()
  }
  
  return(p)
}

# Example usage
plot_lab_data(clinical_data, "alkaline_phosphatase", 
             reference_low = 35, reference_high = 105)

7.6.2 Adverse Event Visualization

Code
library(tidyverse)
library(here)

# Load adverse event data
ae_data <- read_csv(here("data", "processed", "adverse_events.csv"))

# Prepare data for visualization
ae_summary <- ae_data %>%
  group_by(treatment, ae_term) %>%
  summarize(count = n(), .groups = "drop") %>%
  group_by(treatment) %>%
  mutate(percent = count / sum(count) * 100) %>%
  ungroup() %>%
  # Select top 10 most common AEs
  group_by(ae_term) %>%
  mutate(total = sum(count)) %>%
  ungroup() %>%
  arrange(desc(total)) %>%
  filter(ae_term %in% unique(ae_term)[1:10])

# Create adverse event dot plot
ggplot(ae_summary, aes(x = percent, y = reorder(ae_term, total), 
                      color = treatment, size = count)) +
  geom_point() +
  scale_color_manual(values = clinical_colors) +
  scale_size_continuous(range = c(2, 8)) +
  labs(
    title = "Incidence of Common Adverse Events by Treatment Group",
    subtitle = "Size represents the number of events",
    x = "Percentage of Patients (%)",
    y = NULL,
    color = "Treatment Group",
    size = "Event Count"
  ) +
  theme_clinical() +
  theme(
    panel.grid.major.y = element_line(color = "grey90"),
    panel.grid.minor = element_blank()
  )

7.7 Integrating Visualizations into Reproducible Workflows

Building on Chapter 6, let’s explore how to integrate visualization into reproducible research workflows:

Code
library(tidyverse)
library(targets)
library(here)

# Define a targets workflow that includes visualization
tar_script({
  # Load functions and libraries
  source(here("R", "visualization_functions.R"))
  
  # Define targets for data processing (simplified example)
  tar_target(raw_data, read_csv(here("data", "raw", "clinical_data.csv")))
  
  tar_target(processed_data, clean_clinical_data(raw_data))
  
  # Define primary analysis model
  tar_target(efficacy_model, analyze_efficacy(processed_data))
  
  # Define visualization targets
  tar_target(
    efficacy_plot,
    create_efficacy_plot(processed_data, efficacy_model)
  )
  
  tar_target(
    ae_plot,
    create_ae_plot(processed_data)
  )
  
  tar_target(
    km_plot,
    create_survival_plot(processed_data)
  )
  
  # Save visualizations to standard locations
  tar_target(
    save_efficacy_plot,
    ggsave(here("reports", "figures", "efficacy_plot.png"), 
          plot = efficacy_plot, width = 8, height = 6, dpi = 300)
  )
  
  tar_target(
    save_ae_plot,
    ggsave(here("reports", "figures", "ae_plot.png"), 
          plot = ae_plot, width = 10, height = 7, dpi = 300)
  )
  
  tar_target(
    save_km_plot,
    ggsave(here("reports", "figures", "km_plot.png"), 
          plot = km_plot, width = 8, height = 6, dpi = 300)
  )
})

7.8 Conclusion

Effective data visualization is a critical skill in clinical research, bridging the gap between complex statistical analyses and clear, actionable insights. By applying the principles and techniques outlined in this chapter, researchers can create visualizations that not only meet regulatory requirements but also enhance understanding and decision-making.

In the next chapter, we’ll explore detailed case studies that bring together all the elements we’ve covered so far—from data preparation to visualization—in real-world clinical research scenarios.

7.9 References