7 Data Visualization for Clinical Research

7.1 Introduction to Effective Data Visualization in Clinical Settings

Data visualization is a critical component of clinical research, serving as the bridge between complex statistical analyses and clear, actionable insights. In this chapter, we explore how to create effective visualizations tailored specifically for clinical data using R.

7.1.1 The Importance of Visualization in Clinical Research

Effective data visualization in clinical settings provides several key benefits:

Pattern recognition: Detecting trends, outliers, and relationships that may not be apparent in tables
Communication: Facilitating understanding across multidisciplinary teams
Decision-making: Supporting evidence-based clinical and regulatory decisions
Quality control: Identifying data issues or inconsistencies visually
Stakeholder engagement: Making results accessible to patients, clinicians, and non-statistical audiences

7.1.2 Visualization Principles for Clinical Data

Code

library(knitr)
library(tidyverse)

# Create a table of visualization principles
viz_principles <- tribble(
  ~Principle, ~Description, ~Clinical_Relevance,
  "Accuracy", "Represent data faithfully without distortion", "Essential for regulatory compliance and scientific integrity",
  "Clarity", "Create visualizations that are easy to understand", "Ensures correct interpretation by clinicians and reviewers",
  "Efficiency", "Use minimal visual elements to convey the message", "Reduces cognitive load in complex clinical contexts",
  "Consistency", "Apply uniform visual styles across related graphics", "Facilitates comparison across trials or time points",
  "Accessibility", "Design for all viewers including those with visual impairments", "Ensures equitable access to clinical findings",
  "Context", "Include reference values and clinical thresholds", "Connects statistical results to clinical relevance"
)

# Display the table
kable(viz_principles)

7.2 The ggplot2 Framework for Clinical Visualization

7.2.1 Why ggplot2 for Clinical Research

The ggplot2 package has become the standard for statistical visualization in R, offering several advantages for clinical research:

Grammar-based approach: Provides a systematic way to build complex visualizations
Reproducibility: Integrates well with the reproducible workflows discussed in Chapter 6
Customization: Allows tailoring to specific clinical and regulatory requirements
Consistency: Enforces visual standards across different visualizations
Extensions: Numerous extensions developed specifically for clinical data

7.2.2 Essential ggplot2 Elements for Clinical Visualization

Code

library(tidyverse)
library(survival)
library(here)

# Load sample clinical data (using reproducible path from Chapter 6)
clinical_data <- read_csv(here("data", "processed", "clinical_trial_data.csv")) %>%
  mutate(
    treatment = factor(treatment_group, 
                      levels = c("Placebo", "Low Dose", "High Dose")),
    response = factor(response_status, 
                     levels = c("Non-responder", "Partial", "Complete"))
  )

# Basic ggplot2 structure for clinical visualization
ggplot(clinical_data, aes(x = visit_week, y = efficacy_score, color = treatment)) +
  # Add geometric elements
  geom_point(alpha = 0.6) +
  geom_smooth(method = "loess", se = TRUE) +
  # Add clinical context
  geom_hline(yintercept = 15, linetype = "dashed", color = "darkred") +
  annotate("text", x = 0, y = 16, label = "Clinical threshold", hjust = 0) +
  # Customize aesthetics
  scale_color_brewer(palette = "Set1") +
  # Add informative labels
  labs(
    title = "Efficacy Score Over Time by Treatment Arm",
    subtitle = "Dashed line indicates clinically meaningful improvement threshold",
    x = "Study Week",
    y = "Efficacy Score (0-30)",
    color = "Treatment Group",
    caption = "Data source: Clinical Trial XYZ-123"
  ) +
  # Apply clinical theme
  theme_minimal() +
  theme(
    legend.position = "bottom",
    panel.grid.minor = element_blank(),
    axis.title = element_text(face = "bold")
  )

7.3 Specialized Visualizations for Clinical Data

7.3.1 Patient Flow Diagrams (CONSORT)

CONSORT diagrams are essential for reporting clinical trial results:

Code

library(ggplot2)
library(ggdag)

# Creating a simplified CONSORT diagram with ggdag
# This is a conceptual example - in practice, this would use actual trial data

consort_data <- dagify(
  randomized ~ screened,
  allocated_control ~ randomized,
  allocated_treatment ~ randomized,
  followed_control ~ allocated_control,
  followed_treatment ~ allocated_treatment,
  analyzed_control ~ followed_control,
  analyzed_treatment ~ followed_treatment,
  coords = list(
    x = c(screened = 0, randomized = 0, 
          allocated_control = -1, allocated_treatment = 1,
          followed_control = -1, followed_treatment = 1,
          analyzed_control = -1, analyzed_treatment = 1),
    y = c(screened = 0, randomized = -1, 
          allocated_control = -2, allocated_treatment = -2,
          followed_control = -3, followed_treatment = -3,
          analyzed_control = -4, analyzed_treatment = -4)
  ),
  labels = c(
    screened = "Assessed for eligibility\n(n=350)",
    randomized = "Randomized\n(n=300)",
    allocated_control = "Allocated to control\n(n=150)",
    allocated_treatment = "Allocated to treatment\n(n=150)",
    followed_control = "Completed follow-up\n(n=140)",
    followed_treatment = "Completed follow-up\n(n=145)",
    analyzed_control = "Analyzed\n(n=138)",
    analyzed_treatment = "Analyzed\n(n=142)"
  )
)

# Plot the CONSORT diagram
ggdag(consort_data, text = FALSE, use_labels = "label") +
  theme_dag() +
  theme(
    panel.background = element_rect(fill = "white", color = NA),
    plot.title = element_text(hjust = 0.5)
  ) +
  labs(title = "CONSORT Flow Diagram", caption = "Trial ID: XYZ-123")

7.3.2 Kaplan-Meier Survival Curves

Survival analysis is fundamental to many clinical trials:

Code

library(survival)
library(survminer)
library(tidyverse)
library(here)

# Load sample survival data
survival_data <- read_csv(here("data", "processed", "survival_data.csv")) %>%
  mutate(treatment = factor(treatment_group, 
                           levels = c("Placebo", "Active")))

# Fit survival model
surv_fit <- survfit(Surv(time, event) ~ treatment, data = survival_data)

# Create publication-quality survival curve
ggsurvplot(
  surv_fit,
  data = survival_data,
  risk.table = TRUE,              # Add risk table
  pval = TRUE,                    # Add p-value
  conf.int = TRUE,                # Add confidence intervals
  palette = c("#E7B800", "#2E9FDF"),
  legend.labs = c("Placebo", "Active Treatment"),
  risk.table.height = 0.25,       # Adjust risk table height
  tables.theme = theme_cleantable(),
  xlab = "Time (Months)",
  ylab = "Overall Survival Probability",
  title = "Kaplan-Meier Estimate of Overall Survival",
  ggtheme = theme_bw() +
    theme(
      plot.title = element_text(face = "bold"),
      legend.position = "bottom"
    )
)

7.3.3 Forest Plots for Subgroup Analysis

Forest plots provide a clear visualization of treatment effects across subgroups:

Code

library(tidyverse)
library(forestplot)
library(here)

# Load subgroup analysis data
subgroup_data <- read_csv(here("data", "processed", "subgroup_analysis.csv"))

# Prepare data for forest plot
forest_data <- subgroup_data %>%
  mutate(
    subgroup = factor(subgroup, 
                     levels = c("Overall", "Age < 65", "Age ≥ 65", 
                               "Male", "Female", 
                               "Prior Therapy: Yes", "Prior Therapy: No")),
    label = paste0(subgroup, " (n = ", n, ")"),
    effect = hazard_ratio,
    lower = ci_lower,
    upper = ci_upper
  ) %>%
  arrange(subgroup) %>%
  select(label, effect, lower, upper, p_value)

# Create forest plot header
header <- c("Subgroup", "Hazard Ratio (95% CI)", "", "p-value")

# Create the forest plot
forestplot(
  labeltext = forest_data$label,
  mean = forest_data$effect,
  lower = forest_data$lower,
  upper = forest_data$upper,
  is.summary = c(TRUE, rep(FALSE, nrow(forest_data) - 1)),
  zero = 1,
  boxsize = 0.2,
  lineheight = unit(0.8, "cm"),
  clip = c(0.3, 3),
  xlog = TRUE,
  col = fpColors(
    box = "royalblue",
    line = "darkblue",
    summary = "royalblue"
  ),
  title = "Treatment Effect by Subgroup",
  xlab = "Hazard Ratio (95% CI)\n<- Favors Treatment | Favors Control ->",
  txt_gp = fpTxtGp(
    ticks = gpar(cex = 0.9),
    xlab = gpar(cex = 0.9),
    title = gpar(cex = 1.1)
  )
)

7.4 Customizing Visualizations for Regulatory Submissions

7.4.1 Implementing Visual Style Guides

Maintaining consistency across visualizations is crucial for regulatory submissions:

Code

# Create a custom theme for clinical visualizations
theme_clinical <- function(base_size = 12, base_family = "sans") {
  theme_minimal(base_size = base_size, base_family = base_family) %+replace%
    theme(
      # Typography
      text = element_text(color = "black"),
      plot.title = element_text(face = "bold", size = rel(1.2), hjust = 0),
      plot.subtitle = element_text(size = rel(0.9), hjust = 0, margin = margin(b = 10)),
      axis.title = element_text(face = "bold", size = rel(0.9)),
      
      # Gridlines and borders
      panel.grid.major = element_line(color = "grey85"),
      panel.grid.minor = element_blank(),
      axis.line = element_line(color = "black", size = 0.5),
      
      # Legend
      legend.position = "bottom",
      legend.title = element_text(face = "bold"),
      
      # Background
      panel.background = element_rect(fill = "white", color = NA),
      plot.background = element_rect(fill = "white", color = NA),
      
      # Margins
      plot.margin = margin(0.5, 0.5, 0.5, 0.5, "cm")
    )
}

# Define a consistent color palette for treatment groups
clinical_colors <- c(
  "Placebo" = "#999999",
  "Low Dose" = "#E69F00",
  "Medium Dose" = "#56B4E9",
  "High Dose" = "#009E73"
)

# Example plot with custom theme
ggplot(clinical_data, aes(x = visit_week, y = efficacy_score, color = treatment)) +
  geom_point(alpha = 0.7) +
  geom_smooth(method = "loess", se = TRUE) +
  scale_color_manual(values = clinical_colors) +
  labs(
    title = "Efficacy Score Over Time by Treatment Arm",
    x = "Study Week",
    y = "Efficacy Score",
    color = "Treatment Group"
  ) +
  theme_clinical()

7.4.2 Adding Key Statistical Information

Enhancing visualizations with statistical annotations:

Code

library(ggpubr)

# Create boxplot with statistical comparisons
ggplot(clinical_data, aes(x = treatment, y = efficacy_score, fill = treatment)) +
  geom_boxplot(width = 0.7, outlier.shape = 1) +
  # Add individual points for transparency
  geom_jitter(width = 0.2, alpha = 0.5) +
  # Add statistical comparisons
  stat_compare_means(comparisons = list(c("Placebo", "Low Dose"),
                                       c("Placebo", "High Dose"),
                                       c("Low Dose", "High Dose")),
                    label = "p.signif") +
  # Add mean difference annotation
  stat_compare_means(label.y = max(clinical_data$efficacy_score) + 5) +
  # Apply custom theme and colors
  scale_fill_manual(values = clinical_colors) +
  theme_clinical() +
  labs(
    title = "Primary Endpoint: Efficacy Score at Week 12",
    subtitle = "Comparisons show statistical significance (* p<0.05, ** p<0.01, *** p<0.001)",
    x = "Treatment Group",
    y = "Efficacy Score",
    caption = "Analysis based on ITT population (N=240)"
  )

7.5 Interactive Visualizations for Clinical Data Exploration

While static visualizations are typically required for regulatory submissions, interactive tools can enhance data exploration and communication among research teams:

Code

library(plotly)
library(DT)

# Create an interactive scatterplot
efficacy_plot <- ggplot(clinical_data, 
                      aes(x = baseline_score, y = efficacy_score, 
                          color = treatment, text = patient_id)) +
  geom_point(size = 3, alpha = 0.7) +
  geom_smooth(method = "lm", se = FALSE) +
  scale_color_manual(values = clinical_colors) +
  labs(
    title = "Relationship Between Baseline and Week 12 Efficacy Scores",
    x = "Baseline Score",
    y = "Week 12 Efficacy Score",
    color = "Treatment Group"
  ) +
  theme_clinical()

# Convert to interactive plotly object
interactive_plot <- ggplotly(efficacy_plot, tooltip = "text") %>%
  layout(hoverlabel = list(bgcolor = "white"))

# Display the interactive plot
interactive_plot

For more advanced interactive visualization options for clinical data, see Chapter 10 on Interactive Elements.

7.6 Visualization Best Practices for Specific Clinical Data Types

7.6.1 Laboratory Data Visualization

Code

# Create a function for lab data visualization
plot_lab_data <- function(data, lab_param, 
                          reference_low = NULL, reference_high = NULL,
                          log_scale = FALSE) {
  
  p <- ggplot(data, aes(x = visit_week, y = .data[[lab_param]], 
                       group = patient_id, color = treatment)) +
    # Add individual patient lines
    geom_line(alpha = 0.3) +
    # Add treatment group means with error bands
    stat_summary(aes(group = treatment), fun = mean, geom = "line", 
                size = 1.5) +
    stat_summary(aes(group = treatment, fill = treatment), 
                fun.data = mean_se, geom = "ribbon", 
                alpha = 0.2, color = NA) +
    # Apply colors and labels
    scale_color_manual(values = clinical_colors) +
    scale_fill_manual(values = clinical_colors) +
    labs(
      title = paste("Change in", lab_param, "Over Time"),
      x = "Study Week",
      y = lab_param,
      color = "Treatment Group",
      fill = "Treatment Group"
    ) +
    theme_clinical()
  
  # Add reference ranges if provided
  if (!is.null(reference_low)) {
    p <- p + geom_hline(yintercept = reference_low, 
                       linetype = "dashed", color = "darkred")
  }
  
  if (!is.null(reference_high)) {
    p <- p + geom_hline(yintercept = reference_high, 
                       linetype = "dashed", color = "darkred")
  }
  
  # Apply log scale if requested
  if (log_scale) {
    p <- p + scale_y_log10()
  }
  
  return(p)
}

# Example usage
plot_lab_data(clinical_data, "alkaline_phosphatase", 
             reference_low = 35, reference_high = 105)

7.6.2 Adverse Event Visualization

Code

library(tidyverse)
library(here)

# Load adverse event data
ae_data <- read_csv(here("data", "processed", "adverse_events.csv"))

# Prepare data for visualization
ae_summary <- ae_data %>%
  group_by(treatment, ae_term) %>%
  summarize(count = n(), .groups = "drop") %>%
  group_by(treatment) %>%
  mutate(percent = count / sum(count) * 100) %>%
  ungroup() %>%
  # Select top 10 most common AEs
  group_by(ae_term) %>%
  mutate(total = sum(count)) %>%
  ungroup() %>%
  arrange(desc(total)) %>%
  filter(ae_term %in% unique(ae_term)[1:10])

# Create adverse event dot plot
ggplot(ae_summary, aes(x = percent, y = reorder(ae_term, total), 
                      color = treatment, size = count)) +
  geom_point() +
  scale_color_manual(values = clinical_colors) +
  scale_size_continuous(range = c(2, 8)) +
  labs(
    title = "Incidence of Common Adverse Events by Treatment Group",
    subtitle = "Size represents the number of events",
    x = "Percentage of Patients (%)",
    y = NULL,
    color = "Treatment Group",
    size = "Event Count"
  ) +
  theme_clinical() +
  theme(
    panel.grid.major.y = element_line(color = "grey90"),
    panel.grid.minor = element_blank()
  )

7.7 Integrating Visualizations into Reproducible Workflows

Building on Chapter 6, let’s explore how to integrate visualization into reproducible research workflows:

Code

library(tidyverse)
library(targets)
library(here)

# Define a targets workflow that includes visualization
tar_script({
  # Load functions and libraries
  source(here("R", "visualization_functions.R"))
  
  # Define targets for data processing (simplified example)
  tar_target(raw_data, read_csv(here("data", "raw", "clinical_data.csv")))
  
  tar_target(processed_data, clean_clinical_data(raw_data))
  
  # Define primary analysis model
  tar_target(efficacy_model, analyze_efficacy(processed_data))
  
  # Define visualization targets
  tar_target(
    efficacy_plot,
    create_efficacy_plot(processed_data, efficacy_model)
  )
  
  tar_target(
    ae_plot,
    create_ae_plot(processed_data)
  )
  
  tar_target(
    km_plot,
    create_survival_plot(processed_data)
  )
  
  # Save visualizations to standard locations
  tar_target(
    save_efficacy_plot,
    ggsave(here("reports", "figures", "efficacy_plot.png"), 
          plot = efficacy_plot, width = 8, height = 6, dpi = 300)
  )
  
  tar_target(
    save_ae_plot,
    ggsave(here("reports", "figures", "ae_plot.png"), 
          plot = ae_plot, width = 10, height = 7, dpi = 300)
  )
  
  tar_target(
    save_km_plot,
    ggsave(here("reports", "figures", "km_plot.png"), 
          plot = km_plot, width = 8, height = 6, dpi = 300)
  )
})

7.8 Conclusion

Effective data visualization is a critical skill in clinical research, bridging the gap between complex statistical analyses and clear, actionable insights. By applying the principles and techniques outlined in this chapter, researchers can create visualizations that not only meet regulatory requirements but also enhance understanding and decision-making.

In the next chapter, we’ll explore detailed case studies that bring together all the elements we’ve covered so far—from data preparation to visualization—in real-world clinical research scenarios.

7.9 References

# Data Visualization for Clinical Research ## Introduction to Effective Data Visualization in Clinical Settings Data visualization is a critical component of clinical research, serving as the bridge between complex statistical analyses and clear, actionable insights. In this chapter, we explore how to create effective visualizations tailored specifically for clinical data using R. ```{r} #| echo: false #| fig-cap: "The Data Visualization Process in Clinical Research" library(DiagrammeR) # This would render a visualization workflow diagram in the actual document # Placeholder comment for the diagram code ``` ### The Importance of Visualization in Clinical Research Effective data visualization in clinical settings provides several key benefits: 1. **Pattern recognition**: Detecting trends, outliers, and relationships that may not be apparent in tables 2. **Communication**: Facilitating understanding across multidisciplinary teams 3. **Decision-making**: Supporting evidence-based clinical and regulatory decisions 4. **Quality control**: Identifying data issues or inconsistencies visually 5. **Stakeholder engagement**: Making results accessible to patients, clinicians, and non-statistical audiences ### Visualization Principles for Clinical Data ```{r} #| echo: true #| eval: false library(knitr) library(tidyverse) # Create a table of visualization principles viz_principles <- tribble( ~Principle, ~Description, ~Clinical_Relevance, "Accuracy", "Represent data faithfully without distortion", "Essential for regulatory compliance and scientific integrity", "Clarity", "Create visualizations that are easy to understand", "Ensures correct interpretation by clinicians and reviewers", "Efficiency", "Use minimal visual elements to convey the message", "Reduces cognitive load in complex clinical contexts", "Consistency", "Apply uniform visual styles across related graphics", "Facilitates comparison across trials or time points", "Accessibility", "Design for all viewers including those with visual impairments", "Ensures equitable access to clinical findings", "Context", "Include reference values and clinical thresholds", "Connects statistical results to clinical relevance" ) # Display the table kable(viz_principles) ``` ## The ggplot2 Framework for Clinical Visualization ### Why ggplot2 for Clinical Research The ggplot2 package has become the standard for statistical visualization in R, offering several advantages for clinical research: 1. **Grammar-based approach**: Provides a systematic way to build complex visualizations 2. **Reproducibility**: Integrates well with the reproducible workflows discussed in Chapter 6 3. **Customization**: Allows tailoring to specific clinical and regulatory requirements 4. **Consistency**: Enforces visual standards across different visualizations 5. **Extensions**: Numerous extensions developed specifically for clinical data ### Essential ggplot2 Elements for Clinical Visualization ```{r} #| echo: true #| eval: false library(tidyverse) library(survival) library(here) # Load sample clinical data (using reproducible path from Chapter 6) clinical_data <- read_csv(here("data", "processed", "clinical_trial_data.csv")) %>% mutate( treatment = factor(treatment_group, levels = c("Placebo", "Low Dose", "High Dose")), response = factor(response_status, levels = c("Non-responder", "Partial", "Complete")) ) # Basic ggplot2 structure for clinical visualization ggplot(clinical_data, aes(x = visit_week, y = efficacy_score, color = treatment)) + # Add geometric elements geom_point(alpha = 0.6) + geom_smooth(method = "loess", se = TRUE) + # Add clinical context geom_hline(yintercept = 15, linetype = "dashed", color = "darkred") + annotate("text", x = 0, y = 16, label = "Clinical threshold", hjust = 0) + # Customize aesthetics scale_color_brewer(palette = "Set1") + # Add informative labels labs( title = "Efficacy Score Over Time by Treatment Arm", subtitle = "Dashed line indicates clinically meaningful improvement threshold", x = "Study Week", y = "Efficacy Score (0-30)", color = "Treatment Group", caption = "Data source: Clinical Trial XYZ-123" ) + # Apply clinical theme theme_minimal() + theme( legend.position = "bottom", panel.grid.minor = element_blank(), axis.title = element_text(face = "bold") ) ``` ## Specialized Visualizations for Clinical Data ### Patient Flow Diagrams (CONSORT) CONSORT diagrams are essential for reporting clinical trial results: ```{r} #| echo: true #| eval: false library(ggplot2) library(ggdag) # Creating a simplified CONSORT diagram with ggdag # This is a conceptual example - in practice, this would use actual trial data consort_data <- dagify( randomized ~ screened, allocated_control ~ randomized, allocated_treatment ~ randomized, followed_control ~ allocated_control, followed_treatment ~ allocated_treatment, analyzed_control ~ followed_control, analyzed_treatment ~ followed_treatment, coords = list( x = c(screened = 0, randomized = 0, allocated_control = -1, allocated_treatment = 1, followed_control = -1, followed_treatment = 1, analyzed_control = -1, analyzed_treatment = 1), y = c(screened = 0, randomized = -1, allocated_control = -2, allocated_treatment = -2, followed_control = -3, followed_treatment = -3, analyzed_control = -4, analyzed_treatment = -4) ), labels = c( screened = "Assessed for eligibility\n(n=350)", randomized = "Randomized\n(n=300)", allocated_control = "Allocated to control\n(n=150)", allocated_treatment = "Allocated to treatment\n(n=150)", followed_control = "Completed follow-up\n(n=140)", followed_treatment = "Completed follow-up\n(n=145)", analyzed_control = "Analyzed\n(n=138)", analyzed_treatment = "Analyzed\n(n=142)" ) ) # Plot the CONSORT diagram ggdag(consort_data, text = FALSE, use_labels = "label") + theme_dag() + theme( panel.background = element_rect(fill = "white", color = NA), plot.title = element_text(hjust = 0.5) ) + labs(title = "CONSORT Flow Diagram", caption = "Trial ID: XYZ-123") ``` ### Kaplan-Meier Survival Curves Survival analysis is fundamental to many clinical trials: ```{r} #| echo: true #| eval: false library(survival) library(survminer) library(tidyverse) library(here) # Load sample survival data survival_data <- read_csv(here("data", "processed", "survival_data.csv")) %>% mutate(treatment = factor(treatment_group, levels = c("Placebo", "Active"))) # Fit survival model surv_fit <- survfit(Surv(time, event) ~ treatment, data = survival_data) # Create publication-quality survival curve ggsurvplot( surv_fit, data = survival_data, risk.table = TRUE, # Add risk table pval = TRUE, # Add p-value conf.int = TRUE, # Add confidence intervals palette = c("#E7B800", "#2E9FDF"), legend.labs = c("Placebo", "Active Treatment"), risk.table.height = 0.25, # Adjust risk table height tables.theme = theme_cleantable(), xlab = "Time (Months)", ylab = "Overall Survival Probability", title = "Kaplan-Meier Estimate of Overall Survival", ggtheme = theme_bw() + theme( plot.title = element_text(face = "bold"), legend.position = "bottom" ) ) ``` ### Forest Plots for Subgroup Analysis Forest plots provide a clear visualization of treatment effects across subgroups: ```{r} #| echo: true #| eval: false library(tidyverse) library(forestplot) library(here) # Load subgroup analysis data subgroup_data <- read_csv(here("data", "processed", "subgroup_analysis.csv")) # Prepare data for forest plot forest_data <- subgroup_data %>% mutate( subgroup = factor(subgroup, levels = c("Overall", "Age < 65", "Age ≥ 65", "Male", "Female", "Prior Therapy: Yes", "Prior Therapy: No")), label = paste0(subgroup, " (n = ", n, ")"), effect = hazard_ratio, lower = ci_lower, upper = ci_upper ) %>% arrange(subgroup) %>% select(label, effect, lower, upper, p_value) # Create forest plot header header <- c("Subgroup", "Hazard Ratio (95% CI)", "", "p-value") # Create the forest plot forestplot( labeltext = forest_data$label, mean = forest_data$effect, lower = forest_data$lower, upper = forest_data$upper, is.summary = c(TRUE, rep(FALSE, nrow(forest_data) - 1)), zero = 1, boxsize = 0.2, lineheight = unit(0.8, "cm"), clip = c(0.3, 3), xlog = TRUE, col = fpColors( box = "royalblue", line = "darkblue", summary = "royalblue" ), title = "Treatment Effect by Subgroup", xlab = "Hazard Ratio (95% CI)\n<- Favors Treatment | Favors Control ->", txt_gp = fpTxtGp( ticks = gpar(cex = 0.9), xlab = gpar(cex = 0.9), title = gpar(cex = 1.1) ) ) ``` ## Customizing Visualizations for Regulatory Submissions ### Implementing Visual Style Guides Maintaining consistency across visualizations is crucial for regulatory submissions: ```{r} #| echo: true #| eval: false # Create a custom theme for clinical visualizations theme_clinical <- function(base_size = 12, base_family = "sans") { theme_minimal(base_size = base_size, base_family = base_family) %+replace% theme( # Typography text = element_text(color = "black"), plot.title = element_text(face = "bold", size = rel(1.2), hjust = 0), plot.subtitle = element_text(size = rel(0.9), hjust = 0, margin = margin(b = 10)), axis.title = element_text(face = "bold", size = rel(0.9)), # Gridlines and borders panel.grid.major = element_line(color = "grey85"), panel.grid.minor = element_blank(), axis.line = element_line(color = "black", size = 0.5), # Legend legend.position = "bottom", legend.title = element_text(face = "bold"), # Background panel.background = element_rect(fill = "white", color = NA), plot.background = element_rect(fill = "white", color = NA), # Margins plot.margin = margin(0.5, 0.5, 0.5, 0.5, "cm") ) } # Define a consistent color palette for treatment groups clinical_colors <- c( "Placebo" = "#999999", "Low Dose" = "#E69F00", "Medium Dose" = "#56B4E9", "High Dose" = "#009E73" ) # Example plot with custom theme ggplot(clinical_data, aes(x = visit_week, y = efficacy_score, color = treatment)) + geom_point(alpha = 0.7) + geom_smooth(method = "loess", se = TRUE) + scale_color_manual(values = clinical_colors) + labs( title = "Efficacy Score Over Time by Treatment Arm", x = "Study Week", y = "Efficacy Score", color = "Treatment Group" ) + theme_clinical() ``` ### Adding Key Statistical Information Enhancing visualizations with statistical annotations: ```{r} #| echo: true #| eval: false library(ggpubr) # Create boxplot with statistical comparisons ggplot(clinical_data, aes(x = treatment, y = efficacy_score, fill = treatment)) + geom_boxplot(width = 0.7, outlier.shape = 1) + # Add individual points for transparency geom_jitter(width = 0.2, alpha = 0.5) + # Add statistical comparisons stat_compare_means(comparisons = list(c("Placebo", "Low Dose"), c("Placebo", "High Dose"), c("Low Dose", "High Dose")), label = "p.signif") + # Add mean difference annotation stat_compare_means(label.y = max(clinical_data$efficacy_score) + 5) + # Apply custom theme and colors scale_fill_manual(values = clinical_colors) + theme_clinical() + labs( title = "Primary Endpoint: Efficacy Score at Week 12", subtitle = "Comparisons show statistical significance (* p<0.05, ** p<0.01, *** p<0.001)", x = "Treatment Group", y = "Efficacy Score", caption = "Analysis based on ITT population (N=240)" ) ``` ## Interactive Visualizations for Clinical Data Exploration While static visualizations are typically required for regulatory submissions, interactive tools can enhance data exploration and communication among research teams: ```{r} #| echo: true #| eval: false library(plotly) library(DT) # Create an interactive scatterplot efficacy_plot <- ggplot(clinical_data, aes(x = baseline_score, y = efficacy_score, color = treatment, text = patient_id)) + geom_point(size = 3, alpha = 0.7) + geom_smooth(method = "lm", se = FALSE) + scale_color_manual(values = clinical_colors) + labs( title = "Relationship Between Baseline and Week 12 Efficacy Scores", x = "Baseline Score", y = "Week 12 Efficacy Score", color = "Treatment Group" ) + theme_clinical() # Convert to interactive plotly object interactive_plot <- ggplotly(efficacy_plot, tooltip = "text") %>% layout(hoverlabel = list(bgcolor = "white")) # Display the interactive plot interactive_plot ``` For more advanced interactive visualization options for clinical data, see Chapter 10 on Interactive Elements. ## Visualization Best Practices for Specific Clinical Data Types ### Laboratory Data Visualization ```{r} #| echo: true #| eval: false # Create a function for lab data visualization plot_lab_data <- function(data, lab_param, reference_low = NULL, reference_high = NULL, log_scale = FALSE) { p <- ggplot(data, aes(x = visit_week, y = .data[[lab_param]], group = patient_id, color = treatment)) + # Add individual patient lines geom_line(alpha = 0.3) + # Add treatment group means with error bands stat_summary(aes(group = treatment), fun = mean, geom = "line", size = 1.5) + stat_summary(aes(group = treatment, fill = treatment), fun.data = mean_se, geom = "ribbon", alpha = 0.2, color = NA) + # Apply colors and labels scale_color_manual(values = clinical_colors) + scale_fill_manual(values = clinical_colors) + labs( title = paste("Change in", lab_param, "Over Time"), x = "Study Week", y = lab_param, color = "Treatment Group", fill = "Treatment Group" ) + theme_clinical() # Add reference ranges if provided if (!is.null(reference_low)) { p <- p + geom_hline(yintercept = reference_low, linetype = "dashed", color = "darkred") } if (!is.null(reference_high)) { p <- p + geom_hline(yintercept = reference_high, linetype = "dashed", color = "darkred") } # Apply log scale if requested if (log_scale) { p <- p + scale_y_log10() } return(p) } # Example usage plot_lab_data(clinical_data, "alkaline_phosphatase", reference_low = 35, reference_high = 105) ``` ### Adverse Event Visualization ```{r} #| echo: true #| eval: false library(tidyverse) library(here) # Load adverse event data ae_data <- read_csv(here("data", "processed", "adverse_events.csv")) # Prepare data for visualization ae_summary <- ae_data %>% group_by(treatment, ae_term) %>% summarize(count = n(), .groups = "drop") %>% group_by(treatment) %>% mutate(percent = count / sum(count) * 100) %>% ungroup() %>% # Select top 10 most common AEs group_by(ae_term) %>% mutate(total = sum(count)) %>% ungroup() %>% arrange(desc(total)) %>% filter(ae_term %in% unique(ae_term)[1:10]) # Create adverse event dot plot ggplot(ae_summary, aes(x = percent, y = reorder(ae_term, total), color = treatment, size = count)) + geom_point() + scale_color_manual(values = clinical_colors) + scale_size_continuous(range = c(2, 8)) + labs( title = "Incidence of Common Adverse Events by Treatment Group", subtitle = "Size represents the number of events", x = "Percentage of Patients (%)", y = NULL, color = "Treatment Group", size = "Event Count" ) + theme_clinical() + theme( panel.grid.major.y = element_line(color = "grey90"), panel.grid.minor = element_blank() ) ``` ## Integrating Visualizations into Reproducible Workflows Building on Chapter 6, let's explore how to integrate visualization into reproducible research workflows: ```{r} #| echo: true #| eval: false library(tidyverse) library(targets) library(here) # Define a targets workflow that includes visualization tar_script({ # Load functions and libraries source(here("R", "visualization_functions.R")) # Define targets for data processing (simplified example) tar_target(raw_data, read_csv(here("data", "raw", "clinical_data.csv"))) tar_target(processed_data, clean_clinical_data(raw_data)) # Define primary analysis model tar_target(efficacy_model, analyze_efficacy(processed_data)) # Define visualization targets tar_target( efficacy_plot, create_efficacy_plot(processed_data, efficacy_model) ) tar_target( ae_plot, create_ae_plot(processed_data) ) tar_target( km_plot, create_survival_plot(processed_data) ) # Save visualizations to standard locations tar_target( save_efficacy_plot, ggsave(here("reports", "figures", "efficacy_plot.png"), plot = efficacy_plot, width = 8, height = 6, dpi = 300) ) tar_target( save_ae_plot, ggsave(here("reports", "figures", "ae_plot.png"), plot = ae_plot, width = 10, height = 7, dpi = 300) ) tar_target( save_km_plot, ggsave(here("reports", "figures", "km_plot.png"), plot = km_plot, width = 8, height = 6, dpi = 300) ) }) ``` ## Conclusion Effective data visualization is a critical skill in clinical research, bridging the gap between complex statistical analyses and clear, actionable insights. By applying the principles and techniques outlined in this chapter, researchers can create visualizations that not only meet regulatory requirements but also enhance understanding and decision-making. In the next chapter, we'll explore detailed case studies that bring together all the elements we've covered so far—from data preparation to visualization—in real-world clinical research scenarios. ## References