Introducing the Software Engineering working group

Who we are and what we build together

Daniel Sabanes Bove, Ya Wang

2023-01-19

What is the SWE WG?

ASA BIOP Software Engineering working group (SWE WG)

  • An official working group of the ASA Biopharmaceutical Section
  • Formed in August 2022
  • Cross-industry collaboration with more than 30 members from over 20 organizations
  • Home page at rconsortium.github.io/asa-biop-swe-wg
  • Open for new members!

Motivation

  • Importance of reliable software for statistical analysis cannot be underestimated
  • In the past a lot of statistical analyses have been performed with proprietary software
  • Open-source software has gained increasing popularity in Biostatistics over the last two decades
    • Pros: rapid uptake of novel methods and great opportunities for collaboration and innovation
    • Cons: users face huge variability in software quality (reliability, efficiency and maintainability)

Motivation (con’t)

  • Current repositories (GitHub, CRAN):
    • Do not require any statistical quality assurance
    • It’s harder to adopt packages without good documentation and maintenance
  • Developing high-quality software is critical to inform clinical and regulatory decisions:
    • Good coding practices
    • Reproducible outputs
    • Self-sufficient documentation

Goals

  • Primary Goal: Collaborate to engineer R packages that implement important statistical methods to fill in critical gaps

  • Secondary Goal: Develop and disseminate best practices for engineering high-quality open-source statistical software

SWE WG Activities

  • First R package mmrm was published on CRAN in October 2022 and updated in December
    • We aim to establish this package as a new standard for fitting mixed models for repeated measures (MMRM)
    • We have been developing and adopting best practices for software in the mmrm package, and open sourced it at github.com/openpharma/mmrm
    • Currently under active development to add more features

SWE WG Activities (con’t)

  • Prepare public training materials to disseminate best practice for software engineering in the Biostatistics community
    • At the beginning of February, a face-to-face workshop will take place in Basel, Switzerland with a focus on open-source software for clinical trials
    • Organize conference sessions with a focus on statistical software engineering at CEN, JSM and ASA/FDA Workshop
    • Youtube series with a focus on best practices for software engineering

Best Practices for Software Engineering

  • User interface design
  • Version control with git
  • Unit and integration tests
  • Consistent and readable code style
  • Documentation
  • Continuous integration setup
  • Publication on website and repositories
  • etc.

mmrm package example

Motivation

  • Mixed Models for Repeated Measures (MMRM) is a popular choice for analyzing longitudinal continuous outcomes in randomized clinical trials
  • No great R Package - initially thought that it was solved by using lme4 with lmerTest
  • But this approach failed on large data (slow, did not converge)
  • nlme does not give Satterthwaite adjusted degrees of freedom, has convergence issues, and with emmeans it is only approximate

Before creating a new package

  • First try to improve existing package
    • Here we tried to extend glmmTMB to calculate Satterthwaite adjusted degrees of freedom
    • But it did not work
  • Think about long term maintenance and responsibility

mmrm Package overview

  • Linear model for dependent observations within independent subjects
  • Multiple covariance structures available for the dependent observations
  • REML or ML estimation, using multiple optimizers if needed
  • emmeans interface for least square means, tidymodels for easy model fitting
  • Satterthwaite and Kenward-Roger adjustments

Why it’s not just another package

  • Ongoing maintenance and support from the pharmaceutical industry
    • 5 companies being involved in the development, on track to become standard package
  • Development using best practices as show case for high quality package
    • Thorough unit and integration tests (also comparing with SAS results) to ensure accurate results

Covariance structures

Covariance structures {mmrm} PROC MIXED
Unstructured (Unweighted/Weighted) X/X X/X
Toeplitz (hetero/homo) X/X X/X
Compound symmetry (hetero/homo) X/X X/X
Auto-regressive (hetero/homo) X/X X/X
Ante-dependence (hetero/homo) X/X X
Spatial exponential X X

Degrees of Freedom Methods

Method {mmrm} PROC MIXED
Contain X* X
Between/Within X* X
Residual X* X
Satterthwaite X X
Kenward-Roger X X
Kenward-Roger (Linear)** X X

*Available through the emmeans package.

**This is not equivalent to the KR2 setting in PROC MIXED

Contrasts/LSMEANS

Contrasts and least square means estimates are available in mmrm using:

  • df_1d, df_md
  • S3 method that is compatible with emmeans
  • LS means difference can be produced through emmeans (pairs method)
  • Degrees of freedom method is passed from mmrm to emmeans
  • By default PROC MIXED and mmrm do not adjust for multiplicity, whereas emmeans does.

Demo

Closing and Next Steps

Long term perspective

  • Software engineering is a critical competence in producing high-quality statistical software
  • A lot of work needs to be done regarding the establishment, dissemination and adoption of best practices for engineering open-source software
  • Improving the way software engineering is done will help improve the efficiency, reliability and innovation within Biostatistics

New packages coming up

  • sasr
  • HTA packages
  • Bayesian MMRM

Thank you! Questions?