Introducting the Software Engineering Working Group and {mmrm}

Ben Arancibia and Yoni Sidi

Agenda

  • Introduction and Overview of SWE WG

  • Mixed Models for Repeated Measures - Why is it a Problem?

  • MMRM Package

    • Why this is not “yet another package”

    • Long Term Perspective

    • Comparing MMRM R Package to SAS - Demo

  • Closing and Next Steps

Who we are

  • Ben Arancibia: Director of Data Science at GSK sitting within Statistical Data Sciences Innovation Hub
  • Yoni Sidi: Director of Modeling and Simulation at Sage Therapeutics

Software Engineering in Biostatistics

  • Open-source software has gained increasing popularity in Biostatistics over the last two decades
    • Pros: rapid uptake of novel statistical methods and unprecedented opportunities for collaboration and innovation
    • Cons: users face huge variability in software quality, in particular reliability, efficiency and maintainability
  • Developing high-quality software with good coding practices, reproducible outputs, and self-sufficient documentation is critical to inform clinical and regulatory decisions

How to deal with these issues?

  • To deal with the issues of statistical quality assurances for R packages and creating high quality statistical software a group came together to create through the “Software Engineering Working Group” (SWE WG) to look R packages from a statistical point of view

Software Engineering working group (SWE WG)

  • An official working group of the ASA Biopharmaceutical Section
  • Formed in August 2022
  • Cross-industry collaboration with more than 30 members from over 20 organizations
  • Home page at rconsortium.github.io/asa-biop-swe-wg

Goals

  • Primary Goal: Collaborate to engineer R packages that implement important statistical methods to fill in critical gaps

  • Secondary Goal: Develop and disseminate best practices for engineering high-quality open-source statistical software

SWE WG Activities

  • First R package mmrm was published on CRAN in October 2022 and updated in December
    • We aim to estabilish this package as a new standard for fitting mixed models for repeated measures (MMRM)
    • We have been developing and adopting best practices for software in the mmrm package, and open sourced it at github.com/openpharma/mmrm
    • Currently under active development to add more features

Why do we need a package for MMRM?

  • Mixed Models for Repeated Measures (MMRM) is a popular choice for analyzing longitudinal continuous outcomes in randomized clinical trials
  • No great R Package - initially thought that the MMRM problem was solved by using a combination of lme4 and lmerTest
  • Learned that this approach failed on large data sets (slow, did not converge)
  • nlme does not give Satterthwaite adjusted degrees of freedom, has convergence issues, and with emmeans it is only approximate
  • tried to extend glmmTMB to calculate Satterthwaite adjusted degrees of freedom

Before creating a new package

  • First try to improve existing package
    • Here we tried to extend glmmTMB to calculate Satterthwaite adjusted degrees of freedom
    • But it did not work
  • Think about long term maintenance and responsibility

Idea with some Details

  • Because glmmTMB is always using a random effects representation, we cannot have a real unstructured model (uses \(\sigma = \varepsilon > 0\) trick)
  • We only want to fit a fixed effects model with a structured covariance matrix for each subject
  • The idea is then to use the Template Model Builder (TMB) directly - as it is also underlying glmmTMB - but code the exact model we want
  • We do this by implementing the log-likelihood in C++ using the TMB provided libraries

Advantages of TMB

  • Fast C++ framework for defining objective functions (Rcpp would have been alternative interface)
  • Automatic differentiation of the log-likelihood as a function of the variance parameters
  • We get the gradient and Hessian exactly and without additional coding
  • This can be used from the R side with the TMB interface and plugged into optimizers

Why it’s not just another package

  • Ongoing maintenance and support from the pharmaceutical industry that is supported by American Statistical Association
  • Package is part of the mission, but to emphasize our goal is to push out information on practices for engineering high-quality open-source statistical software

Comparing SAS and R

To run an MMRM model in SAS it is recommended to use either the PROC MIXED or PROC GLM procedures.

  • Less model assumptions are applied in PROC MIXED, primarily how one treats missingness.

  • We will compares the PROC MIXED procedure to the {mmrm} package in the following attributes:

  • Documentation
  • Unit Testing
  • Estimation Methods
  • Covariance structures
  • Degrees of Freedom
  • Contrasts

Documentation

Both languages have online documentation of the technical details of the estimation and degrees of freedom methods and the different covariance structures available.

{mmrm}

PROC MIXED

Unit Testing

One major advantage of the {mmrm} over PROC MIXED is that the unit testing in {mmrm} is transparent. It uses the {testthat} framework with {covr} to communicate the testing coverage. Unit tests can be found in the GitHub repository under ./tests.

Note

The integration tests in {mmrm} are set to a tolerance of 10e^-3 when compared to SAS outputs.

Estimation Methods

Method {mmrm} PROC MIXED
ML X X
REML X X

Covariance structures

  • SAS has 23 non-spatial covariance structures, while mmrm has 10.
    • 9 structures intersect with SAS
    • Ante-dependence (homogeneous) is only in the mmrm package.
  • SAS has 14 spatial covariance structures compared to the spatial exponential one available in mmrm.

Tip

For users that need more structure {mmrm} is easily extensible via feature requests in the GitHub repository.

Covariance structures details

Covariance structures {mmrm} PROC MIXED
Unstructured (Unweighted/Weighted) X/X X/X
Toeplitz (hetero/homo) X/X X/X
Compound symmetry (hetero/homo) X/X X/X
Auto-regressive (hetero/homo) X/X X/X
Ante-dependence (hetero/homo) X/X X
Spatial exponential X X

Degrees of Freedom Methods

Method {mmrm} PROC MIXED
Contain X* X
Between/Within X* X
Residual X* X
Satterthwaite X X
Kenward-Roger X X
Kenward-Roger (Linear)** X X

*Available through the emmeans package.

**This is not equivalent to the KR2 setting in PROC MIXED

Contrasts/LSMEANS

Contrasts and LSMeans estimates are available in mmrm using

  • mmrm::df_1d, mmrm::df_md
  • S3 method that is compatible with emmeans
  • LS means difference can be produced through emmeans (pairs method)
  • Degrees of freedom method is passed from mmrm to emmeans
  • By default PROC MIXED and mmrm do not adjust for multiplicity, whereas emmeans does.

Note

These are comparable to the LSMEANS statement in PROC MIXED.

SWE WG Long Term Perspective

  • Software engineering is a critical competence in producing high-quality statistical software
  • A lot of work needs to be done regarding the establishment, dissemination and adoption of best practices for engineering open-source software
  • Improving the way software engineering is done will help improve the efficiency, reliability and innovation within Biostatistics

What’s next with {mmrm} & SWEWG

  • Prepare public training materials to disseminate best practice for software engineering in the Biostatistics community
    • At the beginning of February, a face-to-face workshop will take place in Basel, Switzerland with a focus on open-source software for clinical trials
    • Organize conference sessions with a focus on statistical software engineering at CEN, JSM and ASA/FDA Workshop
    • Video series on best practices for software engineering (link)

New packages SWE WG is Working On

  • sasr
  • HTA
  • Bayesian MMRM

Note

Have an interest in working on these topics? Come work with us, information on the SWE WG can be found here: ASA BIOP SWE WG

Thank you! Questions?