Introducting the Software Engineering Working Group and {mmrm}

Ben Arancibia and Yoni Sidi

Agenda

Introduction and Overview of SWE WG
Mixed Models for Repeated Measures - Why is it a Problem?
MMRM Package
- Why this is not “yet another package”
- Long Term Perspective
- Comparing MMRM R Package to SAS - Demo
Closing and Next Steps

Who we are

Ben Arancibia: Director of Data Science at GSK sitting within Statistical Data Sciences Innovation Hub
Yoni Sidi: Director of Modeling and Simulation at Sage Therapeutics

Software Engineering in Biostatistics

Open-source software has gained increasing popularity in Biostatistics over the last two decades
- Pros: rapid uptake of novel statistical methods and unprecedented opportunities for collaboration and innovation
- Cons: users face huge variability in software quality, in particular reliability, efficiency and maintainability
Developing high-quality software with good coding practices, reproducible outputs, and self-sufficient documentation is critical to inform clinical and regulatory decisions

How to deal with these issues?

To deal with the issues of statistical quality assurances for R packages and creating high quality statistical software a group came together to create through the “Software Engineering Working Group” (SWE WG) to look R packages from a statistical point of view

Software Engineering working group (SWE WG)

An official working group of the ASA Biopharmaceutical Section
Formed in August 2022
Cross-industry collaboration with more than 30 members from over 20 organizations
Home page at rconsortium.github.io/asa-biop-swe-wg

Goals

Primary Goal: Collaborate to engineer R packages that implement important statistical methods to fill in critical gaps
Secondary Goal: Develop and disseminate best practices for engineering high-quality open-source statistical software

SWE WG Activities

First R package mmrm was published on CRAN in October 2022 and updated in December
- We aim to estabilish this package as a new standard for fitting mixed models for repeated measures (MMRM)
- We have been developing and adopting best practices for software in the mmrm package, and open sourced it at github.com/openpharma/mmrm
- Currently under active development to add more features

Why do we need a package for MMRM?

Mixed Models for Repeated Measures (MMRM) is a popular choice for analyzing longitudinal continuous outcomes in randomized clinical trials
No great R Package - initially thought that the MMRM problem was solved by using a combination of lme4 and lmerTest

Learned that this approach failed on large data sets (slow, did not converge)
nlme does not give Satterthwaite adjusted degrees of freedom, has convergence issues, and with emmeans it is only approximate
tried to extend glmmTMB to calculate Satterthwaite adjusted degrees of freedom

Before creating a new package

First try to improve existing package
- Here we tried to extend glmmTMB to calculate Satterthwaite adjusted degrees of freedom
- But it did not work
Think about long term maintenance and responsibility

Idea with some Details

Because glmmTMB is always using a random effects representation, we cannot have a real unstructured model (uses \(\sigma = \varepsilon > 0\) trick)
We only want to fit a fixed effects model with a structured covariance matrix for each subject
The idea is then to use the Template Model Builder (TMB) directly - as it is also underlying glmmTMB - but code the exact model we want
We do this by implementing the log-likelihood in C++ using the TMB provided libraries

Advantages of `TMB`

Fast C++ framework for defining objective functions (Rcpp would have been alternative interface)
Automatic differentiation of the log-likelihood as a function of the variance parameters
We get the gradient and Hessian exactly and without additional coding
This can be used from the R side with the TMB interface and plugged into optimizers

Why it’s not just another package

Ongoing maintenance and support from the pharmaceutical industry that is supported by American Statistical Association
Package is part of the mission, but to emphasize our goal is to push out information on practices for engineering high-quality open-source statistical software

Comparing SAS and R

To run an MMRM model in SAS it is recommended to use either the PROC MIXED or PROC GLM procedures.

Less model assumptions are applied in PROC MIXED, primarily how one treats missingness.
We will compares the PROC MIXED procedure to the {mmrm} package in the following attributes:

Documentation
Unit Testing
Estimation Methods

Covariance structures
Degrees of Freedom
Contrasts

Documentation

Both languages have online documentation of the technical details of the estimation and degrees of freedom methods and the different covariance structures available.

{mmrm}

PROC MIXED

Unit Testing

One major advantage of the {mmrm} over PROC MIXED is that the unit testing in {mmrm} is transparent. It uses the {testthat} framework with {covr} to communicate the testing coverage. Unit tests can be found in the GitHub repository under ./tests.

Note

The integration tests in {mmrm} are set to a tolerance of 10e^-3 when compared to SAS outputs.

Estimation Methods

Method	`{mmrm}`	`PROC MIXED`
ML	X	X
REML	X	X

Covariance structures

SAS has 23 non-spatial covariance structures, while mmrm has 10.
- 9 structures intersect with SAS
- Ante-dependence (homogeneous) is only in the mmrm package.
SAS has 14 spatial covariance structures compared to the spatial exponential one available in mmrm.

Tip

For users that need more structure {mmrm} is easily extensible via feature requests in the GitHub repository.

Covariance structures details

Covariance structures	`{mmrm}`	`PROC MIXED`
Unstructured (Unweighted/Weighted)	X/X	X/X
Toeplitz (hetero/homo)	X/X	X/X
Compound symmetry (hetero/homo)	X/X	X/X
Auto-regressive (hetero/homo)	X/X	X/X
Ante-dependence (hetero/homo)	X/X	X
Spatial exponential	X	X

Degrees of Freedom Methods

Method	`{mmrm}`	`PROC MIXED`
Contain	X*	X
Between/Within	X*	X
Residual	X*	X
Satterthwaite	X	X
Kenward-Roger	X	X
Kenward-Roger (Linear)**	X	X

*Available through the emmeans package.

**This is not equivalent to the KR2 setting in PROC MIXED

Contrasts/LSMEANS

Contrasts and LSMeans estimates are available in mmrm using

mmrm::df_1d, mmrm::df_md
S3 method that is compatible with emmeans
LS means difference can be produced through emmeans (pairs method)
Degrees of freedom method is passed from mmrm to emmeans
By default PROC MIXED and mmrm do not adjust for multiplicity, whereas emmeans does.

Note

These are comparable to the LSMEANS statement in PROC MIXED.

SWE WG Long Term Perspective

Software engineering is a critical competence in producing high-quality statistical software
A lot of work needs to be done regarding the establishment, dissemination and adoption of best practices for engineering open-source software
Improving the way software engineering is done will help improve the efficiency, reliability and innovation within Biostatistics

What’s next with {mmrm} & SWEWG

Prepare public training materials to disseminate best practice for software engineering in the Biostatistics community
- At the beginning of February, a face-to-face workshop will take place in Basel, Switzerland with a focus on open-source software for clinical trials
- Organize conference sessions with a focus on statistical software engineering at CEN, JSM and ASA/FDA Workshop
- Video series on best practices for software engineering (link)

New packages SWE WG is Working On

sasr
HTA
Bayesian MMRM

Note

Have an interest in working on these topics? Come work with us, information on the SWE WG can be found here: ASA BIOP SWE WG

Introducting the Software Engineering Working Group and {mmrm}

Agenda

Who we are

Software Engineering in Biostatistics

How to deal with these issues?

Software Engineering working group (SWE WG)

Goals

SWE WG Activities

Why do we need a package for MMRM?

Before creating a new package

Idea with some Details

Advantages of TMB

Why it’s not just another package

Comparing SAS and R

Documentation

Unit Testing

Estimation Methods

Covariance structures

Covariance structures details

Degrees of Freedom Methods

Contrasts/LSMEANS

SWE WG Long Term Perspective

What’s next with {mmrm} & SWEWG

New packages SWE WG is Working On

Thank you! Questions?

Advantages of `TMB`