Workshops

Promover la Equidad Científica: Una Introducción al uso de R para la programación en Bioestadística y Ciencia de Datos, en Español.

Descripción

A pesar de los abundantes recursos disponibles para aprender R, la mayoría de estos materiales son accesibles principalmente para angloparlantes. Esta barrera del idioma restringe significativamente el acceso de personas que no hablan inglés con fluidez. Como resultado, las comunidades de habla hispana a menudo enfrentan desafíos considerables para acceder a oportunidades de capacitación en software. Esta disparidad conduce a desigualdades en la distribución y utilización de las tecnologías científicas, lo que es particularmente preocupante dada la creciente importancia de las habilidades digitales en la actualidad. Para mitigar estos desafíos y promover la inclusión, proponemos realizar un taller de programación en español durante la conferencia. Esta iniciativa tiene como objetivo cerrar la brecha brindando a los participantes de habla hispana igualdad de oportunidades para interactuar y beneficiarse de los avances tecnológicos. Al hacerlo, no sólo mejoramos las capacidades individuales sino que también contribuimos a una distribución más equitativa de los recursos educativos en la comunidad científica. Este taller equipará a los asistentes con habilidades básicas en R. Nuestro objetivo principal es familiarizar a los participantes con RStudio y sus características clave para generar informes reproducibles. Guiaremos a los asistentes a través del proceso de creación y gestión de proyectos en RStudio y les presentaremos la creación de manuscritos reproducibles utilizando documentos Quarto. El taller utilizará un conjunto de datos disponible públicamente de los CDC, que contiene información sobre el uso de drogas y la ideación suicida entre adolescentes, como un ejemplo práctico del uso de R para la investigación académica en salud pública. Explicaremos cómo utilizar funciones como filtrar, mutar, resumir y seleccionar del conjunto de paquetes tidyverse. Concluiremos demostrando cómo usar ggplot2 para crear visualizaciones en R. Al final del taller, los participantes habrán creado un documento reproducible en formato HTML, detallando los pasos de limpieza de datos y el análisis de un problema social contemporáneo importante. Esta presentación tiene como objetivo cerrar la brecha en la alfabetización en programación entre los investigadores de habla hispana y promover métodos para la investigación científica reproducible.

Catalina Canizares-Escobar and Francisco Cardozo

Catalina Cañizares es una científica de datos apasionada y candidata Ph.D. en trabajo social. Ella se dedidca al uso de datos para obtener información sobre los trastornos emocionales. Ha estado profundizando en el análisis de datos, especialmente con R y su enfoque es hacer que los datos sean comprensibles y útiles. Se especializo en limpiar y fusionar datos, y le encanta explorar datos con herramientas como tidyverse, table1, gtsummary y skimr, entre otras. También se interesa utilizar modelos de Machine Learning, con el paquete tidymodels, para comprender mejor los trastornos emocionales.

Francisco Cardozo es un candidato a PhD en Ciencias de las Prevención y Salud Comunitaria, se especializa en aplicar técnicas cuantitativas para evaluar la eficacia de programas de prevención, centrándose en entender la dinámica de cómo y para quién estos programas son más efectivos. Está dedicado a desarrollar mediciones y análisis que informen decisiones sobre cómo operar los programas. Francisco siente pasión por traducir hallazgos científicos en aplicaciones prácticas.

Introduction to R for Clinical Data

Description

A gentle introduction to R and data science for healthcare professionals and clinical researchers.

Course GitHub repo: https://github.com/skadauke/intro-to-r-for-clinical-data-rmed2025

Course website: https://stephan-kadauke.quarto.pub/intro-to-r-for-clinical-data-rmed2025/

Stephan Kadauke and Rich Hanna

Stephan Kadauke is the Associate Director of the Cell Based Therapy Laboratory in the Department of Pathology at the Children’s Hospital of Philadelphia, and the Medical Director of the Cell and Gene Therapy Informatics Team. He leads efforts to build and deploy predictive models and other data products to improve the care of children receiving bone marrow transplants and other cell therapies. Stephan developed a curriculum in Reproducible Clinical Data Analysis tailored for physicians and other healthcare professionals. He is passionate about using data to improve the care of children.

Rich Hanna is a Data Scientist with the Cell and Gene Therapy Informatics Team at the Children’s Hospital of Philadelphia. He develops software tools and data-driven solutions to support cell therapy research and enhance patient care in pediatric medicine. Rich has a background in biomedical and mechanical engineering and specializes in automating clinical research workflows through advanced analytics and machine learning. He is passionate about leveraging data science to streamline healthcare operations and improve patient outcomes.

R package development with GitHub Pages and `pkgdown`

Description

Creating clear, professional documentation is key to making your R package useful to other people, but building a documentation website can be tricky. The pkgdown package makes this process much easier, but deploying via GitHub Pages can present its own set of challenges, especially if you want a highly customized website.

In this workshop, we’ll build a simple R package together, and use GitHub Actions and GitHub Pages to create a pkgdown documentation website for the package. By the end of this workshop, you’ll have:

🌎 Your package available on GitHub for others to install
📄 A polished pkgdown website for your package, deployed via GitHub Pages
🎨 An understanding of how to customize the trickier components of pkgdown websites

Workshop participants should have experience using Git, but don’t need prior experience with GitHub Actions or pkgdown.

Please see the pre-workshop setup steps here.

Melissa Van Bussel

Melissa Van Bussel is a Senior Data Analyst at Statistics Canada, where she develops and delivers training about open-source languages and tools. She also helps run a large R and Python User Group at Statistics Canada. Outside of work, she creates educational videos on YouTube (@ggnot2) about R and data science, and is an organizer for R-Ladies Ottawa. Melissa is an Accredited Associate Statistician, and received her M. Sc. in Statistics from Carleton University and has a B. Sc. H. in Mathematics and Computing Systems from Trent University.

The power of {targets} package for reproducible data science

Description

Reproducibility is a cornerstone of credible and robust data science. This talk delves into the powerful targets package showcasing how it streamlines and enhances reproducibility in data science workflows. The targets package in R provides a comprehensive framework for pipeline management, enabling eﬃcient dependency tracking, automated pipeline execution, and clear documentation of the entire data analysis process. It ensures execution of complex pipelines in consistent and isolated environments.

Combined with tools like {renv} and docker, this approach eliminates the it works on my machine problem. Through real-world examples, attendees will learn how to leverage these tools to create reproducible, scalable, and maintainable data science projects, ensuring that their analyses can be reliably replicated and shared across diverse computational environments.

This workshop is designed for data scientists and analysts who are looking to enhance their ability to manage and scale up their analytical pipelines. By the end of the session, attendees will have a deeper understanding of the targets package, it’s capabilities and how to apply this package to their workflows, from exploration, to model building, to plotting and report generation.

Attendees should follow the pre-workshop setup instructions found here.

Rahul Sangole

Rahul is a Sr Data Science manager at Apple where his work focuses on building scalable end-to-end data-science production solutions in R. He is interested in the areas of time-series analyses & forecasting, anomaly detection, and reproducible data science.

Survival analysis with tidymodels

Description

Survival analysis is now supported across the tidymodels framework, a collection of R packages for modeling and machine learning using tidyverse principles. It covers the entire predictive modeling workflow from data splitting, resampling, feature engineering, model fitting, and performance evaluation to tuning. It provides a consistent interface with composable functions that allow beginners a safe start and advanced users access to more specialized techniques such as feature engineering on text data or tuning via racing methods. The addition of dedicated performance metrics has enabled us to support tuning of survival models and unlock the entire framework for survival analysis. This workshop focuses on the core components of tidymodels to get you up and running with predictive survival analysis.

This workshop is for you if you

are familiar with basic survival analysis such as censoring of time-to-event data, Kaplan-Meier curves, proportional hazards models
are familiar with the basic predictive modeling workflow such as split in train and test set, resampling, tuning via grid search
want to learn how to leverage the tidymodels framework for survival analysis

Hannah Frick

Hannah Frick is a software engineer on the tidymodels team at Posit. She holds a PhD in statistics and has worked in interdisciplinary research and data science consultancy. She is a co-founder of R-Ladies Global.

First Steps with SQL in R: Making Data Talk

Description

Curious about SQL but not ready to dive into full-blown databases or external connections? This hands-on, beginner-friendly workshop is the perfect starting point. Using the lightweight and intuitive {sqldf} package in R, you’ll learn how to write SQL queries directly on your existing R data frames—no database setup required.

SQL (Structured Query Language) is a powerful tool for querying and transforming structured data, and it’s widely used in clinical research, data science, and industry analytics. In this 3-hour introductory session, we’ll bridge the gap between R and SQL in the most approachable way possible—by working with the data frames you already use in R.

We’ll cover:

What SQL is and why it’s useful alongside R
The basics of SQL syntax: SELECT, FROM, WHERE, ORDER BY, GROUP BY, and JOIN
How to use the {sqldf} package to run SQL queries on R data frames
Comparing SQL and dplyr for common data tasks
Writing readable, reusable SQL queries inside R scripts

This session is geared toward R users with little to no SQL experience. You’ll learn through guided examples and live coding, with time to practice writing your own queries. There’s no need for databases or complicated setup—just bring your laptop with R and {sqldf} installed, and we’ll take it from there.

By the end of the workshop, you’ll be able to:

Write SQL queries to filter, sort, group, and join data frames
Use {sqldf} to integrate SQL smoothly into your R scripts
Decide when SQL might be more effective than tidyverse functions (and vice versa)
Gain confidence in querying data more efficiently—even with large or complex datasets

This workshop is especially helpful for analysts, students, and researchers who want to become more versatile in handling data but prefer to stay within the comfort of the R environment. It’s also a great stepping stone for those who may later work with external databases (e.g., REDCap, EDC systems, or clinical trial platforms).

Come explore the power of SQL in a simple, approachable way—and learn to make your data talk.

Chris Battiston

Chris is responsible for the REDCap (Research Electronic Data Capture) platforms at Women’s College Hospital in Toronto, Canada. His work includes everything from maintaining the system (ensuring regulatory compliance) and building REDCap projects for clinical trials, to supporting teams with complex data management questions. The studies he supports range from complex longitudinal projects with tens of thousands of participants to one-time surveys. Chris prides himself at simplifying and explaining how complex ideas or features might work best for a specific project. His thoughtful work ensures that the EDI principles that Women’s is known for shine through in our surveys. Chris is a fan of a huge variety of music – from heavy metal to opera, and that he has been meditating for at least an hour every day since he was 16.

“Visualise, Optimise, Parameterise!” - Writing dataviz code that your future self will thank you for

Description

You’ve collected the initial data, explored the patterns and settled on a few graphs you’ll want to create now and in the future as you revisit the dataset from different angles and with new data as it comes in. How can you make sure that the graphs will be as memorable as possible, while also making life easier for your future self?

In this code-along workshop, we’ll work together on our datasets to:

Settle on the right types of graphs and find colours, fonts and additional styling which complement our storytelling
Apply these colours, fonts and additional styling to the graphs by creating reusable functions and vectors, avoiding all the copy-pasting we’d end up doing otherwise
Add annotations straight from the data to highlight patterns and points of interest
Make the graphs interactive (because why not?)
Adapt our code to create a function with parameters, which will allow us to apply the same plotting function to any number of future datasets, with a few extra options.

To get the most out of our workshop, I encourage you to bring a long a dataset and a basic graph (built with ggplot) that you’ve been working on for it. That should allow you to leave the workshop having made good progress towards a finished publishable product! If you don’t have a dataset you can use, don’t worry, I’ll provide one.

Cara Thompson

Cara is a data visualisation consultant with an academic background, specialising in helping research teams and data-driven organisations turn their data insights into to clear and compelling visualisations.

Following her PhD in Psychology and a spell teaching research methods at Edinburgh Uni, she embarked on a career in psychometrics at the Royal college of Surgeons of Edinburgh. After ten years of helping surgeons and other medical professionals understand complex patterns in exam data, she set out as an independent data visualisation consultant and launched her business “Building Stories with Data”, to continue crafting innovative dataviz solutions for a range of different organisations.

She lives in Edinburgh, Scotland, with her husband and two young daughters. Cara regularly shares coding tips for dataviz online, and genuinely enjoys helping others level up their dataviz skills through talks, bespoke toolkits, organisational training, and one-to-one coaching.

Rix: reproducible data science environments with Nix

Description

Reproducibility is a critical aspect of modern research, ensuring that results can be consistently replicated and verified by others. In this workshop, Bruno Rodrigues will introduce participants to Nix, a package manager that focuses on reproducible builds. Unlike other solutions, Nix takes care of all the layers of reproducibility by ensuring that not only R packages are correctly versioned, but also R itself, as well as any other system-level dependency. With Nix, it is essentially possible to replace containerization tools such as Docker, and Nix can be used on any operating system, as well as on CI/CD platforms. To make Nix more accessible to beginners, Bruno wrote an R package called {rix}, which he will cover with hands-on exercices.

Bruno Rodrigues

Bruno Rodrigues is a statistician at Luxembourg’s Ministry of Research and Higher Education. He has also worked as a senior data scientist and managed the data science team at PwC Luxembourg, following a period as a research assistant at STATEC Research.

Bruno is keenly interested in reproducibility and works on developing tools and writing tutorials that help others improve their research practices. He is the author of Building Reproducible Analytical Pipelines with R and contributes to the R ecosystem for Nix, a package manager focused on reproducible builds.

Personal R Administration

Description

Does the release of a new R version fill you with dread?
Are there passwords in your R code?
Do you look at the output of a failed package installation and think to yourself, “WTF?!”

If you said yes to any of those questions, then you need Personal R Administration. You’ll come away with tips, tricks, tweaks, and some hacks for building data science dev environments that you won’t be afraid to come back to in a year.

David Aja and Shannon Pileggi

E. David Aja is a Software Engineer at Posit. Before joining Posit, he worked as a data scientist in the public sector.

Shannon Pileggi (she/her) is a Lead Data Scientist at The Prostate Cancer Clinical Trials Consortium, a frequent blogger, and a member of the R-Ladies Global leadership team. She enjoys automating data wrangling and data outputs, and making both data insights and learning new material digestible.

Demystifying LLMs with Ellmer

Description

Today’s best LLMs are incredibly powerful–but you’re only scratching the surface of their capabilities if your use is limited to ChatGPT or Copilot. Accessing LLMs programmatically opens up a whole new world of possibilities, letting you integrate LLMs into your own apps, scripts, and workflows. In this workshop, we’ll cover:

A practical introduction to LLM APIs
Configuring R to access LLMs via the ellmer package
Customizing LLM behavior using system prompts and tool calling
Creating Shiny apps with integrated chatbots
Using LLMs for natural language processing

Attendees will leave the workshop armed with ready-to-run examples, and the confidence and inspiration to run their own experiments with LLMs.

Attendees should be familiar with the basics of R and have a working R installation.

Note that to avoid any potential firewall issues, it is recommended that participants use a personal computer for this workshop.

Joe Cheng

Joe Cheng is the CTO of Posit, PBC. He’s the original creator of the Shiny web framework and co-creator of ellmer.

teal Mastery: From Pre-built Modules to Custom Module Creation

Description

This session provides a comprehensive introduction to teal programming, starting with creating a simple teal application from scratch. You’ll learn the fundamentals of building a basic teal app and understand its core components. Next, we will explore the practical use of pre-built modules from teal.modules.general and teal.modules.clinical, demonstrating how these ready-to-use components can streamline the development of robust teal applications. Participants will gain hands-on experience in integrating these modules into their projects. The workshop will then focus on building upon this foundation by learning how to create custom teal modules that meet specific project needs. You’ll learn how to leverage the core features of the teal framework to develop tailored solutions and take your skills to the next level. This session will provide practical insights and coding examples, empowering you to extend and customize your teal applications beyond the capabilities of pre-built modules. By the end of this workshop, you will have a comprehensive understanding of both pre-built and custom module development in teal, making it an ideal choice for beginners and intermediate learners looking to expand their R skills with teal.

Dony Unardi

Dony Unardi is a Principal Data Scientist at Genentech, and the Engineering Team Lead in the development effort of an open-source R product called teal, a Shiny-based R package focused on interactive and reproducible data analysis and visualization in clinical trials.

Demos

A framework for cohort building in R: the CohortConstructor package for data mapped to the OMOP Common Data Model

Description

In this demo, we’ll introduce CohortConstructor, an R package that helps you build and manage patient cohorts using real-world health data mapped to the OMOP Common Data Model. The package makes it easier to apply both common and complex inclusion criteria, combine cohorts, update cohort entry and exit dates, and track groups of patients based on age, diagnoses, or time periods—all without writing complicated code. It also keeps track of the clinical codes used and the operations performed, making cohort work more transparent and reproducible. We’ll walk through the framework the package is built on, the typical workflow for creating cohorts, and the full set of cohort-curation tools included. Everything will be shown using examples you can also run locally on your own computer. This session is designed for anyone working with healthcare data in R—no OMOP expertise required. You can find the full abstract, setup instructions, and demo slides on the GitHub page.

Nuria Mercade-Besora and Edward Burn

Ed is a Senior Researcher in Epidemiology and Health Economics at the University of Oxford. Their research is focused on using routinely collected health care data to inform medical decision making. The foundation for much of this is the use of the OMOP Common Data Model to transform disparate sources of health care data into a standard format. Ed has led the development of various R packages that work with data standardised into this format, including packages created in collaboration with the European Medicines Agency for the Data Analysis and Real-World Interrogation Network (DARWIN) EU® initiative.

Nuria is a PhD student in Clinical Epidemiology and Medical Statistics at the University of Oxford. Their research interest is on investigating and applying methods to use real-world healthcare data to answer clinical causal questions. In addition, Nuria works part-time as a research assistant, contributing to the development of R packages for the analysis of data mapped to the OMOP Common Data Model, and to research studies conducted in collaboration with the European Medicines Agency for the DARWIN EU® initiative, in the role of data scientist.

Visualising data for patients: create accessible charts

Description

Visualising data for patients or other stakeholders may be difficult. Generally, our audience does not have a scientific or medical background, so our duty is to present data in a way that is understandable for them. To create clear and accessible charts, there are several factors to consider, such as decluttering, accessibility of the colour palette, and fonts. During this demo, we will see how to create accessible and clear charts in R. Attendees will learn how to create a new palette in R that is colourblind-friendly, and also how to create an autism-friendly palette. We will check if the colours in the chart are colourblind friendly in R using colourblind, and we will see how to transform a brand palette into a colourblind friendly palette. Moreover, we will look for fonts that have high readability and see how to include them in our R code.

Rita Giordano

Rita Giordano is an independent data visualisation consultant, with more than 15 years of experience in, statistics and data science. She’s also the founder of Clarum, a digital health start up based in Cambridge, UK. Her work focuses on accessibility and data visualisation in healthcare. She is a physicist, and she holds a PhD in statistics applied to crystallography. She has also created online courses on data science and data visualisation for Pearson UK and LinkedIn. Most recently, she taught statistics and data visualisation as a panel tutor at the University of Cambridge Institute of Continuing Education. She also teaches scientists who want to start their businesses how to use data to tell interesting stories. Currently, she works on a book on statistics and data visualisation for crystallography, for the publisher CRC Press.

Quarto Dashboards: from zero to publish in one hour

Description

You already analyze and summarize your data with R and Quarto. What’s next? You can share your insights or allow others to make their own conclusions in eye-catching dashboards and straight-forward to author, design, and deploy Quarto Dashboards. With Quarto Dashboards, you can create elegant and production-ready dashboards using a variety of components, including static graphics, interactive widgets, tabular data, value boxes, text annotations, and more. Additionally, with intelligent resizing of components, your Quarto Dashboards look great on devices of all sizes. And importantly, you can author Quarto Dashboards without leaving the comfort of your “home” – in plain text markdown with any text editor. In this one-hour demo we will build and publish a Quarto Dashboard – you can code-along or sit back and enjoy the show!

Mine Çetinkaya-Rundel

Mine Çetinkaya-Rundel is Professor of the Practice at Duke University and Developer Educator at Posit. Mine’s work focuses on innovation in statistics and data science pedagogy, with an emphasis on computing, reproducible research, student-centered learning, and open-source education as well as pedagogical approaches for enhancing retention of women and under-represented minorities in STEM. Mine works on the OpenIntro project, whose mission is to make educational products that are free, transparent, and lower barriers to education. As part of this project she co-authored four open-source introductory statistics textbooks – latest is the 2nd edition of Introduction to Modern Statistics. She is also a co-author on R for Data Science, the creator and maintainer of Data Science in a Box, and she teaches popular data analysis and data science with R courses on Coursera. Mine is a Fellow of the ASA and Elected Member of the ISI as well as a Waller and Hogg award winner for teaching excellence. In 2024, she was elected as Vice President of the International Association for Statistical Education (IASE).

R You Compliant? Validating Packages for Regulatory Readiness

Description

As R continues to gain momentum in clinical research, regulatory agencies are paying closer attention to how it’s used in regulated environments. Whether supporting statistical analysis, data visualization, or automation pipelines, R packages must meet stringent validation requirements to be considered fit for use in clinical trials and other regulated activities. But what does validation actually mean in this context—and how can researchers, data scientists, and system administrators ensure their R-based tools hold up under scrutiny? This talk will walk through practical strategies for validating R packages in alignment with regulatory expectations, including those from FDA, EMA, and under ICH guidelines such as E6(R3), E9, and M10. We’ll explore the intersection of open-source software and GxP compliance, unpacking the nuances of how to assess, document, and justify the use of community-developed or custom-built R packages. Attendees will learn: • What regulators expect when R is used in a clinical or GxP-regulated environment. • How to categorize R packages based on risk and intended use. • Validation approaches for both CRAN packages and internally developed tools. • How to document package selection, qualification, and performance testing. • Best practices for ongoing change control, version tracking, and reproducibility. • Real-world examples of validation frameworks and testing workflows using tools like {testthat}, {renv}, and {devtools}. Whether you’re part of a small academic research team or a large sponsor organization, this session offers practical guidance for creating defensible validation packages that support transparency, reproducibility, and regulatory compliance. We’ll also touch on tools and templates that can help streamline validation documentation and collaborate with Quality Assurance teams more effectively. By the end of this talk, you’ll walk away with a clearer understanding of what it takes to “make R compliant,” how to integrate validation into your development workflows, and how to future-proof your R environment in a regulated research setting. Because in clinical research, compliance isn’t just a checkbox—it’s part of building trust in our data, our methods, and ultimately, the science we support.

Chris Battiston

Panels

Supporting R learners on the job during interesting times: A panel of R educators

Description

Join R Educators Ray Balise, Silvia Canelón, Meghan Harris, Ted Laderas, and Joy Payton as they discuss the challenges of supporting and mentoring R learners outside of the traditional university classroom. Whether you mentor physician fellows, onboard research assistants, are the “go-to” person in your department for informal help and training, provide learning and development courses in your workplace, participate in user groups, write blog posts, or work issue queues on GitHub, there’s a place for you to participate in R education. Panelists will share their perspectives working with diverse populations of R learners, share lessons learned and favorite resources, and answer questions about best practices for capacity building. This panel discussion will benefit from your insight and questions, so come prepared for a lively interaction aimed at cross-pollination and building a network of engaged educators!

Workshops

Promover la Equidad Científica: Una Introducción al uso de R para la programación en Bioestadística y Ciencia de Datos, en Español.

Descripción

Catalina Canizares-Escobar and Francisco Cardozo

Introduction to R for Clinical Data

Description

Stephan Kadauke and Rich Hanna

R package development with GitHub Pages and pkgdown

Description

Melissa Van Bussel

The power of {targets} package for reproducible data science

Description

Rahul Sangole

Survival analysis with tidymodels

Description

Hannah Frick

First Steps with SQL in R: Making Data Talk

Description

Chris Battiston

“Visualise, Optimise, Parameterise!” - Writing dataviz code that your future self will thank you for

Description

Cara Thompson

Rix: reproducible data science environments with Nix

Description

Bruno Rodrigues

Personal R Administration

Description

David Aja and Shannon Pileggi

Demystifying LLMs with Ellmer

Description

Joe Cheng

teal Mastery: From Pre-built Modules to Custom Module Creation

Description

Dony Unardi

Demos

A framework for cohort building in R: the CohortConstructor package for data mapped to the OMOP Common Data Model

Description

Nuria Mercade-Besora and Edward Burn

Visualising data for patients: create accessible charts

Description

Rita Giordano

Quarto Dashboards: from zero to publish in one hour

Description

Mine Çetinkaya-Rundel

R You Compliant? Validating Packages for Regulatory Readiness

Description

Chris Battiston

Panels

Supporting R learners on the job during interesting times: A panel of R educators

Description

Ray Balise

Silvia Canelón

Meghan Harris

Ted Laderas

Joy Payton

R package development with GitHub Pages and `pkgdown`