Design specification • S7

This document presents a broad overview of the S7 object-oriented programming toolkit. (Please note that S7 is a working name and may change in the future.)

Objects

We define an S7 object as any R object with:

A class object attribute, a reference to the class object, and retrieved with classObject().
For S3 compatibility, a class attribute, a character vector of class names.
Additional attributes storing properties defined by the class, accessible with @/prop().

Classes

S7 classes are first class objects (Req2) with the following components:

Name, a human-meaningful descriptor for the class. This is used for the default print method and in error messages; it does not identify the class.
Parent, the class object of the parent class. This implies single inheritance (Req6).
A constructor, an user-facing function used to create new objects of this class. It always ends with a call to newObject() to initialize the class. This the function (wrapped appropriately) that represents the class.
A validator, a function that takes the object and returns NULL if the object is valid, otherwise a character vector of error messages (like the methods package).
Properties, a list of property objects (see below) that define the data that objects can possess.

Classes are constructed by supplying these components to a call to newClass(). newClass() returns a class object that can be called to construct an object of that class.

newClass(
  name,
  parent = Object,
  constructor = function(...) newObject(...),
  validator = function(x) NULL,
  properties = list()
)

For example:

Range <- newClass("Range",
  Vector,
  constructor = function(start, end) {
    stopifnot(is.numeric(start), is.numeric(end), end >= start)
    newObject(start = start, end = end)
  },
  validator = function(x) {
    if (x@end < x@start) {
      "end must be greater than or equal to start"
    }
  },
  properties = c(start = "numeric", end = "numeric")
)

Range(start = 1, end = 10)

Initialization

The constructor uses newObject() to initialize a new object. This:

Inspects the enclosing scope to find the “current” class.
Creates the prototype, by either by calling the parent constructor or by creating a base type and adding class and classObject attributes to it.
Validates properties then adds to prototype.
Validates the complete object.

Steps 2 and 3 are similar to calling structure(), except that property values are initialized and validated through the property system.

Shortcuts

By convention, any argument that takes a class object can instead take the name of a class object as a string. The name will be used to find a class object in the calling frame.

Similarly, instead of providing a list of property objects, you can instead provide a named character vector. For example, c(name = "character", age = "integer") is shorthand for list(newProperty("name", "character"), newProperty("age", "integer")).

Validation

Objects will be validated on initialization and every time a property is modified. To temporarily opt-out of validation (e.g. in order to transition through a temporarily invalid state), S7 provides eventuallyValid():

eventuallyValid <- function(object, fun) {
  object$internal_validation_flag <- FALSE
  out <- fun(object)
  out$internal_validation_flag <- TRUE
  validate(out)
}

For example, if you wanted to move a Range object to the right, you could write:

move_right <- function(x, y) {
  eventuallyValid(x, function(x) {
    x@start <- x@start + y
    x@end <- x@end + y
    x
  })
}

This ensures that the validation will not trigger if x@start + y is greater than x@end.

S7 also provides implicitlyValid() for expert use. This is similar to eventuallyValid() but does not check for validity at the end. This can be used in performance critical code where you can ascertain that a sequence of operations can never make a valid object invalid.

(This can be quite hard: for example, in the move_right() example above, you might think that that if x@start < x@end is true at the beginning, then x@start + y < x@end + y will still be true at the end, and you don’t need to re-validate the object. But that’s not necessarily true: if x@start == 1, x@end == 2, and abs(y) > 2e16 then x@start + y == x@end + y!)

Unions

A class union represents a list of possible classes. It is used in properties to allow a property to be one of a set of classes, and in method dispatch as a convenience for defining a method for multiple classes.

ClassUnion <- defineClass("ClassUnion",
  properties = list(classes = "list"),
  validator = function(x) {
    # Checks that all elements of classes are Class object
  },
  constructor = function(...) {
    classes <- list(...)
    # look up Class object from any class vectors
    newObject(classes = classes)
  }
)

ClassUnion("character", "NULL")
ClassUnion(Range, NULL)

Properties

Properties store the data needed by an object. Properties of an object can be accessed using @/@<- or prop()/prop<-. Setting the properties of an object always triggers validation.

Properties are less encapsulated than their equivalents in most languages because R users expect transparency and want to get to the actual data inside an object and directly manipulate it to get their work done. Properties this support typical usage while still providing some ability to encapsulate data and hide implementation details.

Every property definition has a:

A name, used to label output for humans.
An optional class (or class union).
An optional accessor function that overrides getting and setting, much like an active binding (by default, the value is stored as attribute, like S3/S4).

Property objects are created by newProperty():

newProperty(
  name,
  class = NULL,
  accessor = NULL
)

Compared to S3 attributes, properties are considerably stricter: a class defines the names and types of its properties. Compared to S4 slots, properties enable an object to change its internals without breaking existing usage because it can provides a custom accessor that redirects to the new representation. There will be built-in support for emitting deprecation messages.

While it would be tempting to support public vs. private scoping on properties, it is not obvious how to do so, because no code is more privileged than any another. Nor is it clear whether the package should define the boundary, when packages sometimes define methods on classes defined in other packages and want to take advantage of their internal representations. We believe that the encapsulation afforded by properties is a good compromise.

Generics

A generic separates function interface from implementation: the generic defines the interface, and methods provide the implementation. For introspection purposes, it knows its name and the names of the arguments in its signature (the arguments considered during dispatch).

Calling newGeneric() defines a new generic. It has the signature:

newGeneric(name, FUN, signature)

The signature would default to the first argument, i.e. formals(FUN)[1]. The body of FUN would resemble S3 and S4 generics. It might just call UseMethod().

By convention, any argument that takes a generic function, can instead take the name of a generic function supplied as a string. The name will be used to find the class object in the calling frame.

Methods

Creation

Methods are defined by calling method<-(generic, signature, method):

method(generic, signature) <- function(x, ...) {}

generic is a generic function.
signature is a single class object, a class union, list of class objects/unions, or a character vector.
method is a compatible function, i.e. all arguments before ... have the same names in the same order; if the generic doesn’t have ... all arguments must be same.

Documentation will discuss the risks of defining a method when you don’t own either the generic or the class.

method<- is designed to work at run-time (not just package build-time) so that methods can be defined when packages that define classes or generics are loaded after the package that defines the methods. This typically occurs when providing methods for generics or classes in suggested packages.

whenLoaded("pkg", {
  method(mean, pkg::A) <- function() 10
  method(sum, pkg::A) <- function() 5
})

Dispatch

Dispatch is nested, meaning that if there are multiple arguments in the generic signature, it will dispatch on the first argument, then the second. Nested dispatch is likely easier to predict and understand compared to treating all arguments with equal precedence. Nested dispatch is also easier to implement efficiently, because classes would effectively inherit methods, and we could implement that inheritance using environments.

For example, a plot() generic dispatching on x could be implemented like this:

plot <- function(x) {
  method(plot, classObject(x))(x)
}

While a publish() that publishes an object x to a destination y, dispatching on both arguments, could be implemented as:

publish <- function(x, y, ...) {
  sig <- list(classObject(x), classObject(y))
  method(publish, sig)(x, y, ...)
}

Because method dispatch is nested, this is presumably equivalent to something like:

publish <- function(x, y, ...) {
  publish_x <- method(publish, classObject(x))
  publish_xy <- method(publish_x, classObject(y))

  publish_xy(x, y, ...)
}

Alternatively, the generics could just call UseMethod(), which would gain support for nested dispatch.

Compatibility

S3

Since the class attribute has the same semantics as S3, S3 dispatch should be fully compatible. The new generics should also be able to handle legacy S3 objects.

S4

In principle, we could modify the methods package so that S4 generics can operate on the new class definitions. Since the new generics will fallback to S3 dispatch, they should support S4 objects just as S3 generics support them now.

Documentation

The primary challenge is that the availability of methods and classes depends on which packages are installed, so we will need to generate index pages that list the methods for a generic or the methods with a particular class in their signature. The documentation blurbs could be lazily scraped at runtime to build an index, similar to how the help search index is created now. We would then generate help pages from the index.

The index of installed methods could provide additional benefits, including introspection. A generic could prompt the user to load a package if it fails to find an applicable method, while there is an installed, unloaded method available. This case would arise when a method is defined outside of the packages that define the class and generic.