This document presents a broad overview of the S7 object-oriented programming toolkit. (Please note that S7 is a working name and may change in the future.)
Objects
We define an S7 object as any R object with:
- A class object attribute, a reference to the class
object, and retrieved with
classObject()
. - For S3 compatibility, a class attribute, a character vector of class names.
- Additional attributes storing properties defined by
the class, accessible with
@
/prop()
.
Classes
S7 classes are first class objects (Req2) with the following components:
Name, a human-meaningful descriptor for the class. This is used for the default print method and in error messages; it does not identify the class.
Parent, the class object of the parent class. This implies single inheritance (Req6).
A constructor, an user-facing function used to create new objects of this class. It always ends with a call to
newObject()
to initialize the class. This the function (wrapped appropriately) that represents the class.A validator, a function that takes the object and returns
NULL
if the object is valid, otherwise a character vector of error messages (like the methods package).Properties, a list of property objects (see below) that define the data that objects can possess.
Classes are constructed by supplying these components to a call to
newClass()
. newClass()
returns a class object
that can be called to construct an object of that class.
newClass(
name,
parent = Object,
constructor = function(...) newObject(...),
validator = function(x) NULL,
properties = list()
)
For example:
Range <- newClass("Range",
Vector,
constructor = function(start, end) {
stopifnot(is.numeric(start), is.numeric(end), end >= start)
newObject(start = start, end = end)
},
validator = function(x) {
if (x@end < x@start) {
"end must be greater than or equal to start"
}
},
properties = c(start = "numeric", end = "numeric")
)
Range(start = 1, end = 10)
Initialization
The constructor uses newObject()
to
initialize a new object. This:
- Inspects the enclosing scope to find the “current” class.
- Creates the prototype, by either by calling the parent constructor
or by creating a base type and adding
class
andclassObject
attributes to it. - Validates properties then adds to prototype.
- Validates the complete object.
Steps 2 and 3 are similar to calling structure()
, except
that property values are initialized and validated through the property
system.
Shortcuts
By convention, any argument that takes a class object can instead take the name of a class object as a string. The name will be used to find a class object in the calling frame.
Similarly, instead of providing a list of property objects, you can
instead provide a named character vector. For example,
c(name = "character", age = "integer")
is shorthand for
list(newProperty("name", "character"), newProperty("age", "integer"))
.
Validation
Objects will be validated on initialization and every time a property
is modified. To temporarily opt-out of validation (e.g. in order to
transition through a temporarily invalid state), S7 provides
eventuallyValid()
:
eventuallyValid <- function(object, fun) {
object$internal_validation_flag <- FALSE
out <- fun(object)
out$internal_validation_flag <- TRUE
validate(out)
}
For example, if you wanted to move a Range object to the right, you could write:
move_right <- function(x, y) {
eventuallyValid(x, function(x) {
x@start <- x@start + y
x@end <- x@end + y
x
})
}
This ensures that the validation will not trigger if
x@start + y
is greater than x@end
.
S7 also provides implicitlyValid()
for expert use. This
is similar to eventuallyValid()
but does not check for
validity at the end. This can be used in performance critical code where
you can ascertain that a sequence of operations can never make a valid
object invalid.
(This can be quite hard: for example, in the
move_right()
example above, you might think that that if
x@start < x@end
is true at the beginning, then
x@start + y < x@end + y
will still be true at the end,
and you don’t need to re-validate the object. But that’s not necessarily
true: if x@start == 1
, x@end == 2
, and
abs(y) > 2e16
then
x@start + y == x@end + y
!)
Unions
A class union represents a list of possible classes. It is used in properties to allow a property to be one of a set of classes, and in method dispatch as a convenience for defining a method for multiple classes.
ClassUnion <- defineClass("ClassUnion",
properties = list(classes = "list"),
validator = function(x) {
# Checks that all elements of classes are Class object
},
constructor = function(...) {
classes <- list(...)
# look up Class object from any class vectors
newObject(classes = classes)
}
)
ClassUnion("character", "NULL")
ClassUnion(Range, NULL)
Properties
Properties store the data needed by an object. Properties of an
object can be accessed using @
/@<-
or
prop()
/prop<-
. Setting the properties of an
object always triggers validation.
Properties are less encapsulated than their equivalents in most languages because R users expect transparency and want to get to the actual data inside an object and directly manipulate it to get their work done. Properties this support typical usage while still providing some ability to encapsulate data and hide implementation details.
Every property definition has a:
- A name, used to label output for humans.
- An optional class (or class union).
- An optional accessor function that overrides getting and setting, much like an active binding (by default, the value is stored as attribute, like S3/S4).
Property objects are created by newProperty()
:
newProperty(
name,
class = NULL,
accessor = NULL
)
Compared to S3 attributes, properties are considerably stricter: a class defines the names and types of its properties. Compared to S4 slots, properties enable an object to change its internals without breaking existing usage because it can provides a custom accessor that redirects to the new representation. There will be built-in support for emitting deprecation messages.
While it would be tempting to support public vs. private scoping on properties, it is not obvious how to do so, because no code is more privileged than any another. Nor is it clear whether the package should define the boundary, when packages sometimes define methods on classes defined in other packages and want to take advantage of their internal representations. We believe that the encapsulation afforded by properties is a good compromise.
Generics
A generic separates function interface from implementation: the generic defines the interface, and methods provide the implementation. For introspection purposes, it knows its name and the names of the arguments in its signature (the arguments considered during dispatch).
Calling newGeneric()
defines a new generic. It has the
signature:
newGeneric(name, FUN, signature)
The signature
would default to the first argument,
i.e. formals(FUN)[1]
. The body of FUN
would
resemble S3 and S4 generics. It might just call
UseMethod()
.
By convention, any argument that takes a generic function, can instead take the name of a generic function supplied as a string. The name will be used to find the class object in the calling frame.
Methods
Creation
Methods are defined by calling
method<-(generic, signature, method)
:
method(generic, signature) <- function(x, ...) {}
generic
is a generic function.signature
is a single class object, a class union, list of class objects/unions, or a character vector.method
is a compatible function, i.e. all arguments before...
have the same names in the same order; if the generic doesn’t have...
all arguments must be same.
Documentation will discuss the risks of defining a method when you don’t own either the generic or the class.
method<-
is designed to work at run-time (not just
package build-time) so that methods can be defined when packages that
define classes or generics are loaded after the package that defines the
methods. This typically occurs when providing methods for generics or
classes in suggested packages.
Dispatch
Dispatch is nested, meaning that if there are multiple arguments in the generic signature, it will dispatch on the first argument, then the second. Nested dispatch is likely easier to predict and understand compared to treating all arguments with equal precedence. Nested dispatch is also easier to implement efficiently, because classes would effectively inherit methods, and we could implement that inheritance using environments.
For example, a plot()
generic dispatching on
x
could be implemented like this:
plot <- function(x) {
method(plot, classObject(x))(x)
}
While a publish()
that publishes an object
x
to a destination y
, dispatching on both
arguments, could be implemented as:
publish <- function(x, y, ...) {
sig <- list(classObject(x), classObject(y))
method(publish, sig)(x, y, ...)
}
Because method dispatch is nested, this is presumably equivalent to something like:
publish <- function(x, y, ...) {
publish_x <- method(publish, classObject(x))
publish_xy <- method(publish_x, classObject(y))
publish_xy(x, y, ...)
}
Alternatively, the generics could just call UseMethod()
,
which would gain support for nested dispatch.
Compatibility
Documentation
The primary challenge is that the availability of methods and classes depends on which packages are installed, so we will need to generate index pages that list the methods for a generic or the methods with a particular class in their signature. The documentation blurbs could be lazily scraped at runtime to build an index, similar to how the help search index is created now. We would then generate help pages from the index.
The index of installed methods could provide additional benefits, including introspection. A generic could prompt the user to load a package if it fails to find an applicable method, while there is an installed, unloaded method available. This case would arise when a method is defined outside of the packages that define the class and generic.