Skip to content

proposal: spec: read-only types #22876

Open
@jba

Description

@jba

I propose adding read-only types to Go. Read-only types have two related benefits:

  1. The compiler guarantees that values of read-only type cannot be changed, eliminating unintended modifications that can cause subtle bugs.
  2. Copying as a defense against modification can be reduced, improving efficiency.

An additional minor benefit is the ability to take the address of constants.

This proposal makes significant changes to the language, so it is intended for Go 2.

All new syntax in this proposal is provisional and subject to bikeshedding.

Basics

All types have one of two permissions: read-only or read-write. Permission is a property of types, but I sometimes write "read-only value" to mean a value of read-only type.

A type preceded by ro is a read-only type. The identifier ro is pronounced row. It is a keyword. There is no notation for the read-write permission; any type not marked with ro is read-write.

The ro modifier can be applied to slices, arrays, maps, pointers, structs, channels and interfaces. It cannot be applied to any other type, including a read-only type: ro ro T is illegal.

It is a compile-time error to

  • modify a value of read-only type,
  • pass a read-only slice as the first argument of append,
  • use slicing to extend the length of a read-only slice,
  • or send to or receive from a read-only channel.

A value of read-only type may not be immutable, because it may be referenced through another type that is not read-only.

Examples:

  1. A function can assert that it will not modify its argument.
func transmit(data ro []byte) { ... }

The compiler guarantees that the bytes of data will not be altered by transmit.

  1. A method can return an unexported field of its type without fear that it will be changed by the caller.
type BufferedReader struct {
  buf []byte
}

func (b *BufferedReader) Buffer() ro []byte {
  return buf
}

This proposal is concerned exclusively with avoiding modifications to values, not variables. Thus it allows assignment to variables of read-only type.

var EOF ro error = errors.New("EOF")
...
EOF = nil

One could imagine a companion proposal that also used ro, but to restrict assignment:

ro var EOF = ... // cannot assign to EOF

I don't pursue that idea here.

Conversions

There is an automatic conversion from T to ro T. For instance, an actual parameter of type []int can be passed to a formal parameter of type ro []int. This conversion operates at any level: a [][]int can be converted to a []ro []int for example.

There is an automatic conversion from string to ro []byte. It does not apply to nested occurrences: there is no conversion from [][]string to []ro []byte, for example.

(Rationale: ro does not change the representation of a type, so there is no cost to adding ro to any type, at any depth. A constant-time change in representation is required to convert from string to ro []byte because the latter is one word larger. Applying this change to every element of a slice, array or map would require a complete copy.)

Transitivity

Permissions are transitive: a component retrieved from a read-only value is treated as read-only.

For example, consider var a ro []*int. It is not only illegal to assign to a[i]; it is also illegal to assign to *a[i].

Transitivity increases safety, and it can also simplify reasoning about read-only types. For example, what is the difference between ro *int and *ro int? With transitivity, the first is equivalent to ro *ro int, so the difference is just the permission of the full type.

The Address Operator

If v has type ro T, then &v has type *ro T.

If v has type T, then ro &v has type ro *T. This bit of syntax simplifies constructing read-only pointers to struct literals, like ro &S{a: 1, b: 2}.

Taking the address of constants is permitted, including constant literals. If c is a constant of type T, then &c is of type ro *T and is equivalent to

func() ro *T { v := c; return &v }()

Read-Only Interfaces

Any method of an interface may be preceded by ro. This indicates that the receiver of the method must have read-only type.

type S interface {
   ro Marshal() ([]byte, error)
   Unmarshal(ro []byte) error
}

If I is an interface type, then ro I is effectively the sub-interface that contains just the read-only methods of I. If type T implements I, then type ro T implements ro I.

Read-only interfaces can prevent code duplication that might otherwise result from the combination of read-only types and interfaces. Consider the following code from the sort package:

type Interface interface {
	Less(i, j int) bool
	Len() int
	Swap(i, j int)
}
 
func Sort(data Interface) bool {
	… code using Less, Len, and Swap …
}
 
func IsSorted(data Interface) bool {
	… code using only Less and Len …
}

type IntSlice []int
func (x IntSlice) Less(i, j int) bool { return x[i] < x[j] }
func (x IntSlice) Len() int { return len(x) }
func (x IntSlice) Swap(i, j int) { x[i], x[j] = x[j], x[i] }
 
func Ints(a []int) { // invoked as sort.Ints
	Sort(IntSlice(a))
}
 
func IntsAreSorted(a []int) bool {
	return IsSorted(IntSlice(a))
}

We would like to allow IntsAreSorted to accept a read-only slice, since it does not change its argument. But we cannot
cast ro []int to IntSlice, because the Swap method modifies its receiver. It seems we must copy code somewhere.

The solution is to mark the first two methods of the interface as read-only:

type Interface interface {
	ro Less(i, j int) bool
	ro Len() int
	Swap(i, j int)
}

func (x ro IntSlice) Less(i, j int) bool { return x[i] < x[j] }
func (x ro IntSlice) Len() int { return len(x) }

Now we can write IsSorted in terms of the read-only sub-interface:

func IsSorted(data ro Interface) bool {
	… code using only Less and Len …
}

and call it on a read-only slice:

func IntsAreSorted(a ro []int) bool {
	return IsSorted(ro IntSlice(a))
}

Permission Genericity

One of the problems with read-only types is that they lead to duplicate functions. For example, consider this trivial function, ignoring its obvious problem with zero-length slices:

func tail1(x []int) []int { return x[1:] }

We cannot call tail1 on values of type ro []int, but we can take advantage of the automatic conversion to write

func tail2(x ro []int) ro []int { return x[1:] }

Thanks to the conversion from read-write to read-only types, tail2 can be passed an []int. But it loses type information, because the return type is always ro []int. So the first of these calls is legal but the second is not:

var a = []int{1,2,3}
a = tail1(a)
a = tail2(a) // illegal: attempt to assign ro []int to []int

If we had to write two variants of every function like this, the benefits of read-only types would be outweighed by the pain they cause.

To deal with this problem, most programming languages rely on overloading. If Go had overloading, we would name both of the above functions tail, and the compiler would choose which to call based on the argument type. But we do not want to add overloading to Go.

Instead, we can add generics to Go—but just for permissions. Hence permission genericity.

Any type inside a function, including a return type, may be preceded by ro? instead of ro. If ro? appears in a function, it must appear in the function's argument list.

A function with an ro? argument a must type-check in two ways:

  • a has type ro T and ro? is treated as ro.
  • a has type T and ro? is treated as absent.

In calls to a function with a return type ro? T, the effective return type is T if the ro? argument a is a read-write type, and ro T if a is a read-only type.

Here is tail using this feature:

func tail(x ro? []int) ro? []int { return x[1:] }

tail type-checks because:

  • With x declared as ro []int, the slice expression can be assigned to the effective return type ro []int.
  • With x declared as []int, the slice expression can be assigned to the effective return type []int.

This call succeeds because the effective return type of tail is ro []int when the argument is ro []int:

var a = ro []int{1,2,3}
a = tail(a)

This call also succeeds, because tail returns []int when its argument is []int:

var b = []int{1,2,3}
b = tail(b)

Multiple, independent permissions can be expressed by using ro?, ro??, etc. (If the only feasible type-checking algorithm is exponential, implementations may restrict the number of distinct ro?... forms in the same function to a reasonable maximum, like ten.)

In an interface declaration, ro? may be used before the method name to refer to the receiver.

type I interface {
  ro? Tail() ro? I
}

There are no automatic conversions from function signatures using ro? to signatures that do not use ro?. Such conversions can be written explicitly. Examples:

func tail(x ro? []int) ro? []int { return x[1:] }

var (
    f1 func(x ro? []int) ro? []int = tail  // legal: same type
    f2 func(ro []int) ro []int = tail      // illegal: attempted automatic conversion
    f3 = (func(ro []int) ro []int)(tail)   // legal: explicit conversion
)

Permission genericity can be implemented completely within the compiler. It requires no run-time support. A function annotated with ro? requires only a single implementation.

Strengths of This Proposal

Fewer Bugs

The use of ro should reduce the number of bugs where memory is inadvertently modified. There will be fewer race conditions where two goroutines modify the same memory. One goroutine can still modify the memory that another goroutine reads, so not all race conditions will be eliminated.

Less Copying

Returning a reference to a value's unexported state can safely be done without copying the state, as shown in Example 2 above.

Many functions take []byte arguments. Passing a string to such a function requires a copy. If the argument can be changed to ro []byte, the copy won't be necessary.

Clearer Documentation

Function documentation often states conditions that promise that the function doesn't modify its argument, or that extracts a promise from the caller not to modify a return value. If ro arguments and return types are used, those conditions are enforced by the compiler, so they can be deleted from the documentation. Furthermore, readers know that in a well-designed function, a non-ro argument will be written along at least one code path.

Better Static Analysis Tools

Read-only annotations will make it easier for some tools to do their job. For example, consider a tool that checks whether a piece of memory is modified by a goroutine after it sends it on a channel, which may indicate a race condition. Of course if the value is itself read-only, there is nothing to do. But even if it isn't, the tool can do its job by checking for writes locally, and also observing that the value is passed to other functions only via read-only argument. Without ro annotations, the check would be difficult (requiring examining the code of functions not in the current package) or impossible (if the call was through an interface).

Less Duplication in the Standard Library

Many functions in the standard library can be removed, or implemented as wrappers over other functions. Many of these involve the string and []byte types.

If the io.Writer.Write method's argument becomes read-only, then io.WriteString is no longer necessary.

Functions in the strings package that do not return strings can be eliminated if the corresponding bytes method uses ro. For example, strings.Index(string, string) int can be eliminated in favor of (or can trivially wrap) bytes.Index(ro []byte, ro []byte) int. This amounts to 18 functions (including Replacer.WriteString). Also, the strings.Reader type can be eliminated.

Functions that return string cannot be eliminated, but they can be implemented as wrappers around the corresponding bytes function. For example, bytes.ToLower would have the signature func ToLower(s ro? []byte) ro? []byte, and the strings version could look like

func ToLower(s string) string {
    return string(bytes.ToLower(s))
}

The conversion to string involves a copy, but ToLower already contains a conversion from []byte to string, so there is no change in efficiency.

Not all strings functions can wrap a bytes function with no loss of efficiency. For instance, strings.TrimSpace currently does not copy, but wrapping it around bytes.TrimSpace would require a conversion from []byte to string.

Adding ro to the language without some sort of permission genericity would result in additional duplication in the bytes package, since functions that returned a []byte would need a corresponding function returning ro []byte. Permission genericity avoids this additional duplication, as described above.

Pointers to Literals

Sometimes it's useful to distinguish the absence of a value from the zero value. For example, in the original Google protobuf implementation (still used widely within Google), a primitive-typed field of a message may contain its default value, or may be absent.

The best translation of this feature into Go is to use pointers, so that, for example, an integer protobuf field maps to the Go type *int. That works well except for initialization: without pointers to literals, one must write

i := 3
m := &Message{I: &i}

or use a helper function.

In Go as it currently stands, an expression like &3 cannot be permitted because assignment through the resulting pointer would be problematic. But if we stipulate that &3 has type ro *int, then assignment is impossible and the problem goes away.

Weaknesses of This Proposal

Loss of Generality

Having both T and ro T in the language reduces the opportunities for writing general code. For example, an interface method with a []int parameter cannot be satisfied by a concrete method that takes ro []int. A function variable of type func() ro []int cannot be assigned a function of type func() []int. Supporting these cases would start Go down the road of covariance/contravariance, which would be another large change to the language.

Problems Going from string to ro []byte

When we change an argument from string to ro []byte, we may eliminate copying at the call site, but it can reappear elsewhere because the guarantee is weaker: the argument is no longer immutable, so it is subject to change by code outside the function. For example, os.Open returns an error that contains the filename. If the filename were not immutable, it would have to be copied into the error message. Data structures like caches that need to remember their methods' arguments would also have to copy.

Also, replacing string with ro []byte would mean that implementers could no longer compare via operators, range over Unicode runes, or use values as map keys.

Subsumed by Generics

Permission genericity could be subsumed by a suitably general design for generics. No such design for Go exists today. All known constraints on generic types use interfaces to express that satisfying types must provide all the interface's methods. The only other form of constraint is syntactic: for instance, one can write []T, where T is a generic type variable, enforcing that only slice types can match. What is needed is a constraint of the form "T is either []S or ro []S", that is, permission genericity. A generics proposal that included permissions would probably drop the syntax of this proposal and use identifiers for permissions, e.g.

gen <T, perm Ro> func tail(x Ro []T) Ro []T { return x[1:] }

Missing Immutability

This proposal lacks a permission for immutability. Such a permission has obvious charms: immutable values are goroutine-safe, and conversion between strings and immutable byte slices would work in both directions.

The problem is how to construct immutable values. Literals of immutable type would only get one so far. For example, how could a program construct an immutable slice of the first N primes, where N is a parameter? The two easy answers—deep copying, or letting the programmer assert immutability—are both unpalatable. Other solutions exist, but they would require additional features on top of this proposal. Simply adding an im keyword would not be enough.

Does Not Prevent Data Races

A value cannot be modified through a read-only reference, but there may be other references to it that can be modified concurrently. So this proposal prevents some but not all data races. Modern languages like Rust, Pony and Midori have shown that it is possible to eliminate all data races at compile time. But the cost in complexity is high, and the value unclear—there would still be many opportunities for race conditions. If Go wanted to explore this route, I would argue that the current proposal is a good starting point.

References

Brad Fitzpatrick's read-only slice proposal

Russ Cox's evaluation of the proposal. This document identifies the problem with the sort package discussed above, and raises the problem of loss of generality as well as the issues that arise in moving from string to ro []byte.

Discussion on golang-dev

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions