proposal: spec: read-only types

I propose adding read-only types to Go. Read-only types have two related benefits:

1. The compiler guarantees that values of read-only type cannot be changed, eliminating unintended modifications that can cause subtle bugs. 
2. Copying as a defense against modification can be reduced, improving efficiency.

An additional minor benefit is the ability to take the address of constants.

This proposal makes significant changes to the language, so it is intended for Go 2.

All new syntax in this proposal is provisional and subject to bikeshedding.

**Basics**

All types have one of two _permissions_: read-only or read-write. Permission is a property of types, but I sometimes write "read-only value" to mean a value of read-only type.

A type preceded by `ro` is a read-only type. The identifier `ro` is pronounced _row_. It is a keyword. There is no notation for the read-write permission; any type not marked with `ro` is read-write.

The `ro` modifier can be applied to slices, arrays, maps, pointers, structs, channels and interfaces. It cannot be applied to any other type, including a read-only type: `ro ro T` is illegal.

It is a compile-time error to
* modify a value of read-only type,
* pass a read-only slice as the first argument of `append`,
* use slicing to extend the length of a read-only slice,
* or send to or receive from a read-only channel.

A value of read-only type may not be immutable, because it may be referenced through another type that is not read-only.

Examples:

1. A function can assert that it will not modify its argument.
  ```
func transmit(data ro []byte) { ... }
  ```
  The compiler guarantees that the bytes of `data` will not be altered by `transmit`.

2. A method can return an unexported field of its type without fear that it will be changed by the caller.
  ```
  type BufferedReader struct {
    buf []byte
  }
  
  func (b *BufferedReader) Buffer() ro []byte {
    return buf
  }
  ```

This proposal is concerned exclusively with avoiding modifications to _values_, not _variables_. Thus it allows assignment to variables of read-only type.
```
var EOF ro error = errors.New("EOF")
...
EOF = nil
```
One could imagine a companion proposal that also used `ro`, but to restrict assignment:
```
ro var EOF = ... // cannot assign to EOF
```
I don't pursue that idea here.

**Conversions**

There is an automatic conversion from `T` to `ro T`. For instance, an actual parameter of type `[]int` can be passed to a formal parameter of type `ro []int`. This conversion operates at any level: a `[][]int` can be converted to a `[]ro []int` for example. 

There is an automatic conversion from `string` to `ro []byte`. It does not apply to nested occurrences: there is no conversion from `[][]string` to `[]ro []byte`, for example.

(Rationale: `ro` does not change the representation of a type, so there is no cost to adding `ro` to any type, at any depth. A constant-time change in representation is required to convert from `string` to `ro []byte` because the latter is one word larger. Applying this change to every element of a slice, array or map would require a complete copy.)

**Transitivity**

Permissions are transitive: a component retrieved from a read-only value is treated as read-only.

For example, consider `var a ro []*int`. It is not only illegal to assign to `a[i]`; it is also illegal to assign to `*a[i]`. 

Transitivity increases safety, and it can also simplify reasoning about read-only types. For example, what is the difference between `ro *int` and `*ro int`? With transitivity, the first is equivalent to `ro *ro int`, so the difference is just the permission of the full type.

**The Address Operator**

If `v` has type `ro T`, then `&v` has type `*ro T`. 

If `v` has type `T`, then `ro &v` has type `ro *T`. This bit of syntax simplifies constructing read-only pointers to struct literals, like `ro &S{a: 1, b: 2}`.

Taking the address of constants is permitted, including constant literals. If `c` is a constant of type `T`, then `&c` is of type `ro *T` and is equivalent to
```
func() ro *T { v := c; return &v }()
```

**Read-Only Interfaces**

Any method of an interface may be preceded by `ro`. This indicates that the receiver of the method must have read-only type. 

```
type S interface {
   ro Marshal() ([]byte, error)
   Unmarshal(ro []byte) error
}
```

If `I` is an interface type, then `ro I` is effectively the sub-interface that contains just the read-only methods of `I`. If type `T` implements `I`, then type `ro T` implements `ro I`.

Read-only interfaces can prevent code duplication that might otherwise result from the combination of read-only types and interfaces. Consider the following code from the `sort` package:
```
type Interface interface {
	Less(i, j int) bool
	Len() int
	Swap(i, j int)
}
 
func Sort(data Interface) bool {
	… code using Less, Len, and Swap …
}
 
func IsSorted(data Interface) bool {
	… code using only Less and Len …
}

type IntSlice []int
func (x IntSlice) Less(i, j int) bool { return x[i] < x[j] }
func (x IntSlice) Len() int { return len(x) }
func (x IntSlice) Swap(i, j int) { x[i], x[j] = x[j], x[i] }
 
func Ints(a []int) { // invoked as sort.Ints
	Sort(IntSlice(a))
}
 
func IntsAreSorted(a []int) bool {
	return IsSorted(IntSlice(a))
}
```
We would like to allow `IntsAreSorted` to accept a read-only slice, since it does not change its argument. But we cannot
cast `ro []int` to `IntSlice`, because the `Swap` method modifies its receiver. It seems we must copy code somewhere.

The solution is to mark the first two methods of the interface as read-only:
```
type Interface interface {
	ro Less(i, j int) bool
	ro Len() int
	Swap(i, j int)
}

func (x ro IntSlice) Less(i, j int) bool { return x[i] < x[j] }
func (x ro IntSlice) Len() int { return len(x) }
```
Now we can write `IsSorted` in terms of the read-only sub-interface:
```
func IsSorted(data ro Interface) bool {
	… code using only Less and Len …
}
```
and call it on a read-only slice:
```
func IntsAreSorted(a ro []int) bool {
	return IsSorted(ro IntSlice(a))
}
```

**Permission Genericity**

One of the problems with read-only types is that they lead to duplicate functions. For example, consider this trivial function, ignoring its obvious problem with zero-length slices:
```
func tail1(x []int) []int { return x[1:] }
```
We cannot call `tail1` on values of type `ro []int`, but we can take advantage of the automatic conversion to write
```
func tail2(x ro []int) ro []int { return x[1:] }
```
Thanks to the conversion from read-write to read-only types, `tail2` can be passed an `[]int`. But it loses type information, because the return type is always `ro []int`. So the first of these calls is legal but the second is not:
```
var a = []int{1,2,3}
a = tail1(a)
a = tail2(a) // illegal: attempt to assign ro []int to []int
```
If we had to write two variants of every function like this, the benefits of read-only types would be outweighed by the pain they cause.

To deal with this problem, most programming languages rely on overloading. If Go had overloading, we would name both of the above functions `tail`, and the compiler would choose which to call based on the argument type. But we do not want to add overloading to Go.

Instead, we can add generics to Go&mdash;but just for permissions. Hence _permission genericity_.

Any type inside a function, including a return type, may be preceded by `ro?` instead of `ro`. If `ro?` appears in a function, it must appear in the function's argument list.

A function with an `ro?` argument `a` must type-check in two ways:
* `a` has type `ro T` and `ro?` is treated as `ro`.
* `a` has type `T` and `ro?` is treated as absent.

In calls to a function with a return type `ro? T`, the effective return type is `T` if the `ro?` argument `a` is a read-write type, and `ro T` if `a` is a read-only type.

Here is `tail` using this feature:
```
func tail(x ro? []int) ro? []int { return x[1:] }
```
`tail` type-checks because:
* With `x` declared as `ro []int`, the slice expression can be assigned to the effective return type `ro []int`.
* With `x` declared as `[]int`, the slice expression can be assigned to the effective return type `[]int`.

This call succeeds because the effective return type of `tail` is `ro []int` when the argument is `ro []int`:
```
var a = ro []int{1,2,3}
a = tail(a)
```
This call also succeeds, because `tail` returns `[]int` when its argument is `[]int`:
```
var b = []int{1,2,3}
b = tail(b)
```

Multiple, independent permissions can be expressed by using `ro?`, `ro??`, etc. (If the only feasible type-checking algorithm is exponential, implementations may restrict the number of distinct `ro?...` forms in the same function to a reasonable maximum, like ten.)

In an interface declaration, `ro?` may be used before the method name to refer to the receiver.
```
type I interface {
  ro? Tail() ro? I
}
```
  
There are no automatic conversions from function signatures using `ro?` to signatures that do not use `ro?`. Such conversions can be written explicitly. Examples:
```
func tail(x ro? []int) ro? []int { return x[1:] }

var (
    f1 func(x ro? []int) ro? []int = tail  // legal: same type
    f2 func(ro []int) ro []int = tail      // illegal: attempted automatic conversion
    f3 = (func(ro []int) ro []int)(tail)   // legal: explicit conversion
)
```

Permission genericity can be implemented completely within the compiler. It requires no run-time support. A function annotated with `ro?` requires only a single implementation.

**Strengths of This Proposal**

***Fewer Bugs***

The use of `ro` should reduce the number of bugs where memory is inadvertently modified. There will be fewer race conditions where two goroutines modify the same memory. One goroutine can still modify the memory that another goroutine reads, so not all race conditions will be eliminated.

***Less Copying***

Returning a reference to a value's unexported state can safely be done without copying the state, as shown in Example 2 above.

Many functions take `[]byte` arguments. Passing a string to such a function requires a copy. If the argument can be changed to `ro []byte`, the copy won't be necessary.

***Clearer Documentation***

Function documentation often states conditions that promise that the function doesn't modify its argument, or that extracts a promise from the caller not to modify a return value. If `ro` arguments and return types are used, those conditions are enforced by the compiler, so they can be deleted from the documentation. Furthermore, readers know that in a well-designed function, a non-`ro` argument will be written along at least one code path.

***Better Static Analysis Tools***

Read-only annotations will make it easier for some tools to do their job. For example, consider a tool that checks whether a piece of memory is modified by a goroutine after it sends it on a channel, which may indicate a race condition. Of course if the value is itself read-only, there is nothing to do. But even if it isn't, the tool can do its job by checking for writes locally, and also observing that the value is passed to other functions only via read-only argument. Without `ro` annotations, the check would be difficult (requiring examining the code of functions not in the current package) or impossible (if the call was through an interface).

***Less Duplication in the Standard Library***

Many functions in the standard library can be removed, or implemented as wrappers over other functions. Many of these involve the `string` and `[]byte` types.

If the `io.Writer.Write` method's argument becomes read-only, then `io.WriteString` is no longer necessary.

Functions in the `strings` package that do not return strings can be eliminated if the corresponding `bytes` method uses `ro`. For example, `strings.Index(string, string) int` can be eliminated in favor of (or can trivially wrap) `bytes.Index(ro []byte, ro []byte) int`. This amounts to 18 functions (including `Replacer.WriteString`). Also, the `strings.Reader` type can be eliminated.

Functions that return `string` cannot be eliminated, but they can be implemented as wrappers around the corresponding `bytes` function. For example, `bytes.ToLower` would have the signature `func ToLower(s ro? []byte) ro? []byte`, and the `strings` version could look like
```
func ToLower(s string) string {
    return string(bytes.ToLower(s))
}
```
The conversion to `string` involves a copy, but `ToLower` already contains a conversion from `[]byte` to `string`, so there is no change in efficiency.

Not all `strings` functions can wrap a `bytes` function with no loss of efficiency. For instance, `strings.TrimSpace` currently does not copy, but wrapping it around `bytes.TrimSpace` would require a conversion from `[]byte` to `string`.

Adding `ro` to the language without some sort of permission genericity would result in additional duplication in the `bytes` package, since functions that returned a `[]byte` would need a corresponding function returning `ro []byte`. Permission genericity avoids this additional duplication, as described above.

***Pointers to Literals***

Sometimes it's useful to distinguish the absence of a value from the zero value. For example, in the original Google protobuf implementation (still used widely within Google), a primitive-typed field of a message may contain its default value, or may be absent. 

The best translation of this feature into Go is to use pointers, so that, for example, an integer protobuf field maps to the Go type `*int`. That works well except for initialization: without pointers to literals, one must write
```
i := 3
m := &Message{I: &i}
```
or use a helper function.

In Go as it currently stands, an expression like `&3` cannot be permitted because assignment through the resulting pointer would be problematic. But if we stipulate that `&3` has type `ro *int`, then assignment is impossible and the problem goes away.

**Weaknesses of This Proposal**

***Loss of Generality***

Having both `T` and `ro T` in the language reduces the opportunities for writing general code. For example, an interface method with a `[]int` parameter cannot be satisfied by a concrete method that takes `ro []int`. A function variable of type `func() ro []int` cannot be assigned a function of type `func() []int`. Supporting these cases would start Go down the road of covariance/contravariance, which would be another large change to the language.

***Problems Going from `string` to `ro []byte`***

When we change an argument from `string` to `ro []byte`, we may eliminate copying at the call site, but it can reappear elsewhere because the guarantee is weaker: the argument is no longer immutable, so it is subject to change by code outside the function. For example, `os.Open` returns an error that contains the filename. If the filename were not immutable, it would have to be copied into the error message. Data structures like caches that need to remember their methods' arguments would also have to copy.

Also, replacing `string` with `ro []byte` would mean that implementers could no longer compare via operators, range over Unicode runes, or use values as map keys. 

***Subsumed by Generics***

Permission genericity could be subsumed by a suitably general design for generics. No such design for Go exists today. All known constraints on generic types use interfaces to express that satisfying types must provide all the interface's methods. The only other form of constraint is syntactic: for instance, one can write `[]T`, where `T` is a generic type variable, enforcing that only slice types can match. What is needed is a constraint of the form "`T` is either `[]S` or `ro []S`", that is, permission genericity. A generics proposal that included permissions would probably drop the syntax of this proposal and use identifiers for permissions, e.g.
```
gen <T, perm Ro> func tail(x Ro []T) Ro []T { return x[1:] }
```

***Missing Immutability***

This proposal lacks a permission for immutability. Such a permission has obvious charms: immutable values are goroutine-safe, and conversion between strings and immutable byte slices would work in both directions. 

The problem is how to construct immutable values. Literals of immutable type would only get one so far. For example, how could a program construct an immutable slice of the first N primes, where N is a parameter? The two easy answers&mdash;deep copying, or [letting the programmer assert immutability](https://dlang.org/spec/const3.html#creating_immutable_data)&mdash;are both unpalatable. Other solutions exist, but they would  require additional features on top of this proposal. Simply adding an `im` keyword would not be enough.

***Does Not Prevent Data Races***

A value cannot be modified through a read-only reference, but there may be other references to it that can be modified concurrently. So this proposal prevents some but not all data races. Modern languages like [Rust](https://www.rust-lang.org), [Pony](https://www.ponylang.org) and [Midori](http://joeduffyblog.com/2015/11/03/blogging-about-midori) have shown that it is possible to eliminate all data races at compile time. But the cost in complexity is high, and the value unclear&mdash;there would still be many opportunities for race conditions. If Go wanted to explore this route, I would argue that the current proposal is a good starting point.

**References**

[Brad Fitzpatrick's read-only slice proposal](https://docs.google.com/document/d/1UKu_do3FRvfeN5Bb1RxLohV-zBOJWTzX0E8ZU1bkqX0/edit#heading=h.2wzvdd6vdi83)

[Russ Cox's evaluation of the proposal](https://docs.google.com/document/d/1-NzIYu0qnnsshMBpMPmuO21qd8unlimHgKjRD9qwp2A/edit). This document identifies the problem with the `sort` package discussed above, and raises the problem of loss of generality as well as the issues that arise in moving from `string` to `ro []byte`.

[Discussion on golang-dev](https://groups.google.com/d/topic/golang-dev/Y7j4B2r_eDw/discussion)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

proposal: spec: read-only types #22876

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

proposal: spec: read-only types #22876

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions