Skip to content

'isolated' data for Nim #244

@Araq

Description

@Araq

Proposal to add the concept of 'isolated' data to Nim

This is the evolution of owned ref. A new name, Isolated was chosen
in order to avoid confusions and also because it can be done almost entirely
as a library without bloating Nim's core type system further.

Motivation

We want to be able to pass subgraphs to threads, safely and easily. All
data races should be prevented at compile-time. We are not willing to pay
the price of atomic reference counting in order to do so. We also seek to
avoid Rust's and C++'s many different pointer types. Nim uses ref everywhere
and ref should remain the default pointer type to use. It is safe, efficient
and Nim's optimizers understand it well.

Yet, ref has no concept of unique ownership which is required for effective
message passing without copies. Hence we wrap it inside an Isolated[T]:

type
  Isolated*[T] = object ## Isolated data can only be moved, not copied.
    value: T

proc `=`*[T](dest: var Isolated[T]; src: Isolated[T]) {.error.}

proc `=sink`*[T](dest: var Isolated[T]; src: Isolated[T]) {.inline.} =
  # delegate to value's sink operation
  `=sink`(dest.value, src.value)

proc `=destroy`*[T](dest: var Isolated[T]) {.inline.} =
  # delegate to value's destroy operation
  `=destroy`(dest.value)

Isolated[T] is what a channel should use, comparable to
Rust's "sendable" trait:

proc send*[T](c: var Channel[T]; msg: sink Isolated[T])
proc recv*[T](c: var Channel[T]): T
  ## Note: Returns T, not Isolated[T] for convenience.

proc recvIso*[T](c: var Channel[T]): Isolated[T]
  ## remembers the data is Isolated[T].

How to construct isolated subgraphs

Construction must ensure that the invariant holds, namely that the wrapped T
is free of external aliases into it. To ensure it, we propose a Pony-inspired
recover construct, but named isolate for clarity:

func isolate(x: sink T): Isolated[T] {.magic: "Isolate".}

As you can see, this is a new builtin because the check it performs on x is non-trivial:

If T does not contain a ref or closure type, it is isolated. Else the syntactic structure of x is analyzed:

  • Atoms like nil, 4, "abc" are isolated.
  • An array constructor [x...] is isolated if every element x is isolated.
  • An object constructor Obj(x...) is isolated if every element x is isolated.
  • An if or case expression is isolated if all possible values the expression
    may return are isolated.
  • A type conversion C(x) is isolated if x is isolated. Analogous for cast
    expressions.
  • A field access x.field is isolated if x is isolated.
  • An array/seq access x[i] is not isolated if x is isolated as otherwise
    code like a.send x[i]; b.send x[i] (send to two different channels) might
    compile.
  • A function call f(x...) is isolated if f is .noSideEffect and for every argument x:
    • x is isolated or
    • f's return type cannot alias x's type. This is checked via a form of alias analysis as explained in the next paragraph.

Note: Previously the spec said that f must be .gcsafe, this is not sufficient, we cannot guarantee isolation for a .threadvar location.

Alias analysis

We start with an important, simple case that must be valid: Sending the result
of parseJson to a channel. Since the signature
is func parseJson(input: string): JsonNode it is easy to see that JsonNode
can never simply be a view into input which is a string.

A different case is the identity function id, send id(myJsonGraph) must be
invalid because we do not know how many aliases into myJsonGraph exist
elsewhere.

In general type A can alias type T if:

  • A and T are the same types.
  • A is a distinct type derived from T.
  • A is a field inside T if T is a final object type.
  • T is an inheritable object type. (An inherited type could always contain
    a field: A).
  • T is a closure type. Reason: T's environment can contain a field of
    type A.
  • A is the element type of T if T is an array, sequence or pointer type.

These rules ensure the freedom of potential data races but they can be quite
limiting. It remains to be seen if they suffice in practice. Pony's recover
is actually not a function, but a block of code. It is unclear at this point
if that really adds expressivity or not over this proposed builtin function.

Sugar

We expect the pattern send(isolate(f())) to be very common so we add a
template overload to the channel:

template send*[T](c: var Channel[T]; msg: sink T) =
  c.send(isolate(msg))

Example: send JsonNodes to a worker thread

var ch: Channel[JsonNode]
open(ch)

proc worker {.thread.} =
  while true:
    let t = ch.recv()
    if t == nil: break
    download(t["url"])

var thr: Thread[void]
createThread(thr, worker)

proc prepareTasks(fileWithUrls: string): seq[Isolated[JsonNode]] =
  result = @[]
  for line in lines(fileWithUrls):
    result.add isolate(parseJson(line))

proc spawnCrawlers =
  var tasks = prepareTasks("todo_urls.txt")
  for t in mitems tasks:
    ch.send move t

Next steps

There is now an implementation available in std/isolation. The channel type should use Isolated[T] for ARC/ORC and we need to see how it works.

Future directions

Isolated graphs can also be checked for isolation at runtime via something like func assertIsolated[T](graph: sink T): Isolated[T]. This would use ORC's mechanism for graph traversal and the involved reference counts to determine that no external pointers into graph exist. Since this is a runtime check and not a compile-time guarantee, we will first do without such a mechanism.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions