-
-
Notifications
You must be signed in to change notification settings - Fork 28
Description
Bug description
I am trying to get a singleton scheduled on a cluster. I am creating a persistent scheduler (backed by a database), to allow grains to schedule messages at a later date, and the cluster to survive node outages, including full cluster outages.
When experimenting with it, I notice often that my cluster is left without the singleton running. This could full well be that I am doing something wrong initializing the singleton.
It seems this is a composition of several bugs on top of each other, so I am just going to write out what I was trying and what I observed. Hopefully you can make something out of that :D If not, let me know, and I will see if I can write small test applications :) But first, the main issue I was running into.
How to reproduce it?
Running two nodes in a cluster (with quorum of 2, with relocation enabled), and the following snippet on startup:
n.actorSystem.SpawnSingleton(ctx, "scheduler", scheduler.New())
Now when I shut down the node running the scheduler, I get this message:
{"level":"error","ts":"2025-12-27T10:00:14.898023+0100","caller":"actor/relocator.go:106","msg":"cluster rebalancing failed: spawn error: singleton already exists"}
And the cluster is left without a scheduler running.
Expected behavior
Always have the singleton running somewhere on the cluster.
Library Version:
- Go-Akt version: 1b11cfd
- Go version: 1.25.3
Additional context
Initially I was running my cluster with WithoutRelocation. In my use-case, I don't need grains to relocate when a node is shut down. They will just spin up a new instance on next invocation. No need for migration: only wasting CPU.
However, for singletons, if the leader goes away, it doesn't restart the singleton on the new leader either. I somewhat did expect that, as even so I use WithoutRelocation, I did expect singletons to be the exception here. But this was an assumption on my side :D
So now the question becomes: how do I keep a singleton running on a cluster with WithoutRelocation? As for grains, it is very much exactly what I want. Singletons however are the exception in my use-case. But I also can't tell when using client.TellGrain that the grain shouldn't be relocated on node issues.
Another thing I ran into, but this really is a "me" problem: I had a hard time getting singletons working. Initially I just did a n.actorSystem.SpawnSingleton(ctx, "scheduler", scheduler.New()) when each nodes starts. But this errors on the second node (correctly, if you ask me). The error however was difficult to deal with: internal: actor=(scheduler) actor already exists. This is not singleton already exists, and I couldn't differentiate between "there is another error" vs "this singleton already started".
In the end I just added an IsLeader check in front of the SpawnSingleton, once I figured out that "oldest node" means "is-leader" :) So my code became:
if isLeader, _ := n.actorSystem.IsLeader(ctx); isLeader {
n.actorSystem.SpawnSingleton(ctx, "scheduler", scheduler.New())
}
But there might be better ways to ensure a singleton is running on start of a cluster?