Process Groups

Process groups let processes join named groups and receive broadcasts addressed to a group, with membership tracked across every node in the cluster. The model follows Erlang/OTP pg: groups are created on first join, a process can belong to many groups (and join one group multiple times), and there is no central registry — each node maintains state and reconciles with peers through gossip.

The Lua API is documented in Process Groups; this page covers the scope entry kind and its configuration. See the Cluster Guide for the surrounding membership model.

Entry Kind

Kind Description
pg.scope An independent process-group namespace with its own membership state and cluster mesh

Each scope is isolated: groups and members in one scope are invisible to another. A process opens a scope by its entry ID (pg.open("app:pg")) and operates within it.

- name: pg
  kind: pg.scope
  lifecycle:
    auto_start: true

Configuration

All fields are optional and have defaults tuned for a typical cluster.

Field Type Default Description
protocol_timeout duration 5s Timeout for inter-node sync/discover operations
broadcast_timeout duration 5s Timeout for delivering a broadcast to a single member
anti_entropy_interval duration 30s Cadence of the reconcile loop; one peer is synced per tick (0 disables)
circuit_breaker_failures int 3 Consecutive send failures to a node before its circuit opens
circuit_breaker_reset_time duration 10s Wait before an open circuit moves to half-open for a test send
max_retries int 3 Retry attempts for a failed broadcast (0 disables retries)
retry_base_delay duration 100ms Initial backoff delay between retries
retry_max_delay duration 1s Maximum backoff delay
action_queue_size int 256 Depth at which an "approaching capacity" warning is logged
action_queue_max_size int 1024 Hard capacity of the internal event-loop queue; operations are dropped when full
monitor_buffer int 64 Per-subscription event channel capacity; events drop for a subscriber whose buffer fills
max_groups int 0 Maximum distinct groups (0 = unlimited)
max_members_per_group int 0 Maximum members per group, counting multi-joins (0 = unlimited)
- name: pg
  kind: pg.scope
  anti_entropy_interval: 30s
  circuit_breaker_failures: 3
  max_members_per_group: 10000
  lifecycle:
    auto_start: true

How It Works

Single-writer state. Each scope runs a single-goroutine event loop (the gen_server pattern). All mutations are serialized through it; reads of members and groups are served from atomically-published snapshots, so they never block the loop.

Join/leave propagation. A local join or leave is applied to the loop and then fanned out to the union of the live membership peers and any previously-discovered remote nodes. Sending to that union — rather than only gossip-discovered peers — ensures a freshly joined or not-yet-converged node still receives the change.

Broadcast. broadcast snapshots the full cross-cluster member list inside the loop, then delivers to each member outside the loop so a slow recipient cannot stall the scope. broadcast_local does the same but only for members on the local node.

Monitor and events. Subscribing and snapshotting the current members happen in one event-loop tick, so a subscriber never misses or double-counts a change that races the subscription. Subscribers receive member.joined / member.left events; a leave for a process that joined N times reports the PID N times, preserving multiplicity.

Anti-entropy and discovery. On start, a scope sends discover messages to a small random subset of peers (capped to avoid an N² storm when many nodes restart at once). When a node joins, it receives a full state sync. The anti-entropy loop then periodically pushes a full sync to one peer at a time, so any broadcast a peer missed eventually converges. The receiver applies a differential sync — only members actually added or removed emit events.

Circuit breakers. A per-node circuit breaker tracks consecutive send failures. After circuit_breaker_failures failures it opens and sends to that node are skipped until circuit_breaker_reset_time elapses, when one test send is allowed. Join/leave broadcasts that hit an open breaker are retried with exponential backoff up to max_retries.

Observability

A liveness health check (pg.broadcast_recent.<scope>) reports unhealthy if a scope sees no broadcast traffic for an extended period, surfacing a wedged event loop or a persistent partition. See the Observability Guide.

See Also