# Supervision
_Path: en/guides/supervision_
## Table of Contents
- Supervision
## Content
# Supervision
The supervisor manages service lifecycles, handling startup ordering, automatic restarts, and graceful shutdown. Services with `auto_start: true` are started when the application boots.
## Lifecycle Configuration
Services register with the supervisor using a `lifecycle` block. For processes, use `process.service` to wrap a process definition:
```yaml
# Process definition (the code)
- name: worker_process
kind: process.lua
source: file://worker.lua
method: main
# Supervised service (wraps the process with lifecycle management)
- name: worker
kind: process.service
process: app:worker_process
host: app:processes
lifecycle:
auto_start: true
start_timeout: 30s
stop_timeout: 10s
stable_threshold: 5s
depends_on:
- app:database
restart:
initial_delay: 2s
max_delay: 60s
max_attempts: 10
```
| Field | Default | Description |
|-------|---------|-------------|
| `auto_start` | `false` | Start automatically when supervisor starts |
| `start_timeout` | `10s` | Maximum time allowed for startup |
| `stop_timeout` | `10s` | Maximum time for graceful shutdown |
| `stable_threshold` | `5s` | Runtime before service is considered stable |
| `depends_on` | `[]` | Services that must be running first |
## Dependency Resolution
The supervisor resolves dependencies from two sources:
1. **Explicit dependencies** declared in `depends_on`
2. **Registry-extracted dependencies** from entry references (e.g., `database: app:db` in your config)
```mermaid
graph LR
A[HTTP Server] --> B[Router]
B --> C[Handler Function]
C --> D[Database]
C --> E[Cache]
```
Dependencies start before dependents. If Service C depends on A and B, both A and B must reach `Running` state before C starts.
You don't need to declare infrastructure entries like databases in depends_on. The supervisor automatically extracts dependencies from registry references in your entry configuration.
## Restart Policy
When a service fails, the supervisor retries with exponential backoff:
```yaml
lifecycle:
restart:
initial_delay: 1s # First retry wait
max_delay: 90s # Maximum delay cap
backoff_factor: 2.0 # Delay multiplier per attempt
jitter: 0.1 # ±10% randomization
max_attempts: 0 # 0 = infinite retries
```
| Attempt | Base Delay | With Jitter (±10%) |
|---------|------------|-------------------|
| 1 | 1s | 0.9s - 1.1s |
| 2 | 2s | 1.8s - 2.2s |
| 3 | 4s | 3.6s - 4.4s |
| 4 | 8s | 7.2s - 8.8s |
| ... | ... | ... |
| N | 90s | 81s - 99s (capped) |
When a service runs longer than `stable_threshold`, the retry counter resets. This prevents transient failures from permanently escalating delays.
### Terminal Errors
These errors stop retry attempts:
- Context cancellation
- Explicit termination request
- Errors marked as non-retryable
## Security Context
Services can run with a specific security identity:
```yaml
# Process definition
- name: admin_worker_process
kind: process.lua
source: file://admin_worker.lua
method: main
# Supervised service with security context
- name: admin_worker
kind: process.service
process: app:admin_worker_process
host: app:processes
lifecycle:
auto_start: true
security:
actor:
id: "service:admin-worker"
meta:
role: admin
groups:
- app:admin_policies
policies:
- app:data_access
```
The security context sets:
| Field | Description |
|-------|-------------|
| `actor.id` | Identity string for this service |
| `actor.meta` | Key-value metadata (role, permissions, etc.) |
| `groups` | Policy groups to apply |
| `policies` | Individual policies to apply |
Code running in the service inherits this security context. The `security` module can then check permissions:
```lua
local security = require("security")
if security.can("delete", "users") then
-- allowed
end
```
When no security context is configured, the service runs without an actor. In strict mode (default), security checks fail. Configure a security context for services that need authorization.
## Service States
```mermaid
stateDiagram-v2
[*] --> Inactive
Inactive --> Starting
Starting --> Running
Running --> Stopping
Stopping --> Stopped
Stopped --> [*]
Running --> Failed
Starting --> Failed
Failed --> Starting : retry
```
The supervisor transitions services through these states:
| State | Description |
|-------|-------------|
| `Inactive` | Registered but not started |
| `Starting` | Startup in progress |
| `Running` | Operating normally |
| `Stopping` | Graceful shutdown in progress |
| `Stopped` | Cleanly terminated |
| `Failed` | Error occurred, may retry |
## Startup and Shutdown Order
**Startup**: Dependencies first, then dependents. Services at the same dependency level can start in parallel.
**Shutdown**: Dependents first, then dependencies. This ensures dependent services finish before their dependencies stop.
```
Startup: database → cache → handler → http_server
Shutdown: http_server → handler → cache → database
```
## See Also
- [Process Model](concepts/process-model.md) - Process lifecycle
- [Configuration](guides/configuration.md) - YAML configuration format
- [Security Module](lua/security/security.md) - Permission checks in Lua
## Navigation
Previous: Queue Consumers (guides/queue-consumers)
Next: Publishing (guides/publishing)