
Architecture
Distributed heartbeat monitoring and automatic OCI instance recovery
How It Works
Monitor Mode (Oracle Nodes)
- Each Oracle node runs the monitor as a systemd service
- Monitor creates a Consul session with 30-second TTL and
deletebehavior - A KV pair is written at
oracle-watchdog/nodes/{nodename}, locked to the session - The session is renewed every 10 seconds - if renewal fails, the monitor reconnects automatically
- If a node becomes unresponsive (reclaimed by Oracle), the session expires and the KV pair is deleted
Agent Mode
- The agent runs on infrastructure separate from the monitored nodes (Docker, Nomad, or any host that can reach Consul and the OCI API)
- On each check interval (default 30s), it polls Consul for missing node KV pairs
- When a node has been absent longer than the timeout (default 5m), it triggers a restart:
- Issues an OCI stop command
- Polls instance state until STOPPED (10s intervals, 5m max wait)
- Issues an OCI start command
- Polls instance state until RUNNING
- Consecutive restart attempts are tracked per node and reset when the node recovers
- Duplicate restart prevention ensures only one restart is in-flight per node at a time
Safety Features
- Configurable max restart attempts per node (0 = unlimited)
- Dry-run mode for testing (
-dry-runflag) - Connection health tracking with consecutive failure thresholds for both Consul and OCI
- Automatic connection state machine transitions - never crashes, always retries
Optional Features
Both modes ship with an additional optional subsystem that runs in the same process when enabled in the config file. Both are default-disabled and independent of the core OCI-restart flow.
- WireGuard Endpoint Resolver (monitor) - re-resolves a configured WG
peer hostname on an interval and refreshes the kernel peer endpoint via
wgctrlwhen the resolved IP changes. Forces an immediate re-resolve when the most recent peer handshake exceeds the staleness threshold. - Cloudflare WAN-IP DDNS Updater (agent) - detects the host’s public IPv4 via configurable HTTP providers and PATCHes a Cloudflare A record when the value changes. IPv4 only. The Cloudflare API token is read once at startup from a configurable env var.
See the README and the package godoc for configuration reference.