Observability
Containment Chamber provides three observability pillars: Prometheus metrics, OpenTelemetry OTLP tracing, and structured JSON logging. The metrics endpoint runs on a separate port from the signing API, so you can expose metrics to your monitoring stack without exposing the signing surface.
Prometheus Metrics
Section titled “Prometheus Metrics”Metrics are served on a dedicated HTTP endpoint, separate from the signing API (port 9000).
metrics: listen_address: "0.0.0.0" listen_port: 3000 refresh_interval_seconds: 30| Option | Default | Description |
|---|---|---|
listen_address | 0.0.0.0 | Bind address for the metrics server |
listen_port | 3000 | Port for the metrics endpoint |
refresh_interval_seconds | 30 | How often metrics are refreshed |
Verify metrics are working:
curl http://localhost:3000/metricsMetrics Reference
Section titled “Metrics Reference”All metrics exposed at /metrics:
Signing
Section titled “Signing”| Name | Type | Description |
|---|---|---|
containment_signing_requests_total | counter | Total signing requests by status and operation |
containment_signing_duration_seconds | histogram | Duration of signing operations in seconds |
containment_slashing_rejections_total | counter | Total signing requests rejected by slashing protection |
containment_signing_semaphore_available | gauge | Available signing semaphore permits |
containment_signing_concurrency_limit | gauge | Configured signing concurrency limit |
containment_canary_signing_total | counter | Number of times a canary key has signed |
| Name | Type | Description |
|---|---|---|
containment_chamber_init_total | counter | Number of chamber init ceremonies performed |
containment_chamber_seal_total | counter | Number of emergency seal operations |
containment_chamber_unseal_total | counter | Number of completed unseal ceremonies |
containment_chamber_unseal_shares_total | counter | Number of unseal share submissions by operator |
containment_chamber_rotation_total | counter | Number of rotation operations by type (kms, unseal, mode) |
| Name | Type | Description |
|---|---|---|
containment_keys_active | gauge | Number of active validator keys by source |
containment_key_loading_duration_seconds | gauge | Duration of key loading operations in seconds |
containment_key_load_failures_total | counter | Total validator keys that failed to load |
containment_key_refresh_total | counter | Total keys added via background refresh |
Key Management API
Section titled “Key Management API”| Name | Type | Description |
|---|---|---|
containment_key_requests_total | counter | Total Key Manager API requests by method |
containment_key_imports_total | counter | Total validator keys imported via Key Manager API |
containment_key_deletions_total | counter | Total validator keys deleted via Key Manager API |
containment_key_import_duration_seconds | histogram | Duration of Key Manager API import operations in seconds |
Keygen
Section titled “Keygen”| Name | Type | Description |
|---|---|---|
containment_keygen_total | counter | Total validator keys generated via keygen endpoint |
containment_keygen_duration_seconds | histogram | Duration of keygen operations in seconds |
Anti-Slashing
Section titled “Anti-Slashing”| Name | Type | Description |
|---|---|---|
containment_antislashing_check_duration_seconds | histogram | Duration of anti-slashing checks in seconds |
containment_antislashing_errors_total | counter | Total anti-slashing backend errors |
containment_antislashing_pg_pool | gauge | PostgreSQL connection pool state by status |
| Name | Type | Description |
|---|---|---|
containment_auth_rejections_total | counter | Total authentication rejections by reason |
HTTP Errors
Section titled “HTTP Errors”| Name | Type | Description |
|---|---|---|
containment_http_errors_total | counter | Total HTTP error responses by status code |
AWS/KMS
Section titled “AWS/KMS”| Name | Type | Description |
|---|---|---|
containment_aws_keystore_errors_total | counter | Total AWS keystore errors by operation |
containment_kms_operations_total | counter | Total KMS operations by action and status |
containment_kms_operation_duration_seconds | histogram | Duration of KMS operations in seconds |
System
Section titled “System”| Name | Type | Description |
|---|---|---|
containment_build_info | gauge | Build information (version, commit, timestamp) |
containment_network_info | gauge | Ethereum network configuration info gauge |
containment_healthy | gauge | Health status of the signer (1 = healthy, 0 = unhealthy) |
containment_uptime_seconds | gauge | Uptime in seconds since process start |
containment_process_resident_memory_bytes | gauge | Resident memory usage in bytes (Linux only) |
containment_process_open_fds | gauge | Number of open file descriptors (Linux only) |
Backpressure
Section titled “Backpressure”| Name | Type | Description |
|---|---|---|
containment_queue_rejected_total | counter | Total requests rejected due to backpressure |
The operation label uses the signing operation names: AGGREGATION_SLOT, AGGREGATE_AND_PROOF, ATTESTATION, BLOCK_V2, RANDAO_REVEAL, SYNC_COMMITTEE_CONTRIBUTION_AND_PROOF, SYNC_COMMITTEE_MESSAGE, SYNC_COMMITTEE_SELECTION_PROOF, VALIDATOR_REGISTRATION, VOLUNTARY_EXIT.
Process metrics (containment_process_resident_memory_bytes and containment_process_open_fds) are only available on Linux.
OpenTelemetry OTLP Tracing
Section titled “OpenTelemetry OTLP Tracing”Containment Chamber can export distributed traces via gRPC OTLP to any OpenTelemetry-compatible collector — Jaeger, Grafana Tempo, Honeycomb, Datadog, and others.
opentelemetry: enabled: true endpoint: "http://otel-collector:4317" service_name: "containment-chamber"| Option | Default | Description |
|---|---|---|
enabled | false | Enable OTLP trace export |
endpoint | http://localhost:4317 | gRPC OTLP collector endpoint |
service_name | containment-chamber | Service name in traces |
Traces include the full request lifecycle — from HTTP ingestion through authorization, slashing protection checks, and BLS signing.
Grafana Dashboards
Section titled “Grafana Dashboards”Two pre-built Grafana dashboards are included in the repository under k8s/dashboards/:
containment-chamber-classic.json — A standalone dashboard suitable for any deployment model (bare metal, Docker, Kubernetes).
Import via: Grafana → Dashboards → Import → Upload JSON file
containment-chamber-kubernetes.json — A Kubernetes-native dashboard with namespace and pod selector variables. Designed for multi-replica deployments where you need to filter by specific pods.
Import via: Grafana → Dashboards → Import → Upload JSON file
Kubernetes ServiceMonitor
Section titled “Kubernetes ServiceMonitor”If you use the Prometheus Operator, the Helm chart includes a ServiceMonitor resource for automatic scrape target discovery.
Enable it in your Helm values:
serviceMonitor: enabled: true scrapeInterval: "15s" additionalLabels: release: prometheusAll available ServiceMonitor options:
| Option | Default | Description |
|---|---|---|
enabled | false | Create a ServiceMonitor resource |
scrapeInterval | 60s | Prometheus scrape interval |
additionalLabels | {} | Labels added to the ServiceMonitor |
namespace | "" | Namespace for the ServiceMonitor (defaults to release namespace) |
namespaceSelector | {} | Namespace selector (use any: true to scrape all namespaces) |
targetLabels | [] | Labels to transfer from the Kubernetes Service to scraped metrics |
metricRelabelings | [] | Metric relabeling rules |
Logging
Section titled “Logging”By default, Containment Chamber outputs human-readable text logs with ANSI colors (when connected to a terminal). Switch to JSON for production log aggregation.
Configuration
Section titled “Configuration”logging: # Log level filter — supports tracing EnvFilter syntax # Examples: "info", "debug", "containment_chamber=debug,hyper=info" level: "info" # default: "info"
# Output format: "text" (human-readable) or "json" (structured) format: text # default: "text"
# ANSI colors in text output — auto-detects TTY by default log_color: null # default: auto-detect (true if TTY, false otherwise)| Option | Type | Default | Description |
|---|---|---|---|
logging.level | string | "info" | Log level filter (EnvFilter syntax) |
logging.format | enum | text | text for human-readable, json for structured JSON |
logging.log_color | boolean | auto | ANSI colors — auto-detects TTY when unset |
Log levels
Section titled “Log levels”# Via configlogging: level: "containment_chamber=debug,hyper=info"
# Or via environment variable (overrides config)RUST_LOG=containment_chamber=debugJSON output
Section titled “JSON output”Enable JSON format for structured log aggregation (Datadog, Loki, CloudWatch, etc.):
logging: format: json log_color: false # disable ANSI escape codes in JSONEach JSON log line includes timestamp, level, target, span context, and message fields.
Audit Logging
Section titled “Audit Logging”Security-relevant events are logged with target: "audit". This target is separate from the normal containment_chamber target, so you can route audit events to a dedicated sink without changing your general log level.
Events logged to the audit target:
| Event | When |
|---|---|
signing request | Every signing attempt, including key and operation type |
state transition | Seal machine state changes (e.g., Sealed → AwaitingUnseal) |
unseal share submitted | When an operator submits an unseal share, including share index |
signer sealed | When the signer is sealed, and by whom |
Filtering audit events
Section titled “Filtering audit events”# Include audit events alongside normal application logsRUST_LOG=containment_chamber=info,audit=info
# Audit events only — suppress everything elseRUST_LOG=off,audit=infoIn JSON mode, filter on "target":"audit" in your log aggregator (Datadog, Loki, CloudWatch, etc.) to build a dedicated audit trail.