Skip to content

Observability

Containment Chamber provides three observability pillars: Prometheus metrics, OpenTelemetry OTLP tracing, and structured JSON logging. The metrics endpoint runs on a separate port from the signing API, so you can expose metrics to your monitoring stack without exposing the signing surface.

Metrics are served on a dedicated HTTP endpoint, separate from the signing API (port 9000).

metrics:
listen_address: "0.0.0.0"
listen_port: 3000
refresh_interval_seconds: 30
OptionDefaultDescription
listen_address0.0.0.0Bind address for the metrics server
listen_port3000Port for the metrics endpoint
refresh_interval_seconds30How often metrics are refreshed

Verify metrics are working:

Terminal window
curl http://localhost:3000/metrics

All metrics exposed at /metrics:

NameTypeDescription
containment_signing_requests_totalcounterTotal signing requests by status and operation
containment_signing_duration_secondshistogramDuration of signing operations in seconds
containment_slashing_rejections_totalcounterTotal signing requests rejected by slashing protection
containment_signing_semaphore_availablegaugeAvailable signing semaphore permits
containment_signing_concurrency_limitgaugeConfigured signing concurrency limit
containment_canary_signing_totalcounterNumber of times a canary key has signed
NameTypeDescription
containment_chamber_init_totalcounterNumber of chamber init ceremonies performed
containment_chamber_seal_totalcounterNumber of emergency seal operations
containment_chamber_unseal_totalcounterNumber of completed unseal ceremonies
containment_chamber_unseal_shares_totalcounterNumber of unseal share submissions by operator
containment_chamber_rotation_totalcounterNumber of rotation operations by type (kms, unseal, mode)
NameTypeDescription
containment_keys_activegaugeNumber of active validator keys by source
containment_key_loading_duration_secondsgaugeDuration of key loading operations in seconds
containment_key_load_failures_totalcounterTotal validator keys that failed to load
containment_key_refresh_totalcounterTotal keys added via background refresh
NameTypeDescription
containment_key_requests_totalcounterTotal Key Manager API requests by method
containment_key_imports_totalcounterTotal validator keys imported via Key Manager API
containment_key_deletions_totalcounterTotal validator keys deleted via Key Manager API
containment_key_import_duration_secondshistogramDuration of Key Manager API import operations in seconds
NameTypeDescription
containment_keygen_totalcounterTotal validator keys generated via keygen endpoint
containment_keygen_duration_secondshistogramDuration of keygen operations in seconds
NameTypeDescription
containment_antislashing_check_duration_secondshistogramDuration of anti-slashing checks in seconds
containment_antislashing_errors_totalcounterTotal anti-slashing backend errors
containment_antislashing_pg_poolgaugePostgreSQL connection pool state by status
NameTypeDescription
containment_auth_rejections_totalcounterTotal authentication rejections by reason
NameTypeDescription
containment_http_errors_totalcounterTotal HTTP error responses by status code
NameTypeDescription
containment_aws_keystore_errors_totalcounterTotal AWS keystore errors by operation
containment_kms_operations_totalcounterTotal KMS operations by action and status
containment_kms_operation_duration_secondshistogramDuration of KMS operations in seconds
NameTypeDescription
containment_build_infogaugeBuild information (version, commit, timestamp)
containment_network_infogaugeEthereum network configuration info gauge
containment_healthygaugeHealth status of the signer (1 = healthy, 0 = unhealthy)
containment_uptime_secondsgaugeUptime in seconds since process start
containment_process_resident_memory_bytesgaugeResident memory usage in bytes (Linux only)
containment_process_open_fdsgaugeNumber of open file descriptors (Linux only)
NameTypeDescription
containment_queue_rejected_totalcounterTotal requests rejected due to backpressure

The operation label uses the signing operation names: AGGREGATION_SLOT, AGGREGATE_AND_PROOF, ATTESTATION, BLOCK_V2, RANDAO_REVEAL, SYNC_COMMITTEE_CONTRIBUTION_AND_PROOF, SYNC_COMMITTEE_MESSAGE, SYNC_COMMITTEE_SELECTION_PROOF, VALIDATOR_REGISTRATION, VOLUNTARY_EXIT.

Process metrics (containment_process_resident_memory_bytes and containment_process_open_fds) are only available on Linux.

Containment Chamber can export distributed traces via gRPC OTLP to any OpenTelemetry-compatible collector — Jaeger, Grafana Tempo, Honeycomb, Datadog, and others.

opentelemetry:
enabled: true
endpoint: "http://otel-collector:4317"
service_name: "containment-chamber"
OptionDefaultDescription
enabledfalseEnable OTLP trace export
endpointhttp://localhost:4317gRPC OTLP collector endpoint
service_namecontainment-chamberService name in traces

Traces include the full request lifecycle — from HTTP ingestion through authorization, slashing protection checks, and BLS signing.

Two pre-built Grafana dashboards are included in the repository under k8s/dashboards/:

containment-chamber-classic.json — A standalone dashboard suitable for any deployment model (bare metal, Docker, Kubernetes).

Import via: Grafana → Dashboards → Import → Upload JSON file

If you use the Prometheus Operator, the Helm chart includes a ServiceMonitor resource for automatic scrape target discovery.

Enable it in your Helm values:

serviceMonitor:
enabled: true
scrapeInterval: "15s"
additionalLabels:
release: prometheus

All available ServiceMonitor options:

OptionDefaultDescription
enabledfalseCreate a ServiceMonitor resource
scrapeInterval60sPrometheus scrape interval
additionalLabels{}Labels added to the ServiceMonitor
namespace""Namespace for the ServiceMonitor (defaults to release namespace)
namespaceSelector{}Namespace selector (use any: true to scrape all namespaces)
targetLabels[]Labels to transfer from the Kubernetes Service to scraped metrics
metricRelabelings[]Metric relabeling rules

By default, Containment Chamber outputs human-readable text logs with ANSI colors (when connected to a terminal). Switch to JSON for production log aggregation.

logging:
# Log level filter — supports tracing EnvFilter syntax
# Examples: "info", "debug", "containment_chamber=debug,hyper=info"
level: "info" # default: "info"
# Output format: "text" (human-readable) or "json" (structured)
format: text # default: "text"
# ANSI colors in text output — auto-detects TTY by default
log_color: null # default: auto-detect (true if TTY, false otherwise)
OptionTypeDefaultDescription
logging.levelstring"info"Log level filter (EnvFilter syntax)
logging.formatenumtexttext for human-readable, json for structured JSON
logging.log_colorbooleanautoANSI colors — auto-detects TTY when unset
Terminal window
# Via config
logging:
level: "containment_chamber=debug,hyper=info"
# Or via environment variable (overrides config)
RUST_LOG=containment_chamber=debug

Enable JSON format for structured log aggregation (Datadog, Loki, CloudWatch, etc.):

logging:
format: json
log_color: false # disable ANSI escape codes in JSON

Each JSON log line includes timestamp, level, target, span context, and message fields.

Security-relevant events are logged with target: "audit". This target is separate from the normal containment_chamber target, so you can route audit events to a dedicated sink without changing your general log level.

Events logged to the audit target:

EventWhen
signing requestEvery signing attempt, including key and operation type
state transitionSeal machine state changes (e.g., Sealed → AwaitingUnseal)
unseal share submittedWhen an operator submits an unseal share, including share index
signer sealedWhen the signer is sealed, and by whom
Terminal window
# Include audit events alongside normal application logs
RUST_LOG=containment_chamber=info,audit=info
# Audit events only — suppress everything else
RUST_LOG=off,audit=info

In JSON mode, filter on "target":"audit" in your log aggregator (Datadog, Loki, CloudWatch, etc.) to build a dedicated audit trail.