Observability

Containment Chamber provides three observability pillars: Prometheus metrics, OpenTelemetry OTLP tracing, and structured JSON logging. The metrics endpoint runs on a separate port from the signing API, so you can expose metrics to your monitoring stack without exposing the signing surface.

Prometheus Metrics

Metrics are served on a dedicated HTTP endpoint, separate from the signing API (port 9000).

metrics:
  listen_address: "0.0.0.0"
  listen_port: 3000
  refresh_interval_seconds: 30

Option	Default	Description
`listen_address`	`0.0.0.0`	Bind address for the metrics server
`listen_port`	`3000`	Port for the metrics endpoint
`refresh_interval_seconds`	`30`	How often metrics are refreshed

Verify metrics are working:

curl http://localhost:3000/metrics

Metrics Reference

All metrics exposed at /metrics:

Signing

Name	Type	Description
`containment_signing_requests_total`	counter	Total signing requests by status and operation
`containment_signing_duration_seconds`	histogram	Duration of signing operations in seconds
`containment_slashing_rejections_total`	counter	Total signing requests rejected by slashing protection
`containment_signing_semaphore_available`	gauge	Available signing semaphore permits
`containment_signing_concurrency_limit`	gauge	Configured signing concurrency limit
`containment_canary_signing_total`	counter	Number of times a canary key has signed

Seal

Name	Type	Description
`containment_chamber_init_total`	counter	Number of chamber init ceremonies performed
`containment_chamber_seal_total`	counter	Number of emergency seal operations
`containment_chamber_unseal_total`	counter	Number of completed unseal ceremonies
`containment_chamber_unseal_shares_total`	counter	Number of unseal share submissions by operator
`containment_chamber_rotation_total`	counter	Number of rotation operations by type (kms, unseal, mode)

Keys

Name	Type	Description
`containment_keys_active`	gauge	Number of active validator keys by source
`containment_key_loading_duration_seconds`	gauge	Duration of key loading operations in seconds
`containment_key_load_failures_total`	counter	Total validator keys that failed to load
`containment_key_refresh_total`	counter	Total keys added via background refresh

Key Management API

Name	Type	Description
`containment_key_requests_total`	counter	Total Key Manager API requests by method
`containment_key_imports_total`	counter	Total validator keys imported via Key Manager API
`containment_key_deletions_total`	counter	Total validator keys deleted via Key Manager API
`containment_key_import_duration_seconds`	histogram	Duration of Key Manager API import operations in seconds

Keygen

Name	Type	Description
`containment_keygen_total`	counter	Total validator keys generated via keygen endpoint
`containment_keygen_duration_seconds`	histogram	Duration of keygen operations in seconds

Anti-Slashing

Name	Type	Description
`containment_antislashing_check_duration_seconds`	histogram	Duration of anti-slashing checks in seconds
`containment_antislashing_errors_total`	counter	Total anti-slashing backend errors
`containment_antislashing_pg_pool`	gauge	PostgreSQL connection pool state by status

Auth

Name	Type	Description
`containment_auth_rejections_total`	counter	Total authentication rejections by reason

HTTP Errors

Name	Type	Description
`containment_http_errors_total`	counter	Total HTTP error responses by status code

AWS/KMS

Name	Type	Description
`containment_aws_keystore_errors_total`	counter	Total AWS keystore errors by operation
`containment_kms_operations_total`	counter	Total KMS operations by action and status
`containment_kms_operation_duration_seconds`	histogram	Duration of KMS operations in seconds

System

Name	Type	Description
`containment_build_info`	gauge	Build information (version, commit, timestamp)
`containment_network_info`	gauge	Ethereum network configuration info gauge
`containment_healthy`	gauge	Health status of the signer (1 = healthy, 0 = unhealthy)
`containment_uptime_seconds`	gauge	Uptime in seconds since process start
`containment_process_resident_memory_bytes`	gauge	Resident memory usage in bytes (Linux only)
`containment_process_open_fds`	gauge	Number of open file descriptors (Linux only)

Backpressure

Name	Type	Description
`containment_queue_rejected_total`	counter	Total requests rejected due to backpressure

The operation label uses the signing operation names: AGGREGATION_SLOT, AGGREGATE_AND_PROOF, ATTESTATION, BLOCK_V2, RANDAO_REVEAL, SYNC_COMMITTEE_CONTRIBUTION_AND_PROOF, SYNC_COMMITTEE_MESSAGE, SYNC_COMMITTEE_SELECTION_PROOF, VALIDATOR_REGISTRATION, VOLUNTARY_EXIT.

Process metrics (containment_process_resident_memory_bytes and containment_process_open_fds) are only available on Linux.

OpenTelemetry OTLP Tracing

Containment Chamber can export distributed traces via gRPC OTLP to any OpenTelemetry-compatible collector — Jaeger, Grafana Tempo, Honeycomb, Datadog, and others.

opentelemetry:
  enabled: true
  endpoint: "http://otel-collector:4317"
  service_name: "containment-chamber"

Option	Default	Description
`enabled`	`false`	Enable OTLP trace export
`endpoint`	`http://localhost:4317`	gRPC OTLP collector endpoint
`service_name`	`containment-chamber`	Service name in traces

Traces include the full request lifecycle — from HTTP ingestion through authorization, slashing protection checks, and BLS signing.

Grafana Dashboards

Two pre-built Grafana dashboards are included in the repository under k8s/dashboards/:

Classic Dashboard
Kubernetes Dashboard

containment-chamber-classic.json — A standalone dashboard suitable for any deployment model (bare metal, Docker, Kubernetes).

Import via: Grafana → Dashboards → Import → Upload JSON file

containment-chamber-kubernetes.json — A Kubernetes-native dashboard with namespace and pod selector variables. Designed for multi-replica deployments where you need to filter by specific pods.

Import via: Grafana → Dashboards → Import → Upload JSON file

Kubernetes ServiceMonitor

If you use the Prometheus Operator, the Helm chart includes a ServiceMonitor resource for automatic scrape target discovery.

Enable it in your Helm values:

serviceMonitor:
  enabled: true
  scrapeInterval: "15s"
  additionalLabels:
    release: prometheus

All available ServiceMonitor options:

Option	Default	Description
`enabled`	`false`	Create a ServiceMonitor resource
`scrapeInterval`	`60s`	Prometheus scrape interval
`additionalLabels`	`{}`	Labels added to the ServiceMonitor
`namespace`	`""`	Namespace for the ServiceMonitor (defaults to release namespace)
`namespaceSelector`	`{}`	Namespace selector (use `any: true` to scrape all namespaces)
`targetLabels`	`[]`	Labels to transfer from the Kubernetes Service to scraped metrics
`metricRelabelings`	`[]`	Metric relabeling rules

Logging

By default, Containment Chamber outputs human-readable text logs with ANSI colors (when connected to a terminal). Switch to JSON for production log aggregation.

Configuration

logging:
  # Log level filter — supports tracing EnvFilter syntax
  # Examples: "info", "debug", "containment_chamber=debug,hyper=info"
  level: "info"              # default: "info"

  # Output format: "text" (human-readable) or "json" (structured)
  format: text               # default: "text"

  # ANSI colors in text output — auto-detects TTY by default
  log_color: null            # default: auto-detect (true if TTY, false otherwise)

Option	Type	Default	Description
`logging.level`	string	`"info"`	Log level filter (`EnvFilter` syntax)
`logging.format`	enum	`text`	`text` for human-readable, `json` for structured JSON
`logging.log_color`	boolean	auto	ANSI colors — auto-detects TTY when unset

Log levels

# Via config
logging:
  level: "containment_chamber=debug,hyper=info"

# Or via environment variable (overrides config)
RUST_LOG=containment_chamber=debug

JSON output

Enable JSON format for structured log aggregation (Datadog, Loki, CloudWatch, etc.):

logging:
  format: json
  log_color: false   # disable ANSI escape codes in JSON

Each JSON log line includes timestamp, level, target, span context, and message fields.

Audit Logging

Security-relevant events are logged with target: "audit". This target is separate from the normal containment_chamber target, so you can route audit events to a dedicated sink without changing your general log level.

Events logged to the audit target:

Event	When
`signing request`	Every signing attempt, including key and operation type
`state transition`	Seal machine state changes (e.g., `Sealed → AwaitingUnseal`)
`unseal share submitted`	When an operator submits an unseal share, including share index
`signer sealed`	When the signer is sealed, and by whom

Filtering audit events

# Include audit events alongside normal application logs
RUST_LOG=containment_chamber=info,audit=info

# Audit events only — suppress everything else
RUST_LOG=off,audit=info

In JSON mode, filter on "target":"audit" in your log aggregator (Datadog, Loki, CloudWatch, etc.) to build a dedicated audit trail.