Command Center

Command Center provides a unified operations platform for your Crusoe GPU clusters, replacing fragmented monitoring tools with centralized observability, automated alerting, and integrated support workflows.

Why Command Center

Large-scale AI workloads require visibility into every resource in your cluster. Command Center delivers real-time telemetry across your infrastructure, eliminating the need for you to switch between SSH sessions, log dumps, and third-party dashboards.

Key Capabilities

View cluster topology — See health and utilization of every node, arranged by network topology.
Monitor metrics — Track GPU, CPU, memory, storage, and network performance. Ingest custom application metrics.
Access logs — Query Kubernetes pod logs and JournalD system logs without SSH.
Export telemetry — Export metrics to Grafana, Datadog, or Splunk via Prometheus-compatible endpoints.
Receive alerts — Get notified about hardware failures and cluster events via email, Slack, or webhooks.

Components

Command Center consists of the following components:

Component	Description	Availability
Topology	Visual cluster topology with GPU utilization, CPU utilization, and node health overlays	CMK only (VM support planned)
Metrics	Infrastructure and custom application metrics with Prometheus-compatible API and Crusoe Cloud Console	CMK and VM (custom metrics: CMK only)
Logs	Managed log collection and search for Kubernetes and system logs	CMK only (VM support planned)
Telemetry Relay	Export infrastructure metrics to external observability platforms	CMK and VM

Prerequisites

To use Command Center, you need:

Crusoe Cloud account with an active project
CMK cluster with NVIDIA GPU Operator add-on (if using GPU nodes). CMK version 1.33.4-cmk.31 or higher is preferred since it includes the Crusoe Watch Agent.
Crusoe CLI installed and configured
kubectl configured with cluster access
helm installed

Getting Started

Deploy a CMK cluster — Follow Managing your Clusters if needed.
Install the Crusoe Watch Agent — See Installing the Crusoe Watch Agent below.
Open Command Center — Navigate to Orchestration > select your cluster > Command Center tab.
Explore your cluster — Start with Topology, then drill into Metrics and Logs.

Installing the Crusoe Watch Agent

The Crusoe Watch Agent is a vector.dev-based DaemonSet that deploys one pod per node. It collects infrastructure metrics, logs, and custom application metrics.

note

Starting with CMK version 1.33.4-cmk.31, the Crusoe Watch Agent (Helm chart version 0.3.7 or higher) is bundled and automatically installed during cluster creation. Each CMK version is associated with a specific Helm chart version. The agent is only installed at cluster creation and will not be automatically updated if the CMK version is upgraded. For clusters on earlier CMK versions, follow the manual installation steps below.

Step 1: Set your Kubernetes context

Target the cluster where you want to install the Crusoe Watch Agent:

crusoe kubernetes clusters get-credentials <cluster-name> --project-id <project-id>
kubectl config current-context

Step 2: Install the agent via Helm

Install the agent:

helm repo add crusoe-watch-agent https://crusoecloud.github.io/crusoe-watch-agent/k8s/helm-charts
helm repo update
helm install crusoe-watch-agent crusoe-watch-agent/crusoe-watch-agent --namespace crusoe-system

To upgrade an existing installation to the latest version:

helm search repo crusoe-watch-agent/crusoe-watch-agent --versions | head -n 2 #check latest agent version

helm repo update
helm upgrade crusoe-watch-agent crusoe-watch-agent/crusoe-watch-agent --namespace crusoe-system

Step 3: Verify the agent is running

kubectl get pods -n crusoe-system

Confirm that a Crusoe Watch Agent pod is running for every node in your cluster. If you have an Nvidia GPU-accelerated cluster, you will also see a pod with prefix crusoe-log-collector which is used only to collect diagnostic reports when triggered by you in the Console.

Disabling or Customizing the Agent

If you prefer not to use the Crusoe Watch Agent, you can uninstall it or customize its behavior.

To uninstall the agent:

helm uninstall crusoe-watch-agent -n crusoe-system

To customize the agent to collect only specific telemetry:

Configure the agent to collect only metrics or only logs by updating the Helm values. Create a values.yaml file:

# Collect only metrics (disable logs)
metrics:
  enabled: true
logs:
  enabled: false

To collect only logs:

# Collect only logs (disable metrics)
metrics:
  enabled: false
logs:
  enabled: true

Then upgrade the agent with your custom configuration:

helm upgrade crusoe-watch-agent crusoe-watch-agent/crusoe-watch-agent --namespace crusoe-system -f values.yaml

Integration with Crusoe Services

Command Center integrates with AutoClusters for automated hardware failure detection and node replacement. Remediation events appear in Notification Center.

What's Next

Topology — Monitor cluster health and utilization in a topology-aware view
Metrics — Configure and query infrastructure and custom metrics
Logs — Search and filter Kubernetes and system logs
Telemetry Relay — Export metrics to external platforms
Notification — Get notified about resource health via email and in-console, and set up alert routing to Slack or webhooks

Command Center

Why Command Center​

Key Capabilities​

Components​

Prerequisites​

Getting Started​

Installing the Crusoe Watch Agent​

Step 1: Set your Kubernetes context​

Step 2: Install the agent via Helm​

Step 3: Verify the agent is running​

Disabling or Customizing the Agent​

Integration with Crusoe Services​

What's Next​