Logs
System logs are collected automatically and available in the Console. No SSH or manual log aggregation required. The Crusoe Watch Agent collects logs, which you can search, filter, and inspect directly in the Console.
Managed logs is available for both Crusoe Managed Kubernetes (CMK) clusters and Crusoe Virtual Machines (VMs).
Managed Logs is in production-ready preview. Please reach out to Crusoe Cloud Support to learn more.
Prerequisites
To use Logs, you need:
For CMK clusters:
- CMK cluster with Crusoe Watch Agent version 0.3.1 or above installed (see Installing the Crusoe Watch Agent)
- NVIDIA GPU Operator add-on (if using GPU nodes)
For VMs:
- Crusoe Watch Agent version vm-v1.0.3 or above installed (see Virtual Machines Metrics)
Log Sources
The Crusoe Watch Agent collects the following log sources:
| Log Source | Description | Availability |
|---|---|---|
| JournalD | System-level logs from journald — kernel messages (GPU XID errors, OOM events) and system services. CMK nodes also include kubelet and container runtime. | CMK and VM |
| crusoe-watch-agent | Crusoe Watch Agent service logs | CMK and VM |
| cwa-config-reloader | Crusoe Watch Agent config reloader logs | VM only |
Accessing Logs Using Console UI
You can access logs in Console UI in two ways:
- Managed Logs page — Navigate to Managed Logs in the left navigation bar to search logs across all your CMK clusters and VMs in a unified view.
- Resource-specific view — Navigate to Orchestration > select your cluster > Logs tab.
Searching and Filtering
You can use the following filters to narrow your log search:
| Filter | Description |
|---|---|
| Instance name | Filter logs by specific node or VM name |
| Log source | Filter by log source (see Log Sources) |
| Severity | Filter by log severity level (see severity levels below) |
| Time window | Specify a start and end time to narrow results |
| Text search | Search log content using basic text matching |
Combine multiple filters to narrow results. For example, search for XID errors in JournalD logs from a specific node within the last 24 hours.
Log Severity Levels
Logs are normalized to the 8-tier RFC 5424 severity taxonomy:
| Level | Severity | Description |
|---|---|---|
| 0 | Emergency | System is unusable |
| 1 | Alert | Action must be taken immediately |
| 2 | Critical | Critical condition; application cannot continue |
| 3 | Error | Error handled, service continues |
| 4 | Warning | Unexpected situation, but handled gracefully |
| 5 | Notice | Normal but significant condition |
| 6 | Info | Normal operational events (startup, shutdown, config changes) |
| 7 | Debug | Detailed diagnostic information |
| — | Undefined | Log entry has no severity field |
Querying Logs via API
Queries use LogsQL, VictoriaLogs' query language.
Authentication
Use the same monitoring token generated for metrics access (see Virtual Machines Metrics or CMK Metrics). Pass it as a bearer token:
Authorization: Bearer $monitoring_token
Conventions
- Time formats accepted by
start,end,start_time,end_time,time,step, andoffset: Unix epoch seconds, relative durations (5m,1h,6h), RFC3339 (2026-05-10T12:00:00Z), or the literalnow. - Default time window: Defaults to the last 15 minutes (
now-15mtonow). - Retention boundary: a
start_timeolder than the 7-day retention window returns400. - Unknown query parameters return
400with the list of accepted names. - LogsQL queries (
queryparameter) are limited to 4096 characters and 10 pipe operations. - Repeatable parameters (e.g.
levels,instance_names,cluster_id) accept multiple occurrences:?levels=ERROR&levels=WARNING. - NDJSON responses contain one JSON object per line; JSON responses are a single object.
Endpoints
| Endpoint | Purpose | Response |
|---|---|---|
GET /logs | Structured log search using typed filters (no raw LogsQL needed) | NDJSON |
GET /logs/query | Run a raw LogsQL query and stream matching log entries | NDJSON |
GET /logs/tail | Live-stream new log entries matching a LogsQL query (up to 10 min) | NDJSON (stream) |
GET /logs/count | Project-scoped total count | NDJSON |
GET /logs/histogram | Project-scoped time-bucketed counts | NDJSON |
GET /logs/facets | Project-scoped facet counts | NDJSON |
GET /logs/fields | List field names present in matching logs, with hit counts | JSON |
GET /logs/field_values | List distinct values of one field, with hit counts | JSON |
GET /logs/streams | List log streams matching a LogsQL query | JSON |
GET /logs/stats | Point-in-time stats query (query must contain a stats pipe) | JSON |
GET /logs/stats_range | Range stats query over time (query must contain a stats pipe) | JSON |
GET /logs
Structured log search built from typed filters; no raw LogsQL required.
| Parameter | Type | Required | Default | Notes |
|---|---|---|---|---|
search_query | string | no | — | Free-text message filter |
cluster_id | string | no | — | Single cluster |
levels | string (repeatable) | no | — | e.g. INFO, ERROR |
instance_names | string (repeatable) | no | — | VM/node names |
sources | string (repeatable) | no | — | Log source identifiers |
start_time | string (time) | no | — | |
end_time | string (time) | no | — | |
limit | integer | no | 100 | Capped at 5000 |
offset | integer | no | 0 | Pagination offset |
sort | string | no | — | Sort order |
Example — fetch the most recent 50 ERROR-level logs for a cluster:
curl -G "https://api.crusoecloud.com/v1/projects/$project_id/logs" \
-H "Authorization: Bearer $monitoring_token" \
--data-urlencode "cluster_id=$cluster_id" \
--data-urlencode "levels=ERROR" \
--data-urlencode "limit=50" \
--data-urlencode "sort=desc"
GET /logs/query
Run a raw LogsQL query. If the query has no _time: filter, the time bounds from start/end are injected automatically.
| Parameter | Type | Required | Default | Notes |
|---|---|---|---|---|
query | string (LogsQL) | yes | — | Validated |
start | string (time) | no | now-15m | |
end | string (time) | no | now | |
limit | integer | no | 5000 | Must be > 0; values above 5000 are capped to 5000 |
Example — query logs for a specific VM:
curl -G "https://api.crusoecloud.com/v1/projects/$project_id/logs/query" \
-H "Authorization: Bearer $monitoring_token" \
--data-urlencode "query=crusoe_vm_id:$vm_id"
Example — limit results to 10 entries:
curl -G "https://api.crusoecloud.com/v1/projects/$project_id/logs/query" \
-H "Authorization: Bearer $monitoring_token" \
--data-urlencode "query=crusoe_vm_id:$vm_id" \
--data-urlencode "limit=10"
Example — search for error logs in a specific VM:
curl -G "https://api.crusoecloud.com/v1/projects/$project_id/logs/query" \
-H "Authorization: Bearer $monitoring_token" \
--data-urlencode "query=crusoe_vm_id:$vm_id AND error"
GET /logs/tail
Open a long-lived stream of new log entries matching a LogsQL query. The connection is automatically closed after 10 minutes.
| Parameter | Type | Required | Notes |
|---|---|---|---|
query | string (LogsQL) | yes | Only query is accepted; other parameters return 400 |
Example — live-tail logs for a VM:
curl -N -G "https://api.crusoecloud.com/v1/projects/$project_id/logs/tail" \
-H "Authorization: Bearer $monitoring_token" \
--data-urlencode "query=crusoe_vm_id:$vm_id"
GET /logs/count
Project-scoped total count.
| Parameter | Type | Required |
|---|---|---|
cluster_id | string (repeatable) | no |
vm_id | string (repeatable) | no |
search_query | string | no |
levels | string (repeatable) | no |
instance_names | string (repeatable) | no |
sources | string (repeatable) | no |
start_time | string (time) | no |
end_time | string (time) | no |
Example — count error logs in the last hour:
curl -G "https://api.crusoecloud.com/v1/projects/$project_id/logs/count" \
-H "Authorization: Bearer $monitoring_token" \
--data-urlencode "levels=ERROR" \
--data-urlencode "start_time=now-1h"
GET /logs/histogram
Project-scoped histogram across multiple clusters and VMs.
| Parameter | Type | Required | Notes |
|---|---|---|---|
interval | string (duration) | yes | Bucket size, e.g. 1m, 5m, 1h |
cluster_id | string (repeatable) | no | |
vm_id | string (repeatable) | no | |
search_query | string | no | |
levels | string (repeatable) | no | |
instance_names | string (repeatable) | no | |
sources | string (repeatable) | no | |
start_time | string (time) | no | |
end_time | string (time) | no | |
group_by | string | no | Metadata key to group series by |
Example — hourly log volume over the last 24 hours, broken out by severity:
curl -G "https://api.crusoecloud.com/v1/projects/$project_id/logs/histogram" \
-H "Authorization: Bearer $monitoring_token" \
--data-urlencode "interval=1h" \
--data-urlencode "start_time=now-24h" \
--data-urlencode "group_by=level"
GET /logs/facets
Project-scoped counts grouped by a metadata field. Supports multiple clusters and VMs.
| Parameter | Type | Required | Notes |
|---|---|---|---|
field | string (enum) | yes | One of: instance_names, log_sources, levels |
cluster_id | string (repeatable) | no | |
vm_id | string (repeatable) | no | |
search_query | string | no | |
levels | string (repeatable) | no | |
instance_names | string (repeatable) | no | |
sources | string (repeatable) | no | |
start_time | string (time) | no | |
end_time | string (time) | no |
Example — top instance names producing errors in the last 24 hours:
curl -G "https://api.crusoecloud.com/v1/projects/$project_id/logs/facets" \
-H "Authorization: Bearer $monitoring_token" \
--data-urlencode "field=instance_names" \
--data-urlencode "levels=ERROR" \
--data-urlencode "start_time=now-24h"
GET /logs/fields
List the field names present in logs matching the query, with hit counts.
| Parameter | Type | Required | Default |
|---|---|---|---|
query | string (LogsQL) | yes | — |
start | string (time) | no | now-15m |
end | string (time) | no | now |
Response:
{
"values": [
{ "value": "_msg", "hits": 1234 },
{ "value": "level", "hits": 1230 }
]
}
Example — list fields available in JournalD logs:
curl -G "https://api.crusoecloud.com/v1/projects/$project_id/logs/fields" \
-H "Authorization: Bearer $monitoring_token" \
--data-urlencode "query=log_source:journald"
GET /logs/field_values
List distinct values of a single field, with hit counts.
| Parameter | Type | Required | Default | Notes |
|---|---|---|---|---|
field | string | yes | — | Internal/forbidden fields return 400 |
query | string (LogsQL) | yes | — | |
start | string (time) | no | now-15m | |
end | string (time) | no | now | |
limit | integer | no | 100 | Must be ≥ 1; values above 1000 are capped to 1000 |
Response:
{
"values": [
{ "value": "INFO", "hits": 8123 },
{ "value": "ERROR", "hits": 142 }
]
}
Example — list the distinct severity levels seen in the last hour:
curl -G "https://api.crusoecloud.com/v1/projects/$project_id/logs/field_values" \
-H "Authorization: Bearer $monitoring_token" \
--data-urlencode "field=level" \
--data-urlencode "query=*" \
--data-urlencode "start=now-1h"
GET /logs/streams
List log streams (label-set identifiers) matching a LogsQL query.
| Parameter | Type | Required | Default | Notes |
|---|---|---|---|---|
query | string (LogsQL) | yes | — | |
start | string (time) | no | now-15m | |
end | string (time) | no | now | |
limit | integer | no | 100 | Must be ≥ 1; values above 1000 are capped to 1000 |
Example — list streams emitting JournalD logs in the last hour:
curl -G "https://api.crusoecloud.com/v1/projects/$project_id/logs/streams" \
-H "Authorization: Bearer $monitoring_token" \
--data-urlencode "query=log_source:journald" \
--data-urlencode "start=now-1h"
GET /logs/stats
Run a point-in-time LogsQL stats aggregation, e.g. * | stats count().
| Parameter | Type | Required | Notes |
|---|---|---|---|
query | string (LogsQL) | yes | Must contain a stats pipe, otherwise 400 |
time | string (time) | no | Point-in-time evaluation timestamp |
Example — total error count grouped by severity right now:
curl -G "https://api.crusoecloud.com/v1/projects/$project_id/logs/stats" \
-H "Authorization: Bearer $monitoring_token" \
--data-urlencode "query=* | stats by (level) count() AS total"
GET /logs/stats_range
Run a LogsQL stats aggregation over a time range with stepping.
| Parameter | Type | Required | Notes |
|---|---|---|---|
query | string (LogsQL) | yes | Must contain a stats pipe, otherwise 400 |
start | string (time) | no | |
end | string (time) | no | |
step | string (duration) | no | Bucket size, e.g. 5m, 1h |
offset | string (duration) | no | Time offset, e.g. 2h, 5h |
Example — error rate per 5-minute bucket over the last 6 hours:
curl -G "https://api.crusoecloud.com/v1/projects/$project_id/logs/stats_range" \
-H "Authorization: Bearer $monitoring_token" \
--data-urlencode "query=level:ERROR | stats count() AS errors" \
--data-urlencode "start=now-6h" \
--data-urlencode "step=5m"
Log Retention
Logs are retained for 7 days and automatically purged after 7 days.
Common Troubleshooting Workflows
Diagnosing Storage Mount Issues
- Navigate to Logs and filter by node instance name.
- Set the log source to JournalD and search for Kubelet entries.
- Search for mount errors:
MountVolume,nfs. - Check for filesystem errors, RAID issues, or NFS connectivity problems.
What's Next
- Topology — Identify unhealthy nodes and run diagnostics
- Metrics — Correlate log events with performance data
- Notifications — Get notified about resource health via email and in-console, and set up alert routing to Slack or webhooks