Managed Inference Metrics
Metrics are provided out of the box for models served through our Managed Inference service. These metrics are updated every minute, and are available on the metrics page. Metrics can also be integrated with Grafana dashboards via a Prometheus-compatible query API. Instructions to use the query API are provided below.
Available Metrics
| Metrics | Definition | Metric Query |
|---|---|---|
| Request Rate | The rate of API requests served by the model | sum by (model_name) ( rate( crusoe_inference_request_count{project_id="{project_id}", model_name="{model_name}"}[300s] ) ) |
| Input Token Rate | Rate of input tokens processed by the model within a given timestep | sum by (model_name) ( rate( crusoe_inference_input_token_count{project_id="{project_id}", model_name="{model_name}"}[300s] ) ) |
| Output Token Count | Total number of output tokens processed by the model over rolling 24 hour periods based on the query window | sum by (model_name) ( rate( crusoe_inference_output_token_count{project_id="{project_id}", model_name="{model_name}"}[300s] ) ) |
| Time to First Token (TTFT) | The time it takes for the model to generate the first token in response to a request over rolling 24 hour periods based on the query window | histogram_quantile( 0.5, sum by (model_name, le) ( irate( crusoe_inference_histogram_first_token_latency_bucket{project_id="{project_id}", model_name="{model_name}"}[300s] ) ) ) |
| Time per output token (TPOT) | The time between output tokens for requests over the given time period | histogram_quantile( 0.5, sum by (model_name, le) ( irate( crusoe_inference_histogram_output_token_latency_bucket{project_id="{project_id}", model_name="{model_name}"}[300s] ) ) ) |
Each metric has labels for service_tier which are used to segment the metrics. The service_tier can be specified by pt for provisioned throughput metrics.
Retrieving metrics using the PromQL API
You can directly query the metrics API endpoint to retrieve data for a single instant or a specific time range. The API endpoint is:
https://api.crusoecloud.com/v1alpha5/projects/<project-id>/metrics/timeseries
Your project ID can be retrieved by selecting the project name in the top left corner of the Crusoe Console and copying the project ID from the appropriate project.
Generating a monitoring token
To generate the monitoring token to query metrics, run the following CLI command using the Crusoe CLI:
crusoe monitoring tokens create
This command generates an API-Key that you'll use for authentication when querying the metrics API. Please use a secret or key management tool to store the token content. You will not be able to retrieve it later.
Querying metrics
Here is an example curl command to retrieve the most recent data point for Time to First Token (TTFT) in your project:
curl -G https://api.crusoecloud.com/v1alpha5/projects/<project-id>/metrics/timeseries\?query=\
crusoe_inference_first_token_latency \
-H 'Authorization: Bearer <API-Key>'
Importing data into Grafana
To import data into your own Grafana instance, add a Prometheus data source with the following options:
Prometheus Server URL: https://api.crusoecloud.com/v1alpha5/projects/<project-id>/metrics/timeseries
Authentication → HTTP Headers:
Header: Authorization
Value: Bearer <API-Key>
Use the API-Key generated in the 'Generate a monitoring token' section.