Getting Started with Managed Inference
Crusoe's Managed Inference Service provides OpenAI compatible endpoints for a number of popular open source models. The models are hosted on Crusoe's inference engine with MemoryAlloy, a proprietary cluster-wide memory fabric with cache-aware routing that improves TTFT and throughput.
The instructions below provide instructions to start querying models via the OpenAI SDK. All models are accessible via the api.crusoe.ai path.
Retrieving your Inference API token
You can retrieve your Inference API token via the Crusoe Cloud console by following the steps below.
- UI
- Visit the Intelligence Foundry on the [Crusoe Cloud console](https://console.crusoecloud.com/foundry/models)
- From either the models or chat tab, use the "Get API Key" button to generate API key
- Provide an optional alias and expiration date
- Click "Create" to view and save your API key
Querying Text models
After retrieving an API key from the Intelligence Foundry, you can use the OpenAI SDK to make requests. The example below uses the meta-llama/Llama-3.3-70B-Instruct model.
import os
from openai import OpenAI
CRUSOE_API_KEY = os.getenv("CRUSOE_API_KEY")
client = OpenAI(
api_key=CRUSOE_API_KEY,
base_url="https://api.crusoe.ai/v1",
)
completion = client.chat.completions.create(
model="meta-llama/Llama-3.3-70B-Instruct",
messages=[
{"role": "system", "content": "You are a helpful, concise assistant."},
{"role": "user", "content": "Who is Robinson Crusoe?"},
],
)
print(completion.choices[0].message.content)