Using S3 Tools
Once you have an Object Storage API key and at least one bucket, you can use standard S3-compatible tools to manage objects. All S3 operations are performed directly from your Crusoe Cloud VMs.
S3 Endpoint
Use the following endpoint format for your location:
https://object.<location>.crusoecloudcompute.com
For example:
https://object.us-east1-a.crusoecloudcompute.com
Configuring s3cmd
s3cmd is a command-line tool for interacting with S3-compatible storage.
Installation
# Ubuntu/Debian
sudo apt-get install s3cmd
Configuration
Create or edit ~/.s3cfg with the following:
[default]
access_key = YOUR_ACCESS_KEY
secret_key = YOUR_SECRET_KEY
host_base = object.<location>.crusoecloudcompute.com
host_bucket = object.<location>.crusoecloudcompute.com
use_https = True
signature_v2 = False
Set host_bucket to the same value as host_base (without a %(bucket)s prefix) because Crusoe Object Storage uses path-style URLs only.
Alternatively, run s3cmd --configure and manually set the endpoint values.
Common Operations
# List all buckets
s3cmd ls
# List objects in a bucket
s3cmd ls s3://my-training-data
# Upload a file
s3cmd put model-checkpoint.tar s3://my-training-data/checkpoints/
# Upload a directory recursively
s3cmd put --recursive ./dataset/ s3://my-training-data/datasets/
# Download a file
s3cmd get s3://my-training-data/checkpoints/model-checkpoint.tar ./
# Download a directory recursively
s3cmd get --recursive s3://my-training-data/datasets/ ./local-datasets/
# Delete an object
s3cmd del s3://my-training-data/checkpoints/old-checkpoint.tar
# Get object info (metadata)
s3cmd info s3://my-training-data/checkpoints/model-checkpoint.tar
# Multipart upload (automatic for files > 15 MB)
# Adjust chunk size if needed:
s3cmd put --multipart-chunk-size-mb=64 large-dataset.tar s3://my-training-data/
# List active multipart uploads
s3cmd multipart s3://my-training-data
Configuring rclone
rclone is a versatile tool for managing files on cloud storage, and is particularly useful for syncing data and migrating objects from other cloud providers into Crusoe.
Installation
# Ubuntu/Debian
sudo apt-get install rclone
Configuration
Run rclone config and create a new remote, or manually add the following to ~/.config/rclone/rclone.conf:
[crusoe]
type = s3
provider = Other
access_key_id = YOUR_ACCESS_KEY
secret_access_key = YOUR_SECRET_KEY
endpoint = https://object.<location>.crusoecloudcompute.com
acl = private
force_path_style = true
The force_path_style = true setting is required because Crusoe Object Storage does not support virtual-hosted-style URLs.
Common Operations
# List all buckets
rclone lsd crusoe:
# List objects in a bucket
rclone ls crusoe:my-training-data
# Upload a file
rclone copy ./model-checkpoint.tar crusoe:my-training-data/checkpoints/
# Upload a directory
rclone copy ./dataset/ crusoe:my-training-data/datasets/
# Download a file
rclone copy crusoe:my-training-data/checkpoints/model-checkpoint.tar ./
# Sync a local directory to a bucket (mirror)
rclone sync ./dataset/ crusoe:my-training-data/datasets/
# Check data integrity
rclone check ./dataset/ crusoe:my-training-data/datasets/
# Get file info
rclone lsl crusoe:my-training-data/checkpoints/
Migrating Data from AWS S3
rclone can transfer data directly between cloud providers. To copy data from AWS S3 into Crusoe Object Storage:
- Configure an AWS S3 remote in rclone (named
awsin this example). - Run:
rclone copy aws:source-bucket/path/ crusoe:my-training-data/path/ \
--transfers 16 \
--checkers 8 \
--s3-upload-concurrency 4
Adjust --transfers and related flags based on your available bandwidth and the number of files.
Configuring boto3 (Python)
boto3 is the AWS SDK for Python, widely used in ML pipelines and data processing scripts.
Installation
sudo apt install python3-boto3
Configuration
import boto3
s3 = boto3.client(
"s3",
endpoint_url="https://object.<location>.crusoecloudcompute.com",
aws_access_key_id="YOUR_ACCESS_KEY",
aws_secret_access_key="YOUR_SECRET_KEY",
)
The region_name parameter is not required. If your S3 client requires one, you can set it to any placeholder value (e.g., us-east-1). The Crusoe S3 endpoint handles routing internally.
Common Operations
# List buckets
response = s3.list_buckets()
for bucket in response["Buckets"]:
print(bucket["Name"])
# List objects in a bucket
response = s3.list_objects_v2(Bucket="my-training-data")
for obj in response.get("Contents", []):
print(obj["Key"], obj["Size"])
# Upload a file
s3.upload_file(
"model-checkpoint.tar",
"my-training-data",
"checkpoints/model-checkpoint.tar",
)
# Upload with multipart (automatic for large files)
from boto3.s3.transfer import TransferConfig
config = TransferConfig(
multipart_threshold=64 * 1024 * 1024, # 64 MB
multipart_chunksize=64 * 1024 * 1024,
max_concurrency=10,
)
s3.upload_file(
"large-dataset.tar",
"my-training-data",
"datasets/large-dataset.tar",
Config=config,
)
# Download a file
s3.download_file(
"my-training-data",
"checkpoints/model-checkpoint.tar",
"./model-checkpoint.tar",
)
# Delete an object
s3.delete_object(
Bucket="my-training-data",
Key="checkpoints/old-checkpoint.tar",
)
# Get object metadata
response = s3.head_object(
Bucket="my-training-data",
Key="checkpoints/model-checkpoint.tar",
)
print(f"Size: {response['ContentLength']}, Last Modified: {response['LastModified']}")
# Copy an object within the same bucket
s3.copy_object(
Bucket="my-training-data",
Key="checkpoints/model-checkpoint-backup.tar",
CopySource="my-training-data/checkpoints/model-checkpoint.tar",
)
Using boto3 with a Session
For scripts that interact with multiple buckets or need credential management:
import boto3
session = boto3.Session(
aws_access_key_id="YOUR_ACCESS_KEY",
aws_secret_access_key="YOUR_SECRET_KEY",
)
s3 = session.resource(
"s3",
endpoint_url="https://object.<location>.crusoecloudcompute.com",
)
# Upload using the resource interface
bucket = s3.Bucket("my-training-data")
bucket.upload_file("local-file.bin", "remote-key/local-file.bin")
# Iterate all objects
for obj in bucket.objects.all():
print(obj.key, obj.size)
Benchmarking Object Storage
You can use standard S3 benchmarking tools to measure performance from within your Crusoe Cloud VMs. Below is an example using Warp, a purpose-built S3 benchmarking tool.
Install Warp
#for linux x86_64
wget https://dl.min.io/aistor/warp/release/linux-amd64/archive/warp.v1.4.0
#for linux arm64
wget https://dl.min.io/aistor/warp/release/linux-arm64/archive/warp.v1.4.0
chmod +x warp.v1.4.0
Run a Write Benchmark
./warp put \
--host object.<location>.crusoecloudcompute.com \
--access-key YOUR_ACCESS_KEY \
--secret-key YOUR_SECRET_KEY \
--tls \
--bucket bench-bucket \
--obj.size 256MiB \
--concurrent 16 \
--duration 60s
Run a Read Benchmark
./warp get \
--host object.<location>.crusoecloudcompute.com \
--access-key YOUR_ACCESS_KEY \
--secret-key YOUR_SECRET_KEY \
--tls \
--bucket bench-bucket \
--obj.size 256MiB \
--concurrent 16 \
--duration 60s
Run a Mixed Workload Benchmark
./warp mixed \
--host object.<location>.crusoecloudcompute.com \
--access-key YOUR_ACCESS_KEY \
--secret-key YOUR_SECRET_KEY \
--tls \
--bucket bench-bucket \
--obj.size 64MiB \
--get-distrib 80 \
--put-distrib 15 \
--delete-distrib 5 \
--concurrent 32 \
--duration 120s
Create a dedicated bucket for benchmarking to avoid impacting production data.