Dynamic switching between GPU configurations based on workload needs.
Overview#
Rather than running a single fixed configuration, I dynamically switch between GPU configurations. A Python service manages the transitions via HTTP API.
Available Configurations#
| Config | Model | Use Case |
|---|---|---|
big-chat |
Llama 3.3 70B (TP=2) | Maximum quality chat |
coder |
Qwen3 32B (TP=2) | Coding tasks with thinking mode |
dual-chat |
Qwen3 8B + 14B | Two models, different tasks |
image-chat |
Qwen 7B + ComfyUI | Text + image generation |
Configuration files are stored as compose YAML files.
Configuration Switching#
The gpu-state-service.py manages switching via HTTP API:
# Check current state
curl http://localhost:9100/status
# Switch to a different configuration
curl -X POST -H "Content-Type: application/json" \
-d '{"config":"big-chat"}' \
http://localhost:9100/switchSwitch Process#
- Drain - Wait for in-flight requests to complete
- Stop - Tear down current containers
- Start - Launch new configuration via nerdctl compose
- Ready - Wait for vLLM health check to pass
State Machine#
┌─────────┐
│ stopped │
└────┬────┘
│ switch
▼
┌─────────┐
│switching│
└────┬────┘
│ containers started
▼
┌─────────┐
│ loading │──────┐
└────┬────┘ │ timeout/error
│ health ok │
▼ ▼
┌─────────┐ ┌──────┐
│ ready │ │failed│
└────┬────┘ └──────┘
│ drain
▼
┌─────────┐
│draining │
└────┬────┘
│ requests complete
▼
┌─────────┐
│switching│
└─────────┘API Endpoints#
| Method | Endpoint | Description |
|---|---|---|
| GET | / |
Current status (JSON) |
| GET | /status |
Current status (JSON) |
| GET | /metrics |
Prometheus metrics |
| POST | /drain |
Begin draining requests |
| POST | /switch |
Switch configuration (requires {"config": "name"}) |
| POST | /stop |
Stop all GPU services |
Creating New Configurations#
- Create a new YAML file in the configs directory:
# Configuration: My Custom Config
# Use case: Description of what this is for
services:
vllm:
image: nalanzeyu/vllm-gfx906:v0.11.2-rocm6.3
container_name: vllm
# ... device mappings, environment, command
embeddings:
image: michaelf34/infinity:latest
# ... embeddings config-
The configuration is automatically available via the switch API.
-
Test with:
curl -X POST -d '{"config":"my-config"}' http://localhost:9100/switch