GPU Configuration Management

Dynamic switching between GPU configurations based on workload needs.

Overview#

Rather than running a single fixed configuration, I dynamically switch between GPU configurations. A Python service manages the transitions via HTTP API.

Available Configurations#

Config	Model	Use Case
`big-chat`	Llama 3.3 70B (TP=2)	Maximum quality chat
`coder`	Qwen3 32B (TP=2)	Coding tasks with thinking mode
`dual-chat`	Qwen3 8B + 14B	Two models, different tasks
`image-chat`	Qwen 7B + ComfyUI	Text + image generation

Configuration files are stored as compose YAML files.

Configuration Switching#

The gpu-state-service.py manages switching via HTTP API:

# Check current state
curl http://localhost:9100/status

# Switch to a different configuration
curl -X POST -H "Content-Type: application/json" \
  -d '{"config":"big-chat"}' \
  http://localhost:9100/switch

Switch Process#

Drain - Wait for in-flight requests to complete
Stop - Tear down current containers
Start - Launch new configuration via nerdctl compose
Ready - Wait for vLLM health check to pass

State Machine#

         ┌─────────┐
         │ stopped │
         └────┬────┘
              │ switch
              ▼
         ┌─────────┐
         │switching│
         └────┬────┘
              │ containers started
              ▼
         ┌─────────┐
         │ loading │──────┐
         └────┬────┘      │ timeout/error
              │ health ok │
              ▼           ▼
         ┌─────────┐  ┌──────┐
         │  ready  │  │failed│
         └────┬────┘  └──────┘
              │ drain
              ▼
         ┌─────────┐
         │draining │
         └────┬────┘
              │ requests complete
              ▼
         ┌─────────┐
         │switching│
         └─────────┘

API Endpoints#

Method	Endpoint	Description
GET	`/`	Current status (JSON)
GET	`/status`	Current status (JSON)
GET	`/metrics`	Prometheus metrics
POST	`/drain`	Begin draining requests
POST	`/switch`	Switch configuration (requires `{"config": "name"}`)
POST	`/stop`	Stop all GPU services

Creating New Configurations#

Create a new YAML file in the configs directory:

# Configuration: My Custom Config
# Use case: Description of what this is for

services:
  vllm:
    image: nalanzeyu/vllm-gfx906:v0.11.2-rocm6.3
    container_name: vllm
    # ... device mappings, environment, command

  embeddings:
    image: michaelf34/infinity:latest
    # ... embeddings config

The configuration is automatically available via the switch API.
Test with:

curl -X POST -d '{"config":"my-config"}' http://localhost:9100/switch