Running local AI inference on AMD Instinct MI60 GPUs - a budget-friendly path to 70B parameter models at home.
Overview#
The AMD Instinct MI60 is a datacenter GPU from 2018 that remains surprisingly capable for local AI workloads. With 32GB of HBM2 memory per card, a dual-MI60 setup provides 64GB of VRAM, enough to run 70B parameter models with quantization.
This section documents my setup: hardware modifications for cooling, software configuration for ROCm and vLLM, and the workflows that make it practical for daily use.
Hardware#
| Spec | Value |
|---|---|
| Memory | 32GB HBM2 per GPU (64GB total with dual) |
| FP64 | 7.4 TFLOPS |
| FP32 | 10.6 TFLOPS |
| Interface | PCIe Gen3 |
| Architecture | gfx906 (Vega 20) |
| TDP | 300W per GPU |
The MI60 is passive-cooled, designed for server chassis with high-velocity airflow. For desktop use, it requires a custom cooling solution. I use 3D-printed fan shrouds with 92mm fans - single-GPU STL and dual-GPU STL available on Thingiverse.
Software Stack#
- OS: Ubuntu 22.04 / Linux Mint 21.x
- Driver: ROCm 5.6 (newer versions dropped MI60 support)
- Inference: vLLM with tensor parallelism
- Image Gen: ComfyUI with ROCm 5.7
- Containers: containerd with nerdctl
Setup Guides#
Core documentation for the MI60 setup:
- MI60 Hardware Setup - Physical setup, cooling, fan control
- vLLM Inference on MI60 - Production inference with tensor parallelism
- ComfyUI on MI60 - Stable Diffusion with ROCm
- GPU Configuration Management - Dynamic switching between workloads
- GPU Metrics and Monitoring - Prometheus, Grafana, temperature alerts
Resources#
- Scripts and code: github.com/dcruver/MI60
- Fan shroud STLs: Single GPU | Dual GPU