Running local AI inference on AMD Instinct MI60 GPUs - a budget-friendly path to 70B parameter models at home.

Overview#

The AMD Instinct MI60 is a datacenter GPU from 2018 that remains surprisingly capable for local AI workloads. With 32GB of HBM2 memory per card, a dual-MI60 setup provides 64GB of VRAM, enough to run 70B parameter models with quantization.

This section documents my setup: hardware modifications for cooling, software configuration for ROCm and vLLM, and the workflows that make it practical for daily use.

Hardware#

Spec Value
Memory 32GB HBM2 per GPU (64GB total with dual)
FP64 7.4 TFLOPS
FP32 10.6 TFLOPS
Interface PCIe Gen3
Architecture gfx906 (Vega 20)
TDP 300W per GPU

The MI60 is passive-cooled, designed for server chassis with high-velocity airflow. For desktop use, it requires a custom cooling solution. I use 3D-printed fan shrouds with 92mm fans - single-GPU STL and dual-GPU STL available on Thingiverse.

Software Stack#

  • OS: Ubuntu 22.04 / Linux Mint 21.x
  • Driver: ROCm 5.6 (newer versions dropped MI60 support)
  • Inference: vLLM with tensor parallelism
  • Image Gen: ComfyUI with ROCm 5.7
  • Containers: containerd with nerdctl

Setup Guides#

Core documentation for the MI60 setup:

Resources#

Related Posts