GPU/AI

Running local AI inference on AMD Instinct MI60 GPUs - a budget-friendly path to 70B parameter models at home.

Overview#

The AMD Instinct MI60 is a datacenter GPU from 2018 that remains surprisingly capable for local AI workloads. With 32GB of HBM2 memory per card, a dual-MI60 setup provides 64GB of VRAM, enough to run 70B parameter models with quantization.

This section documents my setup: hardware modifications for cooling, software configuration for ROCm and vLLM, and the workflows that make it practical for daily use.

Hardware#

Spec	Value
Memory	32GB HBM2 per GPU (64GB total with dual)
FP64	7.4 TFLOPS
FP32	10.6 TFLOPS
Interface	PCIe Gen3
Architecture	gfx906 (Vega 20)
TDP	300W per GPU

The MI60 is passive-cooled, designed for server chassis with high-velocity airflow. For desktop use, it requires a custom cooling solution. I use 3D-printed fan shrouds with 92mm fans - single-GPU STL and dual-GPU STL available on Thingiverse.

Software Stack#

OS: Ubuntu 22.04 / Linux Mint 21.x
Driver: ROCm 5.6 (newer versions dropped MI60 support)
Inference: vLLM with tensor parallelism
Image Gen: ComfyUI with ROCm 5.7
Containers: containerd with nerdctl

Setup Guides#

Core documentation for the MI60 setup:

MI60 Hardware Setup - Physical setup, cooling, fan control
vLLM Inference on MI60 - Production inference with tensor parallelism
ComfyUI on MI60 - Stable Diffusion with ROCm
GPU Configuration Management - Dynamic switching between workloads
GPU Metrics and Monitoring - Prometheus, Grafana, temperature alerts

Resources#

Scripts and code: github.com/dcruver/MI60
Fan shroud STLs: Single GPU | Dual GPU

2026-03-02 I Pushed Local LLMs Harder. Here's What Two Models Actually Did.
2026-02-16 I Ran Claude Code on Local LLMs for a Month. Here's What Worked for Me.
2026-01-20 Running Claude Code with Local LLMs via vLLM and LiteLLM
2026-01-17 An Affordable AI Server

Overview#

Hardware#

Software Stack#

Setup Guides#

Resources#

Related Posts