VRAMify
Calculate VRAM requirements for LLM inference
Model Parameters
Model Preset
Custom
Llama 2 7B
Llama 2 13B
Llama 2 70B
Llama 3 8B
Llama 3 70B
Llama 3.1 8B
Llama 3.1 70B
Llama 3.1 405B
Mistral 7B
Mixtral 8x7B (MoE)
Mixtral 8x22B (MoE)
Qwen 2 0.5B
Qwen 2 1.5B
Qwen 2 7B
Qwen 2 72B
Phi-2 2.7B
Phi-3 Mini 3.8B
Phi-3 Medium 14B
Gemma 2B
Gemma 7B
Gemma 2 9B
Gemma 2 27B
Parameters (Billions)
Precision
FP32 (4 bytes)
FP16 / BF16 (2 bytes)
INT8 (1 byte)
INT4 (0.5 bytes)
Number of Layers
Hidden Dimension
Inference Settings
Sequence Length
Context window size
Batch Size
KV Cache Precision
FP32 (4 bytes)
FP16 / BF16 (2 bytes)
INT8 (1 byte)
VRAM Requirements
Model Weights
14.00
GB
KV Cache
0.50
GB
Total VRAM
14.50
GB
Fits on RTX 4090 (24GB)
📐 Formulae Used
Model Weights:
params × bytes_per_param
KV Cache:
2 × layers × hidden_dim × seq_len × batch_size × bytes_per_element