Running Llama 405B on VLLM with Slurm and Multiple Nodes

2024-09-14

Llama 405b is a large model which requires lots of memory.

Quantization Method	Weight Memory	# 80GB A100 GPUs
FP16	810GB	10
INT8/FP8	405GB	6
INT4	202GB	3

I have access to a 4-node SLURM cluster each with 4 A100 80GB GPUs each.

So how do we get these to work together?

Read more...

Some Null Hypersurface Visualizations

2021-01-24

Read more...

Computer Vision: Fun with Filters and Frequencies

2020-09-26

Read more...

Markov's Inequality Visually Explained

2020-08-28

Read more...

Understanding Modspaces Visually

2019-02-15

Read more...

Linear Algebra for Graph Algorithms and Massively Parallel Machines

2019-01-30

Read more...