GPU Cluster¶

The GPU cluster is optimized for workloads requiring GPU acceleration, including machine learning, molecular dynamics simulations, and large-scale data analysis. The cluster supports CUDA, TensorFlow, PyTorch, and other GPU-accelerated frameworks. Users who do not require GPU resources are strongly encouraged to leverage the MPI or SMP clusters instead.

Specifications¶

Partition	Nodes	GPU	VRAM	GPU/Node	--constraint	CPU	Cores/Node	Mem/Node	Scratch	Network	Node Names
a100	10	NVIDIA A100-PCIE-40GB	40 GB	4	a100,40g,amd	Amd Epyc 7742	64	512 GB	1.92 TB NVMe	HDR200 IB	gpu-n[35-44]
a100	2	NVIDIA A100-PCIE-40GB	40 GB	4	a100,40g,intel	Intel Xeon Gold 5220R	48	384 GB	960 GB NVMe	10GbE	gpu-n[33-34]
a100_multi	10	NVIDIA A100-PCIE-40GB	40 GB	4	a100,40g,amd	Amd Epyc 7742	64	512 GB	1.92 TB NVMe	HDR200 IB	gpu-n[45-54]
a100_nvlink	3	NVIDIA A100-SXM4-40GB	40 GB	8	a100,40g,amd	Amd Epyc 7742	128	1 TB	12 TB NVMe	HDR200 IB	gpu-n[28-30]
a100_nvlink	2	NVIDIA A100-SXM4-80GB	80 GB	8	a100,80g,amd	Amd Epyc 7742	128	1 TB	1.92 TB NVMe	HDR200 IB	gpu-n[31-32]
l40s	19	NVIDIA L40S	48 GB	4	l40s,48g,intel	Intel Xeon Platinum 8462Y+	64	512 GB	7.2 TB NVMe	10GbE	gpu-n[55-73]
rtx6k	9	NVIDIA RTX PRO 6000 Blackwell Server Edition	96 GB	8	rtx6k,96g,amd	Amd Epyc 9555	128	1.5 TB	7.2 TB NVMe	HDR200 IB	gpu-n[74-82]
h200	2	NVIDIA H200	141 GB	8	h200,141g,intel	Intel Xeon Platinum 8592+	128	3 TB	7.2 TB NVMe	HDR200 IB	gpu-n[89-90]

Partition Details¶

l40s: This partition is appropriate for AI, simulations, 3D modeling workloads that require up to 4x gpus on a single node and rely on single or mixed precision operations (Note: This partition does not support double precision - FP64).

a100: This is the default partition in the gpu cluster and is appropriate for workflows that require up to 4x gpus on a single node. To request a particular feature (such as an Intel host CPU), add the following directive to your job script:

#SBATCH --constraint=intel

Multiple features can be specified in a comma-separated string.

a100_multi: This partition supports multi-node GPU workflows. Your job must request a minimum of 2 nodes and 4 GPUs on each node.

a100_nvlink: This partition supports multi-GPU computation on an Nvidia HGX platform with 8x A100 that are tightly coupled through an NVLink switch. To request a particular feature (such as an A100 with 80GB of GPU memory), add the the following directive to your job script:

#SBATCH --constraint=80g