GPU Cluster Overview¶
The GPU cluster is optimized for computational tasks requiring GPU acceleration, such as artificial intelligence and machine learning workflows, molecular dynamics simulations, and large-scale data analysis.
Key Features¶
- Designed for high-performance GPU workloads.
- Supports CUDA, TensorFlow, PyTorch, and other GPU-accelerated frameworks.
Specifications¶
Partition Name | Node Count | GPU Type | GPU/Node | --constraint | Host Architecture | Core/Node | Max Core/GPU | Mem/Node | Mem/Core | Scratch | Network | Node Names |
---|---|---|---|---|---|---|---|---|---|---|---|---|
l40s |
20 | L40S 48GB | 4 | l40s,48g,intel |
Intel Xeon Platinum 8462Y+ | 64 | 16 | 512 GB | 8 GB | 7 TB NVMe | 10GbE | gpu-n[55-74] |
a100 |
10 | A100 40GB PCIe | 4 | a100,40g,amd |
AMD EPYC 7742 (Rome) | 64 | 16 | 512 GB | 8 GB | 2 TB NVMe | HDR200; 10GbE | gpu-n[35-44] |
2 | A100 40GB PCIe | 4 | a100,40g,intel |
Intel Xeon Gold 5220R (Cascade Lake) | 48 | 12 | 384 GB | 8 GB | 1 TB NVMe | 10GbE | gpu-n[33-34] | |
a100_multi |
10 | A100 40GB PCIe | 4 | a100,40g,amd |
AMD EPYC 7742 (Rome) | 64 | 16 | 512 GB | 8 GB | 2 TB NVMe | HDR200; 10GbE | gpu-n[45-54] |
a100_nvlink |
2 | A100 80GB SXM | 8 | a100,80g,amd |
AMD EPYC 7742 (Rome) | 128 | 16 | 1 TB | 8 GB | 2 TB NVMe | HDR200; 10GbE | gpu-n[31-32] |
3 | A100 40GB SXM | 8 | a100,40g,amd |
AMD EPYC 7742 (Rome) | 128 | 16 | 1 TB | 8 GB | 12 TB NVMe | HDR200; 10GbE | gpu-n[28-30] |
Partition Details¶
l40s: This partition is appropriate for AI, simulations, 3D modeling workloads that require up to 4x gpus on a single node and rely on single or mixed precision operations (Note: This partition does not support double precision - FP64).
a100: This is the default partition in the gpu cluster and is appropriate for workflows that require up to 4x gpus on a single node. To request a particular feature (such as an Intel host CPU), add the following directive to your job script:
#SBATCH --constraint=intel
Multiple features can be specified in a comma-separated string.
a100_multi: This partition supports multi-node GPU workflows. Your job must request a minimum of 2 nodes and 4 GPUs on each node.
a100_nvlink: This partition supports multi-GPU computation on an Nvidia HGX platform with 8x A100 that are tightly coupled through an NVLink switch. To request a particular feature (such as an A100 with 80GB of GPU memory), add the the following directive to your job script:
#SBATCH --constraint=80g