Ollama

Ollama is a platform that enables users to interact with Large Language Models (LLMs) via an Application Programming Interface (API). It is a powerful tool for generating text, answering questions, and performing complex natural language processing tasks.

Running ollama server

Ollama version 0.9.2 has been installed as a singularity image /software/build/ollama/ollama_0.9.2.sif on our cluster.

We have provided two batch job templates to run Ollama server on CRCD's GPU cluster. /software/build/ollama/ollama_0.9.2_a100_80gb.slurm will submit a job to the a100 partion, 80GB GPU memory node. /software/build/ollama/ollama_0.9.2_l40s.slurm will submit a job to the l40s, 48GB GPU memory node.

Using sbatch to run the Ollama server :

sbatch /software/build/ollama/ollama_0.9.2_a100_80gb.slurm

This will run the ollama service on a GPU node with 125GB of memory, 16 cores and an A100 80GB GPU memory which should be suitable for various models provided by Ollama.

sbatch /software/build/ollama/ollama_0.9.2_l40s.slurm

This will run the ollama service on a GPU node with 125GB of memory, 16 cores and an L40S 48GB GPU memory.

After the job is submitted and running, you can inquire the host name and the port number it's listening on via the following command:

[fangping@login3 ~]$ squeue -M gpu -u fangping
CLUSTER: gpu
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
           1230409      l40s ollama_0 fangping  R       0:03      1 gpu-n55

[fangping@login3 ~]$ squeue -M gpu --me --name=ollama_0.9.2_server_job --states=R -h -O NodeList,Comment
gpu-n55             45141

Connecting to the ollama server from a client

You can connect to the ollama server running on a GPU node through R or Python client.

Using RStudio server to connect to ollama server

logon ondemand.htc.crc.pitt.edu, click RStudio server 2022.

You can use rollama to connect to ollama server running on the gpu node.

Note that the ollama models will be downloaded to ~/.ollama. Your home directory has 75 GB quota.

[fangping@login3 ~]$ cd ~/.ollama
[fangping@login3 .ollama]$ ls -l
total 53
-rw------- 1 fangping sam 387 Jun  9 10:15 id_ed25519
-rw-r--r-- 1 fangping sam  81 Jun  9 10:15 id_ed25519.pub
drwxr-xr-x 4 fangping sam  50 Jul  1 12:27 models
Delete the ollama server when you have finished testing.
[fangping@login3 ~]$ scancel -M gpu 1230409

Using Jupyter to connect to ollama server

I have installed ollama python package to a conda environment /ix1/bioinformatics/python_envs/ollama. In case that you are planning to install ollama into your own conda environment, these are the instructions.

[fangping@login3 ~]$ module load python/ondemand-jupyter-python3.11
[fangping@login3 ~]$ conda create --prefix=/ix1/bioinformatics/python_envs/ollama python=3.11
Retrieving notices: ...working... done
Collecting package metadata (current_repodata.json): done
Solving environment: done
...

[fangping@login3 ~]$ source activate /ix1/bioinformatics/python_envs/ollama
(/ix1/bioinformatics/python_envs/ollama) [fangping@login3 ~]$ pip install ollama
...
Successfully installed annotated-types-0.7.0 anyio-4.9.0 certifi-2025.6.15 h11-0.16.0 httpcore-1.0.9 httpx-0.28.1 idna-3.10 ollama-0.5.1 pydantic-2.11.7 pydantic-core-2.33.2 sniffio-1.3.1 typing-extensions-4.14.0 typing-inspection-0.4.1
(/ix1/bioinformatics/python_envs/ollama) [fangping@login3 ~]$ source deactivate
DeprecationWarning: 'source deactivate' is deprecated. Use 'conda deactivate'.
[fangping@login3 ~]$

Run ollama server and inquiry the hostname and port number.

[fangping@login3 ~]$ sbatch /software/build/ollama/ollama_0.9.2_l40s.slurm
Submitted batch job 1230448 on cluster gpu
[fangping@login3 ~]$ squeue -M gpu -u fangping
CLUSTER: gpu
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
           1230448      l40s ollama_0 fangping  R       0:03      1 gpu-n57
[fangping@login3 ~]$ squeue -M gpu --me --name=ollama_0.9.2_server_job --states=R -h -O NodeList,Comment
gpu-n57             48362
logon ondemand.htc.crc.pitt.edu, click Jupyter.

You can use python ollama inside the conda environment to connect to ollama server running on the gpu node.

Pulling new models

You have to to get an interactive session on SMP to run the image as follows:

srun -M smp -p smp -n4 --mem=16G -t0-04:00:00 --pty bash

If you are already in an interactive session on SMP or HTC, then you can just run the following directly from the seesion but make sure you don't run this on the login node or it will be killed, - when you get directed to the node:

$ module load singularity/3.9.6
$ singularity shell /software/build/ollama/ollama_0.9.2.sif
singularity$ export OLLAMA_HOST=gpu-n58:44883
singularity$ ollama pull llama4:scout