This is an old revision of the document!
We noticed sacct (in SLURM 2.6.1) is making unindexed queries[1] on job tables, which take several seconds on an installation with ~2M job_table rows, even after tuning mysqld.
Adding a composite index across some of the more distinctive columns dropped query time to a few milliseconds:
ALTER TABLE ${clustername}_job_table ADD KEY `sacct` (`id_user`,`time_start`,`time_end`);
SET timestamp=1613686302; select distinct t1.id_wckey, t1.is_def, t1.wckey_name, t1.user from "aicluster_wckey_table" as t1 where t1.deleted=0 && (t1.is_def=1) && (t1.user='lukevanbuskirk') order by wckey_name, user; # User@Host: slurm[slurm] @ [172.20.0.3] # Thread_id: 1318 Schema: slurmDB QC_hit: No # Query_time: 0.020722 Lock_time: 0.000069 Rows_sent: 0 Rows_examined: 2763 # Rows_affected: 2762 Bytes_sent: 60 # # explain: id select_type table type possible_keys key key_len ref rows r_rows filtered r_filtered Extra # explain: 1 SIMPLE aicluster_assoc_table index NULL PRIMARY 4 NULL 2763 2763.00 100.00 99.96 Using where
Create QOS
root@fe01:~# sacctmgr -i add qos high set priority=1000 Adding QOS(s) high Settings Description = high Priority = 1000
root@fe01:~# sacctmgr -i add qos medium set priority=500 Adding QOS(s) medium Settings Description = medium Priority = 500
root@fe01:~# sacctmgr -i add qos low set priority=100 Adding QOS(s) low Settings Description = low Priority = 100
Create group:
root@fe01:~# sacctmgr create account jonaslab Adding Account(s) jonaslab Settings Description = Account Name Organization = Parent/Account Name Associations A = jonaslab C = aicluster Would you like to commit changes? (You have 30 seconds to decide) (N/y): y
Set prio and default prio:
root@fe01:~# sacctmgr -i modify account jonaslab set qos=low Modified account associations... C = aicluster A = jonaslab of root
root@fe01:~# sacctmgr -i modify account jonaslab set defaultqos=low Modified account associations... C = aicluster A = jonaslab of root
Source: https://bugs.schedmd.com/show_bug.cgi?id=1613
This will give 'kauffman3' two user
root@fe01:~# sacctmgr create user kauffman3 account=jonaslab Associations = U = kauffman3 A = jonaslab C = aicluster Non Default Settings Would you like to commit changes? (You have 30 seconds to decide) (N/y): y
root@fe01:~# sacctmgr show account withassoc kauffman3 Account Descr Org Cluster Par Name User Share Priority GrpJobs GrpNodes GrpCPUs GrpMem GrpSubmit GrpWall GrpCPUMins MaxJobs MaxNodes MaxCPUs MaxSubmit MaxWall MaxCPUMins QOS Def QOS ---------- -------------------- -------------------- ---------- ---------- ---------- --------- ---------- ------- -------- -------- ------- --------- ----------- ----------- ------- -------- -------- --------- ----------- ----------- -------------------- --------- kauffman3 kauffman3 kauffman3 aicluster root 1 normal kauffman3 kauffman3 kauffman3 aicluster kauffman3 1 normal
root@fe01:~# sacctmgr show account withassoc jonaslab Account Descr Org Cluster Par Name User Share Priority GrpJobs GrpNodes GrpCPUs GrpMem GrpSubmit GrpWall GrpCPUMins MaxJobs MaxNodes MaxCPUs MaxSubmit MaxWall MaxCPUMins QOS Def QOS ---------- -------------------- -------------------- ---------- ---------- ---------- --------- ---------- ------- -------- -------- ------- --------- ----------- ----------- ------- -------- -------- --------- ----------- ----------- -------------------- --------- jonaslab jonaslab jonaslab aicluster root 1 low low jonaslab jonaslab jonaslab aicluster kauffman3 1 low low
normal == default priority
root@fe01:~# sacctmgr list qos Format=name,priority Name Priority ---------- ---------- normal 0 high 1000 medium 500 low 100
Check prio on submitted job:
kauffman3@fe01:~/examples$ sacct -j 381 --format=JobID,JobName,MaxRSS,Elapsed,Qos JobID JobName MaxRSS Elapsed QOS ------------ ---------- ---------- ---------- ---------- 381 two_gpu_p+ 00:00:31 low 381.batch batch 4092K 00:00:31 381.extern extern 0 00:00:32 381.0 bash 520K 00:00:31
https://containers-at-tacc.readthedocs.io/en/latest/singularity/03.mpi_and_gpus.html#message-passing-interface-mpi-for-running-on-multiple-nodes https://containers-at-tacc.readthedocs.io/en/latest/singularity/02.singularity_batch.html#how-do-hpc-systems-fit-into-the-development-workflow
TACC hasn't solved this problem either:
https://containers-at-tacc.readthedocs.io/en/latest/singularity/02.singularity_batch.html#how-do-hpc-systems-fit-into-the-development-workflow
Additionally for their SLURM cluster they use singularity and not Docker. The `build` portion for the Docker container is expected to happen elsewhere.
https://containers-at-tacc.readthedocs.io/en/latest/singularity/03.mpi_and_gpus.html#singularity-and-gpu-computing
Based on the little I know about singularity it was meant to be run on HPC clusters so I don't think we'll have problem deploying it everywhere.
Phil
On 2/9/21 9:37 AM, :Ok this is great, thank you for looking into this so much.
Phil I think your “round-peg square-hole” comment might be correct, but
this is also the world we have woken up in.
Podman might actually work, although I'm vaguely worried that they appear
to use a version of fuse for their non-root userspace filesystem IO, which
may be a performance nightmare.
Heavily-multiuser systems like TACC (NSF supercomputer) and ALCF (Argonne)
are increasingly adopting containers for end:
https://containers-at-tacc.readthedocs.io/en/latest/
I believe the “river” cluster here at UChicago (run by physics) also
supports running containers.
I'm still trying to figure out where the security contours lie between
“building” the container and “running” the container. For example,
cluster-level support for running containers (but not building them) could
conceivably be ok. This might be what TACC et al are doing.
I'm willing to table this for a bit, but let's be sure to revisit. I'll ask
Kyle what the River people are doing.
The conversation then turned to building docker images for different architectures.
https://docs.docker.com/docker-for-mac/apple-m1/
On my M1 MacBook Air:
Find the digest entry for amd64
m1$ docker manifest inspect ubuntu:20.04
m1$ docker run -it docker.io/library/ubuntu:20.04@sha256:3093096ee188f8ff4531949b8f6115af4747ec1c58858c091c8cb4579c39cc4e
uname -a
WARNING: The requested image's platform (linux/amd64) does not match the detected host platform >(linux/arm64/v8) and no specific platform was requested
root@afc7a92aafeb:/# uname -a
Linux afc7a92aafeb 4.19.104-linuxkit #1 SMP PREEMPT Sat Feb 15 00:49:47 UTC 2020 x86_64 x86_64 x86_64 >GNU/Linux
https://docs.docker.com/docker-for-mac/multi-arch/
Basically this:
m1$ docker buildx build –platform linux/amd64 .
I’ve built a container that Techstaff uses to deploy the `chisubmit` client to linux.cs (amd64) on my M1 >MacBook (arm64).
Export the container:
m1$ docker save -o ubuntu-20.04-chisubmit-2.1.0.tar docker.io/techstaff/ubuntu-20.04-chisubmit:2.1.0
Go to an AMD64 machine and import it. Using Podman just to make this harder.
amd64-machine $ podman load < ubuntu-20.04-chisubmit-2.1.0.tar
amd64-machine $ podman image ls
REPOSITORY TAG IMAGE ID CREATED SIZE
localhost/techstaff/ubuntu-20.04-chisubmit 2.1.0 05989787458d 5 minutes ago 628 MB
amd64-machine $ podman run -it 05989787458d /bin/bash
root@718a5928bc4a:/# uname -a
Linux 718a5928bc4a 5.8.0-36-generic #40+21.04.1-Ubuntu SMP Thu Jan 7 11:35:09 UTC 2021 x86_64 x86_64 >x86_64 GNU/Linux
I tried using the repo name to run the image but it didn’t work. Not sure why at the moment.
Phil
This is going to be really interesting going forward when most scientific
users are no longer going to have the ability to build containers on their
laptops due to architectural issues. Sigh.