IT core infrastructure

CIIRC Computational Cluster Description

The CIIRC computational cluster serves CIIRC researches, students, and their collaborators for intensive batch computations. The CIIRC cluster also offers acceleration on graphics cards. We have been extending and improving the CIIRC cluster.

The CIIRC cluster is XSEDE Compatible Basic Cluster based on OpenHPC project. The CIIRC cluster runs on CentOS 7 operating system. It uses Slurm workload manager and a job scheduler. Cluster management and orchestration is done by Warewulf toolkit.

User software is managed by EasyBuild, a software build and installation framework, and uses Lmod environment module system. The container platform in the CIIRC cluster is Singularity.

The core of the cluster consists of five NVIDIA DGX-1 nodes suplemented by other gpu and computational nodes. See table below for further details:

 
# of nodes 23
CPUs Total: 1488
RAM Total 7 TB
# of V100-32-MaxQ GPU 40
# of GTX 1080 Ti; 11 GB GPU 18
Theoretical CPU performance 28 TFLOPS
Theoretical GPU performance SP 750 TFLOPS

 

Nodes are interconnected by 10 Gbit Ethernet. The 100 Gbit EDR IB (InfiniBand) is available.  All nodes contain a local ssd scratch disk. The whole cluster is connected to the 600 TB Isilon NAS storage. „compute*“ means a default partition for computations without gpu acceleration. „GRES: means generic resources in Slurm.

Present Slurm configuration is fairly standard (simplified):

  • SelectType=select/cons_res
  • SelectTypeParameters=CR_CPU_Memory
  • SchedulerType=sched/backfill
  • PriorityType=priority/multifactor

Future extension and plans:

  • BeeGFS distributed scratch space
  • The cluster throughput improvement by better QOS settings

Funding acknowledgment:
Building and continuous extending of the CIIRC computational cluster needs substantial financial resources. The money comes mainly from national and European Commission projects, e.g. the Czech Government investment which created CIIRC, Josef Urban’s ERC Consolidator project AI4REASON and his project AI & Reasoning; Josef Šivic’s project IMPACT, Robert Babuška’s project R4I, etc.

Responsible: Jan Kreps; Poslední změna 2020-11-19