IBM & DOE Launch COVID-19 High Performance Computing Consortium

PIER MARIA FORNASARI
2020-03-23

IBM, in collaboration with the DOE, launched the COVID-19 High Performance Computing Consortium. The Consortium will help aggregate computing capabilities from some of the most powerful and advanced computers in the world to help researchers everywhere better understand COVID-19, its treatments and potential cures.

The consortium bring together an unprecedented amount of supercomputing power—16 systems with more than 330 petaflops, 775,000 CPU cores, 34,000 GPUs, and counting—to help researchers everywhere tackle this global challenge. These high-performance computing systems allow researchers to run very large numbers of calculations in epidemiology, bioinformatics, and molecular modeling in hours or days, not weeks, months or years.

Two critically important applications of this compute power could include: developing predictive models to assess how the disease is progressing and modeling new potential therapies and a possible vaccine.

These high-performance computing systems allow researchers to run very large numbers of calculations in epidemiology, bioinformatics, and molecular modeling,” writes Dario Gil, Director of IBM Research. “Now we must scale, and IBM will work with our consortium partners to evaluate proposals from researchers around the world and provide access to this supercomputing capacity for the projects that can have the most immediate impact.”

Partners within the consortium include: IBM, Lawrence Livermore National Lab (LLNL), Argonne National Lab (ANL), Oak Ridge National Laboratory (ORNL), Sandia National Laboratory (SNL), Los Alamos National Laboratory (LANL), the National Science Foundation (NSF), NASA, the Massachusetts Institute of Technology (MIT), Rensselaer Polytechnic Institute (RPI), Amazon, and Google. IBM will together with partners coordinate efforts across the consortium to evaluate proposals from top institutions, and provide access to HPC capacity for projects that can make the most immediate impact.

This new effort builds on already promising uses of supercomputing to fight COVID-19, and could dramatically accelerate scientific discoveries to combat the pandemic.

Available Resources:

Researchers can apply now at XSEDE for the following supercomputing resources:

U.S. Department of Energy (DOE) Advanced Scientific Computing Research (ASCR)

Supercomputing facilities at DOE offer some of the most powerful resources for scientific computing in the world. The Argonne Leadership Computing Facility (ALCF) and Oak Ridge Leadership Computing Facility (OLCF) may be used for modeling and simulation coupled with machine and deep learning techniques to study a range of areas, including examining underlying protein structure, classifying the evolution of the virus, understanding mutation, uncovering important differences, and similarities with the 2002-2003 SARS virus, searching for potential vaccine and antiviral, compounds, and simulating the spread of COVID-19 and the effectiveness of countermeasure options.

U.S. DOE National Nuclear Security Administration (NNSA)

Established by Congress in 2000, NNSA is a semi-autonomous agency within the U.S. Department of Energy responsible for enhancing national security through the military application of nuclear science. NNSA resources at Lawrence Livermore National Laboratory (LLNL), Los Alamos National Laboratory (LANL), and Sandia National Laboratories (SNL) are being made available to the COVID-19 HPC Consortium.

LLNL Lassen
- 23 PFLOPS, 788 compute nodes, IBM Power9/NVIDIA Volta GV100
- 28 TF per node
- 2 x IBM POWER9 CPUs (44 cores) per node
- 4 x NVIDIA Volta GPUs per node
- 256 BD DDR4 + 64 GB HBM2 (GPU memory) per node
- 1600 GB NVMe local storage per node
- 2 x Mellanox EDR IB (100Gb/s per adapter)
- 24 PB storage
LLNL Quartz
- 3.2 PF, 3004 compute nodes, Intel Broadwell
- 1.2 TF per node
- 2 x Intel Xeon E5-2695 CPUs (36 cores) per node
- 128 GB memory per node
- 1 x Intel Omni-Path IB (100Gb/s)
- 30 PB storage (shared with other clusters)
LLNL Pascal
- 0.9 PF, 163 compute nodes, Intel Broadwell CPUs/NVIDIA Pascal P100
- 11.6 TF per node
- 2 x Intel Xeon E5-2695 CPUs (36 cores) per node
- 2 x NVIDIA Pascal P100 GPUs per node
- 256 GB memory + 32 HBM2 (GPU memory) per node
- 1 x Mellanox EDR IB (100Gb/s)
- 30 PB storage (shared with other clusters)
LLNL Ray
- 1.0 PF, 54 compute nodes, IBM Power8/NVIDIA Pascal P100
- 19 TF per node
- 2 x IBM Power8 CPUs (20 cores) per node
- 4 x NVIDIA Pascal P100 GPUs per node
- 256 GB + 64 GB HBM2 (GPU memory) per node
- 1600 GB NVMe local storage per node
- 2 x Mellanox EDR IB (100Gb/s per adapter)
- 1.5 PB storage
LLNL Surface
- 506 TF, 158 compute nodes, Intel Sandy Bridge/NVIDIA Kepler K40m
- 3.2 TF per node
- 2 x Intel Xeon E5-2670 CPUs (16 cores) per node
- 3 x NVIDIA Kepler K40m GPUs
- 256 GB memory + 36 GB GDDR5 (GPU memory) per node
- 1 x Mellanox FDR IB (56Gb/s)
- 30 PB storage (shared with other clusters)
LLNL Syrah
- 108 TF, 316 compute nodes, Intel Sandy Bridge
- 0.3 TF per node
- 2 x Intel Xeon E5-2670 CPUs (16 cores) per node
- 64 GB memory per node
- 1 x QLogic IB (40Gb/s)
- 30 PB storage (shared with other clusters)
LANL Grizzly
- 1.8 PF, 1490 compute nodes, Intel Broadwell
- 1.2 TF per node
- 2 x Intel Xeon E5-2695 CPUs (36 cores) per node
- 128 GB memory per node
- 1 x Intel Omni-Path IB (100Gb/s)
- 15.2 PB storage
LANL Snow
- 445 TF, 368 compute nodes, Intel Broadwell
- 1.2 TF per node
- 2 x Intel Xeon E5-2695 CPUs (36 cores) per node
- 128 GB memory per node
- 1 x Intel Omni-Path IB (100Gb/s)
- 15.2 PB storage
LANL Badger
- 790 TF, 660 compute nodes, Intel Broadwell
- 1.2 TF per node
- 2 x Intel Xeon E5-2695 CPUs (36 cores) per node
- 128 GB memory per node
- 1 x Intel Omni-Path IB (100Gb/s)
- 15.2 PB storage
SNL Solo
- 460 TF, 374 compute nodes, Intel Broadwell
- 1.2 TF per node
- 2 x Intel Xeon E5-2695 CPUs (36 cores) per node
- 128 GB memory per node
- 1 x Intel Omni-Path IB (100Gb/s)

U.S. National Science Foundation (NSF)

The NSF Office of Advanced Cyberinfrastructure supports and coordinates the development, acquisition, and provision of state-of-the-art cyberinfrastructure resources, tools and services essential to the advancement and transformation of science and engineering. By fostering a vibrant ecosystem of technologies and a skilled workforce of developers, researchers, staff and users, OAC serves the growing community of scientists and engineers, across all disciplines. The most capable resources supported by NSF OAC are being made available to support the COVID-19 HPC Consortium.

Frontera
Operated by the Texas Advanced Computing Center (TACC), Frontera provides a balanced set of capabilities that supports both capability and capacity simulation, data-intensive science, visualization, and data analysis, as well as emerging applications in AI and deep learning. Frontera has two computing subsystems, a primary computing system focused on double precision performance, and a second subsystem focused on single-precision streaming-memory computing.
Comet
Operated by the San Diego Supercomputer Center (SDSC), Comet is a nearly 3-petaflop cluster designed by Dell and SDSC. It features Intel next-generation processors with AVX2, Mellanox FDR InfiniBand interconnects, and Aeon storage.
Stampede 2
Operated by TACC, Stampede 2 is a nearly 20-petaflop HPC national resource accessible to thousands of researchers across the country, including to enable new computational and data-driven scientific and engineering, research and educational discoveries and advances.
Bridges
Operated by the Pittsburgh Supercomputing Center (PSC), Bridges provides an innovative HPC and data-analytic system, integrating advanced memory technologies to empower new modalities of artificial intelligence based computations, bring desktop convenience to HPC, connect to campuses, and express data-intensive scientific and engineering workflows.
Jetstream
Operated by a team led by the Indiana University Pervasive Technology Institute, Jetstream is a configurable large-scale computing resource that leverages both on-demand and persistent virtual machine technology to support a wide array of software environments and services through incorporating elements of commercial cloud computing resources with some of the best software in existence for solving important scientific problems.

NASA High-End Computing Capability

NASA’s High-End Computing Capability (HECC) Portfolio provides world-class high-end computing, storage, and associated services to enable NASA-sponsored scientists and engineers supporting NASA programs to broadly and productively employ large-scale modeling, simulation, and analysis to achieve successful mission outcomes.

NASA’s Ames Research Center in Silicon Valley hosts the agency’s most powerful supercomputing facilities. To help meet the COVID-19 challenge facing the nation and the world, HECC is offering access to NASA’s high-performance computing (HPC) resources for researchers requiring HPC to support their efforts to combat this virus.

Google

Transform research data into valuable insights and conduct large-scale analyses with the power of Google Cloud. As part of the COVID-19 HPC Consortium, Google is providing access to Google Cloud HPC resources for academic researchers.

Amazon Web Services

As part of the COVID-19 HPC Consortium, AWS is offering research institutions and companies technical support and promotional credits for the use of AWS services to advance research on diagnosis, treatment, and vaccine studies to accelerate our collective understanding of the novel coronavirus (COVID-19). Researchers and scientists working on time-critical projects can use AWS to instantly access virtually unlimited infrastructure capacity, and the latest technologies in compute, storage and networking to accelerate time to results. Learn more here.

MIT/Massachusetts Green HPC Center (MGHPCC)

MIT is contributing two HPC systems to the COVID-19 HPC Consortium. The MIT Supercloud, a 7-petaflops Intel x86/NVIDIA Volta HPC cluster, is designed to support research projects that require significant compute, memory or big data resources. Satori, is a 2-petaflops scalable AI-oriented hardware resource for research computing at MIT composed of 64 IBM Power9/Volta nodes. The MIT resources are installed at the Massachusetts Green HPC Center (MGHPCC), which operates as a joint venture between Boston University, Harvard University, MIT, Northeastern University, and the University of Massachusetts.

Rensselaer Polytechnic Institute

The Rensselaer Polytechnic Institute (RPI) Center for Computational Innovations is solving problems for next-generation research through the use of massively parallel computation and data analytics. The center supports researchers, faculty, and students a diverse spectrum of disciplines. RPI is making its Artificial Intelligence Multiprocessing Optimized System (AiMOS) system available to the COVID-19 HPC Consortium. AiMOS is an 8-petaflop IBM Power9/Volta supercomputer configured to enable users to explore new AI applications.

IBM Research WSC

The IBM Research WSC cluster consists of 56 compute nodes, each with dual socket 22 core CPU and 6 GPUs, plus seven additional nodes dedicated to management functions. The cluster is intended to be used for the following purposes: client collaboration, advanced research for government-funded projects, advanced research on Converged Cognitive Systems, and advanced research on Deep Learning.

56 AC922 nodes
2 x POWER9 CPU per node, 22 cores per CPU
6 x NVIDIA V100 GPUs per node (336 total)
512 GiB DRAM per node
1.4 TB NVMe per node
2 x EDR InfiniBand per node
2 PB GPFS distributed storage  (No local disk)
RHEL 7.6
CUDA 10.1
 IBM PowerAI 1.6
 IBM HPC Software stack (CSM/JSM/LSF job management; Spectrum MPI ; XL C/C++/FORTRAN 16.1 ; GPFS)

No Comments InCovid-19

REGENERATIVE AND PERSONALISED MEDICINE

DIGITAL HEALTHCARE