Leibniz HPC housing architecture. Courtesy: http://www.idsia.ch/~juergen/lrz.html
A good software solution which enables highly scalable codes is substantial
in the new era of multi- and many-core chips. Scalability will soon be the most limiting factor for application performance.
Concerning system architecture, both homogeneous and accelerated clusters with ten thousand compute nodes, as well as massively parallel systems with several hundred thousand low power compute nodes seem to be the dominating architectures for the next five years.
In addition to the well-known PRACE prototypes for the first generation of European Tier-0 centers, the PRACE work package for “Future Petaflop/s computer technologies beyond 2010” has evaluated 12 additional prototypes.
The in-depth assessment of prototypes has been a perfect complement to the continuous technology survey established by PRACE.
The investigated next generation architecture prototypes are full systems, system components or software prototypes. And, several research activities have been carried out.
Both prototype assessments and research activity results, are summarised in the PRACE deliverable D8.3.2. The following hardware, software & research activities have been assessed:
Systems
- CINES and LRZ have jointly evaluated a hybrid system architecture containing thin nodes, fat nodes and compute accelerators within a shared file system with components from SGI and ClearSpeed/PetaPath.
-
FZJ has extended the communication capabilities of their Cell-based QPACE system to estimate its suitability for a wider range of applications. This enhancement allowed running Linpack on the full system which made it no.1 on the Green500 list of energy-efficient supercomputers.
- NCF has assessed a system composed of ClearSpeed/PetaPath accelerator boards together with the ClearSpeed programming language Cn.
Software
- CEA has studied the performance of GPUs using CAPS hybrid multicore parallel programming (hmpp) workbench on NVIDIA Tesla.
- CSC studied the maturity of OpenCL and performance improvements for multi-GPU programming on NVIDIA Tesla and AMD Firestream cards.
- CSCS evaluated the ease of use of the PGAS programming model by using the Cray Compiler Environment for UPC and CAF.
- EPCC evaluated the HARWEST Compiling Environment for developing programs on their FPGA-based supercomputer “Maxwell”.
- LRZ assessed code and performance portability of the RapidMind multicore development platform across architectures (Cell, Tesla & Nehalem-EP).
Tools
- BSC did an in-depth performance analysis and performance prediction for full PRACE application codes to show the capabilities of their tools Paraver and Dimemas
Components
- CINECA evaluated the performance of I/0 and the Lustre file system, and assessed the advantages of SSD technology for metadata handling.
Energy efficiency
- PSNC and STFC have jointly assessed the power efficiency of different hardware solutions together with the power consumption profile of HPC servers.
- SNIC-KTH studied the achievable energy efficiency of commodity parts and commodity interconnects for cost efficiency and a minimal impact on the programming model.
The results show that some hardware accelerators have indeed the potential to substantially increase performance and/or power efficiency of traditional HPC systems. But software environments for hardware accelerators are not tailored to the demands of the scientific computing community. They need to become more stable, easier to use and better supported by debugging and optimisation tools.
PRACE presentation
Full technical report