With architecture evolution, the HPC market has undergone a paradigm shift. The adoption of low-cost, Linux-based clusters that offer significant computing performance, and the ability to run a wide array of applications, extended HPC???s reach from its roots in modelling and simulation of complex physical systems to a broad range of industries, from biotechnology, cloud computing, computer analytics and big data challenges to manufacturing sectors such as aeronautics, automotive, and energy. These systems are expected to be highly distributed and to work on massive amounts of data, thus requiring high-bandwidth, low-latency interconnections, and massive storage. On the other hand current interconnections and storage devices together provide latencies in the order of hundreds of microseconds, which limit the scalability of such data-hungry application models.
HPC applications have been traditionally working on large data sets, but, up to recently, the main point of interest was floating-point operations per second, rather than the data. Modern HPC technology promises true-fidelity scientific simulation, enabled by the integration of huge sets (or streams) of data coming from a variety of sources. As a result, general consensus is that the problem of data management in HPC systems is rapidly growing, fueling a parallel shift towards data-centric HPC architectures. These systems are expected to work on massive amounts of data, thus requiring low-latency access to fast storage.
Computing systems that target exascale performance must scale favorably along two orthogonal directions. They must scale up, by increasing the capacity of cores, CPUs and nodes; and they must scale out, by allocating a vast number of computing nodes in multiple cabinets, and putting them to do collective work. Evidently, the interconnection network is a key enabling technology in the tough road towards exascale.
This proposal, driven by strong power and cost incentives, focuses on this important piece of the architecture which is needed to unleash exascale computing. In this context, the availability of distributed low-latency devices brings new opportunities to data centric systems, but also introduces new challenges for the underlying interconnect.