Development of heterogeneous many-core hardware system for next-generation ultra-high performance computers
PCI-E type SiP optical interface board makes high-compatibility inter-node interconnect possible. It reduces electrical interference from the electrical interconnect by minimizing the length of the electrical line. It can support the 200Gbps interconnection. We expect better compatibility, scalability, link-bandwidth characteristics in new HPC interconnect system with PCI-E type SiP optical interface board.
200Gbps Silicon Photonics need to develop silicon photonics based components which supports 200Gbps interconnection system. To design and apply novel modulation scheme to reduce power consumption of the interconnection system are required.
With low-diameter interconnect topology, we can provide low-latency interconnect system for HPC application services. Building interconnect system with virtual routers enables low-cost interconnect system. For fault tolerance, it needs to develop routing algorithm which provides routing flexibility. In addition, we need monitoring visualization for easily analyzing bottlenecks and defects through visualization of integrated interconnect monitoring tools.
1. Development of PCI-E type Silicon Photonics Advanced interface board
1.1 Design of the high-speed interconnect NIC
– Includes ConnectX-5 VPI OCP
– Built as a PCIe NIC card that can be mounted on a high-speed server board
– Equipped with management block for implementing new switch management structure
1.2 Applications of the NIC
Network and disk I / O classifications for supercomputing applications running in high-speed interconnect systems
– High I / O throughput requirements: large capacity calculations for fault-tolerant characteristics Snapshot, etc
– Fast latency requirements: Object storage, Distributed file system, etc. Achieve high I / O throughput in response to various network and I / O requirements of supercomputing applications, and minimize latency
– Utilizing the prototype NICs that have been fabricated, it can be used to upgrade the performance optimization technique according to I / O requirements.
2. Developing a low-diameter scalable topology
2.1 Characteristics of suggested low-diameter scalable topology: FlexibleX
– High-radix, low-diameter
– Required number of routers: 2NM or 3NM, N,M∈ℕ
– Multiple near-minimal routing paths exist
– Developed 3 configurations according to the cost/latency requirements
2.2 Analysis of FlexibleX topology
3. Event Collection Agent (ECA) design for data collection and processing at node level
3.1 Specification definition for event collection between ECA and OpenSM
– Definition of the metric to be generated by ECA based on packet and link information
– Design of connectivity between ECA and subnet manager that collects events from all ECAs in the interconnect system
– Definition of message specification for event collection between subnet manager and ECA