DGX-1 NVLink Network Topology
for Efficient Application Scaling
DGX-1 includes eight NVIDIA Tesla V100 accelerators; providing the highest compute-density available in an air-cooled 3U chassis. Application scaling on this many highly parallel GPUs can be hampered by today's PCIe interconnect. NVLink provides the communications performance needed to achieve good scaling on deep learning and other applications. Each Tesla V100 GPU has six NVLink connection points, each providing a point-to-point connection to another GPU at a peak bandwidth of 25 GB/s in each direction. Multiple NVLink connections can be aggregated, multiplying the available interconnection bandwidth between a given pair of GPUs. The result is that NVLink provides a flexible interconnect that can be used to build a variety of network topologies among multiple GPUs. V100 also supports 16 lanes of PCIe 3.0. In DGX-1, these are used for connecting between the CPUs and GPUs and high-speed IB network interface cards.
The design of the NVLink network topology for DGX-1 aims to optimize a number of factors, including the bandwidth achievable for a variety of point-to-point and collective communications primitives, the ability to support a variety of common communication patterns, and the ability to maintain performance when only a subset of the GPUs is utilized.
The hybrid cube-mesh topology can be thought of as a cube with GPUs at its corners and with all twelve edges connected through NVLink (some edges have two NVLink connections), and with two of the six faces having their diagonals connected as well. The topology can also be thought of as three interwoven rings of single NVLink connections.
The cube-mesh topology provides the highest bandwidth of any 8-GPU NVLink topology for multiple collective communications primitives, including broadcast, gather, all-reduce, and allgather, which are important to deep learning.
Figure: DGX-1 uses an 8-GPU hybrid cube-mesh interconnection network topology. The corners of the mesh-connected faces of the cube are connected to the PCIe tree network, which also connects to the CPUs and NICs.