Docker DGX-1

What is Docker?

A Docker container is a mechanism for bundling a Linux application with all of its libraries, data files, and environment variables so that the execution environment is always the same, on whatever Linux system it runs and between instances on the same host. Docker containers are user-mode only, so all kernel calls from the container are handled by the host system kernel.

Docker containers wrap a piece of software in a complete filesystem that contains everything needed to run: code, runtime, system tools, system libraries - anything that can be installed on a server. This guarantees that the software will always run the same, regardless of its environment.
Docker provides both hardware and software encapsulation by allowing multiple containers to run on the same system at the same time each with their own set of resources (CPU, memory, etc) and their own dedicated set of dependencies (library version, environment variables, etc.). Docker also provides portable Linux deployment: Docker containers can be run on any Linux system with kernel is 3.10 or later. All major Linux distros have supported Docker since 2014. Encapsulation and portable deployment are valuable to both the developers creating and testing applications and the operations staff who run them in data centers.

Docker provides many more important features.
• Docker's powerful command-line tool, 'docker build', creates Docker images from source code and binaries, using the description provided in a "Dockerfile".
• Docker's component architecture allows one container image to be used as a base for other containers.
• Docker provides automatic versioning and labeling of containers, with optimized assembly and deployment. Docker images are assembled from versioned layers so that only the layers missing on a server need to be downloaded.
• Docker Hub is a service that makes it easy to share docker images publicly or privately.
• Containers can be constrained to a limited set of resources on a system (e.g one CPU core and 1GBof memory).

Docker provides a layered file system that conserves disk space and forms the basis for extensible containers.

Why NVIDIA Docker?

Docker containers are platform-agnostic, but also hardware-agnostic. This presents a problem when using specialized hardware such as NVIDIA GPUs which require kernel modules and userlevel libraries to operate. As a result, Docker does not natively support NVIDIA GPUs within containers.

One of the early work-arounds to this problem was to fully install the NVIDIA drivers inside the container and map in the character devices corresponding to the NVIDIA GPUs (e.g. /dev/nvidia0) on launch. This solution is brittle because the version of the host driver must exactly match the version of the driver installed in the container. This requirement drastically reduced the portability of these early containers, undermining one of Docker's more important features.
To enable portability in Docker images that leverage NVIDIA GPUs, NVIDIA developed nvidiadocker, an open-source project hosted on Github that provides the two critical components needed for portable GPU-based containers:

• 1. driver-agnostic CUDA images; and
• 2. a Docker command line wrapper that mounts the user mode components of the driver and 2. the GPUs (character devices) into the container at launch.

nvidia-docker is essentially a wrapper around the docker command that transparently provisions a container with the necessary components to execute code on the GPU. It is only absolutely necessary when using nvidia-docker run to execute a container that uses GPUs. But for simplicity in this post we use it for all Docker commands

This software architecture has many advantages:

• 1. Since each deep learning framework is in a separate container, each framework can use different versions of libraries like libc, cuDNN, and others, and not interfere with each other.

• 2. As deep learning frameworks are improved for performance or bug fixes, new versions of the containers are made available in the DGX Container Registry.

• 3. The system is easy to maintain, and the OS image stays clean, since applications are not installed directly on the OS.

• 4. Security updates, driver updates, and OS patches can be delivered seamlessly.

The deep learning frameworks and the CUDA Toolkit include libraries that have been customtuned to provide high multi-GPU performance on DGX-1.

Softwares

The DGX-1 software has been built to run deep learning at scale. A key goal is to enable practitioners to deploy deep learning frameworks and applications on DGX-1 with minimal setup effort. The design of the platform software is centred around a minimal OS and driver install on the server, and provisioning of all application and SDK software in Docker containers through the DGX Container Registry maintained by NVIDIA. Containers available for DGX-1 include multiple optimized deep learning frameworks, the NVIDIA DIGITS deep learning training application, third-party accelerated solutions, and the NVIDIA CUDA Toolkit. Figure shows the DGX-1 deep learning software stack.

Docker Engine Utility for NVIDIA GPUS

Over the last few years there has been a dramatic rise in the use of software containers for simplifying deployment of data center applications at scale. Containers encapsulate an application's dependencies to provide reproducible and reliable execution of applications and services without the overhead of a full virtual machine.
Each Tensor Core provides a 4x4x4 matrix processing array which performs the operation D = A * B + C, where A, B, C and D are 4x4 matrices. The matrix multiply inputs A and B are FP16 matrices, while the accumulation matrices C and D may be FP16 or FP32 matrices.
Each Tensor Core performs 64 floating point FMA mixed-precision operations per clock (FP16 input multiply with full-precision product and FP32 accumulate) and 8 Tensor Cores in an SM perform a total of 1024 floating point operations per clock. This is a dramatic 8X increase in throughput for deep learning applications per SM compared to Pascal GP100 using standard FP32 operations, resulting in a total 12X increase in throughput for the Volta V100 GPU compared to the Pascal P100 GPU. Tensor Cores operate on FP16 input data with FP32 accumulation. The FP16 multiply results in a full-precision result that is accumulated in FP32 operations with the other products in a given dot product for a 4x4x4 matrix multiply.

Docker containers generally strive to be platform-agnostic and hardware-agnostic. They achieve this by separating user-mode code (in the container) from kernel-mode code. This separation presents a problem when using specialized hardware such as NVIDIA GPUs, since GPU drivers consist of a matched set of user-mode and kernel-mode modules. An early workaround to this problem was to fully install the NVIDIA drivers inside the container and map in the devices corresponding to the NVIDIA GPUs on launch. This solution is brittle because the version of the host driver must exactly match the version of the driver installed in the container. This requirement drastically reduced the portability of these early containers, undermining one of Docker's more important features.

Useful links:

DGX-1:

Click here

NGC:

click here

DGX-1 User Guide:

1.Click here

2.Click here

Frameworks:

1.Click here

2.Click here

DGX-1 V100

Click here