Should I Use Docker With My Deep Learning Projects?

Docker and other containerization tools make software integration and deployment significantly easier.

Chris Kawalek, Sr. Product Marketing Manager, NVIDIA GPU Cloud at NVIDIA

Should I be using Docker with my deep learning projects? originally appeared on Quora, the place to gain and share knowledge, empowering people to learn from others and better understand the world. You can follow Quora on Twitter, Facebook, and Google Plus.

Docker and other containerization tools make software integration and deployment significantly easier. Complex software, like deep learning frameworks or HPC applications, have dependencies not just at the operating system level, but with drivers, libraries, and runtimes, too. These parts all work together to determine how well the software runs, or if it even runs at all.

Containers are useful because they provide a lightweight, isolated environment for applications or other software. They run on top of a host computer, but are independent. This is similar to a virtual machine, but containers are focused on packaging the application with its dependencies, and don’t incur the overhead that virtual machines do. And since, as their name implies, they are self-contained, the applications, drivers, and even the operating system version can be unique between containers and from the host.

This is useful for a few reasons:

* It eliminates any issues with software conflicts. This comes up often with deep learning and other advanced software because they are updated frequently, and one piece of software might require a different version of a specific driver or library than another. Or a newer version of an application might not run if an older version is installed on the same system. There’s not an easy way to deal with this if you’re installing software directly on a host system. But in a container, it is easy -- simply create two (or more) containers with customized stacks for each application or framework.

* Containers allow you to install and pre-configure everything an application needs and wrap it up in a single, easy-to-use package that’s simple to replicate. This allows you to start with a completely up-to-date, fresh copy of an environment every time you start a new project. Or you can version containers so, as you update components in new containers, earlier projects that rely on older components aren’t affected.

* Containers are portable. The containerization technology decouples the container from the host and includes required dependencies, which allows containers to run on a large variety of systems, not just the system it was created on. This portability makes the notion of a container registry compelling.

A container registry is a place to store containers. You “pull” containers from a registry and run them on a system. You can create a container registry yourself, or there are large public registries like Docker Hub with a variety of containerized software.

An extremely useful registry to look into if you’re doing deep learning or other complex, GPU-accelerated work is NVIDIA GPU Cloud. It has containers for all the top deep learning software (TensorFlow, PyTorch, TensorRT, etc.) with tuned and tested software stacks ready-to-run on NVIDIA GPUs. The deep learning containers are updated every month with the latest framework releases and whatever changes are needed in the underlying stack to make sure the software continues to run optimally. And there is no charge to download and use the containers.

Using containers is an excellent way to create easy-to-deploy, repeatable environments for your deep learning projects. It takes time to create containers initially, but they save a lot of effort and frustration in the long run. And to get started quickly, you can skip the do-it-yourself integration work and pull ready-to-run containers from an existing container registry like the ones mentioned above.

This question originally appeared on Quora. More questions on Quora:

* Deep Learning: How can I get started using deep learning frameworks?

* Artificial Intelligence: Should artificial intelligence be regulated?

* Computer Science: What are your recommendations for self-studying machine learning?

Photo Credit: DigitalVision/Getty Images