What Are the Biggest Software Challenges in Machine Learning?

Machine Learning has revived a significant amount of interest in techniques that were popular in the 70s and 80s.

What are the biggest software challenges in machine learning and differentiable programming systems? originally appeared on Quora, the place to gain and share knowledge, empowering people to learn from others and better understand the world. You can follow Quora on Twitter, Facebook, and Google Plus.

Machine Learning has revived a significant amount of interest in techniques that were popular in the 70s and 80s but have not received mainstream attention since outside of academia or certain fairly niche uses cases. Among these techniques are ideas like polyhedral compilation, optimizing array compilers and automatic differentiation (reverse mode AD in particular) and a few others. Additionally, the hardware folks are reviving ideas that have been out of vogue for a while: Systolic arrays, software scheduled architectures (with software-managed hazards), innovation in ISA design and interconnects. Writing good compilers for these architectures is quite hard and not really a solved problem yet.

The reason a lot of these ideas failed last time around was that it turned out there was very little advantage in exploring these exotic ideas, because next year’s general purpose CPU would just be faster than any special purpose hardware you could possibly build. Physical limitations are preventing CPUs from getting faster (I’m writing this answer on a six year old machine and the most powerful single machine I currently have SSH access to is about a decade old - both of these would have been mostly unthinkable even a decade ago), so we need to find innovation elsewhere.

The good news is that (if done properly) all this innovation being driven by machine learning will have significant benefits in other fields. Pervasive availability of production quality automatic differentiation systems (“differentiable programming systems”) will have impact far beyond machine learning. Wherever any sort of optimization process happens, being able to very quickly compute the derivative of your objective function is a crucial prerequisite to getting a good result. This shows up everywhere: Finance, Astrophysics, Medical Imaging, Personalized Medicine, Logistics and many others. The same is true for some of the other techniques. Our thesis for building the machine learning stack in julia is to build it as a set of general infrastructure (AD, compiler support, hardware backends, developer tools, etc.) and have machine learning just fall out as a special case of what’s well supported. That sometimes causes some friction because current-generation machine learning systems tend to not require the full generality that this infrastructure provides, but I think it’s a crucial ingredient for next generation systems [1].

I would be reminisce at this point not to point out some of the other great work that’s going on in this area. There’s some great work out of Google with Swift for TensorFlow (a very different approach from what we’re doing, but taking differentiable programming the furthest of any of the non-Julia systems I have seen) and MLIR (which despite its name and much to my enthusiastic support is trying to build general purpose next-generation compiler infrastructure), TVM from UW (doing some great work on ML-driven compiler heuristics and search space exploration for generating really high performance kernels on all kinds of architectures). I also liked the goals for Myia when it was announced, but I haven’t heard much recently.

[1] I recently gave a talk on this (https://juliacomputing.com/blog/...). It’s a good overview of the various things we’re doing at the compiler level to support machine learning and differentiable programming. There’s also a larger point on how all of these things have to work together that I sometimes think gets lost.

This question originally appeared on Quora. More questions on Quora:

* Open Source: Why do you support open source software and open collaboration?

* Supercomputers: What was it like to run code on some of the world's largest supercomputers?

* Central Processing Units: When should one use a CPU, a GPU, or a TPU?

Photo Credit: gremlin/Getty Images