Any data scientist or machine learning enthusiast who has been trying to elicit performance of her learning models at scale will at some point hit a cap and start to experience various degrees of processing lag.
Tasks that take minutes with smaller training sets may now take more hours—in some cases weeks—when datasets get larger. You’ll need the best hardware, and while researching you will come across and may get confused with CPUs, GPUs, and ASICs.
A Short History
A central processing unit (CPU) is essentially the brain of any computing device, carrying out the instructions of a program by performing control, logical, and input/output (I/O) operations.
The first CPU, the 4004 unit, was developed by Intel just 50 years ago in the 1970s. Most CPUs then were designed with one “core,” meaning that only one operation could be performed at a time. Years later, owing to vast improvements in chip design, research, and manufacturing, the computing market advanced to dual and multi-core CPUs which were faster because they could now perform two or more operations at a time.
Today’s CPUs, however, have just a handful of cores, and its basic design and purpose—to process complex computations—has not really changed. Essentially, they’re mostly applicable for problems that require parsing through or interpreting complex logic in code.
A graphical processing unit (GPU), on the other hand, has smaller-sized but many more logical cores (arithmetic logic units or ALUs, control units and memory cache) whose basic design is to process a set of simpler and more identical computations in parallel.
Figure 1: CPU vs GPU
While GPUs have certainly been around as long as gaming applications since the 1970s, it wasn’t until Nvidia released its GeForce processor line of “graphics processing units” or GPUs that they became more popular.
Initially designed mainly as dedicated graphical rendering workhorses of computer games, GPUs were later enhanced to accelerate other geometric calculations (for instance, transforming polygons or rotating verticals into different coordinate systems like 3D).
Nvidia created a parallel computing architecture and platform for its GPUs called CUDA, which gave developers access and the ability to express simple processing operations in parallel through code.
Figure 2: GPU Parallel Architecture (Wikipedia)
Moreover, most of these computations involved matrix and vector operations, the same type of mathematics that is used in data science today. It wasn’t too long before engineers and non-gaming scientists studied how GPUs might be also used for non-graphical calculations.
Accessing CPUs and GPUs
CPUs are accessible today to data science practitioners on the cloud, using serverless microservices, or “backend-as-a-service” or BAAS architectures. Developers can add API-driven machine learning services to any application with diverse libraries on computer vision, speech or language, as well as integration with modern tools like data lakes or stream processing.
How would you choose among these? As with any data science project, it depends. There are tradeoffs to consider, between speed, reliability, and cost. As a general rule, GPUs are a safer bet for fast machine learning because, at its heart, data science model training is composed of simple matrix math calculations, the speed of which can be greatly enhanced if the computations can be carried out in parallel.
In other words, CPUs are best at handling single, more complex calculations sequentially, while GPUs are better at handling multiple but simpler calculations in parallel.
GPU compute instances will typically cost 2-3x that of CPU compute instances, so unless you’re seeing 2-3x performance gains in your GPU-based training models, I would suggest going with CPUs.
Application Specific Integrated Circuits, or ASICs, are on the horizon next for processor design—single-purpose chips customized specifically for one type of function.
One great example of ASICs is a class of processors that are being developed for Bitcoin mining and involve solving mathematically intensive “hash” computations to find blocks (in a blockchain). One of the more notable Bitcoin ASICs is BitFury’s, whose 16nm multi-core ASIC can achieve over 80 GH/s, or over 80 billion hashes per second.
In a recent paper, Deloitte Global predicts that by the end of 2018, over 25% of all chips to accelerate machine learning will be ASICs (and FPGAs). When this happens, machine-learning enabled applications are likely to cause big changes in industry while expanding to new areas.
Li, Yang, Feng, Zhou, Chakradhar, “Optimizing Memory Efficiency for Deep Convolutional Neural Networks on GPUs”, North Carolina State University, Department of Electrical and Computer Eng, October 12, 2016.
Schlegel, Daniel. “Deep Machine Learning on GPUs”, University of Heidelberg, January 28, 2015.
Tang, Shang, “A List of Chip/IP for Deep Learning”, Medium, August 11, 2017.
TensorRT Team at Nvidia, “TensorRT 4: What’s New”, Nvidia Developer, June 19, 2018.
Freund, Karl, “Will ASIC Chips Become the Next Big Thing in AI?”, Forbes, August 4, 2017.
“The Next Generation of Machine Learning Chips”, Deloitte Global, December, 2017.