Applications and infrastructure evolve in lock-step. That point has been amply made, and since this is the AI regeneration era, infrastructure is both enabling AI applications to make sense of the world and evolving to better serve their needs.
As things usually go, the new infrastructure stack to power AI applications has been envisioned and given a name -- Infrastructure 3.0 -- before it is fully fledged. We set off to explore both the obvious, here and now, and the less obvious, visionary parts of this stack.
In order to keep things manageable, we will limit ourselves to "specialized hardware with many computing cores and high bandwidth memory" and call it AI chips for short. We take a look at how these AI chips can benefit data-centric tasks, both in terms of operational databases and analytics as well as machine learning (ML).
Let us commence on the first part of this journey with the low-hanging fruit: GPUs and FPGAs.
In order to keep things manageable, we will limit ourselves to "specialized hardware with many computing cores and high bandwidth memory" and call it AI chips for short. We take a look at how these AI chips can benefit data-centric tasks, both in terms of operational databases and analytics as well as machine learning (ML).
Let us commence on the first part of this journey with the low-hanging fruit: GPUs and FPGAs.
GPUs
Graphical Processing Units (GPUs) have been around for a while. Initially designed to serve the need for fast rendering, mainly for the gaming industry, the architecture of GPUs has proven a good match for machine learning.
Essentially GPUs leverage parallelism. This is something CPUs can do as well, but as opposed to general-purpose CPUs, the specialized nature of GPUs has enabled them to continue to evolve at a pace that keeps up with Moore's law. Nvidia, the dominant player in the GPU scene, recently announced a new set of GPUs based on an architecture called Turing.
Lest we forget, the new Nvidia GPUs actually bring improvements for graphics rendering. But, more importantly for our purposes, they pack Tensor Cores, the company's specialized architecture for machine learning, and introduces NGX. NGX is a technology which, as Nvidia puts it, brings AI into the graphics pipelines: "NGX technology brings capabilities such as taking a standard camera feed and creating super slow motion like you'd get from a $100,000+ specialized camera."
That may not be all that exciting if you are interested in general-purpose ML, but the capabilities of the new Nvidia cards sure are. Its prices, however, definitely reflect their high-end nature, ranging from US$2.5K to $10K.
But it takes more than a hardware architecture to leverage GPUs -- it also takes software. And this is where things have gone right for Nvidia, and wrong for the competition, such as AMD. The reason Nvidia is so far ahead in the use of GPUs for machine learning applications lies in the libraries (CUDA and cuDNN) needed to use GPUs.
Although there is an alternative software layer that can work with AMD GPUs, called OpenCL, maturity and support for it are not at par with Nvidia's libraries at this point. AMD is trying to catch up, and it also competes on the hardware front, but there is a bigger point to be made here.
In order to benefit from AI chips, the investment required goes beyond the hardware. A software layer that sits on top of these chips to optimize code running on them is required. Without it, they are practically unusable. But learning how to make use of this layer is also needed.
We already mentioned how GPUs are currently the AI chip of choice for ML workloads. Most popular ML libraries support GPUs -- Caffe, CNTK, DeepLearning4j, H2O, MXnet, PyTorch, SciKit, and TensorFlow to name just a few. In addition to learning the specifics of each library, building it for GPU environments is often needed too.
As for plain-old data operations and analytics -- one word: GPU databases. A new class of databases systems have been developed with the goal of utilizing GPU parallelism under the hood to bring the benefits of off-the-shelf hardware to mainstream application development. Some of the options in this space are BlazingDB, Brytlyt, Kinetica, MapD, PG-Strom, and SQream.
FPGAs
Field Programmable Gate Arrays (FPGAs) are not really new either -- they have been around since the 80s. The main idea behind them is that, as opposed to other chips, they can be reconfigured on demand. You may wonder how is this possible, how does this make them specialized, and what are they good for.
FPGAs can be simplistically thought of as boards containing low-level chip fundamentals, such as AND and OR gates. FPGA configuration is typically specified using a hardware description language (HDL). Using this HDL the fundamentals can be configured in a way that matches the requirements of specific tasks or applications, in essence mimicking application-specific integrated circuits (ASICs).
Having to reprogram your chips via HDL for every different application sounds complex. So again, the software layer is crucial. According to Jim McGregor, principal analyst with Tirias Research, "the toolset to build FPGAs is still ancient. Nvidia has done well with the CUDA language to leverage GPUs. With FPGA it's still kind of a black art to build an algorithm efficiently."
But that may be changing. Originally it was Intel who showed interest in FPGAs, acquiring Altera, one of the key FPGA manufacturers. It is possible that this is Intel's way of pushing into the AI chips world, which will be increasingly important, after having been left behind in the GPU battle. But, complexity aside, can FPGAs compete?
Intel recently published research evaluating emerging deep learning (DL) algorithms on two generations of Intel FPGAs (Intel Arria10 and Intel Stratix 10) against the NVIDIA Titan X Pascal GPU. The gist of this research was that Intel Stratix 10 FPGA outperforms the GPU when using pruned or compact data types versus full 32 bit floating point data (FP32).
What this means in plain english is that Intel's FPGAs could compete with GPUs, as long as low precision data types are used. This may sound bad, but it is actually an emerging trend in DL. The rationale is to simplify calculations, while maintaining comparable accuracy.
That may well mean that there is a bright future in using FPGAs for ML. Today, however, things do not look that bright. In verification of McGregor's statement, there does not seem to be a single ML library that supports FPGAs out of the box. There is work under way to make using FPGAs possible with TensorFlow, but precious little else besides that.
Things are different when it comes to data operations and analytics however. Recently Intel presented some of the partners it works with for FPGA-accelerated analytics. Swarm64 looks like the most interesting among them, promising immediate speedup of up to 12 times for PostgreSQL, MariaDB, and MySQL. Other options are rENIAC, offering what it says is a times-13 accelerated version of Cassandra, and Algo-Logic, with its custom key-value store.
Hard choices, in the cloud and on-premise
As usual, there is an array of hard choices to be made with emerging technology, and hardware is no exception. Should you build your own infrastructure, or use the cloud? Should you wait until offerings become more mature, or jump onboard now and reap the early adopter benefits? Should you go for GPUs, or FPGAs? And then, which GPU or FPGA vendor?
When discussing GPU databases with fellow ZDNet contributor and analyst Tony Baer, for example, Baer opined that none of them have a future on their own. That is because, according to Baer, the economics of GPUs are such that only cloud providers will be able to accumulate them at scale, therefore GPU database vendors will be eventual targets for acquisition by cloud-based databases.
In fact, one such acquisition, that of Blazegraph by AWS, has already transpired. But while that does make sense, it's not the only plausible scenario. If we're talking about acquisitions, it's entirely possible that GPU databases could be acquired by non-cloud database vendors who will want to bring such capabilities to their products.
It is also possible that some GPU database vendors will come into their own. GPU databases may seem less mature compared to incumbents now, but the same could be said for many NoSQL solutions 10 years ago. GPU databases seem like a tempting option for everyday operations and analytics, although the question remains as to whether the cost of replacing existing systems is outweighed by the gains in performance.
Swarm64 and rENIAC, on the other hand, are FPGA offerings that promise to leave your existing infrastructure as untouched as possible, especially in the case of Swarm64. Although their maturity remains an open question, the idea of "simply" adding hardware to your existing database and getting a much better performance out of it sounds promising.
As far as the GPU versus FPGA question is concerned, GPUs seem to have a wider and more mature ecosystem, but FPGAs offer superior flexibility. It has also been suggested that FPGAs may offer a better performance/consumption ratio, and that going forward GPUs may have trouble keeping up with low precision data types, as they would have to redesign extensively to support this.
In terms of what GPU or FPGA vendor to choose, the options are intertwined with the cloud or on premise question. GPUs are on offer on AWS, Azure, Google Cloud, all of which use Nvidia for their GPU-enabled instances. FPGAs, on the other hand, are on offer on AWS (EC2 F1 powered by Xilinx) and Azure (Project Brainwave powered by Intel), but not on Google Cloud.
AWS does not seem to provide ML-specific facilities for F1. Microsoft lets users deploy trained ML models, but there is not much information on how to train such models on FPGA-powered instances. Google, for its part, is throwing its weight behind its custom TPU chips.
For the million dollar question -- should you go cloud or build your own infrastructure -- the answer may be not that different from what applies in general: it depends.
If you use your infrastructure enough, perhaps it would make sense to invest in buying and installing, but for occasional use the cloud seems like a better fit. For other cases it might as well be mix-and-match.
And a special note: if you have a Hadoop cluster, it may make sense to add GPU or FPGA capabilities to it, as Hadoop has just been upgraded to support both options.
Of course, we have not covered all options -- these are neither the only clouds, nor the only AI chips in town. This is a nascent area with many emerging players, and we will be revisiting it soon.
No comments:
Post a Comment