March 20, 2017
Despite the fact that Google has developed its own custom machine learning chips, the company is well-known as a user of GPUs internally, particularly for its deep learning efforts, in addition to offering GPUs in its cloud.
At last year’s Nvidia GPU Technology Conference, Jeff Dean, Senior Google Fellow offered a vivid description of how the search giant has deployed GPUs for a large number of workloads, many centered around speech recognition and language-oriented research projects as well as various computer vision efforts. What was clear from Dean’s talk—and from watching other deep learning shops with large GPU cluster counts—is that the real innovation is around tightening deep learning algorithms and frameworks to hum on GPU clusters.
The Google Brain team, which focuses on many of the areas cited above, is working on software tuning to keep pushing the limits of GPU backed machine learning. Most recently, a group there has put together a detailed analysis of architectural hyperparameters for neural machine translation systems, an effort that required more than 250,000 GPU hours on their in-house cluster, which is based on Nvidia Tesla K40m and Tesla K80 GPUs, distributed over 8 parallel workers and 6 parameter servers. New work will help push current hardware and software approaches to the NMT problem beyond current limitations.
“One shortcoming of current NMT architectures is the amount of compute required to train them. Training on real-world datasets of several million examples typically requires dozens of GPUs and convergence time is on the order of days to weeks. While sweeping across large hyper-parameter spaces is common in computer vision, such exploration would be prohibitively expensive for NMT models, limiting researchers to well-established architectures and hyper-parameter choices.”
Even though GPUs are best suited to the NMT task for the Google Brain team, making the models more comprehensive and richer requires co-design and optimization, something the researchers focused on during the architectural analysis. This analysis, while focused on architecture, goes far beyond hardware. The architectural conditions that provided the best performance have been packaged up and provided as an open source software framework based on TensorFlow to allow reproducible sequence-to-sequence model deployments.
In the new work, the team showed how small changes in the values of hyper-parameters and different initializations can affect results. “We purposely built this framework to enable reproducible state of the art implementations of Neural Machine Translation architectures. As part of our contribution we are releasing the framework and all configuration files needed to reproduce our results,” the Google Brain researchers explain. The code, found on GitHub, is “a modular software framework that allows researchers to explore novel architectures with minimal code changes, and define experimental parameters in a reproducible manner.” They note that while they have achieved good results with this approach, the same framework can extend to other problems, including summarization, conversational modeling and image-to-text.
On the WMT’14 English-to-French and English-to-German benchmarks, Google’s Neural Machine Translation achieves competitive results to state-of-the-art. Using a human side-by-side evaluation on a set of isolated simple sentences, it reduces translation errors by an average of 60% compared to Google’s phrase-based production system.
Google’s internal neural machine translation work was made public at the end of 2016 and is the driving neural network force behind Google Translate. Using millions of training examples, the translation service is able to pick up on nuances that go far beyond simply providing word-by-word literal translations, grabbing semantics of full sentences. This is similar to the services Chinese search giant Baidu provides and like Google, GPUs are at the heart of much of the research work there, as we have described in depth in the past.