Google Says Its Machine Learning Chip Leaves CPUs, GPUs in the Dust

Without its custom-designed chip for machine learning, Google might have had to build many more data centers to support the intensive computing demands of services such as Image Search, Google Photos and more.

Yesterday Google revealed new performance details about its Tensor Processing Unit (TPU) during a National Academy of Engineering presentation at the Computer History Museum in Mountain View, Calif. It said the TPU has shown in tests that it can outperform CPUs and GPUs in data centers, enabling faster and more energy-efficient computing for complex demands.

As it applied machine learning capabilities to more of its products and applications over the past several years, Google said it realized it needed to supercharge its hardware as well as its software. It launched a "stealthy" project to develop a custom accelerator, and last May revealed that it had been successfully running Tensor Processing Units in its data centers for more than a year.

Faster Chip or Twice as Many Data Centers

"The need for TPUs really emerged about six years ago, when we started using computationally expensive deep learning models in more and more places throughout our products," Norm Jouppi, a distinguished hardware engineer at Google, wrote yesterday on Google's Cloud Platform Blog. "The computational expense of using these models had us worried."

According to Jouppi, the company calculated that Google users employing voice search for just three minutes a day would have overwhelmed existing data center processors running deep neural nets for speech recognition. Keeping up would have required Google to double its current number of data centers, he said.

Google currently operates 15 major data centers around the globe, with eight located in the U.S. and the rest scattered across Europe, Asia and South America.

The company launched its high-priority TPU development project with the goal of boosting computation performance in its data centers to 10 times that of existing GPUs (graphics processing units), according to a research study released by Google yesterday. Tests show the TPU is an average of 15 to 30 times faster than a contemporary, server-class Intel Haswell CPU (central processing unit) and an Nvidia K80 GPU, according to the study.

Enabling Responses 'in Fractions of a Second'

Google's TPU is an application-specific integrated circuit that was tailored for TensorFlow, the company's machine learning-focused, open source software library. When it announced an updated version of TensorFlow in mid-February, Google said the library is now being used in projects ranging from language translation to cancer detection.

In his blog post yesterday, Jouppi said Google's first-generation TPUs "allow us to make predictions very quickly, and enable products that respond in fractions of a second."

More than 70 authors contributed to the research report detailing the TPU's performance in Google's data centers. The report is scheduled to be presented this June at the International Symposium on Computer Architecture in Toronto. Jouppi noted that Google plans to provide more updates about its TPU in the coming weeks and months.

"TPUs are behind every search query; they power accurate vision models that underlie products like Google Image Search, Google Photos and the Google Cloud Vision API; they underpin the groundbreaking quality improvements that Google Translate rolled out last year; and they were instrumental in Google DeepMind's victory over Lee Sedol, the first instance of a computer defeating a world champion in the ancient game of Go," he said.