Machine Learning on Tensor Processing Unit

TPU SizeGoogle announced last year that they were going to build two hardware products designed around the Edge TPU (Tensor Processing Unit). An Edge TPU is Google’s purpose-built ASIC designed to run AI at the edge. It delivers high performance in a small physical and power footprint, enabling the deployment of high-accuracy AI at the edge. An Edge TPU is as very small and several can fit on a penny.

What’s different with TPU?

During the Tensorflow Dev Summit 2019, we had the opportunity to get access to a TPU. The Coral USB Accelerator is developed by Google as an Edge device which currently only works with Debian 6.0+ or any of its derivatives like Ubuntu or Raspbian with USB 3.0.

Compared to normal deep learning models, a TPU only supports Tensorflow Lite models that are 8-bit quantized and then compiled for the Edge TPU. The following image provides the process to convert a normal TensorFlow based model to an Edge TPU compiled model.

TensorFlow model
Coral provides several quantized models along with a labels file that can be used for image classification as well as object detection. The models can be downloaded from the following URL: https://coral.withgoogle.com/models/

There are some model requirements to use on Edge TPU:

  • Tensor parameters are quantized (8-bit fixed-point numbers). You must use quantization-aware training (post-training quantization is not supported).
  • Tensor sizes are constant at compile-time (no dynamic sizes).
  • Model parameters (such as bias tensors) are constant at compile-time.
  • Tensors are either 1-, 2-, or 3-dimensional. If a tensor has more than 3 dimensions, then only the 3 innermost dimensions may have a size greater than 1.

Setting up the device

For demonstration purposes, we used an Ubuntu 18.04 based device. It is pretty easy to set up Coral USB Accelerator. For setup, run the following commands on the terminal:


During installation, there is a question, “Would you like to enable the maximum operating frequency?” Enabling this option improves the inferencing speed but it also causes the USB Accelerator to become very hot to the touch during operation and might cause burn injuries. For safety purposes, we set this option to No.
Coral has already provided few scripts for image classification, object detection both for images and camera input. We can find these scripts on the following paths after the installation:


The simplest way to test image classification is to download the quantized model as well as the image for classification/object detection. We’ve downloaded the quantized model for bird image classification and sample macaw image.

Setting up device

 

Setting up device
The inference happens in 0.02 seconds with an accuracy of 0.77.

Comparing with the GPU

Comparing with GPUWe tried comparing the same model without quantization and found that the inference was happening in 0.2 seconds which is 10 times slower than a TPU. However, there is a significant increase in the accuracy level. (From 0.77 to 0.96).

Thus, we see that the Edge TPU is almost 10 times faster compared to the GPU but compromises on the accuracy. The difference is due to the quantization of the models used in the TFLite version of the model. Quantizing your model means converting all the 32-bit floating-point numbers (such as weights and activation outputs) to the nearest 8-bit fixed-point numbers. This makes the model smaller and faster. The image representation for quantization is shown below:

Quantization

Inference on live feed on TPU

Coral also supports live inference using the capture scripts present in the demo folder. Since the dev board works only with the raspberry-pi camera, we tweaked the script a bit to make it work with the Webcam on the laptop using imutils package. By default, the maximum number of objects that are detected in any frame is set to 3. We can change this by altering the maxobjects parameter. We also altered the text style and color of object detection text.

Inference on live feed

 

TPU object detection

Summary

During the project, we were able to:

  • Understand about TFLite models and the quantization process
  • Run TPU based models on an edge device and compare the performance with the GPU.
  • Understand about training the models on device. This is useful for an edge appliance as inference can be done on premise.
  • Hacking original scripts provided by Coral TPU to make it work with camera feed other than raspberry pi.

Want More Information?

If you are looking for additional TensorFlow resources, check out these additional resources:

If you still have questions drop us a line at info@SpringML.com or tweet us @springmlinc.