How to deploy/serve custom models on AI Platform (Unified)

A question arose while solutioning for a ML use case. “What is the best way to host and serve custom models on GCP?” Among the options was the AI Platform (Unified) which was released for preview on November 16, 2020. I set out to test the platform with the requirements of this use case in mind. The requirement was to have a platform to train, deploy and serve mobile-high accuracy models on the cloud and AI Platform seems to fit the bill perfectly.

What exactly is Google AI Platform (Unified)?

According to Google’s official documentation:

AI Platform (Unified) brings AutoML and AI Platform (Classic) together into a unified API, client library, and user interface. AutoML allows you to train models on image, video, and tabular datasets without writing code, while training in AI Platform (Classic) lets you run custom training code. With AI Platform (Unified), both AutoML training and custom training are available options. Whichever option you choose for training, you can save models, deploy models and request predictions with AI Platform (Unified).

AI Platform* can be used to manage the following stages in the ML workflow:

  1. Define and upload a dataset.
  2. Train an ML model:
    1. Train model
    2. Evaluate model accuracy
    3. Tune hyperparameters (custom training only)
  3. Upload and store your model in the AI Platform.
  4. Deploy your trained model and get an endpoint for serving predictions.
  5. Send prediction requests to your endpoint.
  6. Specify a prediction traffic split in your endpoint.
  7. Manage your models and endpoints.

This blog focuses on stages 3, 4 and 5 above. If you are interested in additional information, click here for the documentation.

Model Deployment

Models can be deployed on AI Platform and served using an endpoint. Models that are not trained on AI Platform can also be deployed for prediction.
For the purposes of this demo, a model was trained on AutoML as a “Mobile Best Trade-Off” model type. The resulting saved model.pb is exported to Google Cloud Storage.

Deploying a model to the AI Platform

  1. Create a new Project
  2. From the Navigation Menu, select AI Platform (Unified) under ARTIFICIAL INTELLIGENCEAI Platform (Unified) under Artificial Intelligence
  3. Enable AI Platform APIEnable AI Platform API
  4. Select Models from the Navigation MenuSelect Models from the Navigation Menu
  5. Click IMPORT, to import a new model Enter the model name.Import Model
  6. Select model settings. For this demo, Tensorflow 1.15 version was selected. Browse to the GCS path where the saved_model.pb file is stored. Predict schema can be left blank.Select model settings
  7. Deploy the model to an endpointThe model should be deployed to an endpoint in order to serve online predictions. Batch predictions can be set up without an endpoint. This demo focuses on setting up online predictions. Select “Deploy to Endpoint”Deploy the model to an endpoint
  8. Name the endpoint and select settings.
    More than one model can be deployed to an endpoint and the Traffic split field can be used to split prediction request traffic between the models. Each model can also be deployed to multiple endpoints. For this demo, we deploy one model to one endpoint leaving the Traffic split as default.Minimum compute nodes can be set as 1 (default) or more. These compute nodes run even when there is no traffic. Deploy/undeploy section below highlights a way to programmatically deploy models when there is demand and undeploy when there is no traffic demand.Maximum compute nodes is an optional field which will enable autoscaling.Name the endpoint and select settings
  9. Select “Deploy”Select Deploy

Sending an online prediction request

Deploying the model to an endpoint enables the model for online predictions. A sample request is provided on the console for reference.

Request an online prediction by sending input data instances as a JSON string in a predict request. Below are gcloud and Python sample codes for formatting prediction requests.

gcloud command

Convert the image file into json
For online prediction send a JSON request like:

{‘instances’ : [
{‘key’: ‘first_key’, ‘image_bytes’ : {‘b64’: …}},
{‘key’: ‘second_key’, ‘image_bytes’: {‘b64’: …}}
]
}

1.To format the request as above, convert the image using base64 saving the output as json

with open(“sample_image.jpg”, “rb”) as f:
file_content = f.read()
base64.b64encode(file_content).decode(‘utf-8’)

2.Use gcloud command to send request to the endpoint

gcloud beta ai endpoints predict <endpointID> –region=us-central1   –json-request=sample_image.json

Endpoint ID is found in the sample request.

Python

The code below showcases the prediction request in Python. The input image is converted into a payload and passed to the request.

import base64
from google.cloud import aiplatform
import os
def predict_custom_model_sample(endpoint: str, instance: dict, parameters_dict: dict):
client_options = dict(api_endpoint=”us-central1-prediction-aiplatform.googleapis.com”)
client = aiplatform.gapic.PredictionServiceClient(client_options=client_options)
from google.protobuf import json_format
from google.protobuf.struct_pb2 import Value
# The format of the parameters must be consistent with what the model expects.
parameters = json_format.ParseDict(parameters_dict, Value())
# The format of the instances must be consistent with what the model expects.
instances_list = [instance]
instances = [json_format.ParseDict(s, Value()) for s in instances_list]
response = client.predict(
endpoint=endpoint, instances=instances, parameters=parameters
)
print(“response”)
print(” deployed_model_id:”, response.deployed_model_id)
predictions = response.predictions
print(“predictions”)
for prediction in predictions:
print(” prediction:”, dict(prediction))
os.environ[‘GOOGLE_APPLICATION_CREDENTIALS’] = “credential.json”
with open(“sample_image.jpg”, “rb”) as f:
file_content = f.read()
payload = {“key”: “first_key”, “image_bytes”: {“b64”: base64.b64encode(file_content).decode(‘utf-8’)}}
params = {}
predict_custom_model_sample(
“projects/<project_id>/locations/us-central1/endpoints/<endpoint_id>”,
payload,
params
)

Response from online prediction

The response from the above request is depicted below. The prediction returns the labels and the confidence score for each label from the model. Unlike AutoML online predictions, the response contains confidence scores for all the labels. This helps in defining a more strategic pipeline logic.

response
deployed_model_id: 3811910056476147712
predictions
prediction: {‘key’: ‘first_key’, ‘labels’: [string_value: “sunflowers”
, string_value: “tulips”
, string_value: “dandelion”
, string_value: “roses”
, string_value: “daisy”
], ‘scores’: [number_value: 0.0357748382
, number_value: 0.0445411801
, number_value: 0.0346666723
, number_value: 0.0520200431
, number_value: 0.973080397
]}

Deploy and undeploy models

Deploying models to endpoints consumes compute resources even when there is no traffic demand. In order to avoid the associated costs, the models can be undeployed and deployed as needed.

Deploy model

Endpoint can be created on the AI Platform console. Deploying the model to the endpoint can be accomplished by running the following gcloud command:

gcloud beta ai endpoints deploy-model <endpoint_id>\
–region=us-central1 \
–model=<model_id> \
–display-name=<display_name> \
–machine-type=<machine_type> \
–min-replica-count=1 \
–traffic-split=0=100

EndpointID and modelID can be found on the console.

Undeploy model

gcloud beta ai endpoints describe <endpoint_id>\
–region=us-central1
Response from the above command:
Using endpoint [https://us-central1-aiplatform.googleapis.com/]
createTime: ‘2020-12-09T17:01:58.195128Z’
deployedModels:
– createTime: ‘2020-12-09T17:01:58.195128Z’
dedicatedResources:
machineSpec:
machineType: n1-standard-4
maxReplicaCount: 1
minReplicaCount: 1
displayName: <model_display_name>
id: <deployed_model_id>
model: projects/<project_id>/locations/us-central1/models/<model_id>
displayName: <endpoint_display_name>
etag: AMEw9yNxgpwB03iUKZtjORl5fTuzmrEMHLJeqbTHh-flyKNWcsmoreNQdND9T8HlCwl6
name: projects/<project_id>/locations/us-central1/endpoints/<endpoint_id>
trafficSplit:
<deployed_model_id>: 100
updateTime: ‘2020-12-09T17:10:07.852971Z’

Using the <deployed_model_id> from above response to undeploy model:
gcloud beta ai endpoints undeploy-model <endpoint_id>\
–region=us-central1 \
–deployed-model-id=<deployed_model_id>

Undeploying the model from the endpoint, retains both the model and the endpoint for deploying in the future. Deployed-Model-ID is not found on the console and can be retrieved by running the describe command on the endpoint.

Summary

Our final takeaway is that AI Platform provides a single pane of glass for defining datasets, training models, importing custom models, deploying models to endpoints, running online and batch predictions. It provides all the AI tools in one platform making MLOps a breeze.

*AI Platform in this document refers to the Unified version unless mentioned.