Deploying models is always a headache. It's hard enough to develop a simple model, but it's even harder to deploy it. Fortunately, there are a lot of friendly deployment tools that have already been developed for AI developers. Let's take a look at TorchServe, which has no right answer but is still highly useful, and actually deploy it. Before we get into the actual deployment process, let's take a look at the overview of TorchServe.
What is TorchServe?
TorchServe was developed by facebook and aws and was born for model services. Since SageMaker runs on top of TorchServe, using it directly will allow for more autonomous and reliable model delivery. TorchServe was created to make it easier to serve models developed with pytorch and has a python version dependency. (python >= 3.8) It works in both docker and anaconda environments. In addition to SagaMaker, TorchServe is used by Kubeflow, MLflow, Kserve, and Vertex AI.
You can learn more on the TorchServe GitHub.
APIs provided by TorchServe
TorchServe is a handy dandy to help you deploy, so naturally it provides a number of APIs.When you convert a model to the way TorchServe requires, and run a server based on it, the communication between the model and the user is done with the APIs predetermined by TorchServe. The APIs include not only inference but also management and load balancer functions, so if you learn them and use them freely, the pain of deployment will be greatly reduced. You can see examples of actual API usage by accessing the links in the list below.
1.Model Management API
We provide a full-featured API for managing models at runtime. This means that once torchserve is running, you can set basic deployment-related settings besides model parameters through real-time API calls.
- Register a model - Register a new model
- Increase/decrease number of workers for specific model - control the number ofworkers utilized for a specific model
- Describe Model Status - describes the running status of the model. Focuses on parameters that can be controlled in TorchServe
- Unregister a model - Unregisters a model
- List registered models - List registered models
- Set default version of a model - Set default model
The inference API is available on port 8080 and is only accessible from localhost by default. You can change this default setting through TorchServe Configuration. The Inference API is literally a collection of APIs for inference and provides the following APIs.
- API Description - Lists the inference API addresses.
- Health check API - Returns the health of the model server. Acts like a ping.
- Predictions API - This API literally returns the results of model predictions.
- Explanations API - returns a description of the model (I'd need to see a more specific usage example).
- KServe Inference API - KServe Inference
- KServe Explanations API - KServe Explanation