Deploying models is always a headache. It's hard enough to develop a simple model, but it's even harder to deploy it. Fortunately, there are a lot of friendly deployment tools that have already been developed for AI developers. Let's take a look at TorchServe, which has no right answer but is still highly useful, and actually deploy it. Before we get into the actual deployment process, let's take a look at the overview of TorchServe.
TorchServe was developed by facebook and aws and was born for model services. Since SageMaker runs on top of TorchServe, using it directly will allow for more autonomous and reliable model delivery. TorchServe was created to make it easier to serve models developed with pytorch and has a python version dependency. (python >= 3.8) It works in both docker and anaconda environments. In addition to SagaMaker, TorchServe is used by Kubeflow, MLflow, Kserve, and Vertex AI.
You can learn more on the TorchServe GitHub.
TorchServe is a handy dandy to help you deploy, so naturally it provides a number of APIs.When you convert a model to the way TorchServe requires, and run a server based on it, the communication between the model and the user is done with the APIs predetermined by TorchServe. The APIs include not only inference but also management and load balancer functions, so if you learn them and use them freely, the pain of deployment will be greatly reduced. You can see examples of actual API usage by accessing the links in the list below.
We provide a full-featured API for managing models at runtime. This means that once torchserve is running, you can set basic deployment-related settings besides model parameters through real-time API calls.
The inference API is available on port 8080 and is only accessible from localhost by default. You can change this default setting through TorchServe Configuration. The Inference API is literally a collection of APIs for inference and provides the following APIs.