Now that we've had an introduction to TorchServe, let's take a look at how to actually utilize it in PyTorch. Let's go through the actual deployment process and understand the usage of TorchServe's components, Handler and mar files.
When it comes to deploying your model via TorchServe, Handlers are a must-have. A Handler is the little guy that you define in a way that TorchServe can understand in order to deploy your carefully crafted model. TorchServe provides a basic example of a Handler (the built-in handler), but as with most tools that wrap models, the built-in handler is often not a perfect fit for your model, so you'll need to create aCustomHandler.
As a developer working with AI models, when you come across code that wraps a model, you can usually guess what it does just by looking at the function name. The same goes for handlers. The BaseHandler in base_handler.py provided by TorchServe is an abstract class, and when interpreted with the comments, it's easy to intuitively understand what each method does, and all we need to do is adapt it to our own needs. TorchServe has even been kind enough to demonstrate a real-world example of using this base_handler as a demo. (Link to the actual Transformer Handler code)
Let's take a look at the components of theBaseHandler, and the methods that you'll most likely use.
This method can be overridden to declare a model or set config. It has the same meaning as init.
As described, this is the entry point for API calls after deployment through TorchServe. It contains all methods such as preprocess, inference, and post process, so you can freely customize it to deploy your model in the form you want.
It's really simple because you only need to override initialize and handle to implement the desired deployment.
TorchServe mar file
Now that we have a general idea of what TorchServe is and the handlers we need to deploy, we're not quite done yet. We need to convert our model to TorchScript and assemble the necessary files to create a.mar file.
What is TorchScript?
Many AI developers use pytorch for model development. This is because it provides a lot of convenient features and allows for intuitive development. However, it also comes with a lot of wrapping to present it in a human-friendly form, which naturally causes speed issues.For deployment, you need to convert your model from Pytorch to something that can run in a high-performance environment, such as c++. We won't go into that process here, but you can refer to the official TorchScript documentation or Huggingface's examples for more information.
Creating a mar file
You need a .mar file to run Torchserve. The idea is to create a handler, serialize the model, and prepare any other necessary code or files, and then bundle them into a .mar file. When you run Torchserve, it will take this .mar and build the server by itself. The command to create a.mar file is torch-model-archiver. Of course, you'll need to install torch-model-archiver using the instructions in the official TorchServe git repository. Here are the options you'll want to use when running this command
The model name. Setting it to your liking will create a file with that model name.MAR.
Torchserve has the ability to manage versions.It's not required, but it's kind of like a tag. Just enter the version you want
This is where your script or traced model goes.
A handler you've written based on your needs.
Various external files needed to run the model should be placed here. If you use a separate script from the handler, you should include it here as well.
After adding the necessary options and running the torch-model-archiver command, it shouldn't take long to generate a .mar file.