Tutorial for a Segment Anything Torchserve deployment with code
2023
.
06
.
07
by
Jisu Yu
Let's take a look at deploying the Segment Anything model with Torchserve with code. This post will be code-centric, so if you're interested in learning more about Torchserve, we recommend reading our Torchserve series on our blog. Let's get started.
Write a Torchserve handler
Let's write the config.properties and handler.py that we'll need for inferencing.
The address port in config.properties will set up port forwarding for future docker runs so that we can infer outside of docker. The gpu_id and batch_size are values that go into the properties:dict variable in handler.py.
The basic code structure of handler.py is the same as the example code provided by Torchserve. However, we've added the process of base64 encoding the inference result in post-processing. It's fine to return it as a torch.Tensor or np.array, but we'll export it as a string type for convenient communication later.
First, let's create a DockerFile. You can choose any version of the base image, and we'll install the jdk since you'll need it to use Torchserve. Finally, we'll install the necessary libraries. You can create a requirements.txt file and install it with RUN pip install -r requirements.txt if you're comfortable.
FROM pytorch/pytorch:1.13.1-cuda11.6-cudnn8-runtime
WORKDIR /workspace
RUN mkdir datahunt_segment_anything
# Install OpenJDK-11
RUN apt-get update && \
apt-get install -y git && \
apt-get install -y openjdk-11-jre-headless && \
apt-get clean;
RUN apt-get -y install libgl1-mesa-glx && \
apt-get install -y curl;
ADD datahunt_segment_anything /workspace/datahunt_segment_anything
RUN pip install git+https://github.com/facebookresearch/segment-anything.git
RUN pip install opencv-python matplotlib onnxruntime onnx
RUN pip install torchserve torch-model-archiver torch-workflow-archiver nvgpu validators tensorflow-cpu
Now let's build it.
docker build -t segment-anything:v1 .
Next, we'll create a container and set up ports, and if necessary, set up volume mounting so we can see what we're doing locally in real time. But since we're going to be using the finished code, we won't set it up. -p 8070 is the port of the inference_address we set in config.properties above.
docker run -it --gpus all --name segment-anything-v1 -p 8070:8070/tcp segment-anything:v1 /bin/bash
See that you're connected to the container? Now that we're all set for deployment, let's create a .mar file and run Torchserve.
Deploying the Segment Anything model
The current working directory is /workspace/datahunt_segment_anything, and the internal folder structure is as follows.
Here's the script for deployment, you can run it one by one, but I'm going to write it as a shell script for future iterations. Since we're going to be using it for redistribution, I've added code to shut down the server if it's running, and delete any existing MARs that were created. If you don't need to do that, feel free to delete them.
#!/bin/bash
torchserve --stop
version=1.0
model_name="SAM"
model_path='../model/sam_vit_l.pth'
if [ -e ${model_name}.mar ]; then
rm ${model_name}.mar
echo 'Removed existing model archive.'
fi
# Create mar file
torch-model-archiver --model-name ${model_name} --version ${version} --serialized-file ${model_path} --handler "handler.py"
# Depolyment
torchserve --start --model-store . --models ${model_name}.mar --ts-config ./config.properties
Now run the torchserve_deploy.sh file we wrote above, and the model deployment is complete. Let's infer the model. Let's prepare the desired sample and run it as shown below.
curl -X POST "http://127.0.0.1:8070/predictions/SAM/1.0" -T "test/sample/sample.jpg"
Then you can see the result is output in the format you set in handler.py, as shown below.
handler inference results (base64 encoding)
Segment Anything Promptable Task Reasoning Test
This is the process of fetching the pre-picked embedding results and proceeding with the Promptable Task. I used pre-existing values for the point and bounding box coordinates. I used Python code here, but the Github Demo has code for reasoning in other languages.
So far, we've deployed and inferred Segment-Anything's Image Encoder model with Torchserve,
and we're going to proceed to deploy CircleCI so that we can automate the rest of the process. Stay tuned for the final part of this series, Exploring Datahunt's SAM features [3 of 3].