Welcome to CLIP-as-service!#

CLIP-as-service is a low-latency high-scalability service for embedding images and text. It can be easily integrated as a microservice into neural search solutions.

⚑ Fast: Serve CLIP models with TensorRT, ONNX runtime and PyTorch w/o JIT with 800QPS[*]. Non-blocking duplex streaming on requests and responses, designed for large data and long-running tasks.

🫐 Elastic: Horizontally scale up and down multiple CLIP models on single GPU, with automatic load balancing.

πŸ₯ Easy-to-use: No learning curve, minimalist design on client and server. Intuitive and consistent API for image and sentence embedding.

πŸ‘’ Modern: Async client support. Easily switch between gRPC, HTTP, WebSocket protocols with TLS and compression.

🍱 Integration: Smooth integration with neural search ecosystem including Jina and DocArray. Build cross-modal and multi-modal solutions in no time.

[*] with default config (single replica, PyTorch no JIT) on GeForce RTX 3090.

Try it!#

An always-online demo server loaded with ViT-L/14-336px is there for you to play & test:

curl \
-X POST https://demo-cas.jina.ai:8443/post \
-H 'Content-Type: application/json' \
-d '{"data":[{"text": "First do it"}, 
    {"text": "then do it right"}, 
    {"text": "then do it better"}, 
    {"uri": "https://picsum.photos/200"}], 
pip install clip-client
from clip_client import Client

c = Client('grpcs://demo-cas.jina.ai:2096')

r = c.encode(
        'First do it',
        'then do it right',
        'then do it better',


PyPI is the latest version.

Make sure you have Python 3.7+. You can install client and server independently. You don’t have to install both: e.g. installing clip_server on a GPU machine and clip_client on a local laptop.

pip install clip-client
pip install clip-server
pip install "clip_server[onnx]"
pip install nvidia-pyindex 
pip install "clip_server[tensorrt]"

Quick check#

After install, you can run the following commands for a quick connectivity check.

Start the server#

python -m clip_server
python -m clip_server onnx-flow.yml
python -m clip_server tensorrt-flow.yml

At the first time, it will download the default pretrained model, which may take a minute. Then you will get the following address information:

 πŸ”—         Protocol                  GRPC   
 🏠     Local access   
 πŸ”’  Private network   
 🌐   Public address   

It means the server is ready to serve. Note down the three addresses showed above, you will need them later.

Connect from client#


Depending on the location of the client and server. You may use different IP addresses:

  • Client and server are on the same machine: use local address e.g.

  • Client and server are behind the same router: use private network address e.g.

  • Server is in public network: use public network address e.g.

Run the following Python script:

from clip_client import Client

c = Client('grpc://')

will give you:

 Roundtrip  16ms  100%                                                          
β”œβ”€β”€  Client-server network  12ms  75%                                           
└──  Server  4ms  25%                                                           
    β”œβ”€β”€  Gateway-CLIP network  0ms  0%                                          
    └──  CLIP model  4ms  100%      

It means the client and the server are now connected. Well done!


Join Us#

CLIP-as-service is backed by Jina AI and licensed under Apache-2.0. We are actively hiring AI engineers, solution engineers to build the next neural search ecosystem in open-source.

Index | Module Index