ClipImageEncoder
ClipImageEncoder is a class that wraps the image embedding functionality from the CLIP model.
The CLIP model originally was proposed in Learning Transferable Visual Models From Natural Language Supervision.
ClipImageEncoder
encode images stored in the blob attribute of the Document and saves the encoding in the embedding attribute.
Prerequisites
None
Usages
🚧 W.I.P.)
Via JinaHub (Use the prebuilt images from JinaHub in your python codes,
from jina import Flow
f = Flow().add(
uses='jinahub+docker://ClipImageEncoder',
volumes='/your_home_folder/.cache/clip:/root/.cache/clip')
or in the .yml
config.
jtype: Flow
pods:
- name: encoder
uses: 'jinahub+docker://ClipImageEncoder'
volumes: '/your_home_folder/.cache/clip:/root/.cache/clip'
Via Pypi
-
Install the
jinahub-clip-image
pip install git+https://github.com/jina-ai/executor-clip-image.git
-
Use
jinahub-clip-image
in your codefrom jinahub.encoder.clip_image import ClipImageEncoder from jina import Flow f = Flow().add(uses=ClipImageEncoder)
Via Docker
-
Clone the repo and build the docker image
git clone https://github.com/jina-ai/executor-clip-image.git cd executor-clip-image docker build -t jinahub-clip-image .
-
Use
jinahub-clip-image
in your codesfrom jina import Flow f = Flow().add( uses='docker://jinahub-clip-image:latest', volumes='/your_home_folder/.cache/clip:/root/.cache/clip')
Example
f = Flow().add(uses='jinahub+docker://ClipImageEncoder',
volumes='/Users/nanwang/.cache/clip:/root/.cache/clip')
def check_resp(resp):
for _doc in resp.data.docs:
doc = Document(_doc)
print(f'embedding shape: {doc.embedding.shape}')
with f:
f.post(on='foo',
inputs=Document(blob=np.ones((800, 224, 3), dtype=np.uint8)),
on_done=check_resp)
Inputs
Documents with blob
of the shape Height x Width x 3
. By default, the input blob
must be an ndarray
with dtype=uint8
. The Height
and Width
can have arbitrary values. When setting use_default_preprocessing=False
, the input blob
must have the size of 224x224x3
with dtype=float32
.
Returns
Documents with embedding
fields filled with an ndarray
of the shape 512
with dtype=nfloat32
.