/executor-clip-image

Executor for the pre-trained clip model. https://openai.com/blog/clip/

Primary LanguagePython

ClipImageEncoder

ClipImageEncoder is a class that wraps the image embedding functionality from the CLIP model.

The CLIP model originally was proposed in Learning Transferable Visual Models From Natural Language Supervision.

ClipImageEncoder encode images stored in the blob attribute of the Document and saves the encoding in the embedding attribute.

Prerequisites

None

Usages

Via JinaHub (🚧W.I.P.)

Use the prebuilt images from JinaHub in your python codes,

from jina import Flow
	
f = Flow().add(
        uses='jinahub+docker://ClipImageEncoder',
        volumes='/your_home_folder/.cache/clip:/root/.cache/clip')

or in the .yml config.

jtype: Flow
pods:
  - name: encoder
    uses: 'jinahub+docker://ClipImageEncoder'
    volumes: '/your_home_folder/.cache/clip:/root/.cache/clip'

Via Pypi

  1. Install the jinahub-clip-image

    pip install git+https://github.com/jina-ai/executor-clip-image.git
  2. Use jinahub-clip-image in your code

    from jinahub.encoder.clip_image import ClipImageEncoder
    from jina import Flow
    
    f = Flow().add(uses=ClipImageEncoder)

Via Docker

  1. Clone the repo and build the docker image

    git clone https://github.com/jina-ai/executor-clip-image.git
    cd executor-clip-image
    docker build -t jinahub-clip-image .
  2. Use jinahub-clip-image in your codes

    from jina import Flow
    
    f = Flow().add(
            uses='docker://jinahub-clip-image:latest',
            volumes='/your_home_folder/.cache/clip:/root/.cache/clip')

Example

f = Flow().add(uses='jinahub+docker://ClipImageEncoder',
               volumes='/Users/nanwang/.cache/clip:/root/.cache/clip')


def check_resp(resp):
    for _doc in resp.data.docs:
        doc = Document(_doc)
        print(f'embedding shape: {doc.embedding.shape}')


with f:
    f.post(on='foo',
           inputs=Document(blob=np.ones((800, 224, 3), dtype=np.uint8)),
           on_done=check_resp)
	    

Inputs

Documents with blob of the shape Height x Width x 3. By default, the input blob must be an ndarray with dtype=uint8. The Height and Width can have arbitrary values. When setting use_default_preprocessing=False, the input blob must have the size of 224x224x3 with dtype=float32.

Returns

Documents with embedding fields filled with an ndarray of the shape 512 with dtype=nfloat32.

Reference