jina: A Python repository from EmAchieng

Cloud-Native Neural Search^? Framework for Any Kind of Data

Jina is a neural search framework that empowers anyone to build SOTA and scalable deep learning search applications in minutes.

⏱️ Save time - The design pattern of neural search systems. Native support for PyTorch/Keras/ONNX/Paddle. Build solutions in just minutes.

🌌 All data types - Process, index, query, and understand videos, images, long/short text, audio, source code, PDFs, etc.

🌩️ Local & cloud friendly - Distributed architecture, scalable & cloud-native from day one. Same developer experience on both local and cloud.

🍱 Own your stack - Keep end-to-end stack ownership of your solution. Avoid integration pitfalls you get with fragmented, multi-vendor, generic legacy tools.

Install

pip install -U jina

More install options including Conda, Docker, Windows can be found here.

Learning and Docs

Brand new to Jina? Check our Learning Bootcamp to get up to speed.
Check our comprehensive docs for deeper tutorials, more advanced topics, and API reference.

Get Started

We promise you can build a scalable ResNet-powered image search service in 20 minutes or less, from scratch. If not, you can forget about Jina.

Basic Concepts

Document, Executor, and Flow are three fundamental concepts in Jina.

Document is the basic data type in Jina;
Executor is how Jina processes Documents;
Flow is how Jina streamlines and distributes Executors.

Leveraging these three components, let's build an app that find similar images using ResNet50.

ResNet50 Image Search in 20 Lines

^{💡 Preliminaries: download dataset, install PyTorch & Torchvision}

from jina import DocumentArray, Document

def preproc(d: Document):
    return (d.load_uri_to_image_blob()  # load
             .set_image_blob_normalization()  # normalize color 
             .set_image_blob_channel_axis(-1, 0))  # switch color axis
docs = DocumentArray.from_files('img/*.jpg').apply(preproc)

import torchvision
model = torchvision.models.resnet50(pretrained=True)  # load ResNet50
docs.embed(model, device='cuda')  # embed via GPU to speedup

q = (Document(uri='img/00021.jpg')  # build query image & preprocess
     .load_uri_to_image_blob()
     .set_image_blob_normalization()
     .set_image_blob_channel_axis(-1, 0))
q.embed(model)  # embed
q.match(docs)  # find top-20 nearest neighbours, done!

Done! Now print q.matches and you'll see the URIs of the most similar images.

Add three lines of code to visualize them:

for m in q.matches:
    m.set_image_blob_channel_axis(0, -1).set_image_blob_inv_normalization()
q.matches.plot_image_sprites()

Sweet! FYI, you can use Keras, ONNX, or PaddlePaddle for the embedding model. Jina supports them well.

As-a-Service in 10 Extra Lines

With an extremely trivial refactoring and ten extra lines of code, you can make the local script a ready-to-serve service:

Import what we need.

from jina import Document, DocumentArray, Executor, Flow, requests

Copy-paste the preprocessing step and wrap it via Executor:

class PreprocImg(Executor):
    @requests
    def foo(self, docs: DocumentArray, **kwargs):
        for d in docs:
            (d.load_uri_to_image_blob()  # load
             .set_image_blob_normalization()  # normalize color
             .set_image_blob_channel_axis(-1, 0))  # switch color axis

Copy-paste the embedding step and wrap it via Executor:

class EmbedImg(Executor):
    def __init__(self, **kwargs):
        super().__init__(**kwargs)
        import torchvision
        self.model = torchvision.models.resnet50(pretrained=True)        

    @requests
    def foo(self, docs: DocumentArray, **kwargs):
        docs.embed(self.model)

Wrap the matching step into an Executor:

class MatchImg(Executor):
    _da = DocumentArray()

    @requests(on='/index')
    def index(self, docs: DocumentArray, **kwargs):
        self._da.extend(docs)
        docs.clear()  # clear content to save bandwidth

    @requests(on='/search')
    def foo(self, docs: DocumentArray, **kwargs):
        docs.match(self._da)
        for d in docs.traverse_flat('r,m'):  # only require for visualization
            d.convert_uri_to_datauri()  # convert to datauri
            d.pop('embedding', 'blob')  # remove unnecessary fields for save bandwidth

Connect all Executors in a Flow, scale embedding to 3:

f = Flow(port_expose=12345, protocol='http').add(uses=PreprocImg).add(uses=EmbedImg, replicas=3).add(uses=MatchImg)

Plot it via f.plot('flow.svg') and you get:

Index image data and serve REST query publicly:

with f:
    f.post('/index', DocumentArray.from_files('img/*.jpg'), show_progress=True, request_size=8)
    f.block()

Done! Now query it via curl and you get the most similar images:

Or go to http://0.0.0.0:12345/docs and test requests via a Swagger UI:

Or use a Python client to access the service:

from jina import Client, Document
from jina.types.request import Response

def print_matches(resp: Response):  # the callback function invoked when task is done
    for idx, d in enumerate(resp.docs[0].matches):  # print top-3 matches
        print(f'[{idx}]{d.scores["cosine"].value:2f}: "{d.uri}"')

c = Client(protocol='http', port=12345)  # connect to localhost:12345
c.post('/search', Document(uri='img/00021.jpg'), on_done=print_matches)

At this point, you probably have taken 15 minutes but here we are: an image search service with rich features:


✅ Solution as microservices	✅ Scale in/out any component	✅ Query via HTTP/WebSocket/gRPC/Client
✅ Distribute/Dockerize components	✅ Async/non-blocking I/O	✅ Extendable REST interface

^{✅ Solution as microservices
✅ Scale in/out any component
✅ Query via HTTP/WebSocket/gRPC/Client

✅ Distribute/Dockerize components
✅ Async/non-blocking I/O
✅ Extendable REST interface}

Deploy to Kubernetes in 7 Minutes

Have another seven minutes? We'll show you how to bring your service to the next level by deploying it to Kubernetes.

Create a Kubernetes cluster and get credentials (example in GCP, more K8s providers here):

gcloud container clusters create test --machine-type e2-highmem-2  --num-nodes 1 --zone europe-west3-a
gcloud container clusters get-credentials test --zone europe-west3-a --project jina-showcase

Move each Executor class to a separate folder with one Python file in each:
- PreprocImg -> 📁 preproc_img/exec.py
- EmbedImg -> 📁 embed_img/exec.py
- MatchImg -> 📁 match_img/exec.py
Push all Executors to Jina Hub:
```
jina hub push preproc_img
jina hub push embed_img
jina hub push match_img
```
You will get three Hub Executors that can be used via Docker container.

Adjust Flow a bit and open it:

f = Flow(name='readme-flow', port_expose=12345, infrastructure='k8s').add(uses='jinahub+docker://PreprocImg').add(uses='jinahub+docker://EmbedImg', replicas=3).add(uses='jinahub+docker://MatchImg')
with f:
    f.block()

Intrigued? Find more about Jina from our docs.

Run Quick Demo

👗 Fashion image search: jina hello fashion
🤖 QA chatbot: pip install "jina[demo]" && jina hello chatbot
📰 Multimodal search: pip install "jina[demo]" && jina hello multimodal
🍴 Fork the source of a demo to your folder: jina hello fork fashion ../my-proj/

Support

Join our Slack community to chat to our engineers about your use cases, questions, and support queries.
Join our Engineering All Hands meet-up to discuss your use case and learn Jina's new features.
- When? The second Tuesday of every month
- Where? Zoom (see our public calendar/.ical/Meetup group) and live stream on YouTube
Subscribe to the latest video tutorials on our YouTube channel

Join Us

Jina is backed by Jina AI and licensed under Apache-2.0. We are actively hiring AI engineers, solution engineers to build the next neural search ecosystem in open source.

Contributing

We welcome all kinds of contributions from the open-source community, individuals and partners. We owe our success to your active involvement.

EmAchieng/jina