Cloud-Native Neural Search? Framework for Any Kind of Data
Jina🔊
is a neural search framework that empowers anyone to build SOTA and scalable deep learning search applications in minutes.
Install
- via PyPI:
pip install jina
- via Conda:
conda install jina -c conda-forge
- via Docker:
docker run jinaai/jina:latest
- More install options
Documentation
Run Quick Demo
👗 Fashion image search:jina hello fashion
🤖 QA chatbot:pip install "jina[demo]" && jina hello chatbot
📰 Multimodal search:pip install "jina[demo]" && jina hello multimodal
🍴 Fork the source of a demo to your folder:jina hello fork fashion ../my-proj/
Build Your First Jina App
Document, Executor, and Flow are three fundamental concepts in Jina.
📄 Document is the basic data type in Jina;⚙️ Executor is how Jina processes Documents;🔀 Flow is how Jina streamlines and distributes Executors.
Leveraging these three components, let's build an app that find lines from a code snippet that are most similar to the query.
import numpy as np
from jina import Document, DocumentArray, Executor, Flow, requests
class CharEmbed(Executor): # a simple character embedding with mean-pooling
offset = 32 # letter `a`
dim = 127 - offset + 1 # last pos reserved for `UNK`
char_embd = np.eye(dim) * 1 # one-hot embedding for all chars
@requests
def foo(self, docs: DocumentArray, **kwargs):
for d in docs:
r_emb = [ord(c) - self.offset if self.offset <= ord(c) <= 127 else (self.dim - 1) for c in d.text]
d.embedding = self.char_embd[r_emb, :].mean(axis=0) # average pooling
class Indexer(Executor):
_docs = DocumentArray() # for storing all documents in memory
@requests(on='/index')
def foo(self, docs: DocumentArray, **kwargs):
self._docs.extend(docs) # extend stored `docs`
@requests(on='/search')
def bar(self, docs: DocumentArray, **kwargs):
docs.match(self._docs, metric='euclidean')
f = Flow(port_expose=12345, protocol='http', cors=True).add(uses=CharEmbed, parallel=2).add(uses=Indexer) # build a Flow, with 2 parallel CharEmbed, tho unnecessary
with f:
f.post('/index', (Document(text=t.strip()) for t in open(__file__) if t.strip())) # index all lines of _this_ file
f.block() # block for listening request
http://localhost:12345/docs
(an extended Swagger UI) in your browser, click /search tab and input:
{"data": [{"text": "@requests(on=something)"}]}
That means, we want to find lines from the above code snippet that are most similar to @request(on=something)
. Now click Execute button!
from jina import Client, Document
from jina.types.request import Response
def print_matches(resp: Response): # the callback function invoked when task is done
for idx, d in enumerate(resp.docs[0].matches[:3]): # print top-3 matches
print(f'[{idx}]{d.scores["euclidean"].value:2f}: "{d.text}"')
c = Client(protocol='http', port=12345) # connect to localhost:12345
c.post('/search', Document(text='request(on=something)'), on_done=print_matches)
This prints the following results:
Client@1608[S]:connected to the gateway at localhost:12345!
[0]0.168526: "@requests(on='/index')"
[1]0.181676: "@requests(on='/search')"
[2]0.218218: "from jina import Document, DocumentArray, Executor, Flow, requests"
Support
- Join our Slack community to chat to our engineers about your use cases, questions, and support queries.
- Join our Engineering All Hands meet-up to discuss your use case and learn Jina's new features.
- When? The second Tuesday of every month
- Where? Zoom (see our public events calendar/.ical) and live stream on YouTube
- Subscribe to the latest video tutorials on our YouTube channel
Join Us
Jina is backed by Jina AI. We are actively hiring AI engineers, solution engineers to build the next neural search ecosystem in opensource.
Contributing
We welcome all kinds of contributions from the open-source community, individuals and partners. We owe our success to your active involvement.