/executor-text-sentencizer

This executor splits long texts into sentence chunks.

Primary LanguagePython

✨ Sentencizer

Sentencizer is a class that splits texts into sentences.

Table of Contents

🌱 Prerequisites

None

🚀 Usages

🚚 Via JinaHub

using docker images

Use the prebuilt images from JinaHub in your python codes,

from jina import Flow
	
f = Flow().add(uses='jinahub+docker://Sentencizer')

or in the .yml config.

jtype: Flow
pods:
  - name: sentencizer
    uses: 'jinahub+docker://Sentencizer'

using source codes

Use the source codes from JinaHub in your python codes,

from jina import Flow
	
f = Flow().add(uses='jinahub://Sentencizer')

or in the .yml config.

jtype: Flow
pods:
  - name: sentencizer
    uses: 'jinahub://Sentencizer'

📦️ Via Pypi

  1. Install the jinahub-text-sentencizer package.

    pip install git+https://github.com/jina-ai/executor-text-sentencizer.git
  2. Use jinahub-text-sentencizer in your code

    from jina import Flow
    from jinahub.text.sentencizer import Sentencizer
    
    f = Flow().add(uses=MyDummyExecutor)

🐳 Via Docker

  1. Clone the repo and build the docker image

    git clone https://github.com/jina-ai/executor-text-sentencizer.git
    cd executor-text-sentencizer
    docker build -t sentencizer .
  2. Use sentencizer in your codes

    from jina import Flow
    
    f = Flow().add(uses='docker://sentencizer:latest')

🎉️ Example

from jina import Flow, Document

f = Flow().add(uses='jinahub+docker://Sentencizer')

with f:
    resp = f.post(on='foo', inputs=Document(text='Hello. World.'), return_results=True)
    print(f'{resp}')

Inputs

Document with text containing two sentences split by a dot ., namely Hello. World..

Returns

Document with two chunks Documents. The first chunk contains text='Hello.', the second chunk contains text='World.'

🔍️ Reference