A hub of third-party NLP providers and tutorials to help you instantly handle your data iterator with no-string dependency apps.
The purpose is of this project is to share Third-party providers that could be combined into a single pipeline.
-
- Mistral.AI [provider] [🤖 models]
- OpenRouter.AI [provider] [🤖 models]
- Replicate.IO [provider] [🤖 models]
- OpenAI provider:
- ChatGPT [provider]
- Qwen-2.5-Max [bash-script]
- o1 [provider]
- Transformers:
- DeepSeek-R1-distill-7b [📙 qwen-notebook] [📙 llama3-notebook]
- LLaMA-3 [provider]
- Qwen-2 [provider]
- Microsoft-Phi-2 [provider]
- Mistral [provider]
- Gemma [provider]
- Flan-T5 [provider]
-
- DeepPavlov [provider] [📙 notebook]
- Flair [provider] [bash-script] [🤖 models]
- Spacy [provider] [bash-script] [🤖 models]
-
- GoogleTranslator [provider] [📙 notebook]
In this project we consider that each provider represent a wrapper over third-party app to handle iterator of data.
We consider dict
python type for representing each record of the data.
If you wish to use several third-party providers all together for a
data-iterators, it is recommented to adopt AREkit
framework as a no-string solution for deploying pipeline that support batching mode.
- bulk-chain -- framework for reasoning over your tabular data rows with any provided LLM
- bulk-ner -- framework for a quick third-party models binding for entities extraction from cells of long tabular data
- bulk-translate -- framework for translation of a massive stream of texts with native support of pre-annotated fixed-spans that are invariant for translator.
- AREkit pipelines -- toolkit for handling your textual data iterators with various NLP providers