/haystack

:mag: Haystack is an open source NLP framework to interact with your data using Transformer models and LLMs (GPT-4, ChatGPT and alike). Haystack offers production-ready tools to quickly build complex question answering, semantic search, text generation applications, and more.

Primary LanguagePythonApache License 2.0Apache-2.0

Haystack
CI/CD Tests Docker image release Schemas code style - Black types - Mypy
Docs Sync docs with Readme Website
Package PyPI PyPI - Downloads PyPI - Python Version GitHub License Compliance
Meta Discord Twitter Follow

Haystack is an end-to-end NLP framework that enables you to build NLP applications powered by LLMs, Transformer models, vector search and more. Whether you want to perform question answering, answer generation, semantic document search, or build tools that are capable of complex decision making and query resolution, you can use the state-of-the-art NLP models with Haystack to build end-to-end NLP applications solving your use case.

Core Concepts

πŸƒβ€β™€οΈ Pipelines: This is the standard Haystack structure that can connect to your data and perform on it NLP tasks that you define. The data in a Pipeline flows from one Node to the next. You define how Nodes interact with each other, and how one Node pushes data to the next.

An example pipeline would consist of one Retriever Node and one Reader Node. When the pipeline runs with a query, the Retriever first retrieves the documents relevant to the query and then the Reader extracts the final answer.

βš›οΈ Nodes: Each Node achieves one thing. Such as preprocessing documents, retrieving documents, using language models to answer questions and so on.

πŸ•΅οΈ Agent: (since 1.15) An Agent is a component that is powered by an LLM, such as GPT-3. It can decide on the next best course of action so as to get to the result of a query. It uses the Tools available to it to achieve this. While a pipeline has a clear start and end, an Agent is able to decide whether the query has resolved or not. It may also make use of a Pipeline as a Tool.

πŸ› οΈ Tools: You can think of a Tool as an expert, that is able to do something really well. Such as a calculator, good at mathematics. Or a WebRetriever, good at retrieving pages from the internet. A Node or pipeline in Haystack can also be used as a Tool. A Tool is a component that is used by an Agent, to resolve complex queries.

πŸ—‚οΈ DocumentStores: A DocumentStore is database where you store your text data for Haystack to access. Haystack DocumentStores are available with ElasticSearch, Opensearch, Weaviate, Pinecone, FAISS and more. For a full list of available DocumentStores, check out our documentation.

What to Build with Haystack

  • Perform Question Answering in natural language to find granular answers in your documents.
  • Generate answers or content with the use of LLM such as articles, tweets, product descriptions and more, the sky is the limit πŸš€
  • Perform semantic search and retrieve documents according to meaning.
  • Build applications that can do complex decisions making to answer complex queries: such as systems that can resolve complex customer queries, do knowledge search on many disconnected resources and so on.
  • Use off-the-shelf models or fine-tune them to your data.
  • Use user feedback to evaluate, benchmark, and continuously improve your models.

Features

  • Latest models: Haystack allows you to use and compare models available from OpenAI, Cohere and Hugging Face, as well as your own local models. Use the latest LLMs or Transformer-based models (for example: BERT, RoBERTa, MiniLM).
  • Modular: Multiple choices to fit your tech stack and use case. A wide choice of DocumentStores to store your data, file conversion tools and more
  • Open: Integrated with Hugging Face's model hub, OpenAI, Cohere and various Azure services.
  • Scalable: Scale to millions of docs using retrievers and production-scale components like Elasticsearch and a fastAPI REST API.
  • End-to-End: All tooling in one place: file conversion, cleaning, splitting, training, eval, inference, labeling, and more.
  • Customizable: Fine-tune models to your domain or implement your custom Nodes.
  • Continuous Learning: Collect new training data from user feedback in production & improve your models continuously.

Resources

πŸ“’ Docs Components, Pipeline Nodes, Guides, API Reference
πŸ’Ύ Installation How to install Haystack
πŸŽ“ Tutorials See what Haystack can do with our Notebooks & Scripts
πŸŽ‰ Haystack Extras A repository that lists extra Haystack packages and components that can be installed separately.
πŸ”° Demos A repository containing Haystack demo applications with Docker Compose and a REST API
πŸ–– Community Discord, Twitter, Stack Overflow, GitHub Discussions
πŸ’™ Contributing We welcome all contributions!
πŸ“Š Benchmarks Speed & Accuracy of Retriever, Readers and DocumentStores
πŸ”­ Roadmap Public roadmap of Haystack
πŸ“° Blog Learn about the latest with Haystack and NLP
☎️ Jobs We're hiring! Have a look at our open positions

πŸ’Ύ Installation

For a detailed installation guide see the official documentation. There you’ll find instructions for custom installations handling Windows and Apple Silicon.

Basic Installation

Use pip to install a basic version of Haystack's latest release:

pip install farm-haystack

This command installs everything needed for basic Pipelines that use an in-memory DocumentStore.

Full Installation

To use more advanced features, like certain DocumentStores, FileConverters, OCR, or Ray, you need to install further dependencies. The following command installs the latest release of Haystack and all its dependencies:

pip install 'farm-haystack[all]' ## or 'all-gpu' for the GPU-enabled dependencies

If you want to try out the newest features that are not in an official release yet, you can install the unstable version from the main branch with the following command:

pip install git+https://github.com/deepset-ai/haystack.git@main#egg=farm-haystack

To be able to make changes to Haystack code, first of all clone this repo:

git clone https://github.com/deepset-ai/haystack.git

Then move into the cloned folder and install the project with pip, including the development dependencies:

cd haystack && pip install -e '.[dev]'

If you want to contribute to the Haystack repo, check our Contributor Guidelines first.

See the list of dependencies to check which ones you want to install (for example, [all], [dev], or other).

Installing the REST API

Haystack comes packaged with a REST API so that you can deploy it as a service. Run the following command from the root directory of the Haystack repo to install REST_API:

pip install rest_api/

You can find out more about our PyPi package on our PyPi page.

πŸ”°Demos

You can find some of our hosted demos with instructions to run them locally too on our haystack-demos repository

πŸ’« Reduce Hallucinations with Retrieval Augmentation - Generative QA with LLMs

πŸ₯ Should I follow? - Summarizing tweets with LLMs

🌎 Explore The World - Extractive Question Answering

πŸ–– Community

If you have a feature request or a bug report, feel free to open an issue in Github. We regularly check these and you can expect a quick response. If you'd like to discuss a topic, or get more general advice on how to make Haystack work for your project, you can start a thread in Github Discussions or our Discord channel. We also check Twitter and Stack Overflow.

πŸ’™ Contributing

We are very open to the community's contributions - be it a quick fix of a typo, or a completely new feature! You don't need to be a Haystack expert to provide meaningful improvements. To learn how to get started, check out our Contributor Guidelines first.

Who Uses Haystack

Here's a list of projects and companies using Haystack. Want to add yours? Open a PR, add it to the list and let the world know that you use Haystack!