Simple, type-safe access to the ChatNoir search API.
Working with PyTerrier? Check out the chatnoir-pyterrier
package.
Install the package from PyPI:
pip install chatnoir-api
The ChatNoir API offers two main features: search with BM25F and retrieving document contents.
You can use our Python client to search for documents.
The results
object is an iterable wrapper of the search results which handles pagination for you.
List-style indexing is supported to access individual results or sub-lists of results:
from chatnoir_api.v1 import search
results = search("python library")
top10_results = results[:10]
print(top10_results)
result_1234 = results[1234]
print(result_1234)
To limit your search requests to a single index (e.g., ClueWeb22 category B), set the index
parameter like this:
from chatnoir_api import Index
from chatnoir_api.v1 import search
api_key: str = "<API_KEY>"
results = search("python library", index="clueweb22/b")
To search for phrases, use the search_phrases
method in the same way as normal search
:
from chatnoir_api.v1 import search_phrases
results = search_phrases("python library")
The public, shared, default API key comes with a limited request budget. To use the ChatNoir API more extensively, please request a dedicated API key.
Then, use the api_key
parameter to add it to your requests like this:
results = search("python library", api_key="<YOUR_API_KEY>")
To generate text with the ChatNoir Chat API you need to request an API key from the admins. With your API key, you can chat with the cat, like this:
from chatnoir_api.chat import ChatNoirChatClient
chat_client = ChatNoirChatClient(api_key="<API_KEY>")
response = chat_client.chat("how are you?")
Often the title and ID of a document is not enough to effectively re-rank a list of search results.
To retrieve the full content or plain text for a given document you can use the html_contents
helper function.
The html_contents
function expects a ChatNoir-internal UUID, shorthand UUID, or a TREC ID and the index from which to retrieve the document.
You can retrieve a document by its TREC ID like this:
from chatnoir_api import cache_contents, Index
contents = cache_contents(
"clueweb09-en0051-90-00849",
index="clueweb09",
)
print(contents)
plain_contents = cache_contents(
"clueweb09-en0051-90-00849",
index="clueweb09",
plain=True,
)
print(plain_contents)
For newer ChatNoir versions, you can also retrieve a document by its ChatNoir-internal short UUID like this:
from chatnoir_api import cache_contents, Index, ShortUUID
contents = cache_contents(
ShortUUID("MzOlTIayX9ub7c13GLPr_g"),
index="clueweb22/b",
)
print(contents)
plain_contents = cache_contents(
ShortUUID("MzOlTIayX9ub7c13GLPr_g"),
index="clueweb22/b",
plain=True,
)
print(plain_contents)
To build this package and contribute to its development you need to install the build
, and setuptools
and wheel
packages:
pip install build setuptools wheel
(On most systems, these packages are already pre-installed.)
Install package and test dependencies:
pip install -e .[tests]
Configure the API keys for testing:
export CHATNOIR_API_KEY="<API_KEY>"
export CHATNOIR_API_KEY_CHAT="<API_KEY>"
Verify your changes against the test suite to verify.
ruff check . # Code format and LINT
mypy . # Static typing
bandit -c pyproject.toml -r . # Security
pytest . # Unit tests
Please also add tests for your newly developed code.
Wheels for this package can be built with:
python -m build
If you hit any problems using this package, please file an issue. We're happy to help!
This repository is released under the MIT license.