In this example we perform text similarity search over a set of github issues to predict the labels on newly entered tickets.
The data is gathered from the quarkusio/quarkus
repository, which provides a dataset that includes title
and body
(of the issues reported) and is labeled using labels (i.e. area/devmode
, or kind/bug
).
sbert sentence transformers are used to compute the embeddings, which are stored in a vector database (qdrant in our case).
The code is broken down into several Jupyter notebooks that need to be used in order: