Text embedding encoder using IndoBERT as the model.
pip install indobert-embedding
from indo_bert_embedding import get_embedding
embedding = get_embedding("Saya belajar NLP di Neuversity.")
For get text similarity distance:
from indo_bert_embedding import text_similarity
distance = text_similarity("Saya belajar NLP di Neuversity.", "Aku belajar NLP di Universitas Indonesia.")
text_similarity
using cosine similarity to calculate distance.
@inproceedings{koto2020indolem,
title={IndoLEM and IndoBERT: A Benchmark Dataset and Pre-trained Language Model for Indonesian NLP},
author={Fajri Koto and Afshin Rahimi and Jey Han Lau and Timothy Baldwin},
booktitle={Proceedings of the 28th COLING},
year={2020}
}