- https://github.com/BM-K/KoSentenceBERT-ETRI
- https://github.com/BM-K/Sentence-Embedding-Is-All-You-Need\
- https://github.com/BM-K/Sentence-Embedding-Is-All-You-Need/tree/main/KoSBERT
- https://github.com/UKPLab/sentence-transformers
- https://github.com/snunlp/KR-SBERT
https://github.com/UKPLab/sentence-transformers/blob/master/docs/quickstart.md https://github.com/UKPLab/sentence-transformers/blob/master/docs/training/overview.md https://github.com/UKPLab/sentence-transformers/tree/master/examples/training/quora_duplicate_questions
MultipleNegativesRankingLoss is especially suitable for Information Retrieval / Semantic Search. A nice advantage of MultipleNegativesRankingLoss is that it only requires positive pairs, i.e., we only need examples of duplicate questions. https://github.com/UKPLab/sentence-transformers/blob/master/examples/training/quora_duplicate_questions/training_MultipleNegativesRankingLoss.py
Pretrained models https://www.sbert.net/docs/pretrained_models.html bert-base-uncased https://huggingface.co/Graphcore/bert-base-uncased https://wikidocs.net/154530
ko-sbert-multitask https://huggingface.co/jhgan/ko-sbert-multitask
MiniLM https://arxiv.org/pdf/2002.10957.pdf https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2
text-summarization https://github.com/UKPLab/sentence-transformers/blob/master/examples/applications/text-summarization/text-summarization.py
Datasets PAQ https://github.com/facebookresearch/PAQ https://huggingface.co/bert-base-multilingual-uncased https://github.com/google-research/bert/blob/master/multilingual.md#list-of-languages
unsupervised learning https://www.sbert.net/examples/unsupervised_learning/README.html https://aclanthology.org/2021.findings-emnlp.23/ SNCSE https://arxiv.org/pdf/2201.05979.pdf A repository of conversational datasets https://arxiv.org/pdf/1904.06472.pdf