transformers2learn

Links

https://github.com/UKPLab/sentence-transformers/blob/master/docs/quickstart.md https://github.com/UKPLab/sentence-transformers/blob/master/docs/training/overview.md https://github.com/UKPLab/sentence-transformers/tree/master/examples/training/quora_duplicate_questions

MultipleNegativesRankingLoss is especially suitable for Information Retrieval / Semantic Search. A nice advantage of MultipleNegativesRankingLoss is that it only requires positive pairs, i.e., we only need examples of duplicate questions. https://github.com/UKPLab/sentence-transformers/blob/master/examples/training/quora_duplicate_questions/training_MultipleNegativesRankingLoss.py

Pretrained models https://www.sbert.net/docs/pretrained_models.html bert-base-uncased https://huggingface.co/Graphcore/bert-base-uncased https://wikidocs.net/154530

ko-sbert-multitask https://huggingface.co/jhgan/ko-sbert-multitask

MiniLM https://arxiv.org/pdf/2002.10957.pdf https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2

text-summarization https://github.com/UKPLab/sentence-transformers/blob/master/examples/applications/text-summarization/text-summarization.py

Datasets PAQ https://github.com/facebookresearch/PAQ https://huggingface.co/bert-base-multilingual-uncased https://github.com/google-research/bert/blob/master/multilingual.md#list-of-languages

unsupervised learning https://www.sbert.net/examples/unsupervised_learning/README.html https://aclanthology.org/2021.findings-emnlp.23/ SNCSE https://arxiv.org/pdf/2201.05979.pdf A repository of conversational datasets https://arxiv.org/pdf/1904.06472.pdf