intfloat's Stars
pwxcoo/chinese-xinhua
:orange_book: 中华新华字典数据库。包括歇后语,成语,词语,汉字。
cleanlab/cleanlab
The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.
catboost/catboost
A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.
AntixK/PyTorch-VAE
A Collection of Variational Autoencoders (VAE) in PyTorch.
mozillazg/python-pinyin
汉字转拼音(pypinyin)
facebookresearch/moco
PyTorch implementation of MoCo: https://arxiv.org/abs/1911.05722
km1994/nlp_paper_study
该仓库主要记录 NLP 算法工程师相关的顶会论文研读笔记
princeton-nlp/SimCSE
[EMNLP 2021] SimCSE: Simple Contrastive Learning of Sentence Embeddings https://arxiv.org/abs/2104.08821
yuchenlin/rebiber
A simple tool to update bib entries with their official information (e.g., DBLP or the ACL anthology).
dragen1860/MAML-Pytorch
Elegant PyTorch implementation of paper Model-Agnostic Meta-Learning (MAML)
facebookresearch/swav
PyTorch implementation of SwAV https//arxiv.org/abs/2006.09882
pykeen/pykeen
🤖 A Python library for learning and evaluating knowledge graph embeddings
DeepGraphLearning/KnowledgeGraphEmbedding
facebookresearch/BLINK
Entity Linker solution
shangjingbo1226/AutoPhrase
AutoPhrase: Automated Phrase Mining from Massive Text Corpora
tangjianpku/LINE
LINE: Large-scale information network embedding
castorini/anserini
Anserini is a Lucene toolkit for reproducible information retrieval research
facebookresearch/KILT
Library for Knowledge Intensive Language Tasks
XiaoMi/MiNLP
XiaoMi Natural Language Processing Toolkits
facebookresearch/GENRE
Autoregressive Entity Retrieval
google-research/xtreme
XTREME is a benchmark for the evaluation of the cross-lingual generalization ability of pre-trained multilingual models that covers 40 typologically diverse languages and includes nine tasks.
princeton-nlp/DensePhrases
[ACL 2021] Learning Dense Representations of Phrases at Scale; EMNLP'2021: Phrase Retrieval Learns Passage Retrieval, Too https://arxiv.org/abs/2012.12624
wmayner/pyemd
Fast EMD for Python: a wrapper for Pele and Werman's C++ implementation of the Earth Mover's Distance metric
bojone/BERT-whitening
简单的向量白化改善句向量质量
ChunyuanLI/Optimus
Optimus: the first large-scale pre-trained VAE language model
alexa/dialoglue
DialoGLUE: A Natural Language Understanding Benchmark for Task-Oriented Dialogue
facebookresearch/multihop_dense_retrieval
Multi-hop dense retrieval for question answering
Eric-Wallace/interpretability-tutorial-emnlp2020
Materials for the EMNLP 2020 Tutorial on "Interpreting Predictions of NLP Models"
CZWin32768/XLM-Align
facebookresearch/bitext-lexind
Bilingual lexicons map words in one language to their translations in another, and are typically induced by learning linear projections to align monolingual word embedding spaces. In this paper, we show it is possible to produce much higher quality lexicons with methods that combine (1) unsupervised bitext mining and (2) unsupervised word alignment. Directly applying a pipeline that uses recent algorithms for both subproblems significantly improves induced lexicon quality and further gains are possible by learning to filter the resulting lex-ical entries, with both unsupervised and semi-supervised schemes. Our final approach out-performs the state of the art on the BUCC 2020shared task by 14 F1 points averaged over 12 language pairs, while also providing a more interpretable approach that allows for rich reasoning of word meaning in context.