intfloat

Peking UniversityBeijing, China

intfloat's Stars

pwxcoo/chinese-xinhua
:orange_book: 中华新华字典数据库。包括歇后语，成语，词语，汉字。
Language:Python11k 307 602.6k
cleanlab/cleanlab
The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.
Language:Python9.9k 88 366753
catboost/catboost
A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.
Language:Python8.2k 191 2.4k1.2k
AntixK/PyTorch-VAE
A Collection of Variational Autoencoders (VAE) in PyTorch.
Language:Python6.8k 43 881.1k
mozillazg/python-pinyin
汉字转拼音(pypinyin)
Language:Python4.9k 99 268616
facebookresearch/moco
PyTorch implementation of MoCo: https://arxiv.org/abs/1911.05722
Language:Python4.8k 52 135796
km1994/nlp_paper_study
该仓库主要记录 NLP 算法工程师相关的顶会论文研读笔记
Language:C++3.9k 71 8659
princeton-nlp/SimCSE
[EMNLP 2021] SimCSE: Simple Contrastive Learning of Sentence Embeddings https://arxiv.org/abs/2104.08821
Language:Python3.5k 30 274519
yuchenlin/rebiber
A simple tool to update bib entries with their official information (e.g., DBLP or the ACL anthology).
Language:Python2.7k 15 31161
dragen1860/MAML-Pytorch
Elegant PyTorch implementation of paper Model-Agnostic Meta-Learning (MAML)
Language:Python2.3k 23 74423
facebookresearch/swav
PyTorch implementation of SwAV https//arxiv.org/abs/2006.09882
Language:Python2k 41 116283
pykeen/pykeen
🤖 A Python library for learning and evaluating knowledge graph embeddings
Language:Python1.7k 27 555192
DeepGraphLearning/KnowledgeGraphEmbedding
Language:Python1.3k 25 54267
facebookresearch/BLINK
Entity Linker solution
Language:Python1.2k 39 94230
shangjingbo1226/AutoPhrase
AutoPhrase: Automated Phrase Mining from Massive Text Corpora
Language:C++1.2k 39 82276
tangjianpku/LINE
LINE: Large-scale information network embedding
Language:C++1k 44 41408
castorini/anserini
Anserini is a Lucene toolkit for reproducible information retrieval research
Language:Java1k 38 617466
facebookresearch/KILT
Library for Knowledge Intensive Language Tasks
Language:Python921 25 3191
XiaoMi/MiNLP
XiaoMi Natural Language Processing Toolkits
Language:Scala792 26 6889
facebookresearch/GENRE
Autoregressive Entity Retrieval
Language:Python775 19 96104
google-research/xtreme
XTREME is a benchmark for the evaluation of the cross-lingual generalization ability of pre-trained multilingual models that covers 40 typologically diverse languages and includes nine tasks.
Language:Python633 20 69111
princeton-nlp/DensePhrases
[ACL 2021] Learning Dense Representations of Phrases at Scale; EMNLP'2021: Phrase Retrieval Learns Passage Retrieval, Too https://arxiv.org/abs/2012.12624
Language:Python605 13 3176
wmayner/pyemd
Fast EMD for Python: a wrapper for Pele and Werman's C++ implementation of the Earth Mover's Distance metric
Language:C++484 12 5062
bojone/BERT-whitening
简单的向量白化改善句向量质量
Language:Python483 6 1366
ChunyuanLI/Optimus
Optimus: the first large-scale pre-trained VAE language model
Language:Python380 13 3037
alexa/dialoglue
DialoGLUE: A Natural Language Understanding Benchmark for Task-Oriented Dialogue
Language:Python283 19 1727
facebookresearch/multihop_dense_retrieval
Multi-hop dense retrieval for question answering
Language:Python214 10 2122
Eric-Wallace/interpretability-tutorial-emnlp2020
Materials for the EMNLP 2020 Tutorial on "Interpreting Predictions of NLP Models"
197 9 112
CZWin32768/XLM-Align
Language:Python36 4 41
facebookresearch/bitext-lexind
Bilingual lexicons map words in one language to their translations in another, and are typically induced by learning linear projections to align monolingual word embedding spaces. In this paper, we show it is possible to produce much higher quality lexicons with methods that combine (1) unsupervised bitext mining and (2) unsupervised word alignment. Directly applying a pipeline that uses recent algorithms for both subproblems significantly improves induced lexicon quality and further gains are possible by learning to filter the resulting lex-ical entries, with both unsupervised and semi-supervised schemes. Our final approach out-performs the state of the art on the BUCC 2020shared task by 14 F1 points averaged over 12 language pairs, while also providing a more interpretable approach that allows for rich reasoning of word meaning in context.
Language:Python16 6 12

intfloat

intfloat's Stars

pwxcoo/chinese-xinhua

cleanlab/cleanlab

catboost/catboost

AntixK/PyTorch-VAE

mozillazg/python-pinyin

facebookresearch/moco

km1994/nlp_paper_study

princeton-nlp/SimCSE

yuchenlin/rebiber

dragen1860/MAML-Pytorch

facebookresearch/swav

pykeen/pykeen

DeepGraphLearning/KnowledgeGraphEmbedding

facebookresearch/BLINK

shangjingbo1226/AutoPhrase

tangjianpku/LINE

castorini/anserini

facebookresearch/KILT

XiaoMi/MiNLP

facebookresearch/GENRE

google-research/xtreme

princeton-nlp/DensePhrases

wmayner/pyemd

bojone/BERT-whitening

ChunyuanLI/Optimus

alexa/dialoglue

facebookresearch/multihop_dense_retrieval

Eric-Wallace/interpretability-tutorial-emnlp2020

CZWin32768/XLM-Align

facebookresearch/bitext-lexind