Aaroniley

Aaroniley's Stars

facebookresearch/faiss
A library for efficient similarity search and clustering of dense vectors.
Language:C++30.6k 480 2.5k3.6k
Lightning-AI/pytorch-lightning
Pretrain, finetune and deploy AI models on multiple GPUs, TPUs with zero code changes.
Language:Python28k 247 7.1k3.4k
UKPLab/sentence-transformers
State-of-the-Art Text Embeddings
Language:Python14.9k 140 2.1k2.4k
flairNLP/flair
A very simple framework for state-of-the-art Natural Language Processing (NLP)
Language:Python13.8k 202 2.3k2.1k
Embedding/Chinese-Word-Vectors
100+ Chinese Word Vectors 上百种预训练中文词向量
Language:Python11.8k 286 1672.3k
brightmart/text_classification
all kinds of text classification models and more with deep learning
Language:Python7.8k 299 1242.6k
stanfordnlp/GloVe
Software in C and data files for the popular GloVe model for distributed word representations, a.k.a. word vectors or embeddings
Language:C6.8k 228 1621.5k
bentrevett/pytorch-seq2seq
Tutorials on implementing a few sequence-to-sequence (seq2seq) models with PyTorch and TorchText.
Language:Jupyter Notebook5.3k 66 1911.3k
649453932/Chinese-Text-Classification-Pytorch
中文文本分类，TextCNN，TextRNN，FastText，TextRCNN，BiLSTM_Attention，DPCNN，Transformer，基于pytorch，开箱即用。
Language:Python5.3k 35 1171.2k
nlp-with-transformers/notebooks
Jupyter notebooks for the Natural Language Processing with Transformers book
Language:Jupyter Notebook3.8k 60 981.2k
stanford-futuredata/ColBERT
ColBERT: state-of-the-art neural search (SIGIR'20, TACL'21, NeurIPS'21, NAACL'22, CIKM'22, ACL'23, EMNLP'23)
Language:Python2.9k 41 260374
datawhalechina/learn-nlp-with-transformers
we want to create a repo to illustrate usage of transformers in chinese
Language:Shell2.2k 16 21374
dmis-lab/biobert
Bioinformatics'2020: BioBERT: a pre-trained biomedical language representation model for biomedical text mining
Language:Python1.9k 62 175452
facebookresearch/DPR
Dense Passage Retriever - is a set of tools and models for open domain Q&A task.
Language:Python1.7k 23 210299
castorini/pyserini
Pyserini is a Python toolkit for reproducible information retrieval research with sparse and dense representations.
Language:Python1.6k 18 542361
beir-cellar/beir
A Heterogeneous Benchmark for Information Retrieval. Easy to use, evaluate your models across 15+ diverse IR datasets.
Language:Python1.6k 23 138187
zhuifengshen/DingtalkChatbot
钉钉群自定义机器人消息Python封装
Language:Python1.2k 26 54278
NTMC-Community/awesome-neural-models-for-semantic-match
A curated list of papers dedicated to neural text (semantic) matching.
Language:HTML774 53 22122
ahangchen/torch_base
Quickly bring up your PyTorch project(a skeleton)
Language:Python641 5 10116
xiaoqian19940510/text-classification-surveys
文本分类资源汇总，包括深度学习文本分类模型，如SpanBERT、ALBERT、RoBerta、Xlnet、MT-DNN、BERT、TextGCN、MGAN、TextCapsule、SGNN、SGM、LEAM、ULMFiT、DGCNN、ELMo、RAM、DeepMoji、IAN、DPCNN、TopicRNN、LSTMN 、Multi-Task、HAN、CharCNN、Tree-LSTM、DAN、TextRCNN、Paragraph-Vec、TextCNN、DCNN、RNTN、MV-RNN、RAE等，浅层学习模型，如LightGBM 、SVM、XGboost、Random Forest、C4.5、CART、KNN、NB、HMM等。介绍文本分类数据集，如MR、SST、MPQA、IMDB、Yelp、20NG、AG、R8、DBpedia、Ohsumed、SQuAD、SNLI、MNLI、MSRP、MRDA、RCV1、AAPD，评价指标，如accuracy、Precision、Recall、F1、EM、MRR、HL、Micro-F1、Macro-F1、P@K，和技术挑战，包括多标签文本分类。
Language:Python595 17 2104
ahangchen/windy-afternoon
Gitbook based Blog, Android, Linux, Deep Learning, Computer Vision
Language:CSS363 22 966
thunlp/SOS4NLP
Survey of Surveys for Natural Language Processing (SOS4NLP)
326 16 138
UKPLab/gpl
Powerful unsupervised domain adaptation method for dense retrieval. Requires only unlabeled corpus and yields massive improvement: "GPL: Generative Pseudo Labeling for Unsupervised Domain Adaptation of Dense Retrieval" https://arxiv.org/abs/2112.07577
Language:Python321 6 3137
zhihao-chen/QASystemOnMedicalKG
A tutorial and implement of disease centered Medical knowledge graph and qa system based on it。知识图谱构建，自动问答，基于kg的自动问答。以疾病为中心的一定规模医药领域知识图谱，并以该知识图谱完成自动问答与分析服务。
Language:Python297 6 089
sfzhou5678/PolyEncoder
An unofficial implementation of Poly-encoder (Poly-encoders: Transformer Architectures and Pre-training Strategies for Fast and Accurate Multi-sentence Scoring)
Language:Python252 5 1036
Georgetown-IR-Lab/cedr
Code for CEDR: Contextualized Embeddings for Document Ranking, accepted at SIGIR 2019.
Language:Python156 16 3828
lijqhs/text-classification-cn
中文文本分类实践，基于搜狗新闻语料库，采用传统机器学习方法以及预训练模型等方法
Language:Python151 2 230
RUCAIBox/PLMPapers
A paper list of pre-trained language models (PLMs).
137 5 018
pl8787/wsdm2021-beyond-prp-tutorial
WSDM2021 Tutorial: Beyond Probability Ranking Principle: Modeling the Dependencies among Documents
24 4 05
irgroup/trec-covid
As part of the TREC-COVID challenge the Information Retrieval Research Group at Technische Hochschule Köln develops search and retrieval algorithms to support the search for relevant information on COVID-19.
Language:Python4 4 71