innerfirexy's Stars
karpathy/llm.c
LLM training in simple, raw C/CUDA
eugeneyan/open-llms
📋 A list of open LLMs available for commercial use.
RUCAIBox/LLMSurvey
The official GitHub page for the survey paper "A Survey of Large Language Models".
statsmodels/statsmodels
Statsmodels: statistical modeling and econometrics in Python
karpathy/minbpe
Minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization.
lancopku/pkuseg-python
pkuseg多领域中文分词工具; The pkuseg toolkit for multi-domain Chinese word segmentation
CLUEbenchmark/CLUEDatasetSearch
搜索所有中文NLP数据集,附常用英文NLP数据集
baidu/lac
百度NLP:分词,词性标注,命名实体识别,词重要性
LnL7/nix-darwin
nix modules for darwin
ownthink/Jiagu
Jiagu深度学习自然语言处理工具 知识图谱关系抽取 中文分词 词性标注 命名实体识别 情感分析 新词发现 关键词 文本摘要 文本聚类
hankcs/pyhanlp
中文分词
google/BIG-bench
Beyond the Imitation Game collaborative benchmark for measuring and extrapolating the capabilities of language models
EleutherAI/pythia
The hub for EleutherAI's work on interpretability and learning dynamics
thunlp/UltraChat
Large-scale, Informative, and Diverse Multi-round Chat Data (and Models)
HillZhang1999/llm-hallucination-survey
Reading list of hallucination in LLMs. Check out our new survey paper: "Siren’s Song in the AI Ocean: A Survey on Hallucination in Large Language Models"
rowanz/grover
Code for Defending Against Neural Fake News, https://rowanzellers.com/grover/
stanfordnlp/pyvene
Stanford NLP Python Library for Understanding and Improving PyTorch Models via Interventions
mapull/chinese-dictionary
中文汉语拼音辞典,汉字拼音字典,词典,成语词典,常用字、多音字字典数据库
likenneth/honest_llama
Inference-Time Intervention: Eliciting Truthful Answers from a Language Model
jlko/semantic_uncertainty
Codebase for reproducing the experiments of the semantic uncertainty paper (short-phrase and sentence-length experiments).
FreedomIntelligence/Huatuo-26M
The Largest-scale Chinese Medical QA Dataset: with 26,000,000 question answer pairs.
gmftbyGMFTBY/Copyisallyouneed
[ICLR 2023] Codebase for Copy-Generator model, including an implementation of kNN-LM
ttzHome/AnchiBERT
AnchiBERT: A Pre-Trained Model for Ancient Chinese Language Understanding and Generation(古文预训练模型)
gpoesia/minbert-default-final-project
CS 224N Winter 2023 Default Final Project: Multitask BERT
GongFuXiong/Chinese-Medical-Question-Answering-System
TensorFlow for Chinese Medical Question Answering(question Answer matching) by LSTM/CNN/LSTM_ATTENTION/IARNN-GATE
dayihengliu/a2m_chineseNMT
Dataset for TALLIP2019 paper "Ancient-Modern Chinese Translation with a New Large Training Dataset"
viking-sudo-rm/rusty-dawg
Rust library for indexing and quickly searching large pretraining corpora
Andrea-de-Varda/surprisal-across-languages
Code to calculate surprisal values from multilingual XGLM models.
bstee615/shared-hf-cache
tpimentelms/probability-of-a-word
Code to compute a word's probability using the fixes from "How to Compute the Probability of a Word"