Emanual20
3rd-year M.S. student of Gaoling School of Artificial Intelligence, Renmin University of China @RUC-GSAI
Renmin University of ChinaHaidian, Beijing
Emanual20's Stars
315386775/DeepLearing-Interview-Awesome-2024
AIGC-interview/CV-interview/LLMs-interview面试问题与答案集合仓,同时包含工作和科研过程中的新想法、新问题、新资源与新项目
mlfoundations/open_clip
An open source implementation of CLIP.
vllm-project/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
mlfoundations/MINT-1T
MINT-1T: A one trillion token multimodal interleaved dataset.
NLP2CT/LLM-generated-Text-Detection
A survey and reflection on the latest research breakthroughs in LLM-generated Text detection, including data, detectors, metrics, current issues and future directions.
THUDM/GLM-4
GLM-4 series: Open Multilingual Multimodal Chat LMs | 开源多语言多模态对话模型
lyy1994/awesome-data-contamination
The Paper List on Data Contamination for Large Language Models Evaluation.
openai/gpt-2-output-dataset
Dataset of GPT-2 outputs for research in detection, biases, and more
RUCAIBox/LLMBox
A comprehensive library for implementing LLMs, including a unified training pipeline and comprehensive model evaluation.
liyucheng09/LatestEval
Latest Evaluation Toolkit (LatestEval). Assessing the language models with latest, uncontaminated materials.
acl-org/acl-style-files
Official style files for papers submitted to venues of the Association for Computational Linguistics
srush/Tensor-Puzzles
Solve puzzles. Improve your pytorch.
DjangoPeng/LLM-quickstart
Quick Start for Large Language Models (Theoretical Learning and Practical Fine-tuning) 大语言模型快速入门(理论学习与微调实战)
liyucheng09/llm-compressive
Longitudinal Evaluation of LLMs via Data Compression
meta-llama/llama3
The official Meta Llama 3 GitHub site
wangshusen/RecommenderSystem
QwenLM/Qwen
The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.
openai/tiktoken
tiktoken is a fast BPE tokeniser for use with OpenAI's models.
OpenBMB/MiniCPM
MiniCPM-2B: An end-side LLM outperforming Llama2-13B.
ChenghaoMou/text-dedup
All-in-one text de-duplication
lixin4ever/Conference-Acceptance-Rate
Acceptance rates for the major AI conferences
RUC-GSAI/Yulan-GARDEN
Official Repository for SIGIR2024 Demo Paper "An Integrated Data Processing Framework for Pretraining Foundation Models"
DaoD/ResearchFigure
Some example codes for drawing figures in research paper
openai/evals
Evals is a framework for evaluating LLMs and LLM systems, and an open-source registry of benchmarks.
noanabeshima/wikipedia-downloader
Downloads 2020 English Wikipedia articles as plaintext
openai/gpt-2
Code for the paper "Language Models are Unsupervised Multitask Learners"
ray-project/llm-numbers
Numbers every LLM developer should know
modelscope/data-juicer
A one-stop data processing system to make data higher-quality, juicier, and more digestible for (multimodal) LLMs! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷为大模型提供更高质量、更丰富、更易”消化“的数据!
liyucheng09/Contamination_Detector
Lightweight tool to identify Data Contamination in LLMs evaluation
LDNOOBW/List-of-Dirty-Naughty-Obscene-and-Otherwise-Bad-Words
List of Dirty, Naughty, Obscene, and Otherwise Bad Words