amenota's Stars
dorianbrown/rank_bm25
A Collection of BM25 Algorithms in Python
SupritYoung/RLHF-Label-Tool
用于大模型 RLHF 进行人工数据标注排序的工具。A tool for manual response data annotation sorting in RLHF stage.
junzeng-pluto/ChineseSquad
中文机器阅读理解数据集
hiyouga/LLaMA-Factory
Unified Efficient Fine-Tuning of 100+ LLMs (ACL 2024)
microsoft/DeepSpeedExamples
Example models using DeepSpeed
OpenMOSS/MOSS
An open-source tool-augmented conversational language model from Fudan University
hiyouga/ChatGLM-Efficient-Tuning
Fine-tuning ChatGLM-6B with PEFT | 基于 PEFT 的高效 ChatGLM 微调
yangjianxin1/LLMPruner
yangjianxin1/Firefly
Firefly: 大模型训练工具,支持训练Qwen2.5、Qwen2、Yi1.5、Phi-3、Llama3、Gemma、MiniCPM、Yi、Deepseek、Orion、Xverse、Mixtral-8x7B、Zephyr、Mistral、Baichuan2、Llma2、Llama、Qwen、Baichuan、ChatGLM2、InternLM、Ziya2、Vicuna、Bloom等大模型
microsoft/DeepSpeed
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
huggingface/peft
🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.
liucongg/ChatGLM-Finetuning
基于ChatGLM-6B、ChatGLM2-6B、ChatGLM3-6B模型,进行下游具体任务微调,涉及Freeze、Lora、P-tuning、全参微调等
project-baize/baize-chatbot
Let ChatGPT teach your own chatbot in hours with a single GPU!
ymcui/Chinese-LLaMA-Alpaca
中文LLaMA&Alpaca大语言模型+本地CPU/GPU训练部署 (Chinese LLaMA & Alpaca LLMs)
LianjiaTech/BELLE
BELLE: Be Everyone's Large Language model Engine(开源中文对话大模型)
yizhongw/self-instruct
Aligning pretrained language models with instruction data generated by themselves.
getcursor/cursor
The AI Code Editor
HarderThenHarder/transformers_tasks
⭐️ NLP Algorithms with transformers lib. Supporting Text-Classification, Text-Generation, Information-Extraction, Text-Matching, RLHF, SFT etc.
chenweiphd/LargeLanguageModel-and-GPT-4-ResourceMap
hgliyuhao/ActiveLearing4NER
budzianowski/multiwoz
Source code for end-to-end dialogue model from the MultiWOZ paper (Budzianowski et al. 2018, EMNLP)
juand-r/entity-recognition-datasets
A collection of corpora for named entity recognition (NER) and entity recognition tasks. These annotated datasets cover a variety of languages, domains and entity types.
eduosi/district
**省/自治区/直辖市、市/自治州、区/县/旗数据,包含名称、拼音、拼音首字母、行政代码、区号
facebookresearch/XLM
PyTorch original implementation of Cross-lingual Language Model Pretraining.
google-research/xtreme
XTREME is a benchmark for the evaluation of the cross-lingual generalization ability of pre-trained multilingual models that covers 40 typologically diverse languages and includes nine tasks.
huggingface/tokenizers
💥 Fast State-of-the-Art Tokenizers optimized for Research and Production
songyingxin/NLPer-Interview
该仓库主要记录 NLP 算法工程师相关的面试题
powerycy/Efficient-GlobalPointer
pytorch Efficient GlobalPointer
tonngw/LeetCode021
🚀 LeetCode From Zero To One & 题单整理 & 题解分享 & 算法模板 & 刷题路线,持续更新中...
saffsd/langid.py
Stand-alone language identification system