Sunburst0614's Stars
Hoper-J/AI-Guide-and-Demos-zh_CN
这是一份入门AI/LLM大模型的逐步指南,包含教程和演示代码,带你从API走进本地大模型部署和微调,代码文件会提供Kaggle或Colab在线版本,即便没有显卡也可以进行学习。项目中还开设了一个小型的代码游乐场🎡,你可以尝试在里面实验一些有意思的AI脚本。同时,包含李宏毅 (HUNG-YI LEE)2024生成式人工智能导论课程的完整中文镜像作业。
yuanzhoulvpi2017/zero_nlp
中文nlp解决方案(大模型、数据、模型、训练、推理)
jsksxs360/How-to-use-Transformers
Transformers 库快速入门教程
hellotransformers/Natural_Language_Processing_with_Transformers
Natural Language Processing with Transformers 中译本,最权威Transformers教程
nlp-with-transformers/notebooks
Jupyter notebooks for the Natural Language Processing with Transformers book
mlabonne/llm-course
Course to get into Large Language Models (LLMs) with roadmaps and Colab notebooks.
DSXiangLi/CTR
CTR模型代码和学习笔记总结
hrwleo/dwnlpinterview
Datawhale NLP 面筋
huggingface/transformers
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
datawhalechina/so-large-lm
大模型基础: 一文了解大模型基础知识
zyxdtk/RecSys-Notes
记录推荐系统相关的面试题、优化经验
hannawong/MLE-interview
该仓库记录搜索推荐算法工程师的必备面试知识点+paper
hongleizhang/RSPapers
RSTutorials: A Curated List of Must-read Papers on Recommender System.
eclipse-sumo/sumo
Eclipse SUMO is an open source, highly portable, microscopic and continuous traffic simulation package designed to handle large networks. It allows for intermodal simulation including pedestrians and comes with a large set of tools for scenario creation.
SakanaAI/evolutionary-model-merge
Official repository of Evolutionary Optimization of Model Merging Recipes
arcee-ai/mergekit
Tools for merging pretrained large language models.
sangmichaelxie/doremi
Pytorch implementation of DoReMi, a method for optimizing the data mixture weights in language modeling datasets
BHOSC/BUAAthesis
北航毕设论文LaTeX模板
wdndev/llm_interview_note
主要记录大语言大模型(LLMs) 算法(应用)工程师相关的知识及面试题
DLLXW/baby-llama2-chinese
用于从头预训练+SFT一个小参数量的中文LLaMa2的仓库;24G单卡即可运行得到一个具备简单中文问答能力的chat-llama2.
modelscope/data-juicer
Making data higher-quality, juicier, and more digestible for foundation models! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷为大模型提供更高质量、更丰富、更易”消化“的数据!
afatcoder/LeetcodeTop
汇总各大互联网公司容易考察的高频leetcode题🔥
microsoft/LMOps
General technology for enabling AI capabilities w/ LLMs and MLLMs
datawhalechina/self-llm
《开源大模型食用指南》针对**宝宝量身打造的基于Linux环境快速微调(全参数/Lora)、部署国内外开源大模型(LLM)/多模态大模型(MLLM)教程
CASIA-LM/ChineseWebText
facebookresearch/SemDeDup
Code for "SemDeDup", a simple method for identifying and removing semantic duplicates from a dataset (data pairs which are semantically similar, but not exactly identical).
ChenghaoMou/text-dedup
All-in-one text de-duplication
p-lambda/dsir
DSIR large-scale data selection framework for language model training
google-research/deduplicate-text-datasets
google/sentencepiece
Unsupervised text tokenizer for Neural Network-based text generation.