flydsc's Stars
yangjianxin1/Firefly
Firefly: 大模型训练工具,支持训练Qwen2.5、Qwen2、Yi1.5、Phi-3、Llama3、Gemma、MiniCPM、Yi、Deepseek、Orion、Xverse、Mixtral-8x7B、Zephyr、Mistral、Baichuan2、Llma2、Llama、Qwen、Baichuan、ChatGLM2、InternLM、Ziya2、Vicuna、Bloom等大模型
yoheinakajima/prettygraph
An experimental UI for text-to-knowledge-graph generation
D-Star-AI/dsRAG
High-performance retrieval engine for unstructured data
IR-LLM/Awesome-Information-Retrieval-in-the-Age-of-Large-Language-Model
A curated list of awesome papers about information retrieval(IR) in the age of large language model(LLM). These include retrieval augmented large language model, large language model for information retrieval, and so on.
datawhalechina/tiny-universe
《大模型白盒子构建指南》:一个全手搓的Tiny-Universe
CrazyBoyM/llama3-Chinese-chat
Llama3、Llama3.1 中文仓库(随书籍撰写中... 各种网友及厂商微调、魔改版本有趣权重 & 训练、推理、评测、部署教程视频 & 文档)
langchain-ai/rag-from-scratch
aws-samples/private-llm-qa-bot
houbb/sensitive-word
👮♂️The sensitive word tool for java.(敏感词/违禁词/违法词/脏词。基于 DFA 算法实现的高性能 java 敏感词过滤工具框架。请勿发布涉及政治、广告、营销、翻墙、违反国家法律法规等内容。高性能敏感词检测过滤组件,附带繁体简体互换,支持全角半角互换,汉字转拼音,模糊搜索等功能。)
misbahsy/RAGTune
Tuning and Evaluation of RAG pipeline. (Automated optimization to be added soon)
mistralai-sf24/hackathon
zhangzhao219/WSDM-Cup-2024
1st Solution For Conversational Multi-Doc QA Workshop & International Challenge @ WSDM'24 - Xiaohongshu.Inc
ckiplab/ckip-transformers
CKIP Transformers
OpenBMB/MiniCPM
MiniCPM3-4B: An edge-side LLM that surpasses GPT-3.5-Turbo.
linexjlin/GPTs
leaked prompts of GPTs
esbatmop/MNBVC
MNBVC(Massive Never-ending BT Vast Chinese corpus)超大规模中文语料集。对标chatGPT训练的40T数据。MNBVC数据集不但包括主流文化,也包括各个小众文化甚至火星文的数据。MNBVC数据集包括新闻、作文、小说、书籍、杂志、论文、台词、帖子、wiki、古诗、歌词、商品介绍、笑话、糗事、聊天记录等一切形式的纯文本中文数据。
jacksonllee/pycantonese
Cantonese Linguistics and NLP
charent/Phi2-mini-Chinese
Phi2-Chinese-0.2B 从0开始训练自己的Phi2中文小模型,支持接入langchain加载本地知识库做检索增强生成RAG。Training your own Phi2 small chat model from scratch.
mistralai/mistral-inference
Official inference library for Mistral models
CorentinJ/Real-Time-Voice-Cloning
Clone a voice in 5 seconds to generate arbitrary speech in real-time
m-bain/whisperX
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
SumanthRH/tokenization
A comprehensive deep dive into the world of tokens
FlagOpen/FlagEmbedding
Retrieval and Retrieval-augmented LLMs
microsoft/generative-ai-for-beginners
18 Lessons, Get Started Building with Generative AI 🔗 https://microsoft.github.io/generative-ai-for-beginners/
shane-kercheval/llm-workflow
create workflows with LLMs
datawhalechina/vced
VCED 可以通过你的文字描述来自动识别视频中相符合的片段进行视频剪辑。该项目基于跨模态搜索与向量检索技术搭建,通过前后端分离的模式,帮助你快速的接触新一代搜索技术。
jzhang38/TinyLlama
The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.
danielgross/localpilot
easychen/botchan
基于微信测试号的ChatBot,对接OpenAI API
Wang-Shuo/A-Guide-to-Retrieval-Augmented-LLM
an intro to retrieval augmented large language model