Pinned Repositories
awesome-chinese-nlp
A curated list of resources for NLP (Natural Language Processing) for Chinese 中文自然语言处理相关资料
awesome-gulp-cn
Gulp资料大全:入门、插件、包等,已完结。
ChatScript
Natural Language tool/dialog manager
insurance-clause-pdf-format
保险条款pdf数据结构化
KeywordExtractor
使用python实现了一个简单的trie树结构,可增加/查找/删除关键词,用于中文的关键词匹配。
myflashtext
快速的中文字符串匹配小工具
rasa-nlu-trainer
GUI for editing rasa-nlu training data
rasa_core
machine learning based dialogue engine for conversational software
rasa_nlu
turn natural language into structured data
spark-ml-source-analysis
spark ml 算法原理剖析以及具体的源码实现分析
wuxiaobo's Repositories
wuxiaobo/alpaca-chinese-dataset
Alpaca Chinese Dataset -- 中文指令微调数据集【持续更新】
wuxiaobo/Alpaca-family-library
Summarize all low-cost replication methods for Chatgpt. It is believed that with the improvement of data and model fine-tuning techniques, small models suitable for various segmented fields will continue to emerge and have better performance.
wuxiaobo/BELLE
BELLE: Be Everyone's Large Language model Engine(开源中文对话大模型)
wuxiaobo/chatgpt-on-wechat
基于大模型搭建的聊天机器人,同时支持 微信公众号、企业微信应用、飞书、钉钉 等接入,可选择GPT3.5/GPT-4o/GPT-o1/ Claude/文心一言/讯飞星火/通义千问/ Gemini/GLM-4/Claude/Kimi/LinkAI,能处理文本、语音和图片,访问操作系统和互联网,支持基于自有知识库进行定制企业智能客服。
wuxiaobo/ChatGPTX-Uni
实现一种单/多Lora-Fusion权值交叉融合+Zero-Finetune零微调增强的跨模型技术方案,LLM-Base+LLM-X+Alpaca,初期,LLM-Base为Chatglm6B底座模型,LLM-X是LLAMA增强模型。该方案简易高效,目标是使此类语言模型能够低能耗广泛部署,并最终在小模型的基座上发生“智能涌现”,力图最小计算代价达成ChatGPT、GPT4、ChatRWKV等人类友好亲和效果。后期将以此为中心模型大脑Agent,进一步融入并指挥CV目标检测、文本图像生成、语音命令交互等执行模型。当前可以满足总结、提问、问答、摘要、改写、评论、扮演等各种需求。
wuxiaobo/Chinese-LLaMA-Alpaca
中文LLaMA&Alpaca大语言模型+本地部署 (Chinese LLaMA & Alpaca LLMs)
wuxiaobo/ColossalAI
Making large AI models cheaper, faster and more accessible
wuxiaobo/dash-infer
DashInfer is a native LLM inference engine aiming to deliver industry-leading performance atop various hardware architectures, including x86 and ARMv9.
wuxiaobo/DeepLearningSystem
Deep Learning System core principles introduction.
wuxiaobo/EACL2023_Tutorial_Dialogue_Summarization
wuxiaobo/hello-algo
《Hello 算法》:动画图解、一键运行的数据结构与算法教程,支持 Python, C++, Java, C#, Go, Swift, JS, TS, Dart, Rust, C, Zig 等语言。English edition ongoing
wuxiaobo/InstructDS
EMNLP 2023: Instructive Dialogue Summarization with Query Aggregations
wuxiaobo/LKY_OfficeTools
一键自动化 下载、安装、激活 Office 的利器。
wuxiaobo/llm-course
Course to get into Large Language Models (LLMs) with roadmaps and Colab notebooks.
wuxiaobo/llm-inference-benchmark
LLM Inference benchmark
wuxiaobo/lobe-chat
🤖 Lobe Chat - an open-source, high-performance chatbot framework that supports speech synthesis, multimodal, and extensible Function Call plugin system. Supports one-click free deployment of your private ChatGPT/LLM web application.
wuxiaobo/LongBench
LongBench: A Bilingual, Multitask Benchmark for Long Context Understanding
wuxiaobo/NarratoAI
利用AI大模型,一键解说并剪辑视频; Using AI models to automatically provide commentary and edit videos with a single click.
wuxiaobo/notebooks
Jupyter notebooks for the Natural Language Processing with Transformers book
wuxiaobo/Open-Llama
The complete training code of the open-source high-performance Llama model, including the full process from pre-training to RLHF.
wuxiaobo/OpenRLHF
A Ray-based High-performance RLHF framework (Support 70B+ full tuning & LoRA & Mixtral)
wuxiaobo/parler-tts
Inference and training library for high-quality TTS models.
wuxiaobo/pyannote-audio
Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
wuxiaobo/server
The Triton Inference Server provides an optimized cloud and edge inferencing solution.
wuxiaobo/surya
Accurate line-level text detection and recognition (OCR) in any language
wuxiaobo/task-decompose-quickstart
wuxiaobo/tensor_parallel
Automatically split your PyTorch models on multiple GPUs for training & inference
wuxiaobo/trt-samples-for-hackathon-cn
Simple samples for TensorRT programming
wuxiaobo/UER-py
Open Source Pre-training Model Framework in PyTorch & Pre-trained Model Zoo
wuxiaobo/vector-search
The definitive guide to using Vector Search to solve your semantic search production workload needs.