sduxz
Medical AI, Natural language processing, Deep learning
University of Chinese Academy of SciencesLos Angeles
sduxz's Stars
lancedb/lance
Modern columnar data format for ML and LLMs implemented in Rust. Convert from parquet in 2 lines of code for 100x faster random access, vector index, and data versioning. Compatible with Pandas, DuckDB, Polars, Pyarrow, and PyTorch with more integrations coming..
lancedb/lancedb
Developer-friendly, serverless vector database for AI applications. Easily add long-term memory to your LLM apps!
NielsRogge/Transformers-Tutorials
This repository contains demos I made with the Transformers library by HuggingFace.
microsoft/RecAI
Bridging LLM and Recommender System.
webdataset/webdataset
A high-performance Python-based I/O system for large (and small) deep learning problems, with strong support for PyTorch.
sz3/libcimbar
Optimized implementation for color-icon-matrix barcodes
NeoVertex1/SuperPrompt
SuperPrompt is an attempt to engineer prompts that might help us understand AI agents.
OFA-Sys/Chinese-CLIP
Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.
mlabonne/llm-datasets
High-quality datasets, tools, and concepts for LLM fine-tuning.
eza-community/eza
A modern alternative to ls
mosaicml/llm-foundry
LLM training code for Databricks foundation models
project-numina/aimo-progress-prize
microsoft/graphrag
A modular graph-based Retrieval-Augmented Generation (RAG) system
stopwords-iso/stopwords-en
English stopwords collection
goto456/stopwords
中文常用停用词表(哈工大停用词表、百度停用词表等)
baipengyan/Chinese-StopWords
中文常用的停用词(包含百度、哈工大、四川大学等词表)
stopwords-iso/stopwords-zh
Chinese stopwords collection
cambrian-mllm/cambrian
Cambrian-1 is a family of multimodal LLMs with a vision-centric design.
bilibili/Index-1.9B
A SOTA lightweight multilingual LLM
zou-group/textgrad
TextGrad: Automatic ''Differentiation'' via Text -- using large language models to backpropagate textual gradients.
mira-space/Mira
Psycoy/MixEval
The official evaluation suite and dynamic data release for MixEval.
embeddings-benchmark/mteb
MTEB: Massive Text Embedding Benchmark
2noise/ChatTTS
A generative speech model for daily dialogue.
multimodal-art-projection/MAP-NEO
lllyasviel/IC-Light
More relighting!
pytorch/torchtitan
A native PyTorch Library for large model training
comfyanonymous/ComfyUI
The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface.
huggingface/datatrove
Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.
Macaronlin/LLaMA3-Quantization
A repository dedicated to evaluating the performance of quantizied LLaMA3 using various quantization methods..