Pinned Repositories
anything-llm
A multi-user ChatGPT for any LLMs and vector database. Unlimited documents, messages, and storage in one privacy-focused app. Now available as a desktop application!
AVT
Code release for ICCV 2021 paper "Anticipative Video Transformer"
ChatTTS
A generative speech model for daily dialogue.
CosyVoice
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
Diffree
Diffree: Text-Guided Shape Free Object Inpainting with Diffusion Model
mysqlc
mysql c example
segment-anything-2
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
udptunnel-1.1
udptunnel-1.1
UDT
UDT: Breaking the Data Transfer Bottleneck UDT is a reliable UDP based application level data transport protocol for distributed data intensive applications over wide area high-speed networks. UDT uses UDP to transfer bulk data with its own reliability control and congestion control mechanisms. The new protocol can transfer data at a much higher speed than TCP does. UDT is also a highly configurable framework that can accommodate various congestion control algorithms.
UniTalker
gary109's Repositories
gary109/CogVideo
Text-to-video generation: CogVideoX (2024) and CogVideo (ICLR 2023)
gary109/doctr
docTR (Document Text Recognition) - a seamless, high-performing & accessible library for OCR-related tasks powered by Deep Learning.
gary109/segment-anything-2
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
gary109/UniTalker
gary109/AI-Scientist
The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery 🧑🔬
gary109/airllm
AirLLM 70B inference with single 4GB GPU
gary109/axlearn
An Extensible Deep Learning Library
gary109/browser-use
Open-Source Web Automation library with any LLM
gary109/cvat
Annotate better with CVAT, the industry-leading data engine for machine learning. Used and trusted by teams at any scale, for data of any scale.
gary109/Deep-Live-Cam
real time face swap and one-click video deepfake with only a single image
gary109/facefusion
Next generation face swapper and enhancer
gary109/FLAME-Universe
Summary of publicly available ressources such as code, datasets, and scientific papers for the FLAME 3D head model
gary109/FruitNeRF
[IROS24] Offical Code for "FruitNeRF: A Unified Neural Radiance Field based Fruit Counting Framework" - Inegrated into Nerfstudio
gary109/GenerativePhotomontage
gary109/insightface
State-of-the-art 2D and 3D Face Analysis Project
gary109/LongWriter
LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs
gary109/Medical-SAM2
Medical SAM 2: Segment Medical Images As Video Via Segment Anything Model 2
gary109/mPLUG-Owl
mPLUG-Owl: The Powerful Multi-modal Large Language Model Family
gary109/notebooks
Examples and tutorials on using SOTA computer vision models and techniques. Learn everything from old-school ResNet, through YOLO and object-detection transformers like DETR, to the latest models like Grounding DINO and SAM.
gary109/open-battery-information
gary109/openchat
OpenChat: Advancing Open-source Language Models with Imperfect Data
gary109/OpenResearcher
gary109/ovavss
Official Implementation of "Open-Vocabulary Audio-Visual Semantic Segmentation" [ACM MM 2024 Oral].
gary109/PeriodWave
The official Implementation of PeriodWave and PeriodWave-Turbo
gary109/PPOCRLabel
PPOCRLabelv2 is a semi-automatic graphic annotation tool suitable for OCR field, with built-in PP-OCR model to automatically detect and re-recognize data.
gary109/pytorch3d
PyTorch3D is FAIR's library of reusable components for deep learning with 3D data
gary109/RAGFoundry
Framework for specializing LLMs for retrieval-augmented-generation tasks using fine-tuning.
gary109/sprite-decompose
Fast Sprite Decomposition from Animated Graphics [ECCV2024]
gary109/ultralytics
NEW - YOLOv8 🚀 in PyTorch > ONNX > OpenVINO > CoreML > TFLite
gary109/WavTokenizer
SOTA discrete acoustic codec models with 40 tokens per second for audio language modeling