GPTAlgoPro's Stars
facebookresearch/fairseq
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
facebookresearch/audiocraft
Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable music generation LM with textual and melodic conditioning.
opendatalab/MinerU
A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具,将PDF转换成Markdown和JSON格式。
opencv/opencv_contrib
Repository for OpenCV's extra modules
IntelRealSense/librealsense
Intel® RealSense™ SDK
kyutai-labs/moshi
microsoft/MixedRealityToolkit-Unity
This repository is for the legacy Mixed Reality Toolkit (MRTK) v2. For the latest version of the MRTK please visit https://github.com/MixedRealityToolkit/MixedRealityToolkit-Unity
cinder/Cinder
Cinder is a community-developed, free and open source library for professional-quality creative coding in C++.
fudan-generative-vision/champ
Champ: Controllable and Consistent Human Image Animation with 3D Parametric Guidance
fudan-generative-vision/hallo2
Hallo2: Long-Duration and High-Resolution Audio-driven Portrait Image Animation
mindee/doctr
docTR (Document Text Recognition) - a seamless, high-performing & accessible library for OCR-related tasks powered by Deep Learning.
apple/ml-depth-pro
Depth Pro: Sharp Monocular Metric Depth in Less Than a Second.
gpt-omni/mini-omni
open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.
ictnlp/LLaMA-Omni
LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.
PJLab-ADG/SensorsCalibration
OpenCalib: A Multi-sensor Calibration Toolbox for Autonomous Driving
THUDM/GLM-4-Voice
GLM-4-Voice | 端到端中英语音对话模型
AprilRobotics/apriltag
AprilTag is a visual fiducial system popular for robotics research.
QwenLM/Qwen2-Audio
The official repo of Qwen2-Audio chat & pretrained large audio language model proposed by Alibaba Cloud.
pencilresearch/OpenScanner
Fast, reliable, and free document scanner app for iPhone
opendatalab/labelU
Data annotation toolbox supports image, audio and video data.
maiqingqiang/ChatMLX
🤖✨ChatMLX is a modern, open-source, high-performance chat application for MacOS based on large language models.
opendatalab/DocLayout-YOLO
DocLayout-YOLO: Enhancing Document Layout Analysis through Diverse Synthetic Data and Global-to-Local Adaptive Perception
Nightmare-n/DepthAnyVideo
Depth Any Video with Scalable Synthetic Data
qian256/HoloLensARToolKit
Marker tracking using the front-facing camera of HoloLens (both 1 and 2) and Unity, with a wrapper of ARToolKit built for UWP (Windows Universal Platform)
zhanshijinwat/Steel-LLM
Train a 1B LLM with 1T tokens from scratch by personal
Yaepiii/TRLO
YutaItoh/HMD-Calibration
Head-Mounted Display Calibration Toolbox (including Direct Linear Transform and eye localibation-based methods)
YutaItoh/HMD-Light-Field-Correction
fatihksubasi/spaam
Python implementation of SPAAM (Single point active alignment method) for optical see-through HMD calibration for AR
KaikiFather/OpenVR-SpaceCalibrator-Continuous-Calibration
Use tracked VR devices from one company with any other.