TerryH-dog
My name is Jimmy Uccio, I am a college student and most of the time I have to study. I enjoy rap and coding, as well as playing football.
Beijing
TerryH-dog's Stars
amirbar/visual_prompting
Official implementation and data release of the paper "Visual Prompting via Image Inpainting".
IDEA-Research/GroundingDINO
[ECCV 2024] Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"
QwenLM/Qwen2.5
Qwen2.5 is the large language model series developed by Qwen team, Alibaba Cloud.
OpenGVLab/VisionLLM
VisionLLM Series
rom1504/img2dataset
Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.
SkyworkAI/Vitron
NeurIPS 2024 Paper: A Unified Pixel-level Vision LLM for Understanding, Generating, Segmenting, Editing
bytedance/1d-tokenizer
This repo contains the code for 1D tokenizer and generator
PixArt-alpha/PixArt-alpha
PixArt-α: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis
CompVis/latent-diffusion
High-Resolution Image Synthesis with Latent Diffusion Models
AFeng-x/PixWizard
mit-han-lab/vila-u
[ICLR 2025] VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation
PKU-YuanGroup/Open-Sora-Plan
This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.
PKU-YuanGroup/Video-LLaVA
【EMNLP 2024🔥】Video-LLaVA: Learning United Visual Representation by Alignment Before Projection
TencentARC/SEED-Story
SEED-Story: Multimodal Long Story Generation with Large Language Model
AILab-CVC/SEED-X
Multimodal Models in Real World
rongyaofang/PUMA
Empowering Unified MLLM with Multi-granular Visual Generation
deepseek-ai/Janus
Janus-Series: Unified Multimodal Understanding and Generation Models
baaivision/Emu3
Next-Token Prediction is All You Need
VectorSpaceLab/OmniGen
OmniGen: Unified Image Generation. https://arxiv.org/pdf/2409.11340
showlab/Show-o
Repository for Show-o, One Single Transformer to Unify Multimodal Understanding and Generation.
myshell-ai/OpenVoice
Instant voice cloning by MIT and MyShell. Audio foundation model.
ShieldMnt/invisible-watermark
python library for invisible image watermark (blind image watermark)
githubvpn007/proxy
proxy、proxy server、代理协议、代理服务器、代理协议说明、ss、ssr、v2ray、trojan、Clash、V2rayN、Qv2ray、V2rayW、V2RayS、Mellow、V2rayX、V2rayU、ClashX、Kitsunebi、BifrostV、i2Ray 、Quantumult、Surge 4、winXray、Qv2ray、Kitsunebi、Trojan-Qt5
githubvpn007/v2rayNvpn
翻墙、免费翻墙、免费科学上网、免费节点、免费梯子、免费ss/ssr/v2ray/trojan节点、蓝灯、谷歌商店、翻墙梯子 、外网游戏、国外游戏、vpn、vpn推荐、每天更新、上外网、外网、V2rayN、Qv2ray、V2rayW、V2RayS、Mellow、V2rayX、V2rayU、ClashX、Kitsunebi、BifrostV、i2Ray 、Quantumult、Surge 4、winXray、Qv2ray、Kitsunebi、Trojan-Qt5、代理服务器、机场、马里奥、魔兽世界、poshMark、亚马逊、虾皮、煤炉、Mercari、外贸
teacherpeterpan/self-correction-llm-papers
This is a collection of research papers for Self-Correcting Large Language Models with Automated Feedback.
OpenInterpreter/open-interpreter
A natural language interface for computers
wilicc/gpu-burn
Multi-GPU CUDA stress test
kourgeorge/arxiv-style
A Latex style and template for paper preprints (based on NIPS style)
tatarchm/tangent_conv
Tangent Convolutions for Dense Prediction in 3D
drprojects/superpoint_transformer
Official PyTorch implementation of Superpoint Transformer introduced in [ICCV'23] "Efficient 3D Semantic Segmentation with Superpoint Transformer" and SuperCluster introduced in [3DV'24 Oral] "Scalable 3D Panoptic Segmentation As Superpoint Graph Clustering"