Pinned Repositories
Count-Anything
This method uses Segment Anything and CLIP to ground and count any object that matches a custom text prompt, without requiring any point or box annotation.
f2-nerf
Fast neural radiance field training with free camera trajectories
GL-RG
The code of IJCAI22 paper "GL-RG: Global-Local Representation Granularity for Video Captioning".
HAF
This repo is the PyTorch implementation of ICASSP2021 paper "HIERARCHICAL ATTENTION FUSION FOR GEO-LOCALIZATION"
smerf-3d.github.io
STC-Seg
TCSVT Paper: Solve the Puzzle of Instance Segmentation in Videos: A Weakly Supervised Framework with Spatio-Temporal Collaboration
Unbounded-NeRF
UniAD
[CVPR 2023 Award Candidate] Planning-oriented Autonomous Driving
visprog
Visual Programming: Compositional visual reasoning without training (CVPR 2023)
ylqi.github.io
Homepage
ylqi's Repositories
ylqi/Count-Anything
This method uses Segment Anything and CLIP to ground and count any object that matches a custom text prompt, without requiring any point or box annotation.
ylqi/GL-RG
The code of IJCAI22 paper "GL-RG: Global-Local Representation Granularity for Video Captioning".
ylqi/HAF
This repo is the PyTorch implementation of ICASSP2021 paper "HIERARCHICAL ATTENTION FUSION FOR GEO-LOCALIZATION"
ylqi/STC-Seg
TCSVT Paper: Solve the Puzzle of Instance Segmentation in Videos: A Weakly Supervised Framework with Spatio-Temporal Collaboration
ylqi/ylqi.github.io
Homepage
ylqi/f2-nerf
Fast neural radiance field training with free camera trajectories
ylqi/smerf-3d.github.io
ylqi/Unbounded-NeRF
ylqi/UniAD
[CVPR 2023 Award Candidate] Planning-oriented Autonomous Driving
ylqi/visprog
Visual Programming: Compositional visual reasoning without training (CVPR 2023)
ylqi/alpha_visualizer
Visualize the radiance field as point clouds in Switch-NeRF.
ylqi/clbrobot_project
Video language navigation client
ylqi/diffusionmagic
Easy to use Stable diffusion workflows using diffusers (WIP)
ylqi/Image2Paragraph
Transform Image into Unique Paragraph with ChatGPT, BLIP2, OFA, GRIT, Segment Anything, ControlNet.
ylqi/Mask2Former
Code release for "Masked-attention Mask Transformer for Universal Image Segmentation"
ylqi/NeRF-SLAM
NeRF-SLAM: Real-Time Dense Monocular SLAM with Neural Radiance Fields. https://arxiv.org/abs/2210.13641 + Sigma-Fusion: Probabilistic Volumetric Fusion for Dense Monocular SLAM https://arxiv.org/abs/2210.01276
ylqi/NeRF_RPN
ylqi/ODISE
ODISE: Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion Models
ylqi/Segment-and-Track-Anything
An open-source project dedicated to tracking and segmenting any objects in videos, either automatically or interactively. The primary algorithms utilized include the Segment Anything Model (SAM) for key-frame segmentation and Associating Objects with Transformers (AOT) for efficient tracking and propagation purposes.
ylqi/text2room
Text2Room generates textured 3D meshes from a given text prompt using 2D text-to-image models.
ylqi/transformers
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
ylqi/X-Decoder
X-Decoder for generalized decoding for pixel, image and language
ylqi/yolov8_tracking
Real-time multi-object tracking and segmentation using YOLOv8
ylqi/HC-STVG
The HC-STVG Dataset
ylqi/mcvd-pytorch
Official implementation of MCVD: Masked Conditional Video Diffusion for Prediction, Generation, and Interpolation (https://arxiv.org/abs/2205.09853)
ylqi/Pathfinding-Algorithm
A pathfinding algorithm for self-driving delivery vehicles.
ylqi/prompt-to-prompt
ylqi/pytorch-image-models
PyTorch image models, scripts, pretrained weights -- ResNet, ResNeXT, EfficientNet, EfficientNetV2, NFNet, Vision Transformer, MixNet, MobileNet-V3/V2, RegNet, DPN, CSPNet, and more
ylqi/robodreamer
ylqi/vit-pytorch
Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch