naykun's Stars
qier222/YesPlayMusic
高颜值的第三方网易云播放器,支持 Windows / macOS / Linux :electron:
remy/nodemon
Monitor for any changes in your node.js application and automatically restart the server - perfect for development
guidance-ai/guidance
A guidance language for controlling large language models.
BlinkDL/RWKV-LM
RWKV (pronounced RwaKuv) is an RNN with great LLM performance, which can also be directly trained like a GPT transformer (parallelizable). We are at RWKV-7 "Goose". So it's combining the best of RNN and transformer - great performance, linear time, constant space (no kv-cache), fast training, infinite ctx_len, and free sentence embedding.
salesforce/LAVIS
LAVIS - A One-stop Library for Language-Vision Intelligence
facebookresearch/dinov2
PyTorch code and models for the DINOv2 self-supervised learning method.
threestudio-project/threestudio
A unified framework for 3D content generation.
princeton-vl/infinigen
Infinite Photorealistic Worlds using Procedural Generation
tebelorg/RPA-Python
Python package for doing RPA
sanchit-gandhi/whisper-jax
JAX implementation of OpenAI's Whisper model for up to 70x speed-up on TPU.
xinyu1205/recognize-anything
Open-source and strong foundation image recognition models.
z-x-yang/Segment-and-Track-Anything
An open-source project dedicated to tracking and segmenting any objects in videos, either automatically or interactively. The primary algorithms utilized include the Segment Anything Model (SAM) for key-frame segmentation and Associating Objects with Transformers (AOT) for efficient tracking and propagation purposes.
One-2-3-45/One-2-3-45
[NeurIPS 2023] Official code of "One-2-3-45: Any Single Image to 3D Mesh in 45 Seconds without Per-Shape Optimization"
gitwatch/gitwatch
Watch a file or folder and automatically commit changes to a git repo easily.
NVIDIA/aistore
AIStore: scalable storage for AI applications
facebookresearch/home-robot
Mobile manipulation research tools for roboticists
Totoro97/f2-nerf
Fast neural radiance field training with free camera trajectories
allenai/mmc4
MultimodalC4 is a multimodal extension of c4 that interleaves millions of images with text.
showlab/Image2Paragraph
[A toolbox for fun.] Transform Image into Unique Paragraph with ChatGPT, BLIP2, OFA, GRIT, Segment Anything, ControlNet.
booydar/recurrent-memory-transformer
[NeurIPS 22] [AAAI 24] Recurrent Transformer-based long-context architecture.
StanfordVL/OmniGibson
OmniGibson: a platform for accelerating Embodied AI research built upon NVIDIA's Omniverse engine. Join our Discord for support: https://discord.gg/bccR5vGFEx
iejMac/video2dataset
Easily create large video dataset from video urls
facebookresearch/LaViLa
Code release for "Learning Video Representations from Large Language Models"
JialianW/GRiT
GRiT: A Generative Region-to-text Transformer for Object Understanding (https://arxiv.org/abs/2212.00280)
iejMac/video2numpy
Optimized library for large-scale extraction of frames and audio from video.
yilundu/cross_attention_renderer
CVPR 2023: Learning to Render Novel Views from Wide-Baseline Stereo Pairs
iejMac/clip-video-encode
Easily compute clip embeddings from video frames
JamesQFreeman/MicEye
Record radiologists' eye gaze when they are labeling images.
facebookresearch/vq2d_cvpr
This repo contains the code for the recipe of the winning entry to the Ego4d VQ2D challenge at CVPR 2022.
WikiChao/Ego-AV-Loc
[CVPR 2023] Egocentric Audio-Visual Object Localization