Pinned Repositories
100-Days-Of-ML-Code
100-Days-Of-ML-Code中文版
2d-gaussian-splatting
[SIGGRAPH'24] 2D Gaussian Splatting for Geometrically Accurate Radiance Fields
chainer
A flexible framework of neural networks for deep learning
darknet
YOLOv4 - Neural Networks for Object Detection (Windows and Linux version of Darknet )
hand-gesture-recognition-mediapipe
This is a sample program that recognizes hand signs and finger gestures with a simple MLP using the detected key points. Handpose is estimated using MediaPipe.
LLaVA
Large Language-and-Vision Assistant built towards multimodal GPT-4 level capabilities.
mediapipe
Cross-platform, customizable ML solutions for live and streaming media.
Mesh-Flow-Video-Stabilization
Online video stabilization using a novel MeshFlow motion model
RealBasicVSR
Official repository of "Investigating Tradeoffs in Real-World Video Super-Resolution"
VS-Net
VS-Net: Voting with Segmentation for Visual Localization
arctanbell's Repositories
arctanbell/LLaVA
Large Language-and-Vision Assistant built towards multimodal GPT-4 level capabilities.
arctanbell/act-plus-plus
Imitation Learning algorithms with Co-traing for Mobile ALOHA: ACT, Diffusion Policy, VINN
arctanbell/Auto-GPT
An experimental open-source attempt to make GPT-4 fully autonomous.
arctanbell/BLIP
PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
arctanbell/build-openwrt
利用Actions在线云编译openwrt固件,适合官方源码,lede,lienol和immortalwrt源码,支持X86,电视盒子等众多设备!
arctanbell/COSA
Codes and Models for COSA: Concatenated Sample Pretrained Vision-Language Foundation Model
arctanbell/DirectionNet
Wide-Baseline Relative Camera Pose Estimation with Directional Learning (CVPR 2021)
arctanbell/docker-openwrt
OpenWrt running in Docker
arctanbell/fairseq
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
arctanbell/FastChat
An open platform for training, serving, and evaluating large languages. Release repo for Vicuna and FastChat-T5.
arctanbell/Grounded-Segment-Anything
Marrying Grounding DINO with Segment Anything & Stable Diffusion & Tag2Text & BLIP & Whisper & ChatBot - Automatically Detect , Segment and Generate Anything with Image, Text, and Audio Inputs
arctanbell/GroundingDINO
The official implementation of "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"
arctanbell/GVL
Official implementation for paper Learning Grounded Vision-Language Representation for Versatile Understanding in Untrimmed Videos
arctanbell/LAVIS
LAVIS - A One-stop Library for Language-Vision Intelligence
arctanbell/LGT-Net
This is PyTorch implementation of our paper "LGT-Net: Indoor Panoramic Room Layout Estimation with Geometry-Aware Transformer Network".(CVPR'22)
arctanbell/llama-recipes
Examples and recipes for Llama 2 model
arctanbell/llama2.c
Inference Llama 2 in one file of pure C
arctanbell/mobile-aloha
Mobile ALOHA: Learning Bimanual Mobile Manipulation with Low-Cost Whole-Body Teleoperation
arctanbell/neuralangelo
Official implementation of "Neuralangelo: High-Fidelity Neural Surface Reconstruction" (CVPR 2023)
arctanbell/ParlAI
A framework for training and evaluating AI models on a variety of openly available dialogue datasets.
arctanbell/PDVC
End-to-End Dense Video Captioning with Parallel Decoding (ICCV 2021)
arctanbell/PythonRobotics
Python sample codes for robotics algorithms.
arctanbell/pytorch-image-models
PyTorch image models, scripts, pretrained weights -- ResNet, ResNeXT, EfficientNet, EfficientNetV2, NFNet, Vision Transformer, MixNet, MobileNet-V3/V2, RegNet, DPN, CSPNet, and more
arctanbell/stable-dreamfusion
A pytorch implementation of text-to-3D dreamfusion, powered by stable diffusion.
arctanbell/text-generation-webui
A gradio web UI for running Large Language Models like LLaMA, llama.cpp, GPT-J, Pythia, OPT, and GALACTICA.
arctanbell/unilm
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
arctanbell/VALOR
Codes and Models for VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset
arctanbell/VidChapters
[NeurIPS 2023 D&B] VidChapters-7M: Video Chapters at Scale
arctanbell/video-pretrained-transformer
Multi-model video-to-text by combining embeddings from Flan-T5 + CLIP + Whisper + SceneGraph. The 'backbone LLM' is pre-trained from scratch on YouTube (YT-1B dataset).
arctanbell/VPGTrans
Codes for VPGTrans: Transfer Visual Prompt Generator across LLMs. VL-LLaMA, VL-Vicuna.