frank6200db

Singapore

frank6200db's Stars

modelscope/ms-swift
Use PEFT or Full-parameter to finetune 400+ LLMs or 100+ MLLMs. (LLM: Qwen2.5, Llama3.2, GLM4, Internlm2.5, Yi1.5, Mistral, Baichuan2, DeepSeek, Gemma2, ...; MLLM: Qwen2-VL, Qwen2-Audio, Llama3.2-Vision, Llava, InternVL2, MiniCPM-V-2.6, GLM4v, Xcomposer2.5, Yi-VL, DeepSeek-VL, Phi3.5-Vision, ...)
Language:Python4.4k389
NVlabs/LITA
Language:Python14710
showlab/computer_use_ootb
An out-of-the-box (OOTB) version of Anthropic Claude Computer Use for Windows and macOS
Language:Python75973
showlab/MovieSeq
[ECCV2024] Learning Video Context as Interleaved Multimodal Sequences
Language:Jupyter Notebook301
mli/paper-reading
深度学习经典、新论文逐段精读
27.3k2.5k
TinyLLaVA/TinyLLaVA_Factory
A Framework of Small-scale Large Multimodal Models
Language:Python66570
Coobiw/MPP-LLaVA
Personal Project: MPP-Qwen14B & MPP-Qwen-Next(Multimodal Pipeline Parallel based on Qwen-LM). Support [video/image/multi-image] {sft/conversations}. Don't let the poverty limit your imagination! Train your own 8B/14B LLaVA-training-like MLLM on RTX3090/4090 24GB.
Language:Jupyter Notebook38520
showlab/Awesome-GUI-Agent
💻 A curated list of papers and resources for multi-modal Graphical User Interface (GUI) agents.
29612
Alpha-VLLM/LLaMA2-Accessory
An Open-source Toolkit for LLM Development
Language:Python2.7k176
SpaceGrey/assistgpt-new
Language:JavaScript1
Luodian/Otter
🦦 Otter, a multi-modal model based on OpenFlamingo (open-sourced version of DeepMind's Flamingo), trained on MIMIC-IT and showcasing improved instruction-following and in-context learning ability.
Language:Python3.6k243
google-research/pix2struct
Language:Python60954
hkchengrex/Cutie
[CVPR 2024 Highlight] Putting the Object Back Into Video Object Segmentation
Language:Python74273
IDEA-Research/GroundingDINO
[ECCV 2024] Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"
Language:Python6.9k697
z-x-yang/Segment-and-Track-Anything
An open-source project dedicated to tracking and segmenting any objects in videos, either automatically or interactively. The primary algorithms utilized include the Segment Anything Model (SAM) for key-frame segmentation and Associating Objects with Transformers (AOT) for efficient tracking and propagation purposes.
Language:Jupyter Notebook2.9k344
gaomingqi/Track-Anything
Track-Anything is a flexible and interactive tool for video object tracking and segmentation, based on Segment Anything, XMem, and E2FGVI.
Language:Python6.5k482

frank6200db

frank6200db's Stars

modelscope/ms-swift

NVlabs/LITA

showlab/computer_use_ootb

showlab/MovieSeq

mli/paper-reading

TinyLLaVA/TinyLLaVA_Factory

Coobiw/MPP-LLaVA

showlab/Awesome-GUI-Agent

Alpha-VLLM/LLaMA2-Accessory

SpaceGrey/assistgpt-new

Luodian/Otter

google-research/pix2struct

hkchengrex/Cutie

IDEA-Research/GroundingDINO

z-x-yang/Segment-and-Track-Anything

gaomingqi/Track-Anything