rohun-tripathi

Computer Vision

rohun-tripathi's Stars

openai/whisper
Robust Speech Recognition via Large-Scale Weak Supervision
Language:Python68.4k8.1k
suno-ai/bark
🔊 Text-Prompted Generative Audio Model
Language:Jupyter Notebook35.5k4.2k
s3prl/s3prl
Self-Supervised Speech Pre-training and Representation Learning Toolkit
Language:Python2.2k482
EvolvingLMMs-Lab/lmms-eval
Accelerating the development of large multimodal models (LMMs) with lmms-eval
Language:Python1.4k116
EleutherAI/lm-evaluation-harness
A framework for few-shot evaluation of language models.
Language:Python6.6k1.7k
mayubo2333/VLMEvalKit
Open-source evaluation toolkit of large vision-language models (LVLMs), support GPT-4v, Gemini, QwenVLPlus, 50+ HF models, 20+ benchmarks
Language:Python1
open-compass/VLMEvalKit
Open-source evaluation toolkit of large vision-language models (LVLMs), support ~100 VLMs, 40+ benchmarks
Language:Python1.1k157
FuxiaoLiu/LRV-Instruction
[ICLR'24] Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning
Language:Python24613
NVlabs/VILA
VILA - a multi-image visual language model with training, inference and evaluation recipe, deployable from cloud to edge (Jetson Orin and laptops)
Language:Python1.8k149
ayaka14732/jax-smi
JAX Synergistic Memory Inspector
Language:Python1623
google-deepmind/geckonum_benchmark_t2i
GeckoNum Benchmark for T2I Model Eval.
5
OpenGVLab/Ask-Anything
[CVPR2024 Highlight][VideoChatGPT] ChatGPT with video understanding! And many more supported LMs such as miniGPT4, StableLM, and MOSS.
Language:Python3k247
InternLM/InternLM-XComposer
InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output
Language:Python2.5k153
mutonix/Vript
Language:Python1143
vllm-project/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
Language:Python27.7k4.1k
Alpha-VLLM/Lumina-T2X
Lumina-T2X is a unified framework for Text to Any Modality Generation
Language:Python2k86
hpcaitech/Open-Sora
Open-Sora: Democratizing Efficient Video Production for All
Language:Python21.8k2.1k
prometheus-eval/prometheus-eval
Evaluate your LLM's response with Prometheus and GPT4 💯
Language:Python76447
prometheus-eval/prometheus-vision
[ACL 2024 Findings & ICLR 2024 WS] An Evaluator VLM that is open-source, offers reproducible evaluation, and inexpensive to use. Specifically designed for fine-grained evaluation on customized score rubric, Prometheus-Vision is a good alternative for human evaluation and GPT-4V evaluation.
Language:Python526
GAP-LAB-CUHK-SZ/MVImgNet
CVPR2023 | MVImgNet: A Large-scale Dataset of Multi-view Images
Language:Python3897
NVlabs/RADIO
Official repository for "AM-RADIO: Reduce All Domains Into One"
Language:Python63123
cvdfoundation/google-landmark
Dataset with 5 million images depicting human-made and natural landmarks spanning 200 thousand classes.
Language:Shell754132
cambridgeltl/visual-spatial-reasoning
[TACL'23] VSR: A probing benchmark for spatial undersranding of vision-language models.
Language:Python997
mosaicml/diffusion
Language:Python66867
LLaVA-VL/LLaVA-NeXT
Language:Python2.5k188
PKU-YuanGroup/Open-Sora-Plan
This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.
Language:Python11.3k1k
microsoft/SoM
Set-of-Mark Prompting for GPT-4V and LMMs
Language:Python1.1k86
SivanDoveh/TSVLC
Repository for the paper: Teaching Structured Vision & Language Concepts to Vision & Language Models
Language:Python445
google-research-datasets/wit
WIT (Wikipedia-based Image Text) Dataset is a large multimodal multilingual dataset comprising 37M+ image-text sets with 11M+ unique images across 100+ languages.
99740
clip-italian/clip-italian
CLIP (Contrastive Language–Image Pre-training) for Italian
Language:Jupyter Notebook18017

rohun-tripathi

rohun-tripathi's Stars

openai/whisper

suno-ai/bark

s3prl/s3prl

EvolvingLMMs-Lab/lmms-eval

EleutherAI/lm-evaluation-harness

mayubo2333/VLMEvalKit

open-compass/VLMEvalKit

FuxiaoLiu/LRV-Instruction

NVlabs/VILA

ayaka14732/jax-smi

google-deepmind/geckonum_benchmark_t2i

OpenGVLab/Ask-Anything

InternLM/InternLM-XComposer

mutonix/Vript

vllm-project/vllm

Alpha-VLLM/Lumina-T2X

hpcaitech/Open-Sora

prometheus-eval/prometheus-eval

prometheus-eval/prometheus-vision

GAP-LAB-CUHK-SZ/MVImgNet

NVlabs/RADIO

cvdfoundation/google-landmark

cambridgeltl/visual-spatial-reasoning

mosaicml/diffusion

LLaVA-VL/LLaVA-NeXT

PKU-YuanGroup/Open-Sora-Plan

microsoft/SoM

SivanDoveh/TSVLC

google-research-datasets/wit

clip-italian/clip-italian