Drwaish
I am a Gen AI and trying to enhance my skill with the help of Computer Science Communities through knowledge I have.
Pakistan
Drwaish's Stars
divelab/AIRS
Artificial Intelligence Research for Science (AIRS)
FoundationVision/VAR
[NeurIPS 2024 Oral][GPT beats diffusionš„] [scaling laws in visual generationš] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ultra-simple, user-friendly yet state-of-the-art* codebase for autoregressive image generation!
InternLM/InternLM-XComposer
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions
KwaiVGI/SynCamMaster
[ARXIV'24] SynCamMaster: Synchronizing Multi-Camera Video Generation from Diverse Viewpoints
ictnlp/Auto-RAG
This is the official repository for Auto-RAG.
YesianRohn/TextSSR
code for TextSSR paper
PKU-YuanGroup/ConsisID
Identity-Preserving Text-to-Video Generation by Frequency Decomposition
IDEA-Research/ChatRex
Code for ChatRex: Taming Multimodal LLM for Joint Perception and Understanding
ChenHoy/DROID-Splat
End-to-End SLAM with camera calibration, monocular prior integration and dense Rendering
MIC-DKFZ/nnUNet
basf/mamba-tabular
Mambular is a Python package that simplifies tabular deep learning by providing a suite of models for regression, classification, and distributional regression tasks. It includes models such as Mambular, TabM, FT-Transformer, TabulaRNN, TabTransformer, and tabular ResNets.
jdh-algo/JoyVASA
lewis081/CCL-Net
two-stage framework based method with cascaded contrastive learning for UIE
AlonzoLeeeooo/StableV2V
The official implementation of the paper titled "StableV2V: Stablizing Shape Consistency in Video-to-Video Editing".
plageon/HtmlRAG
HtmlRAG: HTML is Better Than Plain Text for Modeling Retrieval Results in RAG Systems
DS4SD/docling
Get your documents ready for gen AI
getmaxun/maxun
š„ Open-source no-code web data extraction platform. Turn websites to APIs and spreadsheets with no-code robots in minutes! [In Beta]
cvlab-kaist/PF3plat
Official Implementation of "PF3plat: Pose-Free Feed-Forward 3D Gaussian Splatting"
HelloVision/HelloMeme
The official HelloMeme GitHub site
Lakonik/MVEdit
3D-Adapter: Geometry-Consistent Multi-View Diffusion for High-Quality 3D Generation
rhymes-ai/Allegro
Allegro is a powerful text-to-video model that generates high-quality videos up to 6 seconds at 15 FPS and 720p resolution from simple text input.
MCG-NJU/EMA-VFI
[CVPR 2023] Extracting Motion and Appearance via Inter-Frame Attention for Efficient Video Frame Interpolatio
ptrvilya/blendify
Lightweight Python framework that provides a high-level API for creating and rendering scenes with Blender.
ai4colonoscopy/IntelliScope
Frontiers in Intelligent Colonoscopy [ColonSurvey | ColonINST | ColonGPT]
microsoft/BitNet
Official inference framework for 1-bit LLMs
amjadraza/pandasai-app-gradio
adarshb3/Virtual-Try-On-Application-using-Flask-Twilio-and-Gradio
This repository contains the code for a virtual try-on application built using Flask, Twilio's WhatsApp API, and Gradio's virtual try-on model. Users can send images via WhatsApp to try on garments virtually, and the results are sent back to them.
hubertsiuzdak/snac
Multi-Scale Neural Audio Codec (SNAC) compresses audio into discrete codes at a low bitrate
cvg/depthsplat
DepthSplat: Connecting Gaussian Splatting and Depth
mit-han-lab/efficientvit
Efficient vision foundation models for high-resolution generation and perception.