wyczzy's Stars
Rapisurazurite/FFDN
Implementation for Enhancing Tampered Text Detection Through Frequency Feature Fusion and Decomposition
chuangchuangtan/C2P-CLIP-DeepfakeDetection
C2P-CLIP-DeepfakeDetection
grip-unina/AdversarialRobustnessCLIP
modelscope/awesome-deep-reasoning
Collect every awesome work about r1!
Horizonll/AI-Detector
2024年第六届全球校园人工智能算法精英大赛AI生成人脸图像鉴别
manjaryp/MCE-ViT
A Robust Approach Towards Distinguishing Natural and Computer Generated Images using Multi-Colorspace fused and Enriched Vision Transformer
dzhng/deep-research
An AI-powered research assistant that performs iterative, deep research on any topic by combining search engines, web scraping, and large language models. The goal of this repo is to provide the simplest implementation of a deep research agent - e.g. an agent that can refine its research direction overtime and deep dive into a topic.
open-webui/open-webui
User-friendly AI Interface (Supports Ollama, OpenAI API, ...)
VARGPT-family/VARGPT
VARGPT: Unified Understanding and Generation in a Visual Autoregressive Multimodal Large Language Model
JCruan519/PETRobustness
(ARXIV24) This is the official code repository for "Understanding Robustness of Parameter-Efficient Tuning for Image Classification".
freddiewalchwmf25/JieMa
2024年最新国外短信接码平台推荐(免费+付费)
dwgoon/jpegio
A python package for accessing the internal variables of JPEG file format such as DCT coefficients and quantization tables
forever208/DCTdiff
Official code for the paper 'DCTdiff: Intriguing Properties of Image Generative Modeling in the DCT Space'
PKU-Alignment/align-anything
Align Anything: Training All-modality Model with Feedback
dummerchen/HFST
speedlab-git/SimCLIP
HashmatShadab/Robust-LLaVA
Robust-LLaVA: On the Effectiveness of Large-Scale Robust Image Encoders for Multi-modal Large Language Models
Deep-Agent/R1-V
Witness the aha moment of VLM with less than $3.
yoziru/nextjs-vllm-ui
Fully-featured, beautiful web interface for vLLM - built with NextJS.
lucidrains/transfusion-pytorch
Pytorch implementation of Transfusion, "Predict the Next Token and Diffuse Images with One Multi-Modal Model", from MetaAI
Alpha-VLLM/Lumina-mGPT
Official Implementation of "Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining"
AILab-CVC/SEED-X
Multimodal Models in Real World
FoundationVision/Infinity
Infinity ∞ : Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis
daixiangzi/VAR-CLIP
Implements VAR+CLIP for text-to-image (T2I) generation
baaivision/Emu3
Next-Token Prediction is All You Need
deepseek-ai/Janus
Janus-Series: Unified Multimodal Understanding and Generation Models
facebookresearch/chameleon
Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR.
showlab/Show-o
[ICLR 2025] Repository for Show-o, One Single Transformer to Unify Multimodal Understanding and Generation.
FoundationVision/LlamaGen
Autoregressive Model Beats Diffusion: 🦙 Llama for Scalable Image Generation
gq-max/AdvDiffVLM