guoxinXiong's Stars
chatchat-space/Langchain-Chatchat
Langchain-Chatchat(原Langchain-ChatGLM)基于 Langchain 与 ChatGLM, Qwen 与 Llama 等语言模型的 RAG 与 Agent 应用 | Langchain-Chatchat (formerly langchain-ChatGLM), local knowledge based LLM (like ChatGLM, Qwen and Llama) RAG and Agent app with langchain
THUDM/ChatGLM-6B
ChatGLM-6B: An Open Bilingual Dialogue Language Model | 开源双语对话语言模型
CLUEbenchmark/SuperCLUE
SuperCLUE: 中文通用大模型综合性基准 | A Benchmark for Foundation Models in Chinese
ymcui/Chinese-LLaMA-Alpaca
中文LLaMA&Alpaca大语言模型+本地CPU/GPU训练部署 (Chinese LLaMA & Alpaca LLMs)
TencentARC/UMT
UMT is a unified and flexible framework which can handle different input modality combinations, and output video moment retrieval and/or highlight detection results.
li-plus/DSNet
DSNet: A Flexible Detect-to-Summarize Network for Video Summarization
THUDM/VisualGLM-6B
Chinese and English multimodal conversational language model | 多模态中英双语对话语言模型
VincentJYZhang/USTC_Lecture
USTC研究生学术报告选课脚本
Yutong-Zhou-cv/Awesome-Text-to-Image
(ෆ`꒳´ෆ) A Survey on Text-to-Image Generation/Synthesis.
danieljf24/awesome-video-text-retrieval
A curated list of deep learning resources for video-text retrieval.
lllyasviel/ControlNet
Let us control diffusion models!
modelscope/modelscope
ModelScope: bring the notion of Model-as-a-Service to life.
microsoft/VSE_Gradient
AAA-Zheng/Image-Text-Matching-Summary
Summary of Related Research on Image-Text Matching
QinYang79/DECL
Deep Evidential Learning with Noisy Correspondence for Cross-modal Retrieval ( ACM Multimedia 2022, Pytorch Code)
kaixindelele/ChatPaper
Use ChatGPT to summarize the arXiv papers. 全流程加速科研,利用chatgpt进行论文全文总结+专业翻译+润色+审稿+审稿回复
KaiyangZhou/CoOp
Prompt Learning for Vision-Language Models (IJCV'22, CVPR'22)
labyrinth7x/Deep-Cross-Modal-Projection-Learning-for-Image-Text-Matching
Deep Cross-Modal Projection Learning for Image-Text Matching
robi56/video-summarization-resources
Video Summarization Dataset, Papers, Codes
woodfrog/vse_infty
Code for "Learning the Best Pooling Strategy for Visual Semantic Embedding", CVPR 2021 (Oral)
LgQu/DIME
Dynamic Modality Interaction Modeling for Image-Text Retrieval. SIGIR'21
CompVis/taming-transformers
Taming Transformers for High-Resolution Image Synthesis
openai/DALL-E
PyTorch package for the discrete VAE used for DALL·E.
lucidrains/DALLE-pytorch
Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch
zhjohnchan/awesome-vision-and-language-pretraining
A curated list of vision-and-language pre-training (VLP). :-)
haofanwang/awesome-vision-language-modeling
Recent Advances in Vision-Language Pre-training!
yuewang-cuhk/awesome-vision-language-pretraining-papers
Recent Advances in Vision and Language PreTrained Models (VL-PTMs)
DirtyHarryLYL/Transformer-in-Vision
Recent Transformer-based CV and related works.
fawazsammani/awesome-vision-language-pretraining
Awesome Vision-Language Pretraining Papers
cshizhe/hgr_v2t
Code accompanying the paper "Fine-grained Video-Text Retrieval with Hierarchical Graph Reasoning".