Pinned Repositories
Amadeus
アマデウスver 1.0.4
ATLA-Demo
Source code for "Adversarial Training for Layout-Aware Text-VQA".
ATS
[ICME 2024] The code for Adversarial Training with OCR Modality Perturbation for Scene-Text Visual Question Answering
Co-Nav-Exp
多智能体目标导航实验
EfficientZero
Open-source codebase for EfficientZero, from "Mastering Atari Games with Limited Data" at NeurIPS 2021. Optimize the residual module
FrankZxShen-Visual-Audio-Semantic-Navigation-TEST
GameAudioCrawler
A script used to climb the wiki audios for some common games.
habitat-installation
安装habitat的简易流程,以v0.2.2为例
MCoCoNav
[AAAI 2025] Enhancing Multi-Robot Semantic Navigation Through Multimodal Chain-of-Thought Score Collaboration
visual-chatgpt-zh-vits
visual-chatgpt支持中文的windows版本,融合vits推断模块
FrankZxShen's Repositories
FrankZxShen/MCoCoNav
[AAAI 2025] Enhancing Multi-Robot Semantic Navigation Through Multimodal Chain-of-Thought Score Collaboration
FrankZxShen/visual-chatgpt-zh-vits
visual-chatgpt支持中文的windows版本,融合vits推断模块
FrankZxShen/ATS
[ICME 2024] The code for Adversarial Training with OCR Modality Perturbation for Scene-Text Visual Question Answering
FrankZxShen/Amadeus
アマデウスver 1.0.4
FrankZxShen/ATLA-Demo
Source code for "Adversarial Training for Layout-Aware Text-VQA".
FrankZxShen/Co-Nav-Exp
多智能体目标导航实验
FrankZxShen/EfficientZero
Open-source codebase for EfficientZero, from "Mastering Atari Games with Limited Data" at NeurIPS 2021. Optimize the residual module
FrankZxShen/FrankZxShen-Visual-Audio-Semantic-Navigation-TEST
FrankZxShen/GameAudioCrawler
A script used to climb the wiki audios for some common games.
FrankZxShen/habitat-installation
安装habitat的简易流程,以v0.2.2为例
FrankZxShen/latr
Implementation of LaTr: Layout-aware transformer for scene-text VQA,a novel multimodal architecture for Scene Text Visual Question Answering (STVQA)
FrankZxShen/MNlexNet
This is the PyTorch version repository for MNIST dataset identification.
FrankZxShen/so-vits-svc-audio2audio
Replace the song vocals to get the target audio.
FrankZxShen/Attention-Efficientzero-Alpaca-Lora-Webui
The Webui based on Alpaca-Lora+ChatGLM aims to visualize Atari game results of Efficientzero.
FrankZxShen/ChatGLM-webui
A WebUI for ChatGLM-6B
FrankZxShen/CogVLM2-API4
用于softmax分类的CogVLM API
FrankZxShen/depth_yolo
combination of darknet_ros and iai_kinect2
FrankZxShen/echarts
Apache ECharts is a powerful, interactive charting and data visualization library for browser
FrankZxShen/EdgeDiffusionNav-DEMO
ICCV25-1代码备份
FrankZxShen/efficient-vits-finetuning
Finetuning VITS Efficiently (Lora)
FrankZxShen/FrankZxShen
FrankZxShen/FrankZxshen.github.io
blog,随便创的
FrankZxShen/Grasscutter
A server software reimplementation for a certain anime game.
FrankZxShen/LATLA
LLM portion of ATLA. Used to bring llama2 external knowledge into Text-VQA.
FrankZxShen/Machine-Learning-Assignments
This project is only for SWJTU's students providing their assignments.
FrankZxShen/sklearn
from mofan python
FrankZxShen/TAP
TAP: Text-Aware Pre-training for Text-VQA and Text-Caption, CVPR 2021 (Oral):Add prompt for LLM.
FrankZxShen/visual-chatgpt-zh
visual-chatgpt支持中文版本
FrankZxShen/VITS-fast-fine-tuning
This repo is a pipeline of VITS finetuning for fast speaker adaptation TTS, and many-to-many voice conversion
FrankZxShen/vits-fast-fineturing-infer
For vits fine-tuning inference.