shuyansy
In HIT Researching in machine learning and computer vision
Harbin Institude of TechnologyChina
shuyansy's Stars
EvolvingLMMs-Lab/LongVA
Long Context Transfer from Language to Vision
BAAI-DCAI/SpatialBot
JUNJIE99/MLVU
BAAI-DCAI/Multimodal-Robustness-Benchmark
FlagOpen/FlagEmbedding
Retrieval and Retrieval-augmented LLMs
BradyFU/Awesome-Multimodal-Large-Language-Models
:sparkles::sparkles:Latest Advances on Multimodal Large Language Models
rese1f/MovieChat
[CVPR 2024] 🎬💭 chat with over 10K frames of video!
BAAI-DCAI/Bunny
A family of lightweight multimodal models.
ttengwang/Awesome_Long_Form_Video_Understanding
Awesome papers & datasets specifically focused on long-term videos.
kousw/experimental-consistory
md-mohaiminul/VideoRecap
TencentARC/SmartEdit
Official code of SmartEdit [CVPR-2024 Highlight]
shengliu66/ICV
Code for In-context Vectors: Making In Context Learning More Effective and Controllable Through Latent Space Steering
shuyansy/Efficient-Ambiguous-Text-Detector
An official Project related to Paper "Perceiving Ambiguity and Semantics without Recognition: An Efficient and Effective Ambiguous Scene Text Detector" (ACM MM 2023)
shuyansy/Survey-of-Visual-Text-Processing
The official project of paper "Visual Text Meets Low-level Vision: A Comprehensive Survey on Visual Text Processing"
bahjat-kawar/time-diffusion
Official code repo for "Editing Implicit Assumptions in Text-to-Image Diffusion Models"
yeungchenwa/Recommendations-Diffusion-Text-Image
A paper collection of recent diffusion models for text-image generation tasks, e,g., visual text generation, font generation, text removal, text image super resolution, text editing, handwritten generation, scene text recognition and scene text detection.
zhoubolei/bolei_awesome_posters
CVPR and NeurIPS poster examples and templates. May we have in-person poster session soon!
UKPLab/sentence-transformers
Multilingual Sentence & Image Embeddings with BERT
dali92002/OCR-TR
Optocal Character Recognition (OCR / HTR) using Transformers
shuyansy/multilingual-machine-translation
This is some code for multilingual machine translation (English, Korean, Japanese, Arabic)
dali92002/SSL-OCR
Text-DIAE: A Self-Supervised Degradation Invariant Autoencoders for Text Recognition and Document Enhancement - AAAI 2023
shuyansy/Synthesis-multilingual-handwritten-text-data
This is a simple yet method focused on handwritten text dataset generation, which is beneficial for handwritten text detection and segmentation
advimman/lama
🦙 LaMa Image Inpainting, Resolution-robust Large Mask Inpainting with Fourier Convolutions, WACV 2022
yeungchenwa/OCR-SAM
Combining MMOCR with Segment Anything & Stable Diffusion. Automatically detect, recognize and segment text instances, with serval downstream tasks, e.g., Text Removal and Text Inpainting
RockeyCoss/Prompt-Segment-Anything
This is an implementation of zero-shot instance segmentation using Segment Anything.
shuyansy/Detect-and-read-meters
This is the first released system towards complex meters` detection and recognition, which is implemented by computer vision techniques.
KidsWithTokens/Medical-SAM-Adapter
Adapting Segment Anything Model for Medical Image Segmentation
ziqi-jin/finetune-anything
Fine-tune SAM (Segment Anything Model) for computer vision tasks such as semantic segmentation, matting, detection ... in specific scenarios
huggingface/peft
🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.