Pinned Repositories
AdvancedEAST
AdvancedEAST is an algorithm used for Scene image text detect, which is primarily based on EAST, and the significant improvement was also made, which make long text predictions more accurate.
CHINESE-OCR
[python3.6] 运用tf实现自然场景文字检测,keras/pytorch实现ctpn+crnn+ctc实现不定长场景文字OCR识别
CVinW_Readings
A collection of papers on the topic of ``Computer Vision in the Wild (CVinW)''
grounded-segment-any-parts
Grounded Segment Anything: From Objects to Parts
Grounded-Segment-Anything
Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything
GroundingDINO
[ECCV 2024] Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"
InternVL
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的可商用开源多模态对话模型
LLaVA
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
objectdetection_script
一些关于目标检测的脚本的改进思路代码,详细请看readme.md
Otter
🦦 Otter, a multi-modal model based on OpenFlamingo (open-sourced version of DeepMind's Flamingo), trained on MIMIC-IT and showcasing improved instruction-following and in-context learning ability.
jamesbondzhou's Repositories
jamesbondzhou/AdvancedEAST
AdvancedEAST is an algorithm used for Scene image text detect, which is primarily based on EAST, and the significant improvement was also made, which make long text predictions more accurate.
jamesbondzhou/CHINESE-OCR
[python3.6] 运用tf实现自然场景文字检测,keras/pytorch实现ctpn+crnn+ctc实现不定长场景文字OCR识别
jamesbondzhou/CVinW_Readings
A collection of papers on the topic of ``Computer Vision in the Wild (CVinW)''
jamesbondzhou/grounded-segment-any-parts
Grounded Segment Anything: From Objects to Parts
jamesbondzhou/Grounded-Segment-Anything
Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything
jamesbondzhou/GroundingDINO
[ECCV 2024] Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"
jamesbondzhou/InternVL
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的可商用开源多模态对话模型
jamesbondzhou/LLaVA
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
jamesbondzhou/objectdetection_script
一些关于目标检测的脚本的改进思路代码,详细请看readme.md
jamesbondzhou/Otter
🦦 Otter, a multi-modal model based on OpenFlamingo (open-sourced version of DeepMind's Flamingo), trained on MIMIC-IT and showcasing improved instruction-following and in-context learning ability.
jamesbondzhou/pytorch-fcn
PyTorch Implementation of Fully Convolutional Networks. (Training code to reproduce the original result is available.)
jamesbondzhou/pytorch_ctpn
This is a pytorch implementation of CTPN(Detecting Text in Natural Image with Connectionist Text Proposal Network)
jamesbondzhou/TextBoxes
TextBoxes: A Fast Text Detector with a Single Deep Neural Network
jamesbondzhou/TextBoxes_plusplus
TextBoxes++: A Single-Shot Oriented Scene Text Detector
jamesbondzhou/yolov5
YOLOv5 汉化版,保持官方同步更新
jamesbondzhou/Segment-Everything-Everywhere-All-At-Once
[NeurIPS 2023] Official implementation of the paper "Segment Everything Everywhere All at Once"
jamesbondzhou/Video-LLaVA
Video-LLaVA: Learning United Visual Representation by Alignment Before Projection