hj611

hj611's Stars

vikpe/vscode-theme-screenshots
Automate screenshots of Visual Studio Code themes.
Language:TypeScript2
abi/screenshot-to-code
Drop in a screenshot and convert it to clean code (HTML/Tailwind/React/Vue)
Language:Python56.4k6.9k
IMNearth/CoAT
Android in the Zoo: Chain-of-Action-Thought for GUI Agents
Language:Python371
njucckevin/SeeClick
The model, data and code for the visual GUI Agent SeeClick
Language:HTML18810
OSU-NLP-Group/Mind2Web
[NeurIPS'23 Spotlight] "Mind2Web: Towards a Generalist Agent for the Web"
Language:Jupyter Notebook66194
THUDM/CogVLM
a state-of-the-art-level open visual language model | 多模态预训练模型
Language:Python5.9k407
google-research-datasets/screen_annotation
The Screen Annotation dataset consists of pairs of mobile screenshots and their annotations. The annotations are in text format, and describe the UI elements present on the screen: their type, location, OCR text and a short description. It has been introduced in the paper `ScreenAI: A Vision-Language Model for UI and Infographics Understanding`.
468
google-research-datasets/screen_qa
ScreenQA dataset was introduced in the "ScreenQA: Large-Scale Question-Answer Pairs over Mobile App Screenshots" paper. It contains ~86K question-answer pairs collected by human annotators for ~35K screenshots from Rico. It should be used to train and evaluate models capable of screen content understanding via question answering.
887
openai/human-eval
Code for the paper "Evaluating Large Language Models Trained on Code"
Language:Python2.3k334
meta-llama/codellama
Inference code for CodeLlama models
Language:Python15.9k1.9k
jadecxliu/CodeQA
Dataset and code for Findings of EMNLP'21 paper "CodeQA: A Question Answering Dataset for Source Code Comprehension".
Language:Python376
likaixin2000/MMCode
[EMNLP 2024] Multi-modal code generation problems.
Language:Python15
QwenLM/Qwen2.5
Qwen2.5 is the large language model series developed by Qwen team, Alibaba Cloud.
Language:Shell8.7k542
InternLM/InternLM-XComposer
InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output
Language:Python2.5k153
wangxiang1230/SSTAP
Code for our CVPR 2021 Paper "Self-Supervised Learning for Semi-Supervised Temporal Action Proposal".
Language:Python708
lllyasviel/ControlNet
Let us control diffusion models!
Language:Python30k2.7k
Computer-Vision-in-the-Wild/CVinW_Readings
A collection of papers on the topic of ``Computer Vision in the Wild (CVinW)''
1.2k58
Yujun-Shi/DragDiffusion
[CVPR2024, Highlight] Official code for DragDiffusion
Language:Python1.1k84
Jingkang50/OpenPSG
Benchmarking Panoptic Scene Graph Generation (PSG), ECCV'22
Language:Python41368
xfhelen/MMBench
An end-to-end benchmark suite of multi-modal DNN applications for system-architecture co-design
Language:Python217
BradyFU/Awesome-Multimodal-Large-Language-Models
:sparkles::sparkles:Latest Advances on Multimodal Large Language Models
12k769
sail-sg/EditAnything
Edit anything in images powered by segment-anything, ControlNet, StableDiffusion, etc. (ACM MM)
Language:Python3.3k188
showlab/Image2Paragraph
[A toolbox for fun.] Transform Image into Unique Paragraph with ChatGPT, BLIP2, OFA, GRIT, Segment Anything, ControlNet.
Language:Python78753
ranjaykrishna/visual_genome_python_driver
A python wrapper for the Visual Genome API
Language:Jupyter Notebook35490
jshilong/GPT4RoI
GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest
Language:Python49825
cvdfoundation/open-images-dataset
Open Images is a dataset of ~9 million images that have been annotated with image-level labels and bounding boxes spanning thousands of classes.
986157
om-ai-lab/RS5M
RS5M: a large-scale vision language dataset for remote sensing [TGRS]
Language:Python1979
OpenGVLab/InternVideo
[ECCV2024] Video Foundation Models & Data for Multimodal Understanding
Language:Python1.3k85
phellonchen/X-LLM
X-LLM: Bootstrapping Advanced Large Language Models by Treating Multi-Modalities as Foreign Languages
Language:Python30216
Vision-CAIR/MiniGPT-4
Open-sourced codes for MiniGPT-4 and MiniGPT-v2 (https://minigpt-4.github.io, https://minigpt-v2.github.io/)
Language:Python25.3k2.9k