holen-zhang's Stars
sola-st/wasm-r3
Record-Reduce-Replay for Realistic and Standalone WebAssembly Benchmarks
iMeanAI/WebCanvas
Connect agents to live web environments evaluation.
ultrafunkamsterdam/undetected-chromedriver
Custom Selenium Chromedriver | Zero-Config | Passes ALL bot mitigation systems (like Distil / Imperva/ Datadadome / CloudFlare IUAM)
landing-ai/vision-agent
Vision agent
stanfordnlp/dspy
DSPy: The framework for programming—not prompting—foundation models
fuzz4all/fuzz4all
🌌️Fuzz4All: Universal Fuzzing with Large Language Models
seketeam/EvoCodeBench
An Evolving Code Generation Benchmark Aligned with Real-world Code Repositories
THUDM/ChatGLM3
ChatGLM3 series: Open Bilingual Chat LLMs | 开源双语对话语言模型
phlippe/uvadlc_notebooks
Repository of Jupyter notebook tutorials for teaching the Deep Learning Course at the University of Amsterdam (MSc AI), Fall 2023
Troyanovsky/Local-LLM-Comparison-Colab-UI
Compare the performance of different LLM that can be deployed locally on consumer hardware. Run yourself with Colab WebUI.
OSU-NLP-Group/Mind2Web
[NeurIPS'23 Spotlight] "Mind2Web: Towards a Generalist Agent for the Web"
THUDM/AutoWebGLM
An LLM-based Web Navigating Agent (KDD'24)
shulin16/MMInA
Official implementation of the paper "MMInA: Benchmarking Multihop Multimodal Internet Agents"
VisualWebBench/VisualWebBench
Evaluation framework for paper "VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?"
MinorJerry/WebVoyager
Code for "WebVoyager: WebVoyager: Building an End-to-End Web Agent with Large Multimodal Models"
jun0wanan/awesome-large-multimodal-agents
mnotgod96/AppAgent
AppAgent: Multimodal Agents as Smartphone Users, an LLM-based multimodal agent framework designed to operate smartphone apps.
QwenLM/Qwen-VL
The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.
haotian-liu/LLaVA
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
cooelf/Auto-GUI
Official implementation for "You Only Look at Screens: Multimodal Chain-of-Action Agents" (Findings of ACL 2024)
Leolty/repobench
✨ RepoBench: Benchmarking Repository-Level Code Auto-Completion Systems - ICLR 2024
princeton-nlp/SWE-bench
[ICLR 2024] SWE-Bench: Can Language Models Resolve Real-world Github Issues?
NLP-Core-Team/RealCode_eval
FlagOpen/TACO
IBM/Project_CodeNet
This repository is to support contributions for tools for the Project CodeNet dataset hosted in DAX
daniel-furman/sft-demos
Lightweight demos for finetuning LLMs. Powered by 🤗 transformers and open-source datasets.
zzxslp/MM-Navigator
GPT-4V in Wonderland: LMMs as Smartphone Agents
FudanSELab/ClassEval
Benchmark ClassEval for class-level code generation.
magicgh/Self-MAP
[ACL 2024] On the Multi-turn Instruction Following for Conversational Web Agents
xai-org/grok-1
Grok open release