ZhuohanX
Postdoc@MBZUAI | PhD@UniMelb
Mohamed bin Zayed University of Artificial Intelligence NLP departmentAbu Dhabi
ZhuohanX's Stars
microsoft/JARVIS
JARVIS, a system to connect LLMs with ML community. Paper: https://arxiv.org/pdf/2303.17580.pdf
OpenBMB/ToolBench
[ICLR'24 spotlight] An open platform for training, serving, and evaluating large language model for tool learning.
OpenBMB/AgentVerse
🤖 AgentVerse 🪐 is designed to facilitate the deployment of multiple LLM-based agents in various applications, which primarily provides two frameworks: task-solving and simulation
agiresearch/AIOS
AIOS: AI Agent Operating System
noahshinn/reflexion
[NeurIPS 2023] Reflexion: Language Agents with Verbal Reinforcement Learning
ysymyth/ReAct
[ICLR 2023] ReAct: Synergizing Reasoning and Acting in Language Models
agiresearch/OpenAGI
OpenAGI: When LLM Meets Domain Experts
XueFuzhao/OpenMoE
A family of open-sourced Mixture-of-Experts (MoE) Large Language Models
Libr-AI/OpenFactVerification
Loki: Open-source solution designed to automate the process of verifying factuality
taichengguo/LLM_MultiAgents_Survey_Papers
Large Language Model based Multi-Agents: A Survey of Progress and Challenges
kaushikb11/awesome-llm-agents
A curated list of awesome LLM agents.
google-deepmind/long-form-factuality
Benchmarking long-form factuality in large language models. Original code for our paper "Long-form factuality in large language models".
scutcyr/SoulChat
中文领域心理健康对话大模型SoulChat
MMMU-Benchmark/MMMU
This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI"
anchen1011/FireAct
FireAct: Toward Language Agent Fine-tuning
Sahandfer/EMPaper
This is a repository for sharing papers in the field of empathetic conversational AI. The related source code for each paper is linked if available.
zou-group/avatar
AvaTaR: Optimizing LLM Agents for Tool Usage via Contrastive Reasoning (NeurIPS 2024)
IINemo/lm-polygraph
zjunlp/AutoAct
[ACL 2024] AUTOACT: Automatic Agent Learning from Scratch for QA via Self-Planning
THUNLP-MT/StableToolBench
A new tool learning benchmark aiming at well-balanced stability and reality, based on ToolBench.
redotvideo/pluto
Synthetic Data for LLM Fine-Tuning
CUHK-ARISE/PsychoBench
Benchmarking LLMs' Psychological Portrayal
CUHK-ARISE/EmotionBench
Benchmarking LLMs' Emotional Alignment with Humans
Sahandfer/EmoBench
This is the official repository for the paper "EmoBench: Evaluating the Emotional Intelligence of Large Language Models"
CMMMU-Benchmark/CMMMU
yuxiaw/OpenFactCheck
bgalitsky/Truth-O-Meter-Making-ChatGPT-Truthful
fact checking of GPT and other LLMs
RUCAIBox/HaluAgent
PKU-ONELab/LLM-evaluator-reliability
The official repository for our ACL 2024 paper, Are LLM-based Evaluators Confusing NLG Quality Criteria?
Xiaoxue-xx/HaluAgent
Small Agent Can Also Rock! Empowering Small Language Models as Hallucination Detector