ZhuohanX

Postdoc@MBZUAI | PhD@UniMelb

Mohamed bin Zayed University of Artificial Intelligence NLP departmentAbu Dhabi

ZhuohanX's Stars

microsoft/JARVIS
JARVIS, a system to connect LLMs with ML community. Paper: https://arxiv.org/pdf/2303.17580.pdf
Language:Python23.8k 381 1812k
OpenBMB/ToolBench
[ICLR'24 spotlight] An open platform for training, serving, and evaluating large language model for tool learning.
Language:Python4.9k 49 304430
OpenBMB/AgentVerse
🤖 AgentVerse 🪐 is designed to facilitate the deployment of multiple LLM-based agents in various applications, which primarily provides two frameworks: task-solving and simulation
Language:JavaScript4.3k 66 80421
agiresearch/AIOS
AIOS: AI Agent Operating System
Language:Python3.5k 50 75427
noahshinn/reflexion
[NeurIPS 2023] Reflexion: Language Agents with Verbal Reinforcement Learning
Language:Python2.5k 29 36245
ysymyth/ReAct
[ICLR 2023] ReAct: Synergizing Reasoning and Acting in Language Models
Language:Jupyter Notebook2.1k 18 30225
agiresearch/OpenAGI
OpenAGI: When LLM Meets Domain Experts
Language:Python2k 29 17173
XueFuzhao/OpenMoE
A family of open-sourced Mixture-of-Experts (MoE) Large Language Models
Language:Python1.4k 14 874
Libr-AI/OpenFactVerification
Loki: Open-source solution designed to automate the process of verifying factuality
Language:Python1k 5 745
taichengguo/LLM_MultiAgents_Survey_Papers
Large Language Model based Multi-Agents: A Survey of Progress and Challenges
768 22 541
kaushikb11/awesome-llm-agents
A curated list of awesome LLM agents.
590 13 454
google-deepmind/long-form-factuality
Benchmarking long-form factuality in large language models. Original code for our paper "Long-form factuality in large language models".
Language:Python567 10 263
scutcyr/SoulChat
中文领域心理健康对话大模型SoulChat
Language:Python515 9 2055
MMMU-Benchmark/MMMU
This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI"
Language:Python372 4 3731
anchen1011/FireAct
FireAct: Toward Language Agent Fine-tuning
Language:Python257 2 518
Sahandfer/EMPaper
This is a repository for sharing papers in the field of empathetic conversational AI. The related source code for each paper is linked if available.
248 13 225
zou-group/avatar
AvaTaR: Optimizing LLM Agents for Tool Usage via Contrastive Reasoning (NeurIPS 2024)
Language:Python196 13 121
IINemo/lm-polygraph
Language:Python195 6 2530
zjunlp/AutoAct
[ACL 2024] AUTOACT: Automatic Agent Learning from Scratch for QA via Self-Planning
Language:Python188 17 1111
THUNLP-MT/StableToolBench
A new tool learning benchmark aiming at well-balanced stability and reality, based on ToolBench.
Language:Python122 4 2215
redotvideo/pluto
Synthetic Data for LLM Fine-Tuning
Language:Python104 2 06
CUHK-ARISE/PsychoBench
Benchmarking LLMs' Psychological Portrayal
Language:Python70 7 22
CUHK-ARISE/EmotionBench
Benchmarking LLMs' Emotional Alignment with Humans
Language:Python69 3 14
Sahandfer/EmoBench
This is the official repository for the paper "EmoBench: Evaluating the Emotional Intelligence of Large Language Models"
Language:Python51 4 00
CMMMU-Benchmark/CMMMU
Language:Python46 2 41
yuxiaw/OpenFactCheck
Language:Python37 3 12
bgalitsky/Truth-O-Meter-Making-ChatGPT-Truthful
fact checking of GPT and other LLMs
Language:Python19 3 24
RUCAIBox/HaluAgent
Language:Python11 1 01
PKU-ONELab/LLM-evaluator-reliability
The official repository for our ACL 2024 paper, Are LLM-based Evaluators Confusing NLG Quality Criteria?
Language:Python51
Xiaoxue-xx/HaluAgent
Small Agent Can Also Rock! Empowering Small Language Models as Hallucination Detector
Language:Python4 1 13

ZhuohanX

ZhuohanX's Stars

microsoft/JARVIS

OpenBMB/ToolBench

OpenBMB/AgentVerse

agiresearch/AIOS

noahshinn/reflexion

ysymyth/ReAct

agiresearch/OpenAGI

XueFuzhao/OpenMoE

Libr-AI/OpenFactVerification

taichengguo/LLM_MultiAgents_Survey_Papers

kaushikb11/awesome-llm-agents

google-deepmind/long-form-factuality

scutcyr/SoulChat

MMMU-Benchmark/MMMU

anchen1011/FireAct

Sahandfer/EMPaper

zou-group/avatar

IINemo/lm-polygraph

zjunlp/AutoAct

THUNLP-MT/StableToolBench

redotvideo/pluto

CUHK-ARISE/PsychoBench

CUHK-ARISE/EmotionBench

Sahandfer/EmoBench

CMMMU-Benchmark/CMMMU

yuxiaw/OpenFactCheck

bgalitsky/Truth-O-Meter-Making-ChatGPT-Truthful

RUCAIBox/HaluAgent

PKU-ONELab/LLM-evaluator-reliability

Xiaoxue-xx/HaluAgent