junwucs's Stars
MMMU-Benchmark/MMMU
This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI"
deepseek-ai/DreamCraft3D
[ICLR 2024] Official implementation of DreamCraft3D: Hierarchical 3D Generation with Bootstrapped Diffusion Prior
chuanyang-Zheng/Progressive-Hint
This is the official implementation of "Progressive-Hint Prompting Improves Reasoning in Large Language Models"
idavidrein/gpqa
GPQA: A Graduate-Level Google-Proof Q&A Benchmark
fchollet/ARC-AGI
The Abstraction and Reasoning Corpus
meta-llama/llama-recipes
Scripts for fine-tuning Meta Llama with composable FSDP & PEFT methods to cover single/multi-node GPUs. Supports default & custom datasets for applications such as summarization and Q&A. Supporting a number of candid inference solutions such as HF TGI, VLLM for local or cloud deployment. Demo apps to showcase Meta Llama for WhatsApp & Messenger.
mlcommons/modelbench
Run safety benchmarks against AI models and view detailed reports showing how well they performed.
pytorch/torchtune
PyTorch native finetuning library
meta-llama/PurpleLlama
Set of tools to assess and improve LLM security.
Magnetic2014/RoleEval
A Bilingual Role Evaluation Benchmark for Large Language Models
SalesforceAIResearch/AgentLite
SalesforceAIResearch/xLAM
llmeval/llmeval-3
中文大语言模型评测第三期
Nanbeige/Nanbeige
deepseek-ai/DeepSeek-Math
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
xingyaoww/mint-bench
Official Repo for ICLR 2024 paper MINT: Evaluating LLMs in Multi-turn Interaction with Tools and Language Feedback by Xingyao Wang*, Zihan Wang*, Jiateng Liu, Yangyi Chen, Lifan Yuan, Hao Peng and Heng Ji.
thunlp/LEGENT
Open Platform for Embodied Agents
thunlp/MatPlotAgent
KimMeen/Time-LLM
[ICLR 2024] Official implementation of " 🦙 Time-LLM: Time Series Forecasting by Reprogramming Large Language Models"
AutonomousAgentsLab/curiousreplay
Implementations of Curious Replay for model-based adaptation.
LiveCodeBench/LiveCodeBench
Official repository for the paper "LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code"
huggingface/llm-swarm
Manage scalable open LLM inference endpoints in Slurm clusters
mgramin/awesome-db-tools
Everything that makes working with databases easier
jsbroks/awesome-dataset-tools
🔧 A curated list of awesome dataset tools
awesomedata/awesome-public-datasets
A topic-centric list of HQ open datasets.
jianzhnie/awesome-instruction-datasets
A collection of awesome-prompt-datasets, awesome-instruction-dataset, to train ChatLLM such as chatgpt 收录各种各样的指令数据集, 用于训练 ChatLLM 模型。
Value4AI/Awesome-LLM-in-Social-Science
Awesome papers involving LLMs in Social Science.
google-deepmind/concordia
A library for generative social simulation
zhaorw02/FlexiDreamer
An official implementation of FlexiDreamer: Single Image-to-3D Generation with FlexiCubes.
OpenBMB/Eurus