Pinned Repositories
LLM-RGB
LLM Reasoning and Generation Benchmark. Evaluate LLMs in complex scenarios systematically.
Devon
Devon: An open-source pair programmer
promptfoo
Test your prompts, agents, and RAGs. Red teaming, pentesting, and vulnerability scanning for LLMs. Compare performance of GPT, Claude, Gemini, Llama, and more. Simple declarative configs with command line and CI/CD integration.
bookinfo
copilot-analysis
experiments
Open sourced predictions, execution logs, trajectories, and results from model inference + evaluation runs on the SWE-bench task.
human-eval
Code for the paper "Evaluating Large Language Models Trained on Code"
nextjs-blog-theme
node-docker-good-defaults
sample node app for Docker examples
springboot_demo
微信云托管springboot demo
zhlmmc's Repositories
zhlmmc/bookinfo
zhlmmc/copilot-analysis
zhlmmc/experiments
Open sourced predictions, execution logs, trajectories, and results from model inference + evaluation runs on the SWE-bench task.
zhlmmc/human-eval
Code for the paper "Evaluating Large Language Models Trained on Code"
zhlmmc/nextjs-blog-theme
zhlmmc/node-docker-good-defaults
sample node app for Docker examples
zhlmmc/springboot_demo
微信云托管springboot demo