This repository contains experiments, methodologies, and results focused on improving our use of LLM Agents, particularly Claude via Claude Code. We systematically collect challenges, develop hypotheses, and evaluate solutions to enhance AI-assisted development workflows.
.
├── challenges/ # Specific problems and test cases
│ ├── direction-following/
│ ├── error-patterns/
│ ├── complex-refactoring/
│ └── ...
├── testbeds/ # Reusable code bases for experiments
│ ├── web-app/
│ ├── cli-tool/
│ ├── data-pipeline/
│ └── ...
├── methods/ # Approaches and techniques
│ ├── prompting-strategies/
│ ├── context-management/
│ ├── iterative-refinement/
│ └── ...
├── experiments/ # Experimental runs and results
│ ├── 2025-07-05-direction-following/
│ ├── 2025-07-06-error-recovery/
│ └── ...
├── evaluation/ # Evaluation scripts and criteria
│ ├── metrics/
│ ├── scripts/
│ └── rubrics/
└── reports/ # Analysis and findings
├── weekly/
└── insights/
- Check open issues for ongoing discussions and experiments
- Review the contribution guidelines
- Pick a challenge or propose a new one
- Document your experiments following our experiment template
- Select or create a challenge in
challenges/ - Choose an appropriate testbed from
testbeds/or create a new one - Apply a method from
methods/or develop a new approach - Document your experiment in
experiments/YYYY-MM-DD-descriptive-name/(sessions are saved automatically by Claude Code) - Evaluate results using tools in
evaluation/ - Share findings via PR and/or issue discussion
Specific, reproducible problems we’ve encountered with Claude Code. Each challenge includes:
- Problem description (README.org)
- Success criteria
- Known failure modes
- Related issues/discussions
Reusable codebases that serve as consistent environments for experiments. These are intentionally kept separate from challenges to enable testing multiple approaches on the same codebase.
Documented approaches for improving Claude’s performance. Methods can be:
- Prompting strategies
- Context management techniques
- Workflow patterns
- Tool configurations
Individual experimental runs combining a challenge, testbed, and method. Each experiment directory contains:
setup.org- Configuration and parameterssessions/- References to Claude Code session UUIDs (sessions are automatically saved to~/.claude/projects/)results.org- Observations and metricsartifacts/- Generated code or outputs
We use GitHub Issues for:
- **Experiment Proposals** (label:
experiment) - **Challenge Documentation** (label:
challenge) - **Method Discussions** (label:
method) - **Results & Insights** (label:
findings)
Issues remain open during active experimentation and link to relevant PRs and commits.
Given our research-oriented workflow with 4-5 contributors:
main- Stable, reviewed contentexperiments/<username>/<description>- Individual experiment branchesdevelop- Integration branch for collaborative work
Example flow:
git checkout -b experiments/alice/context-window-optimization
# ... work on experiment ...
git push origin experiments/alice/context-window-optimization
# Create PR to develop for review
# After team review, merge to mainSee CONTRIBUTING.org for detailed guidelines. Key points:
- Start discussions in issues before major experiments
- Use consistent naming conventions
- Document both successes and failures
- Include session files for reproducibility
- Tag relevant team members for review
- **Direction Following** - Improving Claude’s adherence to specific instructions
- **Error Recovery** - Handling and learning from Claude’s mistakes
- **Context Management** - Optimizing information provided to Claude
- **Complex Refactoring** - Multi-file, architectural changes