Add `--xml` flag to structure output for Claude's long context window
lexh opened this issue · 3 comments
Add --xml
flag to structure output for Claude's long context window
Background
Anthropic has provided specific guidelines for optimally structuring prompts to take advantage of Claude's extended context window (up to 200K tokens for Claude 3 models).
In particular, they recommend wrapping long input documents in XML tags to clearly delineate the boundaries between the documents and the rest of the prompt. This allows Claude to more accurately process the information.
Proposal
To better support using files-to-prompt output with Claude, a new --xml
flag should be added. When this flag is set, the tool should structure its output like this:
Here are some documents for you to reference for your task:
<documents>
<document path="path/to/file1.txt">
Contents of file1.txt
</document>
<document path="path/to/file2.txt">
Contents of file2.txt
</document>
</documents>
The <documents>
and <document>
tags clearly separate the file contents from any additional prompt text that may be provided after.
Additional Considerations
- The
--xml
flag should be optional, preserving the default untagged output for other use as needed - Documentation and usage examples should be updated to cover the new
--xml
option - The XML tag output feature should be covered by new unit tests
This landed as -c/--cxml
- it's documented here: https://github.com/simonw/files-to-prompt/blob/0.3/README.md#claude-xml-output
Tested this like so:
files-to-prompt ../datasette/tests/test_internal* -c | llm -m claude-3.5-sonnet \
--system 'summarize each of these documents, include references to the document'
Output:
Here's a summary of each document:
test_internal_db.py:
- Contains tests for the internal database functionality in Datasette
- Tests catalog tables for databases, tables, indexes, and foreign keys
- Uses pytest and asyncio for asynchronous testing
test_internals_database.py:
- Tests the Database class in Datasette
- Covers methods like execute, table_exists, view_exists, table_columns, foreign keys, and more
- Includes tests for write operations and memory databases
test_internals_datasette.py:
- Tests the Datasette class
- Covers methods like get_database, sign/unsign, settings, and permissions
- Includes tests for rendering templates and handling metadata
test_internals_datasette_client.py:
- Tests the Datasette client functionality
- Covers different HTTP methods (GET, POST, etc.) and handling of base URLs
test_internals_request.py:
- Tests the Request class in Datasette
- Covers parsing of POST variables, request arguments, and URL variables
- Includes tests for various request properties
test_internals_response.py:
- Tests the Response class in Datasette
- Covers different response types (HTML, JSON, redirect) and cookie handling
test_internals_urls.py:
- Tests URL generation in Datasette
- Covers methods for generating paths, static URLs, and database/table URLs
- Includes tests for handling base URLs and prefixes
These tests cover various internal components of Datasette, including database operations, request/response handling, URL generation, and client functionality. They use pytest for testing and include both synchronous and asynchronous tests.
Blogged with another example here :https://simonwillison.net/2024/Sep/9/files-to-prompt-03/