Add `--xml` flag to structure output for Claude's long context window

Question

Add `--xml` flag to structure output for Claude's long context window

lexh opened this issue 7 months ago · 3 comments

Add `--xml` flag to structure output for Claude's long context window

Background

Anthropic has provided specific guidelines for optimally structuring prompts to take advantage of Claude's extended context window (up to 200K tokens for Claude 3 models).

In particular, they recommend wrapping long input documents in XML tags to clearly delineate the boundaries between the documents and the rest of the prompt. This allows Claude to more accurately process the information.

Proposal

To better support using files-to-prompt output with Claude, a new --xml flag should be added. When this flag is set, the tool should structure its output like this:

Here are some documents for you to reference for your task:

<documents>
<document path="path/to/file1.txt">
Contents of file1.txt
</document>

<document path="path/to/file2.txt">
Contents of file2.txt
</document>
</documents>

The <documents> and <document> tags clearly separate the file contents from any additional prompt text that may be provided after.

Additional Considerations

The --xml flag should be optional, preserving the default untagged output for other use as needed
Documentation and usage examples should be updated to cover the new --xml option
The XML tag output feature should be covered by new unit tests

Answer 1 · 2024-09-09T05:46:43.000Z

This landed as -c/--cxml - it's documented here: https://github.com/simonw/files-to-prompt/blob/0.3/README.md#claude-xml-output

Answer 2 · 2024-09-09T06:10:29.000Z

Tested this like so:

files-to-prompt ../datasette/tests/test_internal* -c | llm -m claude-3.5-sonnet \
  --system 'summarize each of these documents, include references to the document'

Output:

Here's a summary of each document:

test_internal_db.py:

Contains tests for the internal database functionality in Datasette

Tests catalog tables for databases, tables, indexes, and foreign keys

Uses pytest and asyncio for asynchronous testing

test_internals_database.py:

Tests the Database class in Datasette

Covers methods like execute, table_exists, view_exists, table_columns, foreign keys, and more

Includes tests for write operations and memory databases

test_internals_datasette.py:

Tests the Datasette class

Covers methods like get_database, sign/unsign, settings, and permissions

Includes tests for rendering templates and handling metadata

test_internals_datasette_client.py:

Tests the Datasette client functionality

Covers different HTTP methods (GET, POST, etc.) and handling of base URLs

test_internals_request.py:

Tests the Request class in Datasette

Covers parsing of POST variables, request arguments, and URL variables

Includes tests for various request properties

test_internals_response.py:

Tests the Response class in Datasette

Covers different response types (HTML, JSON, redirect) and cookie handling

test_internals_urls.py:

Tests URL generation in Datasette

Covers methods for generating paths, static URLs, and database/table URLs

Includes tests for handling base URLs and prefixes

These tests cover various internal components of Datasette, including database operations, request/response handling, URL generation, and client functionality. They use pytest for testing and include both synchronous and asynchronous tests.

Answer 3 · 2024-09-09T06:10:44.000Z

Blogged with another example here :https://simonwillison.net/2024/Sep/9/files-to-prompt-03/

Add --xml flag to structure output for Claude's long context window

Background

Proposal

Additional Considerations

Add `--xml` flag to structure output for Claude's long context window