simonw/files-to-prompt

Add `--xml` flag to structure output for Claude's long context window

lexh opened this issue · 3 comments

lexh commented

Add --xml flag to structure output for Claude's long context window

Background

Anthropic has provided specific guidelines for optimally structuring prompts to take advantage of Claude's extended context window (up to 200K tokens for Claude 3 models).

In particular, they recommend wrapping long input documents in XML tags to clearly delineate the boundaries between the documents and the rest of the prompt. This allows Claude to more accurately process the information.

Proposal

To better support using files-to-prompt output with Claude, a new --xml flag should be added. When this flag is set, the tool should structure its output like this:

Here are some documents for you to reference for your task:

<documents>
<document path="path/to/file1.txt">
Contents of file1.txt
</document>

<document path="path/to/file2.txt">
Contents of file2.txt
</document>
</documents>

The <documents> and <document> tags clearly separate the file contents from any additional prompt text that may be provided after.

Additional Considerations

  • The --xml flag should be optional, preserving the default untagged output for other use as needed
  • Documentation and usage examples should be updated to cover the new --xml option
  • The XML tag output feature should be covered by new unit tests

Tested this like so:

files-to-prompt ../datasette/tests/test_internal* -c | llm -m claude-3.5-sonnet \
  --system 'summarize each of these documents, include references to the document'

Output:

Here's a summary of each document:

  1. test_internal_db.py:

    • Contains tests for the internal database functionality in Datasette
    • Tests catalog tables for databases, tables, indexes, and foreign keys
    • Uses pytest and asyncio for asynchronous testing
  2. test_internals_database.py:

    • Tests the Database class in Datasette
    • Covers methods like execute, table_exists, view_exists, table_columns, foreign keys, and more
    • Includes tests for write operations and memory databases
  3. test_internals_datasette.py:

    • Tests the Datasette class
    • Covers methods like get_database, sign/unsign, settings, and permissions
    • Includes tests for rendering templates and handling metadata
  4. test_internals_datasette_client.py:

    • Tests the Datasette client functionality
    • Covers different HTTP methods (GET, POST, etc.) and handling of base URLs
  5. test_internals_request.py:

    • Tests the Request class in Datasette
    • Covers parsing of POST variables, request arguments, and URL variables
    • Includes tests for various request properties
  6. test_internals_response.py:

    • Tests the Response class in Datasette
    • Covers different response types (HTML, JSON, redirect) and cookie handling
  7. test_internals_urls.py:

    • Tests URL generation in Datasette
    • Covers methods for generating paths, static URLs, and database/table URLs
    • Includes tests for handling base URLs and prefixes

These tests cover various internal components of Datasette, including database operations, request/response handling, URL generation, and client functionality. They use pytest for testing and include both synchronous and asynchronous tests.