Spike: Investigate changing other editor fields with AI integrations

Question

Spike: Investigate changing other editor fields with AI integrations

tomusher opened this issue a year ago · 2 comments

Wagtail AI is currently limited to a rich text integration. It would be useful if we were able to make changes/read to other fields in the editor. e.g. adding streamfields, adjusting multiple rich text fields at once, etc.

This may require some way to pass the whole editor state to the package, make some changes, and have those changes re-applied.

Relevant discussion from Wagtail Slack:

Tom Usher
Thinking about architecture for future wagtail-ai features where we might want to manipulate the current state of the editor as a whole rather than thinking in individual fields.
My first thoughts went to having some way to serialise the current state of the editor, ship it to the backend, run user-defined hooks on that state, return the updated state to the frontend and have those changes applied to the editor page.
I’d imagine something like this could be the basis for future features like collaborative editing too.
My questions are:

How close are we to this with how the preview system works at the moment?

Have there been any other discussions/issues I can reference around this?

Are there any other recommendations/plans for manipulating editor state as a whole like this?

sage
I suppose this might be related to what mattwestcott is planning to do with Telepath? Though I think the current Telepath setup is oriented around individual input widgets rather than the all-encompassing state.

sage
As for the preview panel, it works by POST-ing the form with FormData in the JS to the preview endpoint, which saves the form data in the user's session to be deserialized when accessing the preview endpoint with GET. It doesn't handle request.FILES though

Tom Usher
I suppose there are things about the state that FormData wouldn’t fully represent (order, field choices, etc.) so I guess this doesn’t have much crossover with preview. (edited)

mattwestcott
Yep, that sounds like a good fit for the forthcoming telepath work. The idea there is to expose a JS API for retrieving and setting form field values, as well as things like inserting new form elements (for InlinePanel etc)...

mattwestcott
Currently it's only oriented towards managing elements that have been created through telepath itself - so it works well for StreamField because we can go "here's some JSON data, construct a form for it". We can't currently point to an existing form on the page and go "telepath-ify this"

mattwestcott
but once we do, I think it'll make sense to have a JS object representing the form as a whole, so that we can do things like serialising it to pass to an API

Tom Usher
Great that makes sense, and sounds like a good foundation to build on for something like this

Answer 1 · 2023-12-01T09:53:21.000Z

If we’re talking about text fields, this overlaps quite a bit with RFC 60: Draftail Usage for General Text Entry #60, and RFC 46: Single-line rich text fields (which I took quite far towards completing in wagtail/wagtail#7249).

The purpose of that work wasn’t necessarily to be able to add rich text everywhere, but rather to have character-by-character content interaction APIs for most or all fields.

I’m not clear what kind of interaction pattern we’d go for here where someone would want to "AI-ify" a whole page in one go. We’ve been aiming for more granular field-by-field APIs for now, because we generally felt people wanted some fair amount of control over the specific field where any kind of "smart text" features would apply – so having a toolbar with controls, and ways to annotate specific elements within a field. We’d also not want to lose the form’s field-level undo-redo history, which Draft.js implements for rich text fields, but we’d have to implement ourselves if we auto-changed other field types.

Answer 2 · 2023-12-01T10:42:56.000Z

That makes sense thanks @thibaudcolas

I think the patterns here fall in to two categories:

Changes where one field (or content quality utility) needs to have the full editor context to make decisions; e.g. generate a TL;DR/Easy Read version of a page, suggesting changes according to some prompt (rather than applying them), generating a meta description for a page, etc.
Processes where the LLM may be able to make more sweeping changes; e.g. apply our tone of voice to every field in this page, complete all remaining fields on the page based on a prompt, etc.

The former is probably the most useful with the current state of LLMs and seems like it would be fine implemented with per-field APIs as long as we have a way for that field to receive the full serialised editor state.

I raised collaborative editing in my Slack question because it seems like for this to be a viable in Wagtail's future, we'd need to be able to represent and rebuild the editor state from a set of CRDTs, which might also make applying changes from an LLM easier.