Naming
domenic opened this issue · 4 comments
We are generally not very happy with the current naming of the API, for various reasons.
-
The current API is centered around the concept of a "text session". This is problematic if, in the future, we have multi-modal models. A "language model session" might be more accurate, but it's a very long name.
-
The most familiar public term is "large language model". This is very long, but could perhaps be abbreviated to "LLM". But this doesn't mesh well with some recent efforts, e.g. from Microsoft, to brand models small enough to run on-device as "small language models".
-
Other APIs often use nouns like "chat" or "assistant". Those feel too specific to us, or might give the wrong impression that the on-device model is capable of fulfilling chat/assistant use cases, but perhaps we should go with the majority.
-
We've found some sites are unable to use this API because minifiers already create global
self.ai
variables. Although we kind of like the idea of grouping all AI-related APIs (prompting, translation, etc.) underself.ai
, maybe we should abandon that idea.
One thing to note is that other APIs often work by "creating a model", and then prompting that model. Given the explainer's discussion of "How many stages to reach a response?" and "Stateless or session-based", this doesn't seem to fit as well for us. We could have separate create-model, then create-session, then prompt steps, but it's not clear what the first one would add. Or we could rename "session" to "model" because it's a nicer and more-recognized name, but that seems confusing.
Taking all this into consideration, our current best proposal for a possible rename is the following:
self.languageModel
languageModel.canCreateSession()
languageModel.createSession()
languageModel.ondownloadprogress
languageModel.info()
languageModel.prompt()
Does this seem better than the current naming to folks?
Also, "Prompt API" might be confused with the HTML spec's user prompts and window.prompt()
.
Any reason this API can't be exposed via the navigator
interface? Not sure if that is a great idea or not but it seems to be the stuffing ground of lots of different APIs and stops the global clash issue.
As for naming I think languageModel
is much better than ai
. Removing text from the name is good as I hope this api will have vision one day too via canvas, imgs etc. The current Chrome implementation has/had a createGenericSession too which was confusing as to when that would be chosen over the text session.
Any reason this API can't be exposed via the
navigator
interface?
Because it's not related to user agent data. See w3ctag/design-principles#448 (comment) .
Well, I think languageModel goes in the line of the LLM. while Session is more like each user session.
obvious but hey let me explain haha
Sessions are
Capable of executing prompts
Connected to models
Dependent on models
Models are
Capable of providing a session inference area
Capable of communicating with hardware / window
I say this, because I like the concept of sessions x models.
Clearing out what a model is and a session is opens up to a more generic approach, and this model x session approach is also what is being used in sandbox implementation with RCEs in other places out there for llms.
So I would be a big fã of
session = await lm.createTextSession({ systemPrompt })
session.prompt ( exposed inference session, over a on browser model ).
for loading the lm. window.lm.create if necessary, avoiding two steps. Up to any to be honest. emphasys on keeping "inference area" x "language model" two different stuff