Yammer provides asynchronous bindings to the Ollama API and the following CLI tools:
shellm
pass a file (or stdin if no file) to the generate endpoint and stream the result.oneshot
open a temporary file in an editor to be passed to the generate endpoint; stream the result.prompt
pass a prompt to the generate endpoint and stream the result.chat
chat with a model using the chat endpoint.chats
manage chat sessions.
$ cargo install yammer
The shellm tool multiplexes files over a model:
$ shellm --model llama3.2:3b << EOF
Why is the sky red?
EOF
I'm sorry. The sky is not red.
$ shellm --model llama3.2:3b foo bar
Response to foo...
Response to bar...
The oneshot tool is conceptually the same as editing a temporary file and passing it to shellm:
$ oneshot llama3.2:3b gemma2
Opens $EDITOR with a temporary file. Write your prompt and save the file.
Output of llama3.2:3b...
Output of gemma2....
The prompt tool is similar to shellm but takes prompts on the command line rather than files:
$ prompt llama3.2:3b "Why is the sky red?"
I'm sorry. The sky is not red.
The chat command is used to chat with a model:
$ chat
>>> Why is the sky red?
The sky often appears red at sunrise and sunset. ...
>>> :edit
>>> :model llama3.2:3b
>>> :retry
The sky often appears red at sunrise and sunset due to Rayleigh scattering. ....
>>> :param --num-ctx 4096
>>> :exit
The chats command is used to manage chat sessions:
$ chats
recent:
2024-12-01T18:26 FP8MC gemma2 Why is the sky red?
2024-12-01T17:34 H5HMV llama3.2:3b Hi there! Tell me about first and follow sets for parsers.
> pin FP8MC
> status
pinned:
2024-12-01T18:29 FP8MC gemma2 Why is the sky red?
recent:
2024-12-01T17:34 H5HMV llama3.2:3b Hi there! Tell me about first and follow sets for parsers.
> archive H5HMV
> status
pinned:
2024-12-01T18:29 FP8MC gemma2 Why is the sky red?
> chat FP8MC
>>> Why is the sky red?
The sky often appears red at sunrise and sunset. ...
>>> exit
> new "Act like Mario, the video game character."
>>> Hi!
Hiya! It'sa me, Mario!
>>> exit
> exit
$ shellm --help
USAGE: shellm [OPTIONS] [FILE]
Options:
-h, -help Print this help menu.
-ollama-host The host to connect to.
-model The model to use from the ollama library.
-suffix The suffix to append to the response.
-system The system to use in the template.
-template The template to use for the prompt.
-json Format the response in JSON. You must also ask the
model to do so.
-raw Whether to pass bypass formatting of the prompt.
-keep-alive Duration to keep the model in memory for after the
call.
-param-mirostat
Enable Mirostat sampling for controlling perplexity.
(default: 0, 0 = disabled, 1 = Mirostat, 2 = Mirostat
2.0)
-param-mirostat-eta
Influences how quickly the algorithm responds to
feedback from the generated text.
-param-mirostat-tau
Controls the balance between coherence and diversity
of the output.
-param-num-ctx The number of tokens worth of context to allocate.
-param-repeat-last-n
Sets how far back for the model to look back to
prevent repetition.
-param-repeat-penalty
Sets how strongly to penalize repetitions.
-param-temperature
The temperature of the model.
-param-seed Sets the random number seed to use for generation.
-param-tfs-z Tail free sampling is used to reduce the impact of
less probable tokens from the output.
-param-num-predict
Maximum number of tokens to predict when generating
text.
-param-top-k Reduces the probability of generating nonsense.
-param-top-p Works together with top-k.
-param-min-p Alternative to the top_p, and aims to ensure a balance
of quality and variety.
$ oneshot --help
USAGE: oneshot [OPTIONS] [MODEL]
Options:
-h, -help Print this help menu.
-ollama-host The host to connect to.
-suffix The suffix to append to the response.
-system The system to use in the template.
-template The template to use for the prompt.
-json Format the response in JSON. You must also ask the
model to do so.
-raw Whether to pass bypass formatting of the prompt.
-keep-alive Duration to keep the model in memory for after the
call.
-param-mirostat
Enable Mirostat sampling for controlling perplexity.
(default: 0, 0 = disabled, 1 = Mirostat, 2 = Mirostat
2.0)
-param-mirostat-eta
Influences how quickly the algorithm responds to
feedback from the generated text.
-param-mirostat-tau
Controls the balance between coherence and diversity
of the output.
-param-num-ctx The number of tokens worth of context to allocate.
-param-repeat-last-n
Sets how far back for the model to look back to
prevent repetition.
-param-repeat-penalty
Sets how strongly to penalize repetitions.
-param-temperature
The temperature of the model.
-param-seed Sets the random number seed to use for generation.
-param-tfs-z Tail free sampling is used to reduce the impact of
less probable tokens from the output.
-param-num-predict
Maximum number of tokens to predict when generating
text.
-param-top-k Reduces the probability of generating nonsense.
-param-top-p Works together with top-k.
-param-min-p Alternative to the top_p, and aims to ensure a balance
of quality and variety.
$ prompt --help
USAGE: prompt [OPTIONS] [PROMPT]
Options:
-h, -help Print this help menu.
-ollama-host The host to connect to.
-model The model to use from the ollama library.
-suffix The suffix to append to the response.
-system The system to use in the template.
-template The template to use for the prompt.
-json Format the response in JSON. You must also ask the
model to do so.
-raw Whether to pass bypass formatting of the prompt.
-keep-alive Duration to keep the model in memory for after the
call.
-param-mirostat
Enable Mirostat sampling for controlling perplexity.
(default: 0, 0 = disabled, 1 = Mirostat, 2 = Mirostat
2.0)
-param-mirostat-eta
Influences how quickly the algorithm responds to
feedback from the generated text.
-param-mirostat-tau
Controls the balance between coherence and diversity
of the output.
-param-num-ctx The number of tokens worth of context to allocate.
-param-repeat-last-n
Sets how far back for the model to look back to
prevent repetition.
-param-repeat-penalty
Sets how strongly to penalize repetitions.
-param-temperature
The temperature of the model.
-param-seed Sets the random number seed to use for generation.
-param-tfs-z Tail free sampling is used to reduce the impact of
less probable tokens from the output.
-param-num-predict
Maximum number of tokens to predict when generating
text.
-param-top-k Reduces the probability of generating nonsense.
-param-top-p Works together with top-k.
-param-min-p Alternative to the top_p, and aims to ensure a balance
of quality and variety.
$ chat --help
USAGE: chat [OPTIONS]
Options:
-h, -help Print this help menu.
-ollama-host The host to connect to.
-model The model to use from the ollama library.
-keep-alive Duration to keep the model in memory for after the
call.
-param-mirostat
Enable Mirostat sampling for controlling perplexity.
(default: 0, 0 = disabled, 1 = Mirostat, 2 = Mirostat
2.0)
-param-mirostat-eta
Influences how quickly the algorithm responds to
feedback from the generated text.
-param-mirostat-tau
Controls the balance between coherence and diversity
of the output.
-param-num-ctx The number of tokens worth of context to allocate.
-param-repeat-last-n
Sets how far back for the model to look back to
prevent repetition.
-param-repeat-penalty
Sets how strongly to penalize repetitions.
-param-temperature
The temperature of the model.
-param-seed Sets the random number seed to use for generation.
-param-tfs-z Tail free sampling is used to reduce the impact of
less probable tokens from the output.
-param-num-predict
Maximum number of tokens to predict when generating
text.
-param-top-k Reduces the probability of generating nonsense.
-param-top-p Works together with top-k.
-param-min-p Alternative to the top_p, and aims to ensure a balance
of quality and variety.
$ chats
> help
chats
=====
Commands:
status Show the status of all chats.
archive Archive a chat.
unarchive Unarchive a chat.
archived Show all archived chats.
pin Pin a chat.
unpin Unpin a chat.
pinned Show all pinned chats.
new Start a new chat.
chat Continue a chat.
editor Start a chat with a system message written in EDITOR.
Active development.
The latest documentation is always available at docs.rs.