Audio Agents is a Clojure-based project that demonstrates how to create an audible conversation system using OpenAI's GPT models. This project is designed for Clojure enthusiasts who want to explore the integration of audio input/output with AI-driven conversational agents.
- Audio Input: Capture audio from a microphone and transcribe it into text using OpenAI's transcription API.
- Audio Output: Convert text responses into speech using OpenAI's text-to-speech (TTS) capabilities.
- Conversational Agents: Create agents with customizable personas and system prompts.
- Dialogue System: Facilitate back-and-forth conversations between agents and users.
├── deps.edn ; Project dependencies
├── dev/ ; Development-specific files
│ ├── dev.clj
│ └── user.clj
├── resources/ ; Resources for prompts and audio
│ ├── output.wav
│ └── prompts/ ; These prompts are mostly illustrative, written by gpt
│ ├── persona-base.md
│ └── personas/
│ └── flurbos-fan.md
│ └── grumbos-fan.md
│ └── mark-twain.md
│ └── sketch-artist.md
├── src/ ; Source code
│ ├── audio/ ; Audio input/output utilities
│ │ ├── microphone.clj
│ │ └── playback.clj
│ ├── ayyygents/ ; Workflow utilities for agents
│ │ └── workflow.clj
│ ├── examples/ ; Example usage
│ │ └── conversation.clj
│ │ └── debate.clj
│ │ └── visual.clj
│ └── openai/ ; OpenAI API integration
│ | └── core.clj
│ └── visual/ ; Tools for showing things
│ │ └── popup.clj ; Sweet jframe display
│ │ └── viewer.clj ; Show jframes in a separate JVM
- Clojure CLI installed on your system.
- An OpenAI API key for transcription, TTS, and GPT functionalities.
-
Clone the repository:
git clone <repository-url> cd audio-agents
-
Add your OpenAI API key to the environment.
These examples use the official OpenAI Java SDK, which requires setting the OPENAI_API_KEY environment variable.
user => (System/getenv "OPENAI_API_KEY") ;; make sure key is in envFor now examples can be run from the REPL. Usage can be found at the bottom of the files in src/examples/*.clj.
If you have the clojure cli installed, you can get running pretty quickly. CD into the checked out project and then run:
clj -A:dev
user=> (dev)
dev=> (conversation) ;;; or (debate) or (sketch-artist)
examples.conversation=> (def ch (chat-with-gpt params))Demonstrates a core.async flow for having an ongoing conversation with a gpt fueled persona. Will request access to your microphone.
See src/audio/microphone.clj for options. Attempts to make conversation more natural by checking for periods of silence automatically.
Just speak until you're done!
The conversation can be stopped by closing the core.async channel returned by chat-with-gpt
Captures audio input from the microphone and transcribes it into text.
Creates a conversational agent with a customizable persona and system prompt.
Adds text-to-speech capabilities to a conversational agent.
Starts a dialogue with GPT using a specified persona and system prompt.
Builds on the conversation example, but it is instead two agents talking to one another audibly.
Returns a dialogue channel that starts conversation between two agents. Audio playback stops when the channel is closed.
Experimenting with a conversational agent that has the ability to show images
Returns a dialogue channel that causes a sketch artist to begin an audible interview with you. When the sketch artist
has enough information, it will display an image via dall-e-3.
- Personas: Add new persona prompts in the
resources/prompts/personas/directory. - System Prompts: Modify the base system prompt in
resources/prompts/persona-base.md. - Voice and Instructions: Customize the voice and playback instructions in the
chat-with-gptfunction.
Contributions are welcome! Feel free to open issues or submit pull requests to improve the project.