Swift is a fast AI voice assistant.
- Groq is used for fast inference of OpenAI Whisper (for transcription) and Meta Llama 3 (for generating the text response).
- Cartesia's Sonic voice model is used for fast speech synthesis, which is streamed to the frontend.
- VAD is used to detect when the user is talking, and run callbacks on speech segments.
- The app is a Next.js project written in TypeScript and deployed to Vercel.
Thank you to the teams at Groq and Cartesia for providing access to their APIs for this demo!
- Clone the repository
- Copy
.env.example
to.env.local
and fill in the environment variables. - Run
pnpm install
to install dependencies. - Run
pnpm dev
to start the development server.