/gpt-video

A reproduction of the Gemini demo using GPT-vision.

Primary LanguageJavaScript

GPT Video - Reproducing the Gemini demo using GPT 4 Vision

Screenshot of the App

🌌 Overview

After seeing the 'gemini' video, I asked myself: Could the 'gemini' experience showcased by Google be more than just a scripted demo? This project is a fun experiment to explore the feasibility of real-time AI interactions similar to those portrayed in 'gemini'.

🛠 Stack

  • Next.js with App Router.
  • Vercel AI npm module.
  • OpenAI's Whisper and GPT APIs.

🚀 Getting Started

You can provide the `OPENAI_API_KEY`` environment variable or let the user provide its own API key in the UI.

First, run the development server:

npm run dev
# or
yarn dev
# or
pnpm dev
# or
bun dev

🔧 Constants

Some constants are fixed at the top of /src/app/page.js. You may want to tweak these :

const INTERVAL = 250;
const IMAGE_WIDTH = 512;
const IMAGE_QUALITY = 0.6;
const COLUMNS = 4;
const MAX_SCREENSHOTS = 60;
const SILENCE_DURATION = 2500;
const SILENT_THRESHOLD = -30;