Caption Helper

Caption Helper is a simple tool designed to assist in creating, managing, and enhancing captions for images, particularly for use in Stable Diffusion training.

Preview

Features

Image upload: Drag and drop or select multiple images
Caption editing: Manually edit captions for each image
AI-powered caption enhancement:
- Enhance: Improve existing captions
- Extend: Add more details to captions
- Interrogate: Generate new captions based on image content
Image management: Delete unwanted images
Image cropping: Adjust image framing
Export: Save captions (and optionally images) as a ZIP file

How to Use

Getting Started

Clone the project or use the live version at https://sd-caption-helper.vercel.app/.
Click the "Settings" button to add your API keys:
- Groq API key (for enhance and extend features)
- OpenAI API key (for the interrogate feature)

Uploading Images

Drag and drop images onto the upload area, or click to select files from your device.
Uploaded images will appear in the sidebar on the left.

Editing Captions

Click on an image in the sidebar to select it.
The selected image will appear in the main view along with its caption.
Edit the caption directly in the text box provided.

Using AI Features

For each image, you can use the following AI-powered features:

Enhance: Click to improve the existing caption.
Extend: Click to add more details to the current caption.
Interrogate: Click to generate a new caption based on the image content.

Image Management

Delete: Remove the current image from your collection.
Crop: Adjust the framing of the current image.

Navigation

Use the navigation buttons to move between images in your collection.

GPT Options

Click the "GPT Options" button to set:

Custom Token: A specific token to include in generated captions.
Custom Instruction: Additional instructions for the AI when generating captions.
Inherent Attributes: Attributes to avoid in generated captions.

Exporting

Click the "Export" button.
Choose whether to include images in the export.
Optionally, choose to rename images sequentially and set a prefix.
Click "Export" to download a ZIP file containing your captions (and images if selected).

Caption Helper - Development Guide

This document provides technical details about the Caption Helper project, including setup instructions, architecture overview, and implementation details of key features.

Tech Stack

Next.js 13+ (App Router)
React
TypeScript
NextUI for UI components
Tailwind CSS for styling
react-dropzone for file uploads
react-easy-crop for image cropping
JSZip for creating ZIP files
Groq API for caption enhancement and extension
OpenAI API (GPT-4o) for image interrogation

Project Structure

caption-helper/
📦app
 ┣ 📂api
 ┃ ┣ 📂gpt-interrogate
 ┃ ┃ ┗ 📜route.ts
 ┃ ┣ 📂groq-enhance
 ┃ ┃ ┗ 📜route.ts
 ┃ ┗ 📂groq-extend
 ┃ ┃ ┗ 📜route.ts
 ┣ 📂blank-page
 ┃ ┣ 📜layout.tsx
 ┃ ┗ 📜page.tsx
 ┣ 📜error.tsx
 ┣ 📜layout.tsx
 ┣ 📜page.tsx
 ┗ 📜providers.tsx
 📦components
 ┣ 📜CaptionEditor.tsx
 ┣ 📜ExportOptionsModal.tsx
 ┣ 📜GptOptionsModal.tsx
 ┣ 📜ImageViewer.tsx
 ┣ 📜Navigation.tsx
 ┣ 📜Settings.tsx
 ┣ 📜Sidebar.tsx
 ┣ 📜icons.tsx
 ┣ 📜navbar.tsx
 ┣ 📜primitives.ts
 ┗ 📜theme-switch.tsx
 📦config
 ┣ 📜fonts.ts
 ┗ 📜site.ts
 📦lib
 ┣ 📜types.ts
 ┗ 📜utils.ts

Setup and Installation

Clone the repository:

git clone https://github.com/markuryy/caption-helper.git
cd caption-helper

Install dependencies:
```
bun i
```

Create a .env.local file in the root directory and add your API keys:

GROQ_API_KEY=your_groq_api_key
OPENAI_API_KEY=your_openai_api_key

Run the development server:
```
bun dev
```

Key Features Implementation

Image Upload

Uses react-dropzone for handling file uploads.
Implemented in app/page.tsx within the onDrop function.
Processes uploaded files using the processUploadedFiles utility function.

Caption Editing

Implemented in the CaptionEditor component.
Uses controlled input for real-time updates.

AI-powered Caption Enhancement

Implemented in app/page.tsx within the handleCaptionAction function.
Uses separate API routes for each action:
- app/api/groq-enhance/route.ts for enhancing captions
- app/api/groq-extend/route.ts for extending captions
- app/api/gpt-interrogate/route.ts for generating captions from images

Image Cropping

Uses react-easy-crop library.
Implemented in the ImageViewer component.

Export Functionality

Implemented in app/page.tsx within the handleExport function.
Uses JSZip to create ZIP files containing captions and optionally images.
Export options are managed through the ExportOptionsModal component.

State Management

Uses React's useState hook for local state management.
Global states (like API keys) are stored in localStorage and managed through the Settings component.

API Integration

Groq API

Used for caption enhancement and extension.
API calls are made from the server-side API routes to protect API keys.

OpenAI API (GPT-4o)

Used for image interrogation (generating captions from images).
Implemented in app/api/gpt-interrogate/route.ts.
Uses the GPT-4o model for omni-modal capabilities.

Image Processing

Client-side image downscaling is implemented in lib/utils.ts using the downscaleImage function.
This ensures that large images are properly handled when sent to the GPT-4o API.

Styling

Uses a combination of NextUI components and Tailwind CSS for styling.
Global styles are defined in styles/globals.css.

Future Development

Consider implementing server-side session storage for better state management across page reloads.
Explore options for batch processing of images for more efficient handling of large collections.
Implement user authentication to allow for saved projects and user-specific settings.

Contributing

Fork the repository.
Create a new branch for your feature or bug fix.
Make your changes and commit them with descriptive commit messages.
Push your changes to your fork.
Submit a pull request to the main repository.

Please ensure that your code follows the existing style conventions and includes appropriate tests.

Tips for Stable Diffusion Training

Use clear, descriptive captions that accurately represent the image content.
Include relevant details but avoid overly specific or unique identifiers.
Experiment with the AI enhancement features to generate diverse captions.
Use the custom token and instruction features to tailor captions to your specific training needs.

Support

For issues, feature requests, or contributions, please visit the GitHub repository.