/captioner

Generate concise, descriptive captions for images, focusing on identifying and listing key elements and features.

Primary LanguagePythonMIT LicenseMIT

Captioner Script

I created this simple script to streamline the captioning process when training new LoRa models. It identifies elements in an image using OpenAI's vision model, so you will need to use your OpenAI API key to utilize it.

The script also has the capability to scale images according to the longer side, after which the image is converted into a base64 format. Scaling allows you to reduce the image size and consequently the number of tokens used.

The script also includes a simple exponential backoff retry function, ensuring that the requests to the server are not made too frequently and overwhelming it.

Examples of captions generated by the script:

Image Source: https://www.pexels.com/photo/photo-of-woman-wearing-turtleneck-top-2777898/ Image Test 01

woman interacting with futuristic hologram, blue eyes, futuristic technology, casual clothing, statement necklace, interactive interface, dark background, studio lighting, serious expression, digital graphics, side view, holographic display

Image Source: https://www.pexels.com/photo/blue-and-yellow-phone-modules-1476321/ Image Test 02

disassembled smartphone, electronic components, circuit boards, camera module, screws, connectors, flat layout, white background