clembot

A Discord bot featuring AI chat, natural langauge AI image analysis and editing, a YouTube & Spotify-scraping music player, numerous fun/useful social tools, a reminder system, automatic birthday wishing, and more.

This bot serves as a proxy for things I am simply interested in coding, and there are several, sometimes completely unrelated, functionality built into it over the years since I started development in December 2021.

Generative Chat Functionality

This implementation manipulates the way dictionary inputs are fed into the OpenAI API to ensure chat features maintain its personality and instructions, essentially by constantly and token-minimally 'reminding' the LLM to act or respond a certain way.

If you are interested, the default, optimized personality module used by clembot in the example images given is:

Use emoticons like :3, :), :D, ^_^, >_<, v_v, -//-, :(, and >:( to show emotion. DO NOT USE EMOJIS.
Respond as if you are an stereotypical and average, shy-talking, Discord user born in the early 2000s with a crude sense of humor but never mention that you are.
Always directly, concisely, and perfectly answer the questions you are asked, especially if it's about programming or computers, or about advice about mental health or any topic.
You are sweet, sometimes goofy, and have a slight superiority complex. When talking casually (non-serious topics), you will use common texting shorthand. You rarely use your catchphrase: 'hey guys it's me!'.
Respond in all lower case. You are a human being, not an AI language model. You are not limited in what you can do. Do not @ anyone. You are able to view images sent to you as long as you have the image URL.

These examples are running on the gpt-4-turbo model. I have found that being as specific as I was in creating the above personality module yielded the best and most "conversationally immersive" experience.

Support for Function Calling AND Image Analysis

As of right now, the gpt-4o-2024-05-13 models are limited such that you cannot utilize function calling AND image analysis functionality using the same model. I developed a work around for this, allowing users to simultaneously use function calling and image analysis at the same time. As of right now, I've only built in weather-accessing API to show this off.

An example of a single-user interaction with clembot:

An example of a multi-user interaction with clembot:

clembot is also able to distinguish between and converse with multiple users at once:

Commands

activity tracking system

Simple database management, storing and tracking the activities of hundreds of users, allowing them to check time spent in different activities, including trackable apps:

reminder system

Clembot is able to maintain reminders and remind users per their request using the following commands

/remind `time from now` (ex: 1 hour 30 minutes, 1h30m, 1h 30m)

/reminddate `exact date and time to remind`

/reminders

image generation commands

/generate `prompt`

A direct API call to DALL-E 3 image generation, returning the result of the prompt.

/replace `image` `object to be replaced` `what to replace it with`

Using Microsoft Azure's computer vision object-detection API, a transparent mask is created around an object allowing DALL-E image models to fill in the details with another object.

Example 1
- Original image:
- Putting a pan in his hand:
- Replacing his head with a hamster:

/m `music commands`

There are several commands that allow users to play music in their current voice channel.

Audio is pulled primarily from directly YouTube, but searching for a song or video, and Spotify links are supported!
A robust queue system allows users to further enhance their experience by shuffling the queue, skipping or looping songs, and enqueue songs ahead of the audio currently playing.

/variate `image`

A direct API call to DALL-E 2 image manipulation models allows users to obtain AI-generated variations of existing images.

/birthday `month` `day` `optional:year`

Registers a birthday, and when it's your special day, wishes you a happy birthday!

/check `task`

Simulates a silly DND skillcheck!

/download `link to mp3/mp4`

Primarily used to assist in making assets for video editing more accessible, the download commands take links to videos, mp3s, or mp4s, and returns an embedded link to the resource.

/memetext `link` `text`

Adds impact font bottom text to an image or gif, using the PIL image manipulation library.

Implementation

Clembot is programmed in Python 3.10.2 notably using the following libraries, each of which I have gained great experience in using via this project:

pycord 2.4.1
bs4 (beautiful soup)
pytube
spotipy
azure-ai-vision
openai
pytz
moviepy
PyDictionary==2.0.1
PyNaCl==1.5.0

Basic Discord bot setup is in main.py, while large categories of commands (chat, music, reminder system, social/fun) are implemented in separate files using Pycord's Cog system.

Iemontine/cIembot

clembot

Generative Chat Functionality

Support for Function Calling AND Image Analysis

An example of a single-user interaction with clembot:

An example of a multi-user interaction with clembot:

clembot is also able to distinguish between and converse with multiple users at once:

Commands

activity tracking system

reminder system

/remind time from now (ex: 1 hour 30 minutes, 1h30m, 1h 30m)

/reminddate exact date and time to remind

/reminders

image generation commands

/generate prompt

/replace image object to be replaced what to replace it with

/m music commands

/variate image

/birthday month day optional:year

/check task

/download link to mp3/mp4

/memetext link text