ToyGeniusLab invites kids into a world where they can create and personalize AI-powered toys. By blending technology with imaginative play, we not only empower young minds to explore their creativity but also help them become comfortable with harnessing AI, fostering tech skills in a fun and interactive way.
- 🎨 Customizable AI Toys: Kids design their toy's personality and interactions.
- 📚 Educational: A hands-on introduction to AI, programming, and technology.
- 💡 Open-Source: A call to the community for ongoing enhancement of software and 3D-printed parts.
- 🤖 Future Enhancements: Plans to add servos, displays, and more for a truly lifelike toy experience.
- 🔊 Enhanced Audio Detection: Improved silence detection and audio processing for better interactions.
- 🎯 Debug Mode: Detailed feedback about audio levels and device status for easier troubleshooting.
- Python 3.x
- OpenAI API key
- Eleven Labs API key
- FFmpeg (
brew install ffmpeg
) - MPV (
brew install mpv
) - Required Python packages (see requirements.txt)
# Clone the repository
git clone https://github.com/sidu/toygeniuslab.git
cd toygeniuslab
# Install requirements
pip install -r requirements.txt
# Install system dependencies (macOS)
brew install ffmpeg mpv
Set up your API keys as environment variables:
# OpenAI API Key
export OPENAI_API_KEY="your-api-key-here"
# Eleven Labs API Key
export ELEVEN_API_KEY="your-eleven-api-key-here"
The easiest way to get started is using our example agent implementation:
python character_agent.py --config configs/ghost.yaml
This creates an interactive agent that:
- Manages the character's lifecycle
- Handles continuous listening and response cycles
- Provides visual feedback during interactions
- Manages conversation state
Try different character configurations from the configs/
directory or create your own!
The system now includes enhanced audio debugging features:
- Displays available audio devices
- Shows real-time audio levels
- Provides detailed silence detection feedback
- Reports ambient noise levels
To optimize audio detection:
- Run the program to see audio device information
- Monitor the debug output for audio levels
- Adjust silence threshold in config if needed
Before running the project, make sure you have a portable Bluetooth microphone and speaker connected to your computer. Ensure that they are selected as the default input and output devices. For best experience, we recommend purchasing a mini bluetooth speaker/mic combo, like LEICEX Mini Speaker from Amazon (costs ~$10).
-
Connect your Bluetooth microphone and speaker to your computer following the manufacturer's instructions.
-
On Windows:
- Right-click on the Speaker icon in the taskbar and select "Open Sound settings."
- Under the "Input" section, select your Bluetooth microphone from the dropdown.
- Under the "Output" section, select your Bluetooth speaker from the dropdown.
-
On macOS:
- Open "System Preferences" and click on "Sound."
- Go to the "Input" tab and select your Bluetooth microphone.
- Go to the "Output" tab and select your Bluetooth speaker.
- Download and print the Mario template.
- After pairing a Bluetooth speaker/microphone with your computer, insert it into the paper toy.
- Execute the AI toy program by running python
pet.py mario.yaml
in your terminal. Get ready for interactive fun!
- Begin with downloading the blank template. You can digitally color it or use markers and crayons for a hands-on approach. You can also grab a slightly edited version of it from our repo here (has a blank face for more creative options).
- Insert a Bluetooth speaker/microphone into your custom-designed toy, ensuring it's paired with your computer first.
- Make a copy of an existing toy's config by running
cp mario.yaml mytoy.yaml
. - Update the
system_prompt
property inmytoy.yaml
according to the personality you want your toy to have. - Optionally, update the
voice_id
property inmytoy.yaml
with the value of the voice you'd like your toy to have from ElevenLabs.io. - Activate your AI toy by executing python
pet.py mytoy.yaml
in your terminal. Enjoy your creation's company!
Caught a fun moment with your AI toy? We'd love to see it! Share your experiences and creative toy designs on social media using the hashtag #ToyGeniusLab. Let's spread the joy and inspiration far and wide!
Love ToyGeniusLab? Give us a ⭐ on GitHub to stay connected and receive updates on new features, enhancements, and community contributions. Your support helps us grow and inspire more creative minds!
We're dreaming big for ToyGeniusLab's next steps and welcome your brilliance to bring these ideas to life. Here's what's on our horizon:
- More pets
- Solid local E2E execution: local LLM, local transcription, local TTS
- Local fast transcription and TTS
- SD based generation of custom pets
- Latency improvements
- Interruption handling
- Vision reasoning, with local VLLM support
- Servos for movement
- 3D printable characters
- "Pet in a box" (Raspberry-Pi)
Help shape ToyGeniusLab's tomorrow: Raise PRs for innovative features or spark conversations in our Discussions. 🌟
Overview of how the toy works.
The project consists of two main components:
-
AICharacter (ai_character.py)
- Core character capabilities (speech, vision, thinking)
- Audio processing and silence detection
- Visual animation (optional)
- LLM integration (GPT-4, Groq, Ollama)
- Text-to-speech via ElevenLabs
-
Character Agent (character_agent.py)
- Creates and manages AICharacter instances
- Implements the interaction loop
- Handles user feedback
- Manages conversation flow
- Provides progress indicators
Here's how they work together:
You can create your own agent implementation using the AICharacter
class. Here's a basic example:
from ai_character import AICharacter
class AICharacterAgent:
def __init__(self, config_path):
# Initialize character with configuration
self.character = AICharacter(config=self.load_config(config_path))
# Add callback for speaking state changes
self.character.add_speaking_callback(self.speaking_state_changed)
def speaking_state_changed(self, is_speaking):
if is_speaking:
print("\nSpeaking", end='', flush=True)
else:
print("\nSpeech finished!")
def run(self):
while True:
# Listen for user input
user_input = self.character.listen()
# Get AI response
response = self.character.think_response(user_input)
# Speak the response
self.character.speak(response)
character = AICharacter(config={
'sampling_rate': 44100,
'num_channels': 1,
'dtype': 'float32',
'silence_threshold': 0.01,
'system_prompt': 'Your character prompt here',
# ... other config options
}, debug=False)
-
listen(): Records audio until silence is detected and returns transcribed text
user_input = character.listen() # Returns transcribed text or None
-
think_response(user_input): Generates AI response based on user input
response = character.think_response("Hello!") # Returns AI-generated response
-
speak(text): Converts text to speech and animates the character (if images provided)
character.speak("Hello, I'm your AI character!")
def on_speaking_changed(is_speaking):
print("Character is speaking:" if is_speaking else "Character finished speaking")
character.add_speaking_callback(on_speaking_changed)
- get_speaking_state(): Returns whether the character is currently speaking
- cleanup(): Properly closes resources when done
sampling_rate: 44100
num_channels: 1
dtype: "float32"
silence_threshold: 0.01
ambient_noise_level_threshold_multiplier: 2.0
silence_count_threshold: 30
max_file_size_bytes: 10485760
enable_lonely_sounds: false
enable_squeak: false
system_prompt: "You are a friendly AI character..."
voice_id: "your-eleven-labs-voice-id"
model: "gpt-4-vision-preview"
character_closed_mouth: "assets/closed.png" # Optional: for visual animation
character_open_mouth: "assets/open.png" # Optional: for visual animation
enable_vision: true
greetings:
- "Hello!"
- "Hi there!"
lonely_sounds:
- "sound1.mp3"
- "sound2.mp3"
See character_agent.py for a complete implementation example.
MIT