Create virtual environment and activate it

python -m venv .venv
.venv/Scripts/activate

Install Dependencies

pip install -r requirements.txt

python main.py

Keep a transcript of all the things the user has heard (filter out the garbage)
Try using an open source TTS (or stick to elevenlabs).
Use voice converter? or is it too heavy in resources? I want Lilas Ikuta for Japanese and Kafka for english.
Plan out the hardware required to support teh software (this focuses on being lightweight, this is not AI, it is solely API)
Implement a deafen feature that only activates when you flick a switch or press a button
Find a translator API