TalkTales

Introduction
Motivation
Technologies
Getting Started
- Prerequisites
- Installation
Usage
- Main Application
- Utilities
Roadmap
Authors
License

Introduction

TalkTales is a specialized tool designed to assist deaf and hard-of-hearing individuals in their daily interactions. Unlike conventional speech-to-text services, TalkTales goes a step further by providing contextual understanding through speaker differentiation.

Motivation

The primary objective of this project is to develop a specialized tool aimed at assisting deaf individuals in their day-to-day interactions. While the overarching goal is to convert spoken language into text, the unique aspect of this project lies in its approach to speaker differentiation. By highlighting changes in the speaker's voice within the transcribed text, our tool aims to offer an enhanced contextual understanding, a feature often missing in traditional speech-to-text services.

Technologies

We leverage the open-source Vosk model for the core speech-to-text translation. However, our methodology diverges from mainstream solutions, as we are intent on reducing our dependence on machine learning algorithms. The goal is not merely to create a functional tool but to deepen our understanding of sound and voice phenomena. Most of the concepts are explained in detail inside the docs directory.

The development of the repository is currently halted due to high intensity of university tasks

Getting Started

Prerequisites

Python 3.10+ installed
git installed

Installation

First, clone the repository to your local machine:

git clone https://github.com/kryczkal/TalkTales.git ; cd TalkTales

Before installation of the dependencies, it is highly recommended to set up a virtual environment inside the project

python -m venv `myPythonEnv`
source `myPythonEnv`/bin/activate

Then install the python dependencies with pip

pip install -r requirements.txt

You are ready to go

Usage

Main Application

Run the application with

python main.py

or on linux

./main.py

Utilities

Various utilities are also provided alongside the main app. These include:

DiarizationTester

Program used to invoke Diarizers components without application frontend. It either loads an audio file or connects to live stream. It should then write speaker changes to stdout. If set to plot (With Settings file), it will also make plot of speakers across time.

Usage

python DiarizationTester.py [optional: filename]

./DiarizationTester.py [optional: filename]

Suggestions Library

The "Suggestions" folder serves as an experimental solution to address the challenges posed by artifacts in speech-to-text algorithms. These algorithms, while impressive, are not flawless. Even minor errors can significantly impede the smoothness of a conversation. To mitigate this, we've implemented a secondary layer of security that scans the transcribed sentences for anomalies. When it identifies an 'unlikely' word or phrase, it flags it as such and offers a more probable alternative. This enhancement leverages the Herbert language model. Unfortunately this approach has a limitation: it's too slow for real-time applications.

It can be tested with:

python src/suggestions/testing.py

./src/suggestions/testing.py

The script will ask for input sentences, and struggle to find improbable utterances in it, and suggest improvements.

Matlab Folder

The "Matlab" folder contains a suite of streamlined scripts designed specifically for the acquisition of sound samples across diverse settings. These scripts are integral to our broader project, which aims to analyze human speech in various environments and build a speech diarization tool. To this end, we generated a comprehensive range of auditory visualizations, including wave plots, spectrograms, and mel spectrograms. During the initial phases of our research and development, these visualizations functioned as foundational references, enhancing our understanding of the acoustics and patterns of human speech.

Roadmap

Authors

License

Distributed under the MIT License. See LICENSE.txt for more information.

kryczkal/TalkTales

TalkTales

Table of Contents

Introduction

Motivation

Technologies

Getting Started

Prerequisites

Installation

Usage

Main Application

Utilities

DiarizationTester

Usage

Suggestions Library

Matlab Folder

Roadmap

Authors

License