Speech_to_text_data_pipeline

Table of content

Overview
Install
Data
Folders

Overview

The purpose of this week’s challenge is to build a data engineering pipeline that allows recording millions of Amharic and Swahili speakers reading digital texts in-app and web platforms. There are a number of large text corpora we will use We will design and build a robust, large scale, fault tolerant, highly available Kafka cluster that can be used to post a sentence and receive an audio file. By the end of this project, we will produce a tool that can be deployed to process posting and receiving text and audio files from and into a data lake, apply transformation in a distributed manner, and load it into a warehouse in a suitable format to train a speech-t0-text model.

Install

git clone https://github.com/Reiten-10Academy/Speech_to_text_data_pipeline
cd Speech_to_text_data_pipeline
pip install -r requirements.txt

Data

Data can be found here

description

 Amharic news text classification dataset with baseline performance dataset:

folders

backend: a flask server and a bunch of python scripts that process data in pipeline

frontend: a react application.

extra: contains, notebooks, docs, and other development and testing files.

Authors

👤 Biniyam Belayneh
👤 Meron Abate
👤 Tewodros Kaderaleh
👤 Gezahegne Wondachew
👤 Hewan Mulu
👤 Titus Wachira
👤 Amal Abdallah

Show your support

Give a ⭐ if you like this project!

teddyk251/Speech_to_text_data_pipeline

Speech_to_text_data_pipeline

Overview

Install

Data

description

folders

Authors

Show your support