ivrit-ai/ivrit.ai

Build a New Tagging App for Collecting Volunteer Readings

Opened this issue · 0 comments

High-Level Characterization Warning:
This is a high-level characterization. If you plan to start working on this, please contact @yairl or @yanirmr for further details and coordination.

Is your feature request related to a problem? Please describe.
The current system focuses on transcribing given audio. There is a need for a new tagging app that collects recordings from volunteers reading provided texts, which can extend the variety of texts and speakers in the dataset.

Describe the solution you'd like
Develop a new tagging app, either based on a Telegram bot or a web interface, to collect recordings of volunteers reading texts. The app should handle the following requirements:

1. Text Curation:

  • Curate texts with appropriate licenses for volunteer reading.
  • Ensure the texts are diverse and suitable for linguistic analysis.

2. Volunteer Interaction:

  • Provide an interface for volunteers to receive and read the curated texts.
  • Allow volunteers to submit their recordings easily.

3. Data Storage:

  • Store the collected recordings in a structured and secure manner.
  • Ensure proper metadata tagging (e.g., text ID, volunteer ID, timestamp).

4. Verification and Quality Control:

  • Implement a verification process to ensure the accuracy and quality of the recordings.
  • Conduct automated and/or manual checks for audio clarity, correctness of the read text, and proper metadata tagging.