Speach (🐍🍑, formerly texttaglib), is a Python 3 library for managing, annotating, and converting natural language corpuses using popular formats (CoNLL, ELAN, Praat, CSV, JSON, SQLite, VTT, Audacity, TTL, TTLIG, ISF, etc.)
Main functions are:
- Reading, editing, and writing ELAN transcriptions and related media files directly in ELAN Annotation Format (eaf)
- Cutting, converting, and merging audio/video files
- TTLIG (or TIG) - A human-friendly linguistic documentation format with intelinear gloss support
- Text corpus management using texttaglib format
- Multiple storage formats (text, CSV, JSON, SQLite databases)
- Speach documentation: https://speach.readthedocs.io/
- Source code: https://github.com/neocl/speach/
speach
is available on PyPI.
pip install speach
Speach can extract annotations and metadata from ELAN transcripts directly, for example:
from speach import elan
# Test ELAN reader function in speach
eaf = elan.read_eaf('./test/data/test.eaf')
# accessing tiers & annotations
for tier in eaf:
print(f"{tier.ID} | Participant: {tier.participant} | Type: {tier.type_ref}")
for ann in tier:
print(f"{ann.ID.rjust(4, ' ')}. [{ann.from_ts} :: {ann.to_ts}] {ann.text}")
Speach also provides command line tools for processing EAF files.
# this command converts an eaf file into csv
python -m speach eaf2csv input_elan_file.eaf -o output_file_name.csv
Processing media files
>>> from speach import media
>>> media.convert("~/Documents/test.wav", "~/Documents/test.ogg")
>>> media.cut("test.wav", "test_10-15.ogg", from_ts="00:00:10", to_ts="00:00:15")
Read Speach documentation for more information.
- Le Tuan Anh (Maintainer)
- Victoria Chua
The Speach logo () was created by using the snake emoji (created by Selina Bauder) and the peach emoji (created by Marius Schnabel) from Openmoji project. License: CC BY-SA 4.0
Contributors are welcome! If you want to help developing speach, please visit Contributing page.