Speach - Documenting Natural languages

Welcome to speach’s documentation! Speach, formerly texttaglib, is a Python 3 library for managing, annotating, and converting natural language corpuses using popular formats (CoNLL, ELAN, Praat, CSV, JSON, SQLite, VTT, Audacity, TTL, TIG, ISF, etc.)

Main functions:

  • Text corpus management

  • Manipulating ELAN transcription files directly in ELAN Annotation Format (eaf)

  • TIG - A human-friendly intelinear gloss format for linguistic documentation

  • Multiple storage formats (text files, JSON files, SQLite databases)

Contributors are welcome! If you want to help developing speach, please visit Contributing page.

Installation

Speach is availble on PyPI.

pip install speach

ELAN support

Speach can be used to extract annotations as well as metadata from ELAN transcripts, for example:

from speach import elan

# Test ELAN reader function in speach
eaf = elan.read_eaf('./test/data/test.eaf')

# accessing tiers & annotations
for tier in eaf:
    print(f"{tier.ID} | Participant: {tier.participant} | Type: {tier.type_ref}")
    for ann in tier:
        print(f"{ann.ID.rjust(4, ' ')}. [{ann.from_ts} :: {ann.to_ts}] {ann.text}")

Speach also provides command line tools for processing EAF files.

# this command converts an eaf file into csv
python -m speach eaf2csv input_elan_file.eaf -o output_file_name.csv

More information:

Release Notes

Release notes is available here.

Contributors

Indices and tables