wav2textgrid

The goal of wav2textgrid is to take a two channel wav file of a conversation and turn it into a transcribed textgrid ready for further analysis in R or Praat. To use the package, you’ll need R, Praat, and python installed.

Installation

You can install the development version of wav2textgrid from GitHub with:

# install.packages("devtools")
devtools::install_github("TysonStanley/wav2textgrid")

Example

Python Environment Set Up

library(wav2textgrid)
py_setup(path)
# this runs the following
# wav2textgrid relies on the python (the OpenAI whisper model)
# reticulate::use_miniconda("path/r-miniconda-arm64")
# if the following python packages are not installed, use:
# packages = c("ffmpeg-python", "numpy", "scipy", "setuptools-rust", "pydub", "llvmlite", "librosa", "numba",
#              "Cmake", "wheel", "setuptools-rust", "pytorch", "torchvision")
# reticulate::conda_install(envname = "~/Desktop/Automation/r-py-automate", packages = packages)
# reticulate::py_install("openai-whisper", pip = TRUE, pip_options = "-U")
# reticulate::py_install("light-the-torch", pip = TRUE, pip_options = "-U")

Run `auto_textgrid()`

wav_file = "path/example.wav"
auto_textgrid(wav_file)

#> → Default prompt:
#> 'I was like, was like, I'm like, um, ah, huh, and so, so um, uh, and um, mm-hmm, like um, so like, like it's, it's like, i mean, yeah, uh-huh, hmm, right, ok so, uh so, so uh, yeah so, you know, it's uh, uh and, and uh'
#> ✔ Step 1 of 5 [2.6s]
#> ✔ Step 2 of 5 [1.8s]
#> ✔ Step 3 of 5 [5m 11.3s]                                                
#> ✔ Step 4 of 5 [262ms]
#> ✔ Step 5 of 5 [68ms]
#> ℹ Written to path/example_output.TextGrid

The output is a textgrid that can be loaded directly into Praat or can be read into R via readtextgrid package. For the wav file in the example above (found on GitHub in the inst folder), running the code above gives us the following TextGrid (as shown by reading it in using readtextgrid).

#> # A tibble: 254 × 6
#>    tier_num tier_xmin tier_xmax  xmin  xmax text                                
#>       <dbl>     <dbl>     <dbl> <dbl> <dbl> <chr>                               
#>  1        1         0      281.  0     1.49 n                                   
#>  2        1         0      281.  1.49  4.43 i am participant one on channel one 
#>  3        1         0      281.  4.43  8.31 n                                   
#>  4        1         0      281.  8.31  8.96 ok                                  
#>  5        1         0      281.  8.96  9.48 n                                   
#>  6        1         0      281.  9.48 11.7  um you want to start in the top lef…
#>  7        1         0      281. 11.7  15.4  n                                   
#>  8        1         0      281. 15.4  15.7  yes                                 
#>  9        1         0      281. 15.7  17.8  n                                   
#> 10        1         0      281. 17.8  19.4  mine says peggy sue                 
#> # ℹ 244 more rows

Importantly, we can control a number of features of the transcription to make it more accurate. In this case, we used a noise reduction program and set the threshold of silences to -30 dB. We also use the larger “small” model from Whisper.

auto_textgrid(wav_file, noise_reduction = TRUE, threshold = -30, model_type = "small")

We do recommend the “small” model or larger for more accurate transcription, especially when using the default prompt.

TysonStanley/wav2textgrid

wav2textgrid

Installation

Example

Python Environment Set Up

Run auto_textgrid()

Run `auto_textgrid()`