cloudyr/aws.polly

Can't read in PCM from AWS

muschellij2 opened this issue · 4 comments

Please specify whether your issue is about:

  • a possible bug
  • a question about package functionality
  • a suggested code or documentation change, improvement to the code, or feature request

Can't read in the PCM from AWS in tuneR:

library(aws.polly)
library(text2speech)
library(tuneR)

res = get_synthesis("hey it's John", voice = "Joanna", format = "pcm", rate = 16000)
tmp_pcm = tempfile(fileext = ".pcm")
writeBin(res, tmp_pcm)
tuneR::readWave(tmp_pcm)
#> Error in readChar(con, 4): invalid UTF-8 input in readChar()

The pcm_to_wav conversion function I made https://github.com/muschellij2/text2speech/blob/master/R/pcm_to_wav.R may be useful. Not sure if this package, but may be worthwhile for others if not incorporated here:

# https://github.com/muschellij2/text2speech/blob/master/R/pcm_to_wav.R
tmp_wav = pcm_to_wav(tmp_pcm)
tuneR::readWave(tmp_wav)
#> 
#> Wave Object
#>  Number of Samples:      14111
#>  Duration (seconds):     0.88
#>  Samplingrate (Hertz):   16000
#>  Channels (Mono/Stereo): Mono
#>  PCM (integer format):   TRUE
#>  Bit (8/16/24/32/64):    16

Created on 2019-08-14 by the reprex package (v0.3.0)

Session info
devtools::session_info()
#> ─ Session info ──────────────────────────────────────────────────────────
#>  setting  value                       
#>  version  R version 3.6.0 (2019-04-26)
#>  os       macOS Mojave 10.14.6        
#>  system   x86_64, darwin15.6.0        
#>  ui       X11                         
#>  language (EN)                        
#>  collate  en_US.UTF-8                 
#>  ctype    en_US.UTF-8                 
#>  tz       America/New_York            
#>  date     2019-08-14                  
#> 
#> ─ Packages ──────────────────────────────────────────────────────────────
#>  package       * version  date       lib source                           
#>  assertthat      0.2.1    2019-03-21 [1] CRAN (R 3.6.0)                   
#>  aws.polly     * 0.1.2    2016-12-08 [1] CRAN (R 3.6.0)                   
#>  aws.signature   0.5.2    2019-08-08 [1] CRAN (R 3.6.0)                   
#>  backports       1.1.4    2019-04-10 [1] CRAN (R 3.6.0)                   
#>  base64enc       0.1-3    2015-07-28 [1] CRAN (R 3.6.0)                   
#>  callr           3.3.1    2019-07-18 [1] CRAN (R 3.6.0)                   
#>  cli             1.1.0    2019-03-19 [1] CRAN (R 3.6.0)                   
#>  crayon          1.3.4    2017-09-16 [1] CRAN (R 3.6.0)                   
#>  curl            4.0      2019-07-22 [1] CRAN (R 3.6.0)                   
#>  desc            1.2.0    2019-07-10 [1] Github (muschellij2/desc@b0c374f)
#>  devtools        2.1.0    2019-07-06 [1] CRAN (R 3.6.0)                   
#>  digest          0.6.20   2019-07-04 [1] CRAN (R 3.6.0)                   
#>  evaluate        0.14     2019-05-28 [1] CRAN (R 3.6.0)                   
#>  fs              1.3.1    2019-05-06 [1] CRAN (R 3.6.0)                   
#>  glue            1.3.1    2019-03-12 [1] CRAN (R 3.6.0)                   
#>  highr           0.8      2019-03-20 [1] CRAN (R 3.6.0)                   
#>  htmltools       0.3.6    2017-04-28 [1] CRAN (R 3.6.0)                   
#>  httr            1.4.1    2019-08-05 [1] CRAN (R 3.6.0)                   
#>  jsonlite        1.6      2018-12-07 [1] CRAN (R 3.6.0)                   
#>  knitr           1.24     2019-08-08 [1] CRAN (R 3.6.0)                   
#>  magrittr        1.5      2014-11-22 [1] CRAN (R 3.6.0)                   
#>  MASS            7.3-51.4 2019-03-31 [1] CRAN (R 3.6.0)                   
#>  memoise         1.1.0    2017-04-21 [1] CRAN (R 3.6.0)                   
#>  pkgbuild        1.0.3    2019-03-20 [1] CRAN (R 3.6.0)                   
#>  pkgload         1.0.2    2018-10-29 [1] CRAN (R 3.6.0)                   
#>  prettyunits     1.0.2    2015-07-13 [1] CRAN (R 3.6.0)                   
#>  processx        3.4.1    2019-07-18 [1] CRAN (R 3.6.0)                   
#>  ps              1.3.0    2018-12-21 [1] CRAN (R 3.6.0)                   
#>  R6              2.4.0    2019-02-14 [1] CRAN (R 3.6.0)                   
#>  Rcpp            1.0.2    2019-07-25 [1] CRAN (R 3.6.0)                   
#>  remotes         2.1.0    2019-06-24 [1] CRAN (R 3.6.0)                   
#>  rlang           0.4.0    2019-06-25 [1] CRAN (R 3.6.0)                   
#>  rmarkdown       1.14     2019-07-12 [1] CRAN (R 3.6.0)                   
#>  rprojroot       1.3-2    2018-01-03 [1] CRAN (R 3.6.0)                   
#>  sessioninfo     1.1.1    2018-11-05 [1] CRAN (R 3.6.0)                   
#>  signal          0.7-6    2015-07-30 [1] CRAN (R 3.6.0)                   
#>  stringi         1.4.3    2019-03-12 [1] CRAN (R 3.6.0)                   
#>  stringr         1.4.0    2019-02-10 [1] CRAN (R 3.6.0)                   
#>  testthat        2.1.1    2019-04-23 [1] CRAN (R 3.6.0)                   
#>  text2speech   * 0.2.7    2019-08-14 [1] local                            
#>  tuneR         * 1.3.3    2018-07-08 [1] CRAN (R 3.6.0)                   
#>  usethis         1.5.1    2019-07-04 [1] CRAN (R 3.6.0)                   
#>  withr           2.1.2    2018-03-15 [1] CRAN (R 3.6.0)                   
#>  xfun            0.8      2019-06-25 [1] CRAN (R 3.6.0)                   
#>  yaml            2.2.0    2018-07-25 [1] CRAN (R 3.6.0)                   
#> 
#> [1] /Library/Frameworks/R.framework/Versions/3.6/Resources/library

In your text2speech package, have you observed the same issue with PCM file with other text2speech providers, or is it AWS specific?

I have generated a pcm file using your example and it plays fine using ffplay -f s16le -ar 16k.
so I assume the pcm file is fine.

I've only seen it with AWS, as the other provide WAV/MP3 files only, no PCM option that I can tell. I see it could read using ffplay, but can you read it in using tuneR?

No, like you've said, tuneR does not read it properly. Assuming the PCM file is fine, this is a bug (or just a non supported file type) on the tuneR side.

As far as I can see, there is either something wrong with the PCM file from AWS (although it reads fine with ffplay), or something wrong with tuneR. In any case, resolving this issue is out of the scope of this package. Thanks for flagging it though.