rstudio/bookdown-demo

EPUB output do not have figure and table identifiers while compiling bookdown-demo book

N0rbert opened this issue · 3 comments

Steps to reproduce

  1. Have Ubuntu 18.04.2 LTS installed

  2. Install latest pandoc-2.7.2-1-amd64.deb with

    cd ~/Downloads
    wget https://github.com/jgm/pandoc/releases/download/2.7.2/pandoc-2.7.2-1-amd64.deb
    sudo apt install ./pandoc-2.7.2-1-amd64.deb
    
  3. Install R with sudo apt-get install r-base-dev

  4. Launch R console to install bookdown package:

    $ R
    install.packages('bookdown')
    
  5. Clone bookdown-demo repository

    sudo apt-get install git
    git clone https://github.com/rstudio/bookdown-demo.git
    cd bookdown-demo
    
  6. Compile the demo book to EPUB format

    Rscript -e "bookdown::render_book('index.Rmd', 'bookdown::epub_book')"
    
  7. Install epubcheck with sudo apt-get install epubcheck default-jre and launch it against the compiled epub document:

    epubcheck _book/bookdown-demo.epub
    

Expected result - the produced EPUB document is correct and do not have errors

Actual result - the produced EPUB document has errors:

Validating using EPUB version 3.0.1 rules.
WARNING(ACC-009): _book/bookdown-demo.epub/EPUB/text/ch001.xhtml(82,446): MathML should either have an 'alttext' attribute or 'annotation-xml' child element.
ERROR(RSC-005): _book/bookdown-demo.epub/EPUB/text/ch002.xhtml(87,74): Error while parsing file 'value of attribute "width" is invalid; must be an integer'.
ERROR(RSC-012): _book/bookdown-demo.epub/EPUB/text/ch002.xhtml(82,247): Fragment identifier is not defined.
ERROR(RSC-012): _book/bookdown-demo.epub/EPUB/text/ch002.xhtml(92,123): Fragment identifier is not defined.
ERROR(RSC-012): _book/bookdown-demo.epub/EPUB/text/ch002.xhtml(92,252): Fragment identifier is not defined.

Check finished with errors

epubcheck completed

and user can't click on table and image references.
The problematic lines may be determined by unzipping epub and showing its contents:

cd _book
unzip bookdown-demo.epub
pluma EPUB/text/ch002.xhtml

$ awk 'NR==82' EPUB/text/ch002.xhtml
<p>You can label chapter and section titles using <code>{#label}</code> after them, e.g., we can reference Chapter <a href="#intro">2</a>. If you do not manually label them, there will be automatic labels anyway, e.g., Chapter <a href="#methods">4</a>.</p>

$ awk 'NR==92' EPUB/text/ch002.xhtml
<p>Reference a figure by its code chunk label with the <code>fig:</code> prefix, e.g., see Figure <a href="#fig:nice-fig">2.1</a>. Similarly, you can reference tables generated from <code>knitr::kable()</code>, e.g., see Table <a href="#tab:nice-tab">2.1</a>.</p>

Notes:

  1. Issue persists with Pandoc 1.19 from the repository.
  2. I'm not sure about correct component to report bug. Should it be bookdown, rmarkdown, knitr or pandoc?

I'm getting the same issues with our book on Leanpub, but it uploads OK. I think it's bookdown, but I don't know what's going on with the ePub rendering.

Seems as though the id need be defined Seems as though this already works for references, but not for figures/tables.

When figures are generated you get something like <div class="figure" style="text-align: center">, but it should be <div class="figure" style="text-align: center" id="fig:myfig">

Here's a reprex that demonstrates it (you need epubcheck:

Clone the Repo

library(git2r)
library(bookdown)
local_path = "bookdown-demo"
git2r::clone("https://github.com/rstudio/bookdown-demo.git",
             local_path = local_path)
#> cloning into 'bookdown-demo'...
#> Receiving objects:   1% (6/530),    9 kb
#> Receiving objects:  11% (59/530),   17 kb
#> Receiving objects:  21% (112/530),  121 kb
#> Receiving objects:  31% (165/530),  321 kb
#> Receiving objects:  41% (218/530),  409 kb
#> Receiving objects:  51% (271/530),  473 kb
#> Receiving objects:  61% (324/530),  545 kb
#> Receiving objects:  71% (377/530),  577 kb
#> Receiving objects:  81% (430/530),  585 kb
#> Receiving objects:  91% (483/530),  593 kb
#> Receiving objects: 100% (530/530),  723 kb, done.
#> Local:    master /private/var/folders/1s/wrtqcpxn685_zk570bnx9_rr0000gr/T/Rtmpe46Igu/reprex95b433c7d4f8/bookdown-demo
#> Remote:   master @ origin (https://github.com/rstudio/bookdown-demo.git)
#> Head:     [4e34630] 2018-10-22: Add now.json and Dockerfile for building HTML book and deploy to now.sh (#36)
setwd(local_path)
epub_file = bookdown::render_book(
  "index.Rmd",
  bookdown::epub_book())
#> processing file: bookdown-demo.Rmd
#> output file: bookdown-demo.knit.md
#> /usr/local/bin/pandoc +RTS -K512m -RTS bookdown-demo.utf8.md --to epub3 --from markdown+autolink_bare_uris+ascii_identifiers+tex_math_single_backslash --output bookdown-demo.epub --number-sections --filter /usr/local/bin/pandoc-citeproc
#> 
#> Output created: _book/bookdown-demo.epub
epub_file = normalizePath(epub_file)

This is a function to fix one simple id, which is hard coded.

fix_one_id = function(epub_file) {
  epub_dir = tempfile()
  dir.create(epub_dir, recursive = TRUE)
  epub_files = unzip(epub_file, exdir = epub_dir, 
                     junkpaths = TRUE, list = TRUE)
  epub_files = epub_files$Name
  res = unzip(epub_file, exdir = epub_dir)
  
  all_xhtml = list.files(
    pattern = ".xhtml", 
    path = file.path(epub_dir, "EPUB", "text"),
    recursive = FALSE, full.names = TRUE)
  
  ifile = all_xhtml[2]
  # for (ifile in all_xhtml) {
  x = readLines(ifile)
  x[grep("file0", x)-1] = paste0(
    '<div class="figure" style="text-align: center" ', 
    'id="fig:nice-fig">')
  writeLines(x, ifile)
  # }
  owd = getwd()
  on.exit({
    setwd(owd)
  })
  setwd(epub_dir)
  new_epub = tempfile(fileext = ".epub")
  zip(new_epub, files = epub_files)
  # file.copy(new_epub, epub_file, overwrite = TRUE)
  return(new_epub)
}

Simple epub checker function

The epubcheck R function will get the output from epubcheck.

epubcheck = function(epub_file) {
  res = system2("epubcheck", epub_file, stdout = TRUE, stderr = TRUE)
  res
}

Then num_errors will count the number of errors

num_errors = function(out) {
  out = grep("Messages", out, value = TRUE)
  out = sub(".* (.*) errors.*", "\\1", out)
  as.numeric(out)
}

Test output

Here we see we get 5 errors from the result

result = epubcheck(epub_file)
#> Warning in system2("epubcheck", epub_file, stdout = TRUE, stderr
#> = TRUE): running command ''epubcheck' /private/var/folders/1s/
#> wrtqcpxn685_zk570bnx9_rr0000gr/T/Rtmpe46Igu/reprex95b433c7d4f8/bookdown-
#> demo/_book/bookdown-demo.epub 2>&1' had status 1
result
#>  [1] "Validating using EPUB version 3.2 rules."                                                                                                                                    
#>  [2] "ERROR(RSC-005): ./bookdown-demo/_book/bookdown-demo.epub/EPUB/nav.xhtml(19,9): Error while parsing file: element \"ol\" incomplete; missing required element \"li\""         
#>  [3] "ERROR(RSC-005): ./bookdown-demo/_book/bookdown-demo.epub/EPUB/text/ch002.xhtml(87,74): Error while parsing file: value of attribute \"width\" is invalid; must be an integer"
#>  [4] "ERROR(RSC-012): ./bookdown-demo/_book/bookdown-demo.epub/EPUB/text/ch002.xhtml(82,247): Fragment identifier is not defined."                                                 
#>  [5] "ERROR(RSC-012): ./bookdown-demo/_book/bookdown-demo.epub/EPUB/text/ch002.xhtml(92,123): Fragment identifier is not defined."                                                 
#>  [6] "ERROR(RSC-012): ./bookdown-demo/_book/bookdown-demo.epub/EPUB/text/ch002.xhtml(92,252): Fragment identifier is not defined."                                                 
#>  [7] ""                                                                                                                                                                            
#>  [8] "Check finished with errors"                                                                                                                                                  
#>  [9] "Messages: 0 fatals / 5 errors / 0 warnings / 0 infos"                                                                                                                        
#> [10] ""                                                                                                                                                                            
#> [11] "EPUBCheck completed"                                                                                                                                                         
#> attr(,"status")
#> [1] 1
num_errors(result)
#> [1] 5

Here we see we get only 4 errors (one fixed) after adding an id.

fixed = fix_one_id(epub_file)
new_result = epubcheck(fixed)
#> Warning in system2("epubcheck", epub_file, stdout = TRUE,
#> stderr = TRUE): running command ''epubcheck' /var/folders/1s/
#> wrtqcpxn685_zk570bnx9_rr0000gr/T//RtmpRWHDVK/file993116f24b23.epub 2>&1'
#> had status 1
new_result
#>  [1] "Validating using EPUB version 3.2 rules."                                                                                                                                                                              
#>  [2] "ERROR(RSC-005): /var/folders/1s/wrtqcpxn685_zk570bnx9_rr0000gr/T//RtmpRWHDVK/file993116f24b23.epub/EPUB/nav.xhtml(19,9): Error while parsing file: element \"ol\" incomplete; missing required element \"li\""         
#>  [3] "ERROR(RSC-005): /var/folders/1s/wrtqcpxn685_zk570bnx9_rr0000gr/T//RtmpRWHDVK/file993116f24b23.epub/EPUB/text/ch002.xhtml(87,74): Error while parsing file: value of attribute \"width\" is invalid; must be an integer"
#>  [4] "ERROR(RSC-012): /var/folders/1s/wrtqcpxn685_zk570bnx9_rr0000gr/T//RtmpRWHDVK/file993116f24b23.epub/EPUB/text/ch002.xhtml(82,247): Fragment identifier is not defined."                                                 
#>  [5] "ERROR(RSC-012): /var/folders/1s/wrtqcpxn685_zk570bnx9_rr0000gr/T//RtmpRWHDVK/file993116f24b23.epub/EPUB/text/ch002.xhtml(92,252): Fragment identifier is not defined."                                                 
#>  [6] ""                                                                                                                                                                                                                      
#>  [7] "Check finished with errors"                                                                                                                                                                                            
#>  [8] "Messages: 0 fatals / 4 errors / 0 warnings / 0 infos"                                                                                                                                                                  
#>  [9] ""                                                                                                                                                                                                                      
#> [10] "EPUBCheck completed"                                                                                                                                                                                                   
#> attr(,"status")
#> [1] 1
num_errors(new_result)
#> [1] 4

Created on 2019-08-28 by the reprex package (v0.3.0)

Session info
devtools::session_info()
#> ─ Session info ──────────────────────────────────────────────────────────
#>  setting  value                       
#>  version  R version 3.6.0 (2019-04-26)
#>  os       macOS Mojave 10.14.6        
#>  system   x86_64, darwin15.6.0        
#>  ui       X11                         
#>  language (EN)                        
#>  collate  en_US.UTF-8                 
#>  ctype    en_US.UTF-8                 
#>  tz       America/New_York            
#>  date     2019-08-28                  
#> 
#> ─ Packages ──────────────────────────────────────────────────────────────
#>  package     * version     date       lib
#>  assertthat    0.2.1       2019-03-21 [1]
#>  backports     1.1.4       2019-04-10 [1]
#>  bookdown    * 0.11        2019-05-28 [1]
#>  callr         3.3.1       2019-07-18 [1]
#>  cli           1.1.0       2019-03-19 [1]
#>  crayon        1.3.4       2017-09-16 [1]
#>  curl          4.0         2019-07-22 [1]
#>  desc          1.2.0       2019-07-10 [1]
#>  devtools      2.1.0       2019-07-06 [1]
#>  digest        0.6.20      2019-07-04 [1]
#>  evaluate      0.14        2019-05-28 [1]
#>  fs            1.3.1       2019-05-06 [1]
#>  git2r       * 0.26.1      2019-06-29 [1]
#>  glue          1.3.1       2019-03-12 [1]
#>  highr         0.8         2019-03-20 [1]
#>  htmltools     0.3.6       2017-04-28 [1]
#>  httr          1.4.1       2019-08-05 [1]
#>  knitr         1.24        2019-08-08 [1]
#>  magrittr      1.5         2014-11-22 [1]
#>  memoise       1.1.0       2017-04-21 [1]
#>  mime          0.7         2019-06-11 [1]
#>  pkgbuild      1.0.3       2019-03-20 [1]
#>  pkgload       1.0.2       2018-10-29 [1]
#>  prettyunits   1.0.2       2015-07-13 [1]
#>  processx      3.4.1       2019-07-18 [1]
#>  ps            1.3.0       2018-12-21 [1]
#>  R6            2.4.0       2019-02-14 [1]
#>  Rcpp          1.0.2       2019-07-25 [1]
#>  remotes       2.1.0       2019-06-24 [1]
#>  rlang         0.4.0       2019-06-25 [1]
#>  rmarkdown     1.14        2019-07-12 [1]
#>  rprojroot     1.3-2       2018-01-03 [1]
#>  rstudioapi    0.10.0-9000 2019-07-30 [1]
#>  sessioninfo   1.1.1       2018-11-05 [1]
#>  stringi       1.4.3       2019-03-12 [1]
#>  stringr       1.4.0       2019-02-10 [1]
#>  testthat      2.1.1       2019-04-23 [1]
#>  usethis       1.5.1.9000  2019-08-15 [1]
#>  withr         2.1.2       2018-03-15 [1]
#>  xfun          0.8         2019-06-25 [1]
#>  xml2          1.2.1       2019-07-29 [1]
#>  yaml          2.2.0       2018-07-25 [1]
#>  source                             
#>  CRAN (R 3.6.0)                     
#>  CRAN (R 3.6.0)                     
#>  CRAN (R 3.6.0)                     
#>  CRAN (R 3.6.0)                     
#>  CRAN (R 3.6.0)                     
#>  CRAN (R 3.6.0)                     
#>  CRAN (R 3.6.0)                     
#>  Github (muschellij2/desc@b0c374f)  
#>  CRAN (R 3.6.0)                     
#>  CRAN (R 3.6.0)                     
#>  CRAN (R 3.6.0)                     
#>  CRAN (R 3.6.0)                     
#>  CRAN (R 3.6.0)                     
#>  CRAN (R 3.6.0)                     
#>  CRAN (R 3.6.0)                     
#>  CRAN (R 3.6.0)                     
#>  CRAN (R 3.6.0)                     
#>  CRAN (R 3.6.0)                     
#>  CRAN (R 3.6.0)                     
#>  CRAN (R 3.6.0)                     
#>  CRAN (R 3.6.0)                     
#>  CRAN (R 3.6.0)                     
#>  CRAN (R 3.6.0)                     
#>  CRAN (R 3.6.0)                     
#>  CRAN (R 3.6.0)                     
#>  CRAN (R 3.6.0)                     
#>  CRAN (R 3.6.0)                     
#>  CRAN (R 3.6.0)                     
#>  CRAN (R 3.6.0)                     
#>  CRAN (R 3.6.0)                     
#>  CRAN (R 3.6.0)                     
#>  CRAN (R 3.6.0)                     
#>  Github (rstudio/rstudioapi@31d1afa)
#>  CRAN (R 3.6.0)                     
#>  CRAN (R 3.6.0)                     
#>  CRAN (R 3.6.0)                     
#>  CRAN (R 3.6.0)                     
#>  local                              
#>  CRAN (R 3.6.0)                     
#>  CRAN (R 3.6.0)                     
#>  CRAN (R 3.6.0)                     
#>  CRAN (R 3.6.0)                     
#> 
#> [1] /Library/Frameworks/R.framework/Versions/3.6/Resources/library

I have received notification about issue 766, which is now locked.

So I'll write my test result here. OS is Debian 12, R is 4.2, pandoc is 2.17.1.1, R packages are the following:

> xfun::session_info('bookdown')
R version 4.2.2 Patched (2022-11-10 r83330)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 12 (bookworm)

Locale:
  LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
  LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
  LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
  LC_PAPER=en_US.UTF-8       LC_NAME=C                 
  LC_ADDRESS=C               LC_TELEPHONE=C            
  LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

Package version:
  base64enc_0.1.3   bookdown_0.36     bslib_0.5.1       cachem_1.0.8     
  cli_3.6.1         digest_0.6.33     ellipsis_0.3.2    evaluate_0.23    
  fastmap_1.1.1     fontawesome_0.5.2 fs_1.6.3          glue_1.6.2       
  graphics_4.2.2    grDevices_4.2.2   highr_0.10        htmltools_0.5.7  
  jquerylib_0.1.4   jsonlite_1.8.7    knitr_1.45        lifecycle_1.0.3  
  magrittr_2.0.3    memoise_2.0.1     methods_4.2.2     mime_0.12        
  R6_2.5.1          rappdirs_0.3.3    rlang_1.1.1       rmarkdown_2.25   
  sass_0.4.7        stats_4.2.2       stringi_1.7.12    stringr_1.5.0    
  tinytex_0.48      tools_4.2.2       utils_4.2.2       vctrs_0.6.4      
  xfun_0.41         yaml_2.3.7       
> 

The result of epubcheck 3.2 is the following:

$ epubcheck _book/*.epub
Validating using EPUB version 3.2 rules.
ERROR(RSC-005): _book/bookdown-demo.epub/EPUB/text/ch002.xhtml(152,74): Error while parsing file: value of attribute "width" is invalid; must be an integer
ERROR(RSC-012): _book/bookdown-demo.epub/EPUB/text/ch002.xhtml(147,247): Fragment identifier is not defined.

Check finished with errors
Messages: 0 fatals / 2 errors / 0 warnings / 0 infos

EPUBCheck completed

So 2 issues with EPUB still exist.