EPUB output do not have figure and table identifiers while compiling bookdown-demo book
N0rbert opened this issue · 3 comments
Steps to reproduce
-
Have Ubuntu 18.04.2 LTS installed
-
Install latest
pandoc-2.7.2-1-amd64.deb
withcd ~/Downloads wget https://github.com/jgm/pandoc/releases/download/2.7.2/pandoc-2.7.2-1-amd64.deb sudo apt install ./pandoc-2.7.2-1-amd64.deb
-
Install R with
sudo apt-get install r-base-dev
-
Launch R console to install
bookdown
package:$ R install.packages('bookdown')
-
Clone
bookdown-demo
repositorysudo apt-get install git git clone https://github.com/rstudio/bookdown-demo.git cd bookdown-demo
-
Compile the demo book to EPUB format
Rscript -e "bookdown::render_book('index.Rmd', 'bookdown::epub_book')"
-
Install
epubcheck
withsudo apt-get install epubcheck default-jre
and launch it against the compiled epub document:epubcheck _book/bookdown-demo.epub
Expected result - the produced EPUB document is correct and do not have errors
Actual result - the produced EPUB document has errors:
Validating using EPUB version 3.0.1 rules.
WARNING(ACC-009): _book/bookdown-demo.epub/EPUB/text/ch001.xhtml(82,446): MathML should either have an 'alttext' attribute or 'annotation-xml' child element.
ERROR(RSC-005): _book/bookdown-demo.epub/EPUB/text/ch002.xhtml(87,74): Error while parsing file 'value of attribute "width" is invalid; must be an integer'.
ERROR(RSC-012): _book/bookdown-demo.epub/EPUB/text/ch002.xhtml(82,247): Fragment identifier is not defined.
ERROR(RSC-012): _book/bookdown-demo.epub/EPUB/text/ch002.xhtml(92,123): Fragment identifier is not defined.
ERROR(RSC-012): _book/bookdown-demo.epub/EPUB/text/ch002.xhtml(92,252): Fragment identifier is not defined.
Check finished with errors
epubcheck completed
and user can't click on table and image references.
The problematic lines may be determined by unzipping epub and showing its contents:
cd _book
unzip bookdown-demo.epub
pluma EPUB/text/ch002.xhtml
$ awk 'NR==82' EPUB/text/ch002.xhtml
<p>You can label chapter and section titles using <code>{#label}</code> after them, e.g., we can reference Chapter <a href="#intro">2</a>. If you do not manually label them, there will be automatic labels anyway, e.g., Chapter <a href="#methods">4</a>.</p>
$ awk 'NR==92' EPUB/text/ch002.xhtml
<p>Reference a figure by its code chunk label with the <code>fig:</code> prefix, e.g., see Figure <a href="#fig:nice-fig">2.1</a>. Similarly, you can reference tables generated from <code>knitr::kable()</code>, e.g., see Table <a href="#tab:nice-tab">2.1</a>.</p>
Notes:
- Issue persists with Pandoc 1.19 from the repository.
- I'm not sure about correct component to report bug. Should it be
bookdown
,rmarkdown
,knitr
orpandoc
?
I'm getting the same issues with our book on Leanpub, but it uploads OK. I think it's bookdown, but I don't know what's going on with the ePub rendering.
Seems as though the id
need be defined Seems as though this already works for references, but not for figures/tables.
When figures are generated you get something like <div class="figure" style="text-align: center">
, but it should be <div class="figure" style="text-align: center" id="fig:myfig">
Here's a reprex
that demonstrates it (you need epubcheck
:
Clone the Repo
library(git2r)
library(bookdown)
local_path = "bookdown-demo"
git2r::clone("https://github.com/rstudio/bookdown-demo.git",
local_path = local_path)
#> cloning into 'bookdown-demo'...
#> Receiving objects: 1% (6/530), 9 kb
#> Receiving objects: 11% (59/530), 17 kb
#> Receiving objects: 21% (112/530), 121 kb
#> Receiving objects: 31% (165/530), 321 kb
#> Receiving objects: 41% (218/530), 409 kb
#> Receiving objects: 51% (271/530), 473 kb
#> Receiving objects: 61% (324/530), 545 kb
#> Receiving objects: 71% (377/530), 577 kb
#> Receiving objects: 81% (430/530), 585 kb
#> Receiving objects: 91% (483/530), 593 kb
#> Receiving objects: 100% (530/530), 723 kb, done.
#> Local: master /private/var/folders/1s/wrtqcpxn685_zk570bnx9_rr0000gr/T/Rtmpe46Igu/reprex95b433c7d4f8/bookdown-demo
#> Remote: master @ origin (https://github.com/rstudio/bookdown-demo.git)
#> Head: [4e34630] 2018-10-22: Add now.json and Dockerfile for building HTML book and deploy to now.sh (#36)
setwd(local_path)
epub_file = bookdown::render_book(
"index.Rmd",
bookdown::epub_book())
#> processing file: bookdown-demo.Rmd
#> output file: bookdown-demo.knit.md
#> /usr/local/bin/pandoc +RTS -K512m -RTS bookdown-demo.utf8.md --to epub3 --from markdown+autolink_bare_uris+ascii_identifiers+tex_math_single_backslash --output bookdown-demo.epub --number-sections --filter /usr/local/bin/pandoc-citeproc
#>
#> Output created: _book/bookdown-demo.epub
epub_file = normalizePath(epub_file)
This is a function to fix one simple id
, which is hard coded.
fix_one_id = function(epub_file) {
epub_dir = tempfile()
dir.create(epub_dir, recursive = TRUE)
epub_files = unzip(epub_file, exdir = epub_dir,
junkpaths = TRUE, list = TRUE)
epub_files = epub_files$Name
res = unzip(epub_file, exdir = epub_dir)
all_xhtml = list.files(
pattern = ".xhtml",
path = file.path(epub_dir, "EPUB", "text"),
recursive = FALSE, full.names = TRUE)
ifile = all_xhtml[2]
# for (ifile in all_xhtml) {
x = readLines(ifile)
x[grep("file0", x)-1] = paste0(
'<div class="figure" style="text-align: center" ',
'id="fig:nice-fig">')
writeLines(x, ifile)
# }
owd = getwd()
on.exit({
setwd(owd)
})
setwd(epub_dir)
new_epub = tempfile(fileext = ".epub")
zip(new_epub, files = epub_files)
# file.copy(new_epub, epub_file, overwrite = TRUE)
return(new_epub)
}
Simple epub checker function
The epubcheck
R function will get the output from epubcheck
.
epubcheck = function(epub_file) {
res = system2("epubcheck", epub_file, stdout = TRUE, stderr = TRUE)
res
}
Then num_errors
will count the number of errors
num_errors = function(out) {
out = grep("Messages", out, value = TRUE)
out = sub(".* (.*) errors.*", "\\1", out)
as.numeric(out)
}
Test output
Here we see we get 5 errors from the result
result = epubcheck(epub_file)
#> Warning in system2("epubcheck", epub_file, stdout = TRUE, stderr
#> = TRUE): running command ''epubcheck' /private/var/folders/1s/
#> wrtqcpxn685_zk570bnx9_rr0000gr/T/Rtmpe46Igu/reprex95b433c7d4f8/bookdown-
#> demo/_book/bookdown-demo.epub 2>&1' had status 1
result
#> [1] "Validating using EPUB version 3.2 rules."
#> [2] "ERROR(RSC-005): ./bookdown-demo/_book/bookdown-demo.epub/EPUB/nav.xhtml(19,9): Error while parsing file: element \"ol\" incomplete; missing required element \"li\""
#> [3] "ERROR(RSC-005): ./bookdown-demo/_book/bookdown-demo.epub/EPUB/text/ch002.xhtml(87,74): Error while parsing file: value of attribute \"width\" is invalid; must be an integer"
#> [4] "ERROR(RSC-012): ./bookdown-demo/_book/bookdown-demo.epub/EPUB/text/ch002.xhtml(82,247): Fragment identifier is not defined."
#> [5] "ERROR(RSC-012): ./bookdown-demo/_book/bookdown-demo.epub/EPUB/text/ch002.xhtml(92,123): Fragment identifier is not defined."
#> [6] "ERROR(RSC-012): ./bookdown-demo/_book/bookdown-demo.epub/EPUB/text/ch002.xhtml(92,252): Fragment identifier is not defined."
#> [7] ""
#> [8] "Check finished with errors"
#> [9] "Messages: 0 fatals / 5 errors / 0 warnings / 0 infos"
#> [10] ""
#> [11] "EPUBCheck completed"
#> attr(,"status")
#> [1] 1
num_errors(result)
#> [1] 5
Here we see we get only 4 errors (one fixed) after adding an id
.
fixed = fix_one_id(epub_file)
new_result = epubcheck(fixed)
#> Warning in system2("epubcheck", epub_file, stdout = TRUE,
#> stderr = TRUE): running command ''epubcheck' /var/folders/1s/
#> wrtqcpxn685_zk570bnx9_rr0000gr/T//RtmpRWHDVK/file993116f24b23.epub 2>&1'
#> had status 1
new_result
#> [1] "Validating using EPUB version 3.2 rules."
#> [2] "ERROR(RSC-005): /var/folders/1s/wrtqcpxn685_zk570bnx9_rr0000gr/T//RtmpRWHDVK/file993116f24b23.epub/EPUB/nav.xhtml(19,9): Error while parsing file: element \"ol\" incomplete; missing required element \"li\""
#> [3] "ERROR(RSC-005): /var/folders/1s/wrtqcpxn685_zk570bnx9_rr0000gr/T//RtmpRWHDVK/file993116f24b23.epub/EPUB/text/ch002.xhtml(87,74): Error while parsing file: value of attribute \"width\" is invalid; must be an integer"
#> [4] "ERROR(RSC-012): /var/folders/1s/wrtqcpxn685_zk570bnx9_rr0000gr/T//RtmpRWHDVK/file993116f24b23.epub/EPUB/text/ch002.xhtml(82,247): Fragment identifier is not defined."
#> [5] "ERROR(RSC-012): /var/folders/1s/wrtqcpxn685_zk570bnx9_rr0000gr/T//RtmpRWHDVK/file993116f24b23.epub/EPUB/text/ch002.xhtml(92,252): Fragment identifier is not defined."
#> [6] ""
#> [7] "Check finished with errors"
#> [8] "Messages: 0 fatals / 4 errors / 0 warnings / 0 infos"
#> [9] ""
#> [10] "EPUBCheck completed"
#> attr(,"status")
#> [1] 1
num_errors(new_result)
#> [1] 4
Created on 2019-08-28 by the reprex package (v0.3.0)
Session info
devtools::session_info()
#> ─ Session info ──────────────────────────────────────────────────────────
#> setting value
#> version R version 3.6.0 (2019-04-26)
#> os macOS Mojave 10.14.6
#> system x86_64, darwin15.6.0
#> ui X11
#> language (EN)
#> collate en_US.UTF-8
#> ctype en_US.UTF-8
#> tz America/New_York
#> date 2019-08-28
#>
#> ─ Packages ──────────────────────────────────────────────────────────────
#> package * version date lib
#> assertthat 0.2.1 2019-03-21 [1]
#> backports 1.1.4 2019-04-10 [1]
#> bookdown * 0.11 2019-05-28 [1]
#> callr 3.3.1 2019-07-18 [1]
#> cli 1.1.0 2019-03-19 [1]
#> crayon 1.3.4 2017-09-16 [1]
#> curl 4.0 2019-07-22 [1]
#> desc 1.2.0 2019-07-10 [1]
#> devtools 2.1.0 2019-07-06 [1]
#> digest 0.6.20 2019-07-04 [1]
#> evaluate 0.14 2019-05-28 [1]
#> fs 1.3.1 2019-05-06 [1]
#> git2r * 0.26.1 2019-06-29 [1]
#> glue 1.3.1 2019-03-12 [1]
#> highr 0.8 2019-03-20 [1]
#> htmltools 0.3.6 2017-04-28 [1]
#> httr 1.4.1 2019-08-05 [1]
#> knitr 1.24 2019-08-08 [1]
#> magrittr 1.5 2014-11-22 [1]
#> memoise 1.1.0 2017-04-21 [1]
#> mime 0.7 2019-06-11 [1]
#> pkgbuild 1.0.3 2019-03-20 [1]
#> pkgload 1.0.2 2018-10-29 [1]
#> prettyunits 1.0.2 2015-07-13 [1]
#> processx 3.4.1 2019-07-18 [1]
#> ps 1.3.0 2018-12-21 [1]
#> R6 2.4.0 2019-02-14 [1]
#> Rcpp 1.0.2 2019-07-25 [1]
#> remotes 2.1.0 2019-06-24 [1]
#> rlang 0.4.0 2019-06-25 [1]
#> rmarkdown 1.14 2019-07-12 [1]
#> rprojroot 1.3-2 2018-01-03 [1]
#> rstudioapi 0.10.0-9000 2019-07-30 [1]
#> sessioninfo 1.1.1 2018-11-05 [1]
#> stringi 1.4.3 2019-03-12 [1]
#> stringr 1.4.0 2019-02-10 [1]
#> testthat 2.1.1 2019-04-23 [1]
#> usethis 1.5.1.9000 2019-08-15 [1]
#> withr 2.1.2 2018-03-15 [1]
#> xfun 0.8 2019-06-25 [1]
#> xml2 1.2.1 2019-07-29 [1]
#> yaml 2.2.0 2018-07-25 [1]
#> source
#> CRAN (R 3.6.0)
#> CRAN (R 3.6.0)
#> CRAN (R 3.6.0)
#> CRAN (R 3.6.0)
#> CRAN (R 3.6.0)
#> CRAN (R 3.6.0)
#> CRAN (R 3.6.0)
#> Github (muschellij2/desc@b0c374f)
#> CRAN (R 3.6.0)
#> CRAN (R 3.6.0)
#> CRAN (R 3.6.0)
#> CRAN (R 3.6.0)
#> CRAN (R 3.6.0)
#> CRAN (R 3.6.0)
#> CRAN (R 3.6.0)
#> CRAN (R 3.6.0)
#> CRAN (R 3.6.0)
#> CRAN (R 3.6.0)
#> CRAN (R 3.6.0)
#> CRAN (R 3.6.0)
#> CRAN (R 3.6.0)
#> CRAN (R 3.6.0)
#> CRAN (R 3.6.0)
#> CRAN (R 3.6.0)
#> CRAN (R 3.6.0)
#> CRAN (R 3.6.0)
#> CRAN (R 3.6.0)
#> CRAN (R 3.6.0)
#> CRAN (R 3.6.0)
#> CRAN (R 3.6.0)
#> CRAN (R 3.6.0)
#> CRAN (R 3.6.0)
#> Github (rstudio/rstudioapi@31d1afa)
#> CRAN (R 3.6.0)
#> CRAN (R 3.6.0)
#> CRAN (R 3.6.0)
#> CRAN (R 3.6.0)
#> local
#> CRAN (R 3.6.0)
#> CRAN (R 3.6.0)
#> CRAN (R 3.6.0)
#> CRAN (R 3.6.0)
#>
#> [1] /Library/Frameworks/R.framework/Versions/3.6/Resources/library
I have received notification about issue 766, which is now locked.
So I'll write my test result here. OS is Debian 12, R is 4.2, pandoc is 2.17.1.1, R packages are the following:
> xfun::session_info('bookdown')
R version 4.2.2 Patched (2022-11-10 r83330)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 12 (bookworm)
Locale:
LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
LC_PAPER=en_US.UTF-8 LC_NAME=C
LC_ADDRESS=C LC_TELEPHONE=C
LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
Package version:
base64enc_0.1.3 bookdown_0.36 bslib_0.5.1 cachem_1.0.8
cli_3.6.1 digest_0.6.33 ellipsis_0.3.2 evaluate_0.23
fastmap_1.1.1 fontawesome_0.5.2 fs_1.6.3 glue_1.6.2
graphics_4.2.2 grDevices_4.2.2 highr_0.10 htmltools_0.5.7
jquerylib_0.1.4 jsonlite_1.8.7 knitr_1.45 lifecycle_1.0.3
magrittr_2.0.3 memoise_2.0.1 methods_4.2.2 mime_0.12
R6_2.5.1 rappdirs_0.3.3 rlang_1.1.1 rmarkdown_2.25
sass_0.4.7 stats_4.2.2 stringi_1.7.12 stringr_1.5.0
tinytex_0.48 tools_4.2.2 utils_4.2.2 vctrs_0.6.4
xfun_0.41 yaml_2.3.7
>
The result of epubcheck
3.2 is the following:
$ epubcheck _book/*.epub
Validating using EPUB version 3.2 rules.
ERROR(RSC-005): _book/bookdown-demo.epub/EPUB/text/ch002.xhtml(152,74): Error while parsing file: value of attribute "width" is invalid; must be an integer
ERROR(RSC-012): _book/bookdown-demo.epub/EPUB/text/ch002.xhtml(147,247): Fragment identifier is not defined.
Check finished with errors
Messages: 0 fatals / 2 errors / 0 warnings / 0 infos
EPUBCheck completed
So 2 issues with EPUB still exist.