rstudio/bookdown.org

Accommodate to bs4_book

yihui opened this issue · 9 comments

yihui commented

For books generated with the bs4_book format, the diffs would look like this:

git diff content/archive/external/adv-r-hadley-nz.md
diff --git a/content/archive/external/adv-r-hadley-nz.md b/content/archive/external/adv-r-hadley-nz.md
index f731283..749ebaf 100644
--- a/content/archive/external/adv-r-hadley-nz.md
+++ b/content/archive/external/adv-r-hadley-nz.md
@@ -1,13 +1,11 @@
 ---
-title: "Advanced R"
+title: "Welcome | Advanced R"
 author: "Hadley Wickham"
 date: ""
 tags: [Advanced R, R Programming]
 link: "https://adv-r.hadley.nz/"
-length_weight: "35%"
-cover: "https://adv-r.hadley.nz/cover.png"
-repo: "hadley/adv-r"
+length_weight: "0%"
 pinned: true
 ---
  1. We probably need to extract the book title after |:
    title = xml_text(title)
    but before doing that, we need a relatively reliable way to tell if the page is generated from bs4_book (i.e., not accidentally truncating titles generated from other formats such as gitbook).
  2. We need to fetch the cover image and Github repo elsewhere (bs4_book may need to write these information to the HTML file if it hasn't already done so).
  3. The book length estimate for gitbook was calculated from the size of search_index.json:
    book_length = function(url) {
    x = httr::headers(httr::HEAD(paste0(url, 'search_index.json')))$`content-length`
    if (length(x) == 0) 0 else as.numeric(x)
    }
    I'm not sure how this should be done for bs4_book (presumably it also has a search index, but I don't know if it's comparable to gitbook's index in terms of file size).
  4. bs4_book doesn't seem to write out the date information.
cderv commented
  1. I was curious why we have this | -> in fact, this appending of chapter name to title books is a bookdown feature in prepend_chapter_title() called in split_chapters(). Usually, the prepending does not happen on first page of the bookdown because book title is the same as first header. But this is not the case with bs4_book() and any other template that would not add a title part in there template. I can deal with it in bookdown.org but it feels that this preprend feature can maybe evolved to not preprend anything on index.html ?

  2. Metadata are missing for now in bs4_book(): rstudio/bookdown#1034

  3. The file for bs4_book() is called search.json. It returns:

httr::headers(httr::HEAD(paste0("https://adv-r.hadley.nz/", 'search.json')))$`content-length`
#> [1] "759493"
  1. No bs4_book() does not have the date in header. If date field is passed in YAML header, it is used to fill a text in the footer: See https://devguide.ropensci.org/

I think we could adapt the script here, but also suggest some addition in bs4_book().

yihui commented
  1. Okay, I see. It was because of https://github.com/rstudio/bookdown/blob/6ae8ead/R/html.R#L1112. Let's just do title = gsub('.*\\|\\s*', '', title) for now.
  2. I see.
  3. I see. It will be nice if bs4_book could use the same filename as gitbook (i.e., search_index.json), so we don't need to try two locations here.
  4. It doesn't matter where the date is inserted on the page, as long as it can be found.

Thanks!

cderv commented

After running the new script for test, it seems there is also an issue with the summary description.

diff --git a/content/archive/external/adv-r-hadley-nz.md b/content/archive/external/adv-r-hadley-nz.md
index f731283..adbcadc 100644
--- a/content/archive/external/adv-r-hadley-nz.md
+++ b/content/archive/external/adv-r-hadley-nz.md
@@ -4,10 +4,8 @@ author: "Hadley Wickham"
 date: ""
 tags: [Advanced R, R Programming]
 link: "https://adv-r.hadley.nz/"
-length_weight: "35%"
-cover: "https://adv-r.hadley.nz/cover.png"
-repo: "hadley/adv-r"
+length_weight: "32.8%"
 pinned: true
 ---

-The book is designed primarily for R users who want to improve their programming skills and understanding of the language. It should also be useful for programmers coming to R from other languages, as help you to understand why R works the way it does. [...] This is the website for the 2nd edition of “Advanced R”, a book in Chapman & Hall’s R Series. The book is designed primarily for R users who want to improve their programming skills and understanding of the language. It should also be useful for programmers coming to R from other languages, as it helps you to understand why R works the ...
+View book source This is the website for 2nd edition of “Advanced R”, a book in Chapman & Hall’s R Series. The book is designed primarily for R users who want to improve their programming skills and understanding of the language. It should also be useful for programmers coming to R from other languages, as help you to understand why R works the way it does. If you’re looking for the 1st edition, you can find it at http://adv-r.had.co.nz/. This work, as a whole, is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. The code contained in this
book ...

View book source is a button text that we should not retrieve.

cderv commented

It seems also that some JSON file does not offer content-length in the header.

str(httr::headers(httr::HEAD("https://advanced-r-solutions.rbind.io/search.json")))
#> List of 16
#>  $ date            : chr "Mon, 08 Mar 2021 16:48:21 GMT"
#>  $ content-type    : chr "application/json"
#>  $ connection      : chr "keep-alive"
#>  $ set-cookie      : chr "__cfduid=dbb4b62d94d5ea2fdac23cb9cba7f2fa61615222100; expires=Wed, 07-Apr-21 16:48:20 GMT; path=/; domain=.rbin"| __truncated__
#>  $ cache-control   : chr "public, max-age=0, must-revalidate"
#>  $ etag            : chr "W/\"cb7a3c92d216f158f69b287f9782556b-ssl\""
#>  $ age             : chr "2360"
#>  $ x-nf-request-id : chr "7c4cca55-d8a8-42bf-a4ea-2a326e13fe5b-65283787"
#>  $ cf-cache-status : chr "DYNAMIC"
#>  $ cf-request-id   : chr "08b4566b890000088332b1d000000001"
#>  $ expect-ct       : chr "max-age=604800, report-uri=\"https://report-uri.cloudflare.com/cdn-cgi/beacon/expect-ct\""
#>  $ report-to       : chr "{\"group\":\"cf-nel\",\"max_age\":604800,\"endpoints\":[{\"url\":\"https:\\/\\/a.nel.cloudflare.com\\/report?s="| __truncated__
#>  $ nel             : chr "{\"report_to\":\"cf-nel\",\"max_age\":604800}"
#>  $ server          : chr "cloudflare"
#>  $ cf-ray          : chr "62cd8cf27ca50883-CDG"
#>  $ content-encoding: chr "gzip"
#>  - attr(*, "class")= chr [1:2] "insensitive" "list"

Created on 2021-03-08 by the reprex package (v1.0.0.9002)

This will lead to O length. Do we want to prevent this ?

It would require reading the file probably and maybe we don't want that.

cderv commented

Also a another note on book length: It seems the length of bs4_book() json is different for gitbook() - as we normalize the length, I think it will reduce the length of all books.

That maybe because json is different. Don't know how we want this to be a useful and correct measure.

cderv commented

I have dealt for bs4_book() with

  • Book cover not in meta but in main page
  • Repo url which is in another place we can find
  • description creation if none is provided

This should cover most of the case we have. I'll share this in a PR soon

cderv commented

This is mainly taken into account by #64

yihui commented

It seems also that some JSON file does not offer content-length in the header.

Just ignore them.

Also a another note on book length: It seems the length of bs4_book() json is different for gitbook() - as we normalize the length, I think it will reduce the length of all books.

That maybe because json is different. Don't know how we want this to be a useful and correct measure.

The measure doesn't need to be accurate. A rough estimator is fine.

cderv commented

Just sharing here for reference why bookdown will prepend the header of the file in title.
I think this because sometimes, users forgot to set the title: field in index.Rmd and the filename will be used as title. This leads to title like this (source)

<title>Dissertating with RMarkdown and Bookdown | dissertating_rmd_presentation.utf8.md</title>

So, sometimes the second part of the title is not what we want...