Superfluous braces when citing an entry
MLopez-Ibanez opened this issue · 7 comments
As shown here: https://mlopez-ibanez.github.io/eaf/reference/whv_rect.html
Citing this entry:
@article{DiaLop2020ejor,
author = { Juan Esteban Diaz and Manuel L{\'o}pez-Ib{\'a}{\~n}ez },
title = {Incorporating Decision-Maker's Preferences into the Automatic
Configuration of Bi-Objective Optimisation Algorithms},
journal = {European Journal of Operational Research},
year = 2021,
volume = 289,
number = 3,
pages = {1209--1222},
doi = {10.1016/j.ejor.2020.07.059},
}
with \insertCite{DiaLop2022ejor;textual}{eaf}
renders as Diaz and López-Ibá{ñ}ez (2021)
but the bibliography entry does not show those superfluous braces.
I have tried various ways to encode the name and none works.
Thanks for the report. Initially I thought that this is some straightforward omission of handling tilde diacritics but couldn't find any in Rdpack and rbibutils. After some digging in the R sources I narrowed the problem down to tools:::cleanupLatex
. I may be able to sidestep this but I am giving technical details below, for easy reference for a report I will put on R-devel.
Inside tools:::cleanupLatex
, the difference between the handling of the diacritics in your name appeared eventually after a call to tools::deparseLatex
(via toRd
). There is nothing special in depaseLatex about \~
but after looking at the source code of deparseLatex
and the object it was processing (obtained from parseLatex
), I realised that the code indeed would put the second one in braces. Indeed, exchanging the order of the consecutive accented letters in your name (sorry for playing with it)) leaves the second one parenthesised:
> e1 <- "Manuel L{\\'o}pez-Ib{\\'a}{\\~n}ez"
> e2 <- "Manuel L{\\'o}pez-Ib{\\~n}{\\'a}ez"
> tools:::cleanupLatex(e1)
## [1] "Manuel López-Ibá{ñ}ez"
> tools:::cleanupLatex(e2)
[1] "Manuel López-Ibñ{á}ez"
Here is the source of deparseLatex
. If dropBraces
is TRUE it strips the braces but only if the preceding tag is "TEXT"
:
deparseLatex <- function(x, dropBraces = FALSE)
{
result <- character()
lastTag <- "TEXT"
for (i in seq_along(x)) {
a <- x[[i]]
tag <- attr(a, "latex_tag")
if (is.null(tag)) tag <- "NULL"
switch(tag,
VERB = ,
TEXT = ,
MACRO = ,
COMMENT = result <- c(result, a),
BLOCK = result <- c(result, if (dropBraces && lastTag == "TEXT") deparseLatex(a) else c("{", deparseLatex(a), "}")),
ENVIRONMENT = result <- c(result,
"\\begin{", a[[1L]], "}",
deparseLatex(a[[2L]]),
"\\end{", a[[1L]], "}"),
MATH = result <- c(result, "$", deparseLatex(a), "$"),
NULL = stop("Internal error, no tag", domain = NA)
)
lastTag <- tag
}
paste(result, collapse="")
}
I saved your example to file"issueRdpack25.bib"
and read it in as in Rdpack (but the effect is the same as in the examples above). a1, a2, a3a
emulate the steps taken by cleanupLatex
to check where the difference between the handling of accents appears:
tmp <- readBib("issueRdpack25.bib", encoding = "utf8", direct=TRUE, extra=TRUE, texChars = "Rdpack")
a1 <- tools:::parseLatex(tmp$author)
> a1
## Juan Esteban Diaz
## Manuel L{\'o}pez-Ib{\'a}{\~n}ez
> a2 <- tools:::latexToUtf8(a1)
> a2
## Juan Esteban Diaz
## Manuel L{ó}pez-Ib{á}{ñ}ez
> a3a <- tools:::deparseLatex(a2, TRUE)
> a3a
## [1] "Juan Esteban Diaz\nManuel López-Ibá{ñ}ez"
Notice below that the first accented character is preceded by a "TEXT"
element, as is the first of the two consecutive ones.But the accented characters themselves are in "BLOCK"
components. Hence, the second is put in braces by deparseLatex
.
> unclass(a2)
[[1]]
[1] "Juan Esteban Diaz\nManuel L"
attr(,"latex_tag")
[1] "TEXT"
[[2]]
[[2]][[1]]
[1] "ó"
attr(,"latex_tag")
[1] "TEXT"
attr(,"latex_tag")
[1] "BLOCK"
[[3]]
[1] "pez-Ib"
attr(,"latex_tag")
[1] "TEXT"
[[4]]
[[4]][[1]]
[1] "á"
attr(,"latex_tag")
[1] "TEXT"
attr(,"latex_tag")
[1] "BLOCK"
[[5]]
[[5]][[1]]
[1] "ñ"
attr(,"latex_tag")
[1] "TEXT"
attr(,"latex_tag")
[1] "BLOCK"
[[6]]
[1] "ez"
attr(,"latex_tag")
[1] "TEXT"
I don't know if why deparseLatex
only drops the braces when the previous block is TEXT
but maybe parseLatex
uses blocks for unrelated purposes. Difficult to tell since parseLatex
is a formal parser with opaque code.
I posted a report about what I thing is a bug in R to R-devel, see https://stat.ethz.ch/pipermail/r-devel/2022-April/081604.html.
If nothing happens I will look at how to circumvent this.
I posted a report about what I thing is a bug in R to R-devel, see https://stat.ethz.ch/pipermail/r-devel/2022-April/081604.html.
If nothing happens I will look at how to circumvent this.
Many thanks. I have circumvented it by using an explicit UTF8 "ñ". I tried to avoid this to be able to share the same bibtex files between R and other tools, some of them don't work well with utf8.
Many thanks. I have circumvented it by using an explicit UTF8 "ñ". I tried to avoid this to be able to share the same bibtex files between R and other tools, some of them don't work well with utf8.
I am not converting the accented characters to UTF8 when reading in the file with rbibutils::readBib
, which offers this option, mainly because on Windows some characters get mangled if they are not available in the current Windows code page (another reason is that some UTF8 characters are problematic for Latex). The mangling on Windows should go away though (some time) after the release of R 4.2 which has native UTF8 locale.
Yes, I didn't wish to imply that my "fix" is the right fix, only that I'm happy to wait for a proper solution in R because I found a workaround that works for my particular goal (having a nice webpage online).
(Many thanks for all your help. Next time I'm in Manchester, I will buy you a beer!)
I will be waiting for your call!
Fixed in Rdpack v2.3.1.