plk/biber

Biber makes substitutions for \textgamma that cause errors

Opened this issue · 9 comments

[also asked at TeX StackExchange]

The following example produces an error:

\documentclass{article}
\usepackage{biblatex}
\usepackage{textgreek}
\begin{filecontents}{test.bib}
@article{Author2015,
    author  = "Imogene Mirabell Anne Author",
    title   = "Making {\textgamma}-Iron from Lead",
    journal = "Alchemist",
    volume  = 10,
    number  = 3,
    pages   = "121--134",
    year    = 2038
}
\end{filecontents}
\addbibresource{test.bib}

\begin{document}
\nocite{*}
\printbibliography
\end{document}

Now run LaTeX, then Biber, then LaTeX again and you'll get an error:

! LaTeX Error: Unicode character ɣ (U+0263)
               not set up for use with LaTeX.

In the .bbl file, the sequence \textgamma has been replaced with the actual character (gamma), which is not declared, hence the error.

Note that replacing \textgamma with \textalpha works fine.

Why does this occur? That is, why is Biber making this substitution? It seems like a bug in Biber to me.

Additional info: biber version 2.19

plk commented

It's not a bug, biber always encodes to UTF-8 unless you tell it not to. The mapping it uses to do this can be modified as per the docs. However, the default for \textgamma wasn't ideal as it was the latin gamma. This has been changed in the commit linked since there is a different mapping for latin gamma anyway. You can fix your install by altering your recode_data.xml as per the fixing commit.

I did not see this documented anywhere. Encoding in UTF-8 is not the same thing as replacing TeX macros with Unicode characters on-the-fly, so it seems like this should be mentioned and the mechanism to turn it off advertised in the documentation. Perhaps I just missed it? I did not see \text anywhere in the Biber documentation.

plk commented

See section 3.6 of the biber PDF documentation (texdoc biber in TexLIve).

Does it make sense to translate \textalpha to U+03B1 and so forth?

plk commented

Always open to suggestions for default for these - many were just best guesses over a decade ago.

@hammondkd biber always replaces latex character commands by the character where possible, but \textgamma is intended for Greek so the change in the commit above to map it to Greek gamma certainly looks right to me. The latin gamma has some specialised uses but is pretty esoteric and latex does not define a mapping for it at all in the default setup for classic tex systems, hence the error reported that the standard \textgamma command ended up being translated to an undefined character error.

@plk thanks for quick fix for gamma.

I think the whole greek alphabet should be in the 03xx range not 02xx eg

    <map><from>textupsilon</from>                      <to hex="28A">ʊ</to></map>

is mapping to Latin upsilon which is possibly used in phonetics or somewhere but isn't the intended interpretation of \textupsilon and U+028A has no mapping in latex and will by default just make

! LaTeX Error: Unicode character ʊ (U+028A)
               not set up for use with LaTeX.
plk commented

Done - all of the greek alphabet is available with--decodecharsset=full but any text* defaults are also now greek.

I still think that it is unnecessary that situations like this yield errors. I still think the right solution is to remember the actual text (LaTeX code) used in the bib entry so that after all processing has been done (which includes conversions, standardization and whatnot) to a form used for comparison, sorting etc. when something actually should be output it is that remembered value that the user actually wrote in the bib entry that goes out, and not the standardized form used in the machinery.

Then it is the responsibility of the user to have something suitable there, and if there is a problem the user can easily change it and will not get a really hard to understand error message.

I suggested this in an issue in 2014. and since then I've seen several related issues about various problematic characters. I don't think we have seen the last such issue yet, but that this will continue to be a problem that appears for users now and then.

plk commented

It's just too messy to remember the original chars and we wanted to support Unicode as a standard for such things. You can indeed select the output format as ascii with biber and it will sort internally with UTF-8 and output ascii equivalents for characters not in the output encoding. This involves two conversions however, not a "remember the original commands` approach.