reutenauer/polyglossia

bidi disables all hyphenation in \text… and \foreignlanguage

logological opened this issue · 10 comments

With TeX Live 2024 updated to 2024-06-14, using the bidi package (or invoking \setotherlanguage with any language that loads bidi) disables hyphenation in the output of the \text.… and \foreignlanguage commands. Hyphenation still works in \begin{language}…\end{language} environments.

Here's an example file demonstrating the problem:

\documentclass{article}

\usepackage{polyglossia}
\setdefaultlanguage{german}
\setotherlanguage{persian} % or just \usepackage{bidi}

\begin{document}
% Hyphenation works outside of any command or environment:
\parbox{0pt}{\hspace{0pt}Rindfleischetikettierungsüberwachungsaufgabenübertragungsgesetz}

% Hyphenation works in a {language} environment:
\parbox{0pt}{\hspace{0pt}\begin{german}Rindfleischetikettierungsüberwachungsaufgabenübertragungsgesetz\end{german}}

% Hyphenation doesn't work in \foreignlanguage:
\parbox{0pt}{\hspace{0pt}\foreignlanguage{german}{Rindfleischetikettierungsüberwachungsaufgabenübertragungsgesetz}} 

% Hyphenation doesn't work in \text…:
\parbox{0pt}{\hspace{0pt}\textgerman{Rindfleischetikettierungsüberwachungsaufgabenübertragungsgesetz}}

\end{document}

test.pdf
test.log

The discussion in latex3/latex2e#1368 indicates that a great many packages stopped working in conjunction with bidi following a recent update to the array package. Perhaps this is yet another example. I will also report this issue to the developer of bidi, though they do not seem to have been active in the last eight months.

Thanks for the report. This seems to be a bug in XeTeX. Here is a more minimal example demonstrating the problem

%\font\foo="[lmroman10-regular]" at 10pt\foo
\TeXXeTstate=1
\vbox{\hsize=0pt\hskip0pt Rindfleischetikettierungsüberwachungsaufgabenübertragungsgesetz}
\vskip10pt
% Hyphenation works in a {language} environment:
\vbox{\hsize=0pt\hskip0pt\beginL Rindfleischetikettierungsüberwachungsaufgabenübertragungsgesetz\endL}
\bye

If I uncomment the first line (i.e. load a modern font) the bug occurs. The only thong we can do on polyglossia's side is to prevent the use of \beginL etc. when not needed, but it will not solve the problem in cases where such phrases are used inside RTL paragraph.

I forgot to mention. I can reproduce the bug with older TL so it is probably not new, nor related to the latest update.

Thanks for investigating. I'll follow the XeTeX bug report.

I've pushed a commit to the branch I'm working on (named udi). Your example work there, but as I said, if the direction change is really needed then the problem still exists (for example if persian was the main language in your example).

As mentioned by Jonathan Kew in the bug report, you can append \hskip0pt to the end of the text, I'm currently not sure if that will have other side effects, so for now will not add that to the package, but it might get added in the future.

@jspitz @reutenauer do you know if adding \hskip0pt to such cases can have an affect such as adding a possible line break and thus move the \endL/\endR node to a new line which can cause some changes to the output?

As mentioned by Jonathan Kew in the bug report, you can append \hskip0pt to the end of the text, I'm currently not sure if that will have other side effects, so for now will not add that to the package, but it might get added in the future.

I can confirm that adding \hskip0pt to the end of the text has the undesirable feature that it allows a line break at that point. Such a line break might be undesirable if the \foreignlanguage command is followed by something like punctuation: with, say, \foreignlanguage{greek}{Ψυχοφθόρα}, I do not want TeX to break the line before the comma.

\kern0pt instead of \hskip0pt could work.

Yes, \kern0pt seems to work. I get the hyphenation without any unwanted line breaks.

@u-fischer Thanks Ulrike. Just to be sure, could this lead to changes in rare cases where lastnode commands (e.g \lastskip) are used just after a language skip? If so well probably just have to add a hook or a new option.

I believe adding a zero width kern can cause some problems, so shouldn't be done always. Since this is not our bug, and a user can easily define a macro that add a kern, I'll close for now.