jgm/citeproc

Inner quotes handled incorrectly for it-IT locale.

fiapps opened this issue · 7 comments

In Italian, «» are used for outer quotes, and "" for inner quotes, and the it-IT CSL locale (lines 74–77) specifies this. When a pandoc citation has a quotation with inner and outer quotes in its suffix, the inner quotes are changed from "" to «» if the locale is it-IT, whether this is specified by pandoc lang metadata or a default-locale="it-IT" attribute on the style element in the CSL file.

I assume this is a bug in citeproc because it does not happen to quotations in body text.

Test file (test-quotes.md):

Foo[@a 50: «Disse: "bar"»]. «Disse: "baz"»

---
suppress-bibliography: true
references:
- id: a
  author:
    - literal: Aristotele
  title: Metafisica
  type: book
...

Correct output from pandoc test-quotes.md --citeproc --csl universita-pontificia-salesiana.csl -t markdown-citations:

Foo[^1]. «Disse: "baz"»

[^1]: [Aristotele]{.smallcaps}, *Metafisica*, 50: «Disse: "bar"».

Incorrect output from pandoc test-quotes.md --citeproc --csl universita-pontificia-salesiana.csl -M lang=it-IT -t markdown-citations:

Foo[^1]. «Disse: "baz"»

[^1]: [Aristotele]{.smallcaps}, *Metafisica*, 50: «Disse: «bar»».

pandoc --version reports:

pandoc 2.14.1
Compiled with pandoc-types 1.22, texmath 0.12.3, skylighting 0.11,
citeproc 0.4.1, ipynb 0.1.0.1
jgm commented

We can handle quote alternation properly when the quotes are added by CSL processing.
But when they're just part of the suffix, it's hard to get it right. Pandoc doesn't parse the «...» as a Quoted element, so it thinks the inner ".." are outer quotes...

Hard to think of a good solution short of having pandoc recognize all kinds of localized quotes as Quoted elements, which would require lots of changes.

As a workaround, I'd suggest backslash-escaping the inner quotes, so pandoc will just pass them through literally without changing them. (You probably want to use curly quotes instead of straight ones in this case.)

I’ve escaped my quotes for now, but I think there's a bug in that pandoc and citeproc don't currently agree on what a Quoted element is: pandoc thinks it should always be in "..", and citeproc thinks it should be localized quotes.

The best solution, in my view, would be to have pandoc recognize as a quoted element a run of text enclosed in the quotation marks defined for the locale. I don't personally benefit from the recognition of Quoted elements, but if pandoc is going to recognize them, and wants to work with non-English documents, it should recognize as a quote what is marked as such according to the locale. If that's the solution you choose, than I think this issue can be closed, and a pandoc one for recognizing localized quotations should be opened in its place.

The other solution would be for pandoc to continue to only recognize ".." as a Quoted element, and use citeproc in such a way that it does not wrap these elements in localized quotes. I see these solutions (not knowing anything about the internals of either program): (1) instead of passing a Quoted element to citeproc as such, pandoc could wrap the element's content in ".." and pass the result to citeproc; (2) pandoc could continue to pass a Quoted element to citeproc as such, but citeproc would have to know not to use localized quotes for these elements.

jgm commented

It's true, we could convert Quoted ils to Span ("",[],[]) (Str "“" : ils ++ [Str "”"]) before passing to citeproc, at least in prefixes and suffixes. (And maybe also in the bibliography fields? But that would mean we could get some failures in quotation flipflopping in cases where titles contain quotes.)
One would have to consider how this affects the code for moving punctuation (e.g. #33).

jgm commented

Another option would be for citeproc to use a custom Span element instead of Quoted to represent quoted sections, e.g.
Span ("",["csl-quoted"],[]). Quoted elements in bibliography databases or prefixes or suffixes could be converted to these Spans, prior to passing to citeproc, if the localized quotation marks match those pandoc would normally use to render these, and otherwise left as they are.

Note also that the code for resolving Quoted elements to inlines with localized quotes is currently in pandoc, not citeproc: convertQuotes in T.P.Citeproc. Ideally this function would be moved to citeproc and be made part of the regular pipeline. E.g. as a method localizeQuotes :: Locale -> a -> a on CiteprocOutput a.

Thanks for the fix (though I haven't tested it). I think I had also seen this issue in a title that used quotes. As I understand it, you fix will deal with that too.

jgm commented

The current code will treat a Quoted element in a title (or other bibliography field) specially, doing flip-flopping and localization regardless of the lang. It seems to me that this is the proper behavior, but feel free to give a counterexample. I had thought of making this locale sensitive (so that Quoted elements were just left alone in languages that don't use single or double curly quotes for either inner or outer quotes), but I decided against this in the end -- and in any case it wouldn't make a difference for Italian which uses double curly quotes for inner quotes.

Here are two examples showing the current behavior in titles:

% pandoc -C -t plain -Mlang=en
---
references:
- id: a
  author:
    - literal: Aristotele
  title: Metafisica et "Physica"
  type: article-journal
...

Foo [@a 50].
^D
Foo (Aristotele, n.d., 50).

Aristotele. n.d. “Metafisica Et ‘Physica’.”
% pandoc -C -t plain -Mlang=it
---
references:
- id: a
  author:
    - literal: Aristotele
  title: Metafisica et "Physica"
  type: article-journal
...

Foo [@a 50].
^D
Foo (Aristotele, s.d., 50).

Aristotele. s.d. «Metafisica et “Physica”».

I think the problem I saw was double curly quotes in the title becoming guillemots in the output, which happens when it's a field that the style renders in italics rather than quotes. Italian bibliography styles I'm familiar with tend to use italics for the article title and quotes for the container title. So you get this:

% pandoc -C -t plain -Mlang=it --csl universita-pontificia-salesiana.csl
---
references:
- id: a
  author:
    - literal: Aristotele
  title: Metafisica et "Physica"
  type: article-journal
...

Foo [@a 50].
Foo (Aristotele, s.d., 50).
Foo.[1] Foo (Aristotele, s.d., 50).

ARISTOTELE, Metafisica et «Physica».

[1] ARISTOTELE, Metafisica et «Physica»: 50.

This looks weird if the article title is in a language, like English, that doesn't use guillemots. But I can see this is a difficult problem to solve: flip-flopping is important, and parsing quotes is generally a good way of doing that, but non-English documents and multilingual bibliographies complicate matters. For now, the best solution I see is escaping quotes that shouldn't be changed.