jgm/pandoc

Smart - quote types

jgm opened this issue · 23 comments

jgm commented

currently only english type quotes (both up) are supported, an option like -Sq x (x = numeric id of quote type) would be nice to allow e.g. (this is for Czech) „abc“; or even -SqAB, where A(B) represent characters for opening(closing) quote

Google Code Info:
Issue #: 287
Author: svat...@mail2web.com
Created On: 2011-02-14T10:26:54.000Z
Closed On:

I would very much like to see this too.

I imagine a minimal implementation could be to support explicit setting them as unicode character pairs like this:

pandoc -V doublequote=»« -V singlequote=›‹ -o output.html input.txt

More clever (e.g. a later addition) would be a language lookup table, to alter the defaults per language. That would cleverly switch to „this“ when setting -V lang=da (or autodiscovering the language?), and might also support different quoting style for text pieces of different languages within same document.

For others reading this issue, here's what I currently use to postprocess html output to switch to danish citation style:

perl -i -pe 's/”(\w([^”]*\w)?)”/„$1“/g;s/’(\w([^’]*\w)?)’/‚$1‘/g' output.html

(to change to other citation styles, change the characters around the two

$1
)

jgm commented

To clarify: are you requesting support for configurable smart quotes on output, or also in input?

If you want to write Danish style quotes (or whatever) in input, they should pass through unchanged to the output.

So I gather you want to write "hello" in your markdown file and get „hello“ in the output. Correct?

On 12-05-30 at 07:19pm, John MacFarlane wrote:

To clarify: are you requesting support for configurable smart quotes
on output, or also in input?

If you want to write Danish style quotes (or whatever) in input, they
should pass through unchanged to the output.

So I gather you want to write "hello" in your markdown file and get
„hello“ in the output. Correct?

Yes, correct.

(I did wonder why you had notions of quoting style at all in input files
in source - now I understand that (to some degree of "understand")) :-)

  • Jonas

  • Jonas Smedegaard - idealist & Internet-arkitekt

  • Tlf.: +45 40843136 Website: http://dr.jones.dk/

    [x] quote me freely [ ] ask before reusing [ ] keep private

jgm commented

Jonas: In HTML 5 and LaTeX/PDF output, it is already possible to get national quote styles.

In HTML 5, you just need to add some CSS (which you can include using --css): something like this, but for your language:

 q { quotes: "“" "”" "‘" "’"; }

In LaTeX, add \usepackage[danish=quotes]{csquotes} to your template.

On 12-05-31 at 08:28pm, John MacFarlane wrote:

Jonas: In HTML 5 and LaTeX/PDF output, it is already possible to get
national quote styles.

In HTML 5, you just need to add some CSS (which you can include using
--css): something like this, but for your language:

 q { quotes: "“" "”" "‘" "’"; }

In LaTeX, add \usepackage[danish=quotes]{csquotes} to your template.

I knew about LaTeX but not HTML5. Thanks for the hint!

Still, as I suspect is even written between the lines above: that is
little help for my current project for primary schools that use IE7.

I can possibly use Modernizr.js and/or IE7.js but in my experience those
often collide with other JavaScript messing with the DOM, e.g. Slidy and
Slideous.

Also, for the reference, above are not danish quotes. These are correct:

q { quotes: "„" "“" "‚" "‘"; }

...and (as an active translator made me aware when I tried to "correct"
him) these are equally correct (even if not my preference, as you might
guess from that incident):

q { quotes: "»" "«" "›" "‹"; }

More info here:
http://en.wikipedia.org/wiki/Non-English_usage_of_quotation_marks

  • Jonas

Happy to notice that Slideous has been merged into Pandoc now!

  • Jonas Smedegaard - idealist & Internet-arkitekt

  • Tlf.: +45 40843136 Website: http://dr.jones.dk/

    [x] quote me freely [ ] ask before reusing [ ] keep private

jgm commented

+++ Jonas Smedegaard [Jun 01 12 02:28 ]:

On 12-05-31 at 08:28pm, John MacFarlane wrote:

Jonas: In HTML 5 and LaTeX/PDF output, it is already possible to get
national quote styles.

In HTML 5, you just need to add some CSS (which you can include using
--css): something like this, but for your language:

 q { quotes: "“" "”" "‘" "’"; }

In LaTeX, add \usepackage[danish=quotes]{csquotes} to your template.

I knew about LaTeX but not HTML5. Thanks for the hint!

Still, as I suspect is even written between the lines above: that is
little help for my current project for primary schools that use IE7.

I can possibly use Modernizr.js and/or IE7.js but in my experience those
often collide with other JavaScript messing with the DOM, e.g. Slidy and
Slideous.

Also, for the reference, above are not danish quotes. These are correct:

q { quotes: "„" "“" "‚" "‘"; }

Yeah, I know. I just gave the English ones and added "but for your
language," because it's tough for me to type those.

...and (as an active translator made me aware when I tried to "correct"
him) these are equally correct (even if not my preference, as you might
guess from that incident):

q { quotes: "»" "«" "›" "‹"; }

By the way, LaTeX csquotes also has a danish=guillemots option.

On 12-06-01 at 08:49am, John MacFarlane wrote:

+++ Jonas Smedegaard [Jun 01 12 02:28 ]:

On 12-05-31 at 08:28pm, John MacFarlane wrote:

Jonas: In HTML 5 and LaTeX/PDF output, it is already possible to
get national quote styles.

In HTML 5, you just need to add some CSS (which you can include
using --css): something like this, but for your language:

 q { quotes: "“" "”" "‘" "’"; }

In LaTeX, add \usepackage[danish=quotes]{csquotes} to your
template.

I knew about LaTeX but not HTML5. Thanks for the hint!

Still, as I suspect is even written between the lines above: that is
little help for my current project for primary schools that use IE7.

I can possibly use Modernizr.js and/or IE7.js but in my experience
those often collide with other JavaScript messing with the DOM, e.g.
Slidy and Slideous.

Also, for the reference, above are not danish quotes. These are
correct:

q { quotes: "„" "“" "‚" "‘"; }

Yeah, I know. I just gave the English ones and added "but for your
language," because it's tough for me to type those.

Ahh, how lovely: you beat me in nitpicking: I missed that tiny "but"!
:-D

  • Jonas

  • Jonas Smedegaard - idealist & Internet-arkitekt

  • Tlf.: +45 40843136 Website: http://dr.jones.dk/

    [x] quote me freely [ ] ask before reusing [ ] keep private

Language dependent smart quotes would be very nice (HTML, EPUB writer) for me too. I use markdown as source with "-quotes for German, French and Russian texts.

the best would be for pandoc to adapt according to the lang variable

Also, it would be great if pandoc could manage some typographic corrections. For example, in French, you should have a   before signs like ! ? ; or : .

I currently do regexes on the resulting HTML to switch between American and English quotes, but a format independent way would be handy!

Hi. This seems to be a little related to issue #327 too.

I’d love to see such switches too, where one could simply pass the desired quote characters to which the ASCII quotes " around a sequence of words should be converted to.

$ pandoc --version
pandoc 1.12.4.2
Compiled with texmath 0.6.6.1, highlighting-kate 0.5.8.5.
Syntax highlighting is supported for the following languages:
    actionscript, ada, apache, asn1, asp, awk, bash, bibtex, boo, c, changelog,
    clojure, cmake, coffee, coldfusion, commonlisp, cpp, cs, css, curry, d,
    diff, djangotemplate, doxygen, doxygenlua, dtd, eiffel, email, erlang,
    fortran, fsharp, gcc, gnuassembler, go, haskell, haxe, html, ini, isocpp,
    java, javadoc, javascript, json, jsp, julia, latex, lex, literatecurry,
    literatehaskell, lua, makefile, mandoc, markdown, matlab, maxima, metafont,
    mips, modelines, modula2, modula3, monobasic, nasm, noweb, objectivec,
    objectivecpp, ocaml, octave, pascal, perl, php, pike, postscript, prolog,
    pure, python, r, relaxngcompact, restructuredtext, rhtml, roff, ruby, rust,
    scala, scheme, sci, sed, sgml, sql, sqlmysql, sqlpostgresql, tcl, texinfo,
    verilog, vhdl, xml, xorg, xslt, xul, yacc, yaml
Default user data directory: /home/paul/.pandoc
Copyright (C) 2006-2014 John MacFarlane
Web:  http://johnmacfarlane.net/pandoc
This is free software; see the source for copying conditions.  There is no
warranty, not even for merchantability or fitness for a particular purpose.
$ more test.textile 
"test"
$ pandoc -o test.markdown test.textile
$ more test.markdown 
“test”

In the other direction, for transforming internationalized to ASCII text: https://raw.githubusercontent.com/moses-smt/mosesdecoder/master/scripts/tokenizer/normalize-punctuation.perl is a script that normalizes all quotes and punctuation, for example, guillemets to double quotation marks. For example normalize-punctuation.pl < file.md. For example in LaTeX (withcsquotes), the compiled pdfrestores all the internationalization.

odkr commented

I wrote a simple filter that replaces ASCII quotes with typographic ones and that respects the lang metadata field.

It should be fairly easy to customise. However, it’s only intended for output formats that treat quotes as part of a document’s semantics (e.g., OpenOffice, Word), not output formats that treat quotes as part of a document’s syntax (e.g., HTML, LaTeX).

You can install it by: pip3 install pandoc_quotes

See https://github.com/odkr/pandoc-quotes for details.

How to turn off the quotes translated? eg. I had write chinese quotes “”, and I hope it is “” in the final pdf file. But pandoc would be self-clever to translate “” to `` '', and in some blocks pandoc would not translate them, as suggested in https://stackoverflow.com/questions/52052231/how-to-write-chinese-quotes-in-bookdown . it leaders to a chaos. Thank you.

chpio commented

How to turn off the quotes translated?

By not enabling smart.

@chpio I find bookdown will run

/usr/bin/pandoc +RTS -K512m -RTS deepin-bible.utf8.md --to latex --from markdown+autolink_bare_uris+ascii_identifiers+tex_math_single_backslash --output deepin-bible.tex --table-of-contents --toc-depth 2 --template latex/template.tex --number-sections --highlight-style tango --pdf-engine xelatex --biblatex --listings --top-level-division=chapter --variable tables=yes --standalone

which does not contain smart.

You mean, it should add -smart in shell like this

markdown+autolink_bare_uris+ascii_identifiers+tex_math_single_backslash-smart

??? Thank you.

As described in https://stackoverflow.com/questions/52052231/how-to-write-chinese-quotes-in-bookdown

I had tested in bookdown template, and found that “”, which is Chinese quotes, would be translated to ``,''。 But if you write “” in a block or other begin,end blocks, the Chinese quotes, “”, would not be translated to ``,''。So you will get different Chinese quotes, in the final pdf file. Can I set in some place to turn off such translation? Thank you.

I had add -smart, it also do the same thing in the above.

/usr/bin/pandoc +RTS -K512m -RTS deepin-bible.utf8.md --to latex --from markdown+autolink_bare_uris+ascii_identifiers+tex_math_single_backslash-smart --output deepin-bible.tex --table-of-contents --toc-depth 2 --template latex/template.tex --number-sections --highlight-style tango --pdf-engine xelatex --biblatex --listings --top-level-division=chapter --variable tables=yes --standalone

Hello! This might help. Assume you have this markdown doc:

---
lang: cs-CZ
csquotes: true
---

"Quotation test"

Using this command:

pandoc --pdf-engine=xelatex -o example.pdf example.md

You will get PDF with this quotation:

„Quotation test“

Thank you for the heads up that 8031ac1 was included three years ago; I remember vaguely that one had to hide to pandoc the activation of csquotes (with --smart ?!) for this package to work correctly.