Smart - quote types
jgm opened this issue · 23 comments
currently only english type quotes (both up) are supported, an option like -Sq x (x = numeric id of quote type) would be nice to allow e.g. (this is for Czech) „abc“; or even -SqAB, where A(B) represent characters for opening(closing) quote
Google Code Info:
Issue #: 287
Author: svat...@mail2web.com
Created On: 2011-02-14T10:26:54.000Z
Closed On:
here are links to proper unicode codes for czech (alternatives to english double) ones:
opening: http://www.fileformat.info/info/unicode/char/201e/index.htm
closing: http://www.fileformat.info/info/unicode/char/201c/index.htm
other useful types:
1.) english/single
o: http://www.fileformat.info/info/unicode/char/2018/index.htm
c: http://www.fileformat.info/info/unicode/char/2019/index.htm
2.) czech/single
o: http://www.fileformat.info/info/unicode/char/201a/index.htm
c: http://www.fileformat.info/info/unicode/char/2018/index.htm
Google Code Info:
Author: svat...@mail2web.com
Created On: 2011-02-14T10:36:56.000Z
I would very much like to see this too.
I imagine a minimal implementation could be to support explicit setting them as unicode character pairs like this:
pandoc -V doublequote=»« -V singlequote=›‹ -o output.html input.txt
More clever (e.g. a later addition) would be a language lookup table, to alter the defaults per language. That would cleverly switch to „this“ when setting -V lang=da (or autodiscovering the language?), and might also support different quoting style for text pieces of different languages within same document.
For others reading this issue, here's what I currently use to postprocess html output to switch to danish citation style:
perl -i -pe 's/”(\w([^”]*\w)?)”/„$1“/g;s/’(\w([^’]*\w)?)’/‚$1‘/g' output.html
(to change to other citation styles, change the characters around the two
$1)
To clarify: are you requesting support for configurable smart quotes on output, or also in input?
If you want to write Danish style quotes (or whatever) in input, they should pass through unchanged to the output.
So I gather you want to write "hello"
in your markdown file and get „hello“
in the output. Correct?
On 12-05-30 at 07:19pm, John MacFarlane wrote:
To clarify: are you requesting support for configurable smart quotes
on output, or also in input?If you want to write Danish style quotes (or whatever) in input, they
should pass through unchanged to the output.So I gather you want to write
"hello"
in your markdown file and get
„hello“
in the output. Correct?
Yes, correct.
(I did wonder why you had notions of quoting style at all in input files
in source - now I understand that (to some degree of "understand")) :-)
- Jonas
-
Jonas Smedegaard - idealist & Internet-arkitekt
-
Tlf.: +45 40843136 Website: http://dr.jones.dk/
[x] quote me freely [ ] ask before reusing [ ] keep private
Jonas: In HTML 5 and LaTeX/PDF output, it is already possible to get national quote styles.
In HTML 5, you just need to add some CSS (which you can include using --css): something like this, but for your language:
q { quotes: "“" "”" "‘" "’"; }
In LaTeX, add \usepackage[danish=quotes]{csquotes} to your template.
On 12-05-31 at 08:28pm, John MacFarlane wrote:
Jonas: In HTML 5 and LaTeX/PDF output, it is already possible to get
national quote styles.In HTML 5, you just need to add some CSS (which you can include using
--css): something like this, but for your language:q { quotes: "“" "”" "‘" "’"; }
In LaTeX, add \usepackage[danish=quotes]{csquotes} to your template.
I knew about LaTeX but not HTML5. Thanks for the hint!
Still, as I suspect is even written between the lines above: that is
little help for my current project for primary schools that use IE7.
I can possibly use Modernizr.js and/or IE7.js but in my experience those
often collide with other JavaScript messing with the DOM, e.g. Slidy and
Slideous.
Also, for the reference, above are not danish quotes. These are correct:
q { quotes: "„" "“" "‚" "‘"; }
...and (as an active translator made me aware when I tried to "correct"
him) these are equally correct (even if not my preference, as you might
guess from that incident):
q { quotes: "»" "«" "›" "‹"; }
More info here:
http://en.wikipedia.org/wiki/Non-English_usage_of_quotation_marks
- Jonas
Happy to notice that Slideous has been merged into Pandoc now!
-
Jonas Smedegaard - idealist & Internet-arkitekt
-
Tlf.: +45 40843136 Website: http://dr.jones.dk/
[x] quote me freely [ ] ask before reusing [ ] keep private
+++ Jonas Smedegaard [Jun 01 12 02:28 ]:
On 12-05-31 at 08:28pm, John MacFarlane wrote:
Jonas: In HTML 5 and LaTeX/PDF output, it is already possible to get
national quote styles.In HTML 5, you just need to add some CSS (which you can include using
--css): something like this, but for your language:q { quotes: "“" "”" "‘" "’"; }
In LaTeX, add \usepackage[danish=quotes]{csquotes} to your template.
I knew about LaTeX but not HTML5. Thanks for the hint!
Still, as I suspect is even written between the lines above: that is
little help for my current project for primary schools that use IE7.I can possibly use Modernizr.js and/or IE7.js but in my experience those
often collide with other JavaScript messing with the DOM, e.g. Slidy and
Slideous.Also, for the reference, above are not danish quotes. These are correct:
q { quotes: "„" "“" "‚" "‘"; }
Yeah, I know. I just gave the English ones and added "but for your
language," because it's tough for me to type those.
...and (as an active translator made me aware when I tried to "correct"
him) these are equally correct (even if not my preference, as you might
guess from that incident):q { quotes: "»" "«" "›" "‹"; }
By the way, LaTeX csquotes also has a danish=guillemots option.
On 12-06-01 at 08:49am, John MacFarlane wrote:
+++ Jonas Smedegaard [Jun 01 12 02:28 ]:
On 12-05-31 at 08:28pm, John MacFarlane wrote:
Jonas: In HTML 5 and LaTeX/PDF output, it is already possible to
get national quote styles.In HTML 5, you just need to add some CSS (which you can include
using --css): something like this, but for your language:q { quotes: "“" "”" "‘" "’"; }
In LaTeX, add \usepackage[danish=quotes]{csquotes} to your
template.I knew about LaTeX but not HTML5. Thanks for the hint!
Still, as I suspect is even written between the lines above: that is
little help for my current project for primary schools that use IE7.I can possibly use Modernizr.js and/or IE7.js but in my experience
those often collide with other JavaScript messing with the DOM, e.g.
Slidy and Slideous.Also, for the reference, above are not danish quotes. These are
correct:q { quotes: "„" "“" "‚" "‘"; }
Yeah, I know. I just gave the English ones and added "but for your
language," because it's tough for me to type those.
Ahh, how lovely: you beat me in nitpicking: I missed that tiny "but"!
:-D
- Jonas
-
Jonas Smedegaard - idealist & Internet-arkitekt
-
Tlf.: +45 40843136 Website: http://dr.jones.dk/
[x] quote me freely [ ] ask before reusing [ ] keep private
Language dependent smart quotes would be very nice (HTML, EPUB writer) for me too. I use markdown as source with "-quotes for German, French and Russian texts.
the best would be for pandoc to adapt according to the lang variable
Also, it would be great if pandoc could manage some typographic corrections. For example, in French, you should have a before signs like ! ? ; or : .
I currently do regexes on the resulting HTML to switch between American and English quotes, but a format independent way would be handy!
Hi. This seems to be a little related to issue #327 too.
I’d love to see such switches too, where one could simply pass the desired quote characters to which the ASCII quotes " around a sequence of words should be converted to.
$ pandoc --version
pandoc 1.12.4.2
Compiled with texmath 0.6.6.1, highlighting-kate 0.5.8.5.
Syntax highlighting is supported for the following languages:
actionscript, ada, apache, asn1, asp, awk, bash, bibtex, boo, c, changelog,
clojure, cmake, coffee, coldfusion, commonlisp, cpp, cs, css, curry, d,
diff, djangotemplate, doxygen, doxygenlua, dtd, eiffel, email, erlang,
fortran, fsharp, gcc, gnuassembler, go, haskell, haxe, html, ini, isocpp,
java, javadoc, javascript, json, jsp, julia, latex, lex, literatecurry,
literatehaskell, lua, makefile, mandoc, markdown, matlab, maxima, metafont,
mips, modelines, modula2, modula3, monobasic, nasm, noweb, objectivec,
objectivecpp, ocaml, octave, pascal, perl, php, pike, postscript, prolog,
pure, python, r, relaxngcompact, restructuredtext, rhtml, roff, ruby, rust,
scala, scheme, sci, sed, sgml, sql, sqlmysql, sqlpostgresql, tcl, texinfo,
verilog, vhdl, xml, xorg, xslt, xul, yacc, yaml
Default user data directory: /home/paul/.pandoc
Copyright (C) 2006-2014 John MacFarlane
Web: http://johnmacfarlane.net/pandoc
This is free software; see the source for copying conditions. There is no
warranty, not even for merchantability or fitness for a particular purpose.
$ more test.textile
"test"
$ pandoc -o test.markdown test.textile
$ more test.markdown
“test”
In the other direction, for transforming internationalized to ASCII text: https://raw.githubusercontent.com/moses-smt/mosesdecoder/master/scripts/tokenizer/normalize-punctuation.perl
is a script that normalizes all quotes and punctuation, for example, guillemets to double quotation marks. For example normalize-punctuation.pl < file.md
. For example in LaTeX (withcsquotes
), the compiled pdf
restores all the internationalization.
I wrote a simple filter that replaces ASCII quotes with typographic ones and that respects the lang
metadata field.
It should be fairly easy to customise. However, it’s only intended for output formats that treat quotes as part of a document’s semantics (e.g., OpenOffice, Word), not output formats that treat quotes as part of a document’s syntax (e.g., HTML, LaTeX).
You can install it by: pip3 install pandoc_quotes
See https://github.com/odkr/pandoc-quotes for details.
How to turn off the quotes translated? eg. I had write chinese quotes “”
, and I hope it is “”
in the final pdf file. But pandoc would be self-clever to translate “”
to `` ''
, and in some blocks pandoc would not translate them, as suggested in https://stackoverflow.com/questions/52052231/how-to-write-chinese-quotes-in-bookdown . it leaders to a chaos. Thank you.
@chpio I find bookdown will run
/usr/bin/pandoc +RTS -K512m -RTS deepin-bible.utf8.md --to latex --from markdown+autolink_bare_uris+ascii_identifiers+tex_math_single_backslash --output deepin-bible.tex --table-of-contents --toc-depth 2 --template latex/template.tex --number-sections --highlight-style tango --pdf-engine xelatex --biblatex --listings --top-level-division=chapter --variable tables=yes --standalone
which does not contain smart.
You mean, it should add -smart
in shell like this
markdown+autolink_bare_uris+ascii_identifiers+tex_math_single_backslash-smart
??? Thank you.
As described in https://stackoverflow.com/questions/52052231/how-to-write-chinese-quotes-in-bookdown
I had tested in bookdown template, and found that
“”
, which is Chinese quotes, would be translated to``,''
。 But if you write“”
in a block or other begin,end blocks, the Chinese quotes,“”
, would not be translated to``,''
。So you will get different Chinese quotes, in the final pdf file. Can I set in some place to turn off such translation? Thank you.
I had add -smart
, it also do the same thing in the above.
/usr/bin/pandoc +RTS -K512m -RTS deepin-bible.utf8.md --to latex --from markdown+autolink_bare_uris+ascii_identifiers+tex_math_single_backslash-smart --output deepin-bible.tex --table-of-contents --toc-depth 2 --template latex/template.tex --number-sections --highlight-style tango --pdf-engine xelatex --biblatex --listings --top-level-division=chapter --variable tables=yes --standalone
Hello! This might help. Assume you have this markdown doc:
---
lang: cs-CZ
csquotes: true
---
"Quotation test"
Using this command:
pandoc --pdf-engine=xelatex -o example.pdf example.md
You will get PDF with this quotation:
„Quotation test“