hexojs/hexo-renderer-pandoc

How to preserve tabs in code blocks?

seekstar opened this issue · 8 comments

For pandoc, the command line option --preserve-tabs makes pandoc preserve tabs in code blocks when converting markdown to HTML. I tried to pass that option to pandoc use by hexo-renderer-pandoc in the following ways:

Adding it to _config.yaml

pandoc:
  extensions:
    - -implicit_figures
    - +gfm_auto_identifiers+angle_brackets_escapable # Not available in pandoc 1.16
    - +pipe_tables+raw_html+fenced_code_blocks
    - -ascii_identifiers+backtick_code_blocks+autolink_bare_uris
    - +intraword_underscores+strikeout+hard_line_breaks+emoji
    - +shortcut_reference_links
  extra:
    - preserve-tabs

Adding it to node_modules/hexo-renderer-pandoc/index.js

  args.push("--preserve-tabs");

  var res = spawnSync('pandoc', args, {
    cwd: process.cwd(),
    env: process.env,
    encoding: "utf8",
    input: src
  });

But neither of them works. After hexo s -g, the generated blog page still replaces each tab with four whitespaces. So I'm wondering how to preserve tabs in code blocks for hexo-renderer-pandoc.

My suspicion is that our code block is hijacked by hexo's syntax highlight plugin and isn't processed my pandoc at all.

Can you try and see if any of the following two approaches works?

  1. if you don't want hexo's syntax highlight, then, in your _config.yml, add
highlight:
  enable: false

to turn off hexo's syntax highlight, and use the following syntax to pass preserve-tabs to pandoc (note the colon!!)

pandoc:
  extra:
    - preserve-tabs:  # note this colon!!

Then your code block will be processed by pandoc and have tabs preserved.

  1. if you want hexo's syntax highlight, then of course pandoc's --preserve-tabs won't be effective. In this case, add to _config.yml the following:
highlight:
  enable: false
  tab_replace: '	' # or put a tab character between the quotes instead

None of them works:

1

highlight:
  enable: false
pandoc:
  extra:
    - preserve-tabs:  # note this colon!!

2

highlight:
  enable: false
  tab_replace: '	'

3

highlight:
  enable: true
  tab_replace: '	'

I suspect that hexo-theme-tree I use hijacks the highlighting of the code blocks. Because I have uninstalled almost all unnecessary npm modules. All modules remained are as follows:

➜  blog git:(master) ✗ npm list
hexo-site@0.0.0 /home/searchstar/git/blog
├── hexo-asset-image-fixed@0.0.6
├── hexo-deployer-git@3.0.0
├── hexo-generator-archive@1.0.0
├── hexo-generator-category@1.0.0
├── hexo-generator-index@2.0.0
├── hexo-generator-sitemap@2.1.0
├── hexo-generator-tag@1.0.0
├── hexo-renderer-ejs@2.0.0
├── hexo-renderer-pandoc@0.3.0
├── hexo-server@2.0.0
└── hexo@6.2.0

I don't think there is any other module that can hijack the highlighting of code blocks.

I removed all code related to highlighting in hexo-theme-tree: seekstar/blog@299bdce

And the code blocks become plain, e.g., http://localhost:4000/2022/09/30/c-panic/:

image

But each tab is still replaced with four whitespaces.

I need your help to pin-point the location of the issue.

Can you locate the following code in node_modules/hexo-renderer-pandoc/index.js

  var res = spawnSync(pandoc_path, args, {
    cwd: process.cwd(),
    env: process.env,
    encoding: "utf8",
    input: src
  });

and change it to

  console.log(args) // ADDED
  console.log(data) // ADDED
  var res = spawnSync(pandoc_path, args, {
    cwd: process.cwd(),
    env: process.env,
    encoding: "utf8",
    input: src
  });
  console.log(res) // ADDED

Then render a minimal working example, such as the one below:

---
title: test codeblock
---

begin

```
function () {
    // <- put a tab here before the slashes
}
```

Then post the content of stdout here. By looking at the output I may be able to see what's causing the issue.

Also, please don't post urls that contain localhost (like your http://localhost:4000/2022/09/30/c-panic/). Those urls are only accessible on your own machine.

I've already tried console.log. When it is added before pandocRenderer, it prints to stdout. But when it is added in pandocRenderer, it does not print anything, even when refreshing the blog page. I suspect that the output is printed elsewhere.

By the way, the URL http://localhost:4000/2022/09/30/c-panic/ was posted to indicate the source blog post of the screenshot. It is useful if you (or someone else) clone my blog repo and hexo g && hexo s to reproduce the problem.

I accidentally found that the re-rendering of a blog post can only be triggered by pressing ctrl+s in the post in vscode, and the tabs in the re-rendered posts are perfectly preserved. Do you have any suggestions about re-rendering all blog posts without discarding the deployment history?

Do you have any suggestions about re-rendering all blog posts without discarding the deployment history?

I accidentally found that hexo clean && hexo g works fine.

In conclusion, a solution to this issue is disabling the highlight of hexo and passing --preserve-tabs to pandoc:

highlight:
  enable: false
pandoc:
  extra:
    - preserve-tabs:  # note this colon!!

And then hexo clean && hexo g and optionally hexo s or hexo d to make the changes take effect. This works for hexo-theme-tree I use out-of-the-box, i.e., there is no need to remove highlight-related code, and the theme can still highlight the tabs-preserved code blocks outputted by pandoc.

Thank you for your attention!