hexojs/hexo-renderer-pandoc

`hexo-renderer-pandoc` with `hexo-generator-feed` produces HTML generation errors | `hexo-renderer-pandoc` 搭配 `hexo-generator-feed` 会出现 HTML 生成错误

ligen131 opened this issue · 2 comments

Problem Description 问题描述

Rendering with hexo-renderer-pandoc and using hexo-generator-feed plugin generates RSS feed link, when it encounters too long text in a line and a link appears in this line, it will cause an error in the generated HTML.

在使用 hexo-renderer-pandoc 渲染并使用 hexo-generator-feed 插件生成 RSS 订阅链接时,当遇到一行之中过长的文字,且该行出现链接时,会导致生成出的 HTML 出现错误。

For example, the original text of my Markdown is as follows:

如我的 Markdown 原文如下:

[test](https://012345678901234567890123456789012345678)

In the generated RSS subscription text, the content in the <content> field is

在生成出的 RSS 订阅文本中,<content> 字段中内容为

<content type="html"><![CDATA[<p><ahref="https://012345678901234567890123456789012345678">test</a></p>]]></content>

You can see that there is no space between <a and href, which causes a rendering error.

可以看到 <ahref 之间没有空格,从而导致渲染错误。

This problem does not occur on normal blog pages.

该问题在正常博客页面不会出现。

Since this problem does not occur when using Hexo's default renderer hexo-renderer-marked, an issue is raised under this repository.

由于使用 Hexo 默认渲染器 hexo-renderer-marked 不会出现该问题,故在该仓库下提起 issue。

Reproduction Steps 复现步骤

My Hexo version information 我的 Hexo 版本信息:

hexo: 6.3.0
hexo-cli: 4.3.0
os: win32 10.0.19044
node: 18.8.0
v8: 10.2.154.13-node.11
uv: 1.43.0
zlib: 1.2.11
brotli: 1.0.9
ares: 1.18.1
modules: 108
nghttp2: 1.47.0
napi: 8
llhttp: 6.0.7
openssl: 3.0.5+quic
cldr: 41.0
icu: 71.1
tz: 2022a
unicode: 14.0
ngtcp2: 0.1.0-DEV
nghttp3: 0.1.0-DEV

Node.js version is v18.8.0

npm version is 8.18.0

pandoc version 3.1.2

Follow the steps below to reproduce the problem step by step 按照下面步骤逐步执行即可复现该问题

$ hexo init test
$ cd test
$ npm uninstall hexo-renderer-marked
$ npm install hexo-renderer-pandoc --save
$ npm install hexo-generator-feed
$ hexo new test

Insert the Markdown into the source/_post/test.md file

将刚才的 Markdown 插入 source/_post/test.md 文件中

---
title: test
date: 2023-04-01 15:34:17
tags:
---

[test](https://012345678901234567890123456789012345678)

Save and quit.

保存并退出。

Modify the configuration to make hexo-generator-feed take effect, add the following content to _config.yml

修改配置使 hexo-generator-feed 生效,向 _config.yml 中添加

feed:
  enable: true
  type: atom
  path: atom.xml

And then, generate! 然后,生成

$ hexo g

The generated code can be seen in public/atom.xml

可以在 public/atom.xml 下看到生成的代码

<entry>
    <title>test</title>
    <link href="http://example.com/2023/04/01/test/"/>
    <id>http://example.com/2023/04/01/test/</id>
    <published>2023-04-01T07:34:17.000Z</published>
    <updated>2023-04-01T07:47:15.742Z</updated>
    
    <content type="html"><![CDATA[<p><ahref="https://012345678901234567890123456789012345678">test</a></p>]]></content>

    <summary type="html">&lt;p&gt;&lt;a
href=&quot;https://012345678901234567890123456789012345678&quot;&gt;test&lt;/a&gt;&lt;/p&gt;
</summary>

  </entry>

This is really an April Fool's Day joke for me, but this issue is not an April Fool's Day joke.

UPD:

I'm sorry, this doesn't seem to be the fault of hexo-renderer-pandoc and pandoc.

I noticed that pandoc automatically wraps URLs (using \r\n) when they are too long. For the above question, my guess is that some subsequent steps of hexo-generator-feed deleted these \r\n and did not add spaces.

Below is some debug data.

pandoc parameter = [
  '-f',
  'markdown-smart',
  '-t',
  'html-smart',
  '--mathjax',
  '-M',
  'pagetitle=dummy',
  '-M',
  'standalone=True'
]

source = "[test](https://012345678901234567890123456789012345678)"

res.output = [
  null,
  '<p><a\r\n' +
    'href="https://012345678901234567890123456789012345678">test</a></p>\r\n',
  ''
]

The final solution is to add the following configuration to Hexo's configuration file _config.yml:

pandoc:
  extra:
    - wrap: none

Close this issue as solved.