wzpan/hexo-generator-search

Request for filter features.

siraisisatoru opened this issue · 0 comments

Hello. I am using this plugin on my site and use it with the HEXO theme orange. It comes pretty handy tool. However, the XML file also generates certain things like HEXO tags, inline js scripts, and something irrelevant to the main contents. The following are the examples:


<!-- more -->
[[toc]]
english testing
{% webp /post_image/ABC.JPG 16:9 %}
{% include_alt /asset/sketches/siteMatters/map_import.html %}
<!-- load geojson -->
<base> 
<script src="/asset/geojson/ABC.geojson"></script> 
<script src="/asset/geojson/DEF.geojson"></script>
</base> 
{% iMap  side_main=false mapid=mainMap geojsonPoint=... %}

The main content is the sentence english testing or other main texts. I am not sure it makes sense to you or not but other tags, inline HTMLs are appearing in the XML. Someone asked in the previous issue. I followed method 2 to remove rendered HTML but those typed in .md file are still there.


Possible solution:

Edit 3:
Well, I do think it is inappropriate to modify this issue so many times but finding better solutions is necessary.
After much consideration, it is NOT necessary to add filter code in the source code of this generation engine which is contradicting to my first thought. The thing we need is some short of filer for the content for search. Therefore, this project will make the filter compile process using nunjucks. And there is a replace function built-in. Therefore, we can simply add replace before the content pass to the generator.

Now the xml template:

<content type="html"><![CDATA[{{ post._content | noControlChars | safe 
  | replace(r/({([^}]+)})/ig, '') | replace(r/(<([^>]+)>)/ig, '') | replace('[[toc]]', '')
  | replace(r/(\$\$([^\$\$]+)\$\$)/ig, '') | replace(r/(\$([^\$]+)\$)/ig, '') | replace(r/#/ig , '') 
  | replace(r/(\[([^\]]+)\]\(([^\)]+)\))/ig, '')}}]]></content>

Then the output comes like:

<content type="html"><![CDATA[english testing         ]]></content>

This is not the perfect way to go since the code in-between <script></script > will still be there.
I hope this issue can be a new feature in the next version. This can also be universal namely allow users to enter custom REGEXs in the _config.yml file.

Sorry for the poor English explanation and any non-sense code since I am new to js.