gjtorikian/html-pipeline

convert_filter only executed when node_filters are present

Closed this issue · 3 comments

Hey hey, I'm playing around with html-pipeline and noticed that the following snippet does not run the convert_filter:

require "html_pipeline/convert_filter/markdown_filter"
require "selma"
markdown = "1. Foo\n2. Bar"
result = HTMLPipeline.new(convert_filter: HTMLPipeline::ConvertFilter::MarkdownFilter.new).call(markdown)
{
    :output => "1. Foo\n2. Bar"
}

vs. when defining any node_filter like so:

require "html_pipeline/convert_filter/markdown_filter"
require "html_pipeline/node_filter/https_filter"
require "selma"
markdown = "1. Foo\n2. Bar"
result = HTMLPipeline.new(
  convert_filter: HTMLPipeline::ConvertFilter::MarkdownFilter.new,
  node_filters: [HTMLPipeline::NodeFilter::HttpsFilter.new],
).call(markdown)
{
    :output => "<ol>\n<li>Foo</li>\n<li>Bar</li>\n</ol>"
}

AFAIU the html-result will not be used unless any @node_filters are present. See

html = @convert_filter.call(text) unless @convert_filter.nil?
unless @node_filters.empty?
payload = default_payload({
node_filters: @node_filters.map { |f| f.class.name },
context: context,
result: result,
})
instrument("call_node_filters.html_pipeline", payload) do
result[:output] = Selma::Rewriter.new(sanitizer: @sanitization_config, handlers: @node_filters).rewrite(html)
end
end

Just let me know if you like to have a PR for that – only if this is unintended behaviour of course.

In practice this isn't an issue for me since I'll be using some node_filters.

Hey @grekko, I'm in the process of updating to v3 and just encountered the same issue. I was wondering if this was an intentional behavior. In my case, I only need the convert_filter.

Completely unintentional—thank you for the report. Will be fixed monetarily in 3.0.2!