Low-level API for creating a document/fragment without parsing an HTML string
joeldrapper opened this issue · 5 comments
I maintain a Ruby-based view component library called Phlex, which takes a Ruby structure and turns it into an HTML String. I’m wondering if it might be possible (and reasonable) to alternatively turn it into a Nokolexbor syntax tree directly, skipping the HTML rendering and parsing steps for performance.
Basically, instead of returning an HTML string, a Phlex component could optionally return a Nokolexbor::DocumentFragment
, which could then be used for testing or further DOM manipulation.
It may not work since we’d have to spend a lot of time in Ruby land calling lots of Ruby methods to build this document (they’d have to be faster than String#<<
for this to make sense). And if it does work, it may not be worth it since it sounds like Nokolexbor is already really fast at parsing HTML. I thought it might be an interesting idea to explore anyway.
What are your thoughts?
@joeldrapper If you mean to build a Nokolexbor::DocumentFragment
by manually creating elements. It's already supported. Example:
doc = Nokolexbor::Document.new
frag = Nokolexbor::DocumentFragment.new(doc)
div = doc.create_element('div', "Content", { class: 'a b c', style: 'e f g' })
div << doc.create_element('a', "Link text", { href: 'https://www.google.com' })
div << doc.create_element('span', "xxx", { class: 'some_class' })
frag << div
I also benchmarked creating Nokolexbor::DocumentFragment
by hand and by parsing. But it seems the latter is the absolute winner.
require 'benchmark/ips'
require 'nokolexbor'
def create_fragment_by_hand
doc = Nokolexbor::Document.new
frag = Nokolexbor::DocumentFragment.new(doc)
(1..50).each do |i|
div = doc.create_element('div', "Content #{i}", { class: 'a b c', style: 'e f g' })
div << doc.create_element('a', "Link text", { href: 'https://www.google.com' })
(1..50).each do |j|
div << doc.create_element('span', j, { class: 'some_class' })
end
frag << div
end
frag
end
@html = create_fragment_by_hand.to_html
def create_fragment_by_parsing
doc = Nokolexbor::Document.new
frag = doc.fragment(@html)
end
raise "HTML output not equal" if create_fragment_by_parsing.to_html != create_fragment_by_hand.to_html
Benchmark.ips do |x|
x.warmup = 2
x.time = 10
x.report("Create fragment by hand") do
create_fragment_by_hand
end
x.report("Create fragment by parsing") do
create_fragment_by_parsing
end
x.compare!
end
Output
Warming up --------------------------------------
Create fragment by hand
20.000 i/100ms
Create fragment by parsing
93.000 i/100ms
Calculating -------------------------------------
Create fragment by hand
293.214 (± 5.8%) i/s - 2.940k in 10.063928s
Create fragment by parsing
1.257k (±18.0%) i/s - 12.183k in 10.087883s
Comparison:
Create fragment by parsing: 1257.4 i/s
Create fragment by hand: 293.2 i/s - 4.29x slower
However, the benchmarks may not be very helpful because you have to do String#<<
first to build the HTML which takes time. So you'd better do benchmarks on your specific scenario (create by hand vs. build HTML + parse).
In C, the situation is the opposite. It is much faster to create a tree manually than to parse it.
Lot of time is spent on processing the bindings for the C functions.
I wondered if there might be a way for Phlex to create the primitive data structures for the tree directly, without going through the normal DOM manipulation API. Phlex templates are interpreted like a tree already, so there’s no need for tokenising or parsing steps.
I wondered if there might be a way for Phlex to create the primitive data structures for the tree directly, without going through the normal DOM manipulation API. Phlex templates are interpreted like a tree already, so there’s no need for tokenising or parsing steps.
Ideally, do only one call to get the desired Nokolexbor::DocumentFragment
. In this case I'm afraid you might need to customize the C part. You pass the Phlex structure into C extension, iterate your structure in C, and create the corresponding Lexbor structure. But the iteration will be calling ruby C API, not sure how the performance will be.
That makes sense. It sounds like this is not the low-hanging fruit I thought it could be.
Thanks for your help. Nokolexbor is awesome.