rushter/selectolax

How can I replace text in html more efficiently

Closed this issue · 2 comments

Suppose I need to replace the text contained in a node, for example, replace the text with the translated text.
now we have replace_with method in Node, but we can not replace the text directly(just replace the text content part).
this is my implementation:

from selectolax.parser import HTMLParser
parser = HTMLParser('<h1>Welcome to selectolax tutorial</h1>')
node = parser.body.child
translated_text = '欢迎使用 selectolax 教程'
clone_html = node.html.replace(node.text(), translated_text)
clone_node = HTMLParser(clone_html)
node.replace_with(clone_node.body.child)
print("NEW HTML", parser.body.child.html)

But this implementation may cause low performance, so is there a better way to achieve it?

Why do you need to create a new node? Why not just call node.replace_with("text"). It's not clear to me.

I see, maybe I should find the “-text” node and call node.replace_with('new_text'), thanks for your reply