/html-processing

playground for memory-efficient, fast HTML processing

Primary LanguageJavaScript

HTML Processing

This repository is used as a playground to test different alternatives for HTML processing in Node.js

parse5-html-rewriting-stream

see documentation

this module exposes a simple and efficient streaming SAX HTML parser

it seems that the business logic will be slightly harder to implement (so will have to be well-organized in terms of design to avoid spaghetti) but the streaming nature of this module will make is resource-light and efficient

NOTES

easy access to three main types of events

startTag

rewriter.on('startTag', tag => {
    ...
})

endTag

rewriter.on('endTag', tag => {
    ...
})

text

rewriter.on('text', (_, text) => {
    ...
})

QUESTIONS

remains to see if this handles Unicode correctly and can handle the cases we're interested in