Wrong HTML string when reading from a file with encoding utf8 with bom
faust21 opened this issue · 1 comments
faust21 commented
Environment: nodejs
The file: a.html
File encoding: utf8 with bom
The parse and serialize codes:
fs.readFile('a.html', 'utf8', (err, data) => {
const dom = parse(data);
const html = serialize(dom);
fs.writeFile('a.html', Buffer.from(html, 'utf8'), (werr) => {
});
});
The source file's content like this:
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<title>xxx</title>
</head>
<body>
<div></div>
</body>
</html>
but the serialized content became this:
<html><head></head><body>
<meta charset="UTF-8">
<title>xxx</title>
<div></div>
</body></html>
As you can see, the head's content in body now. But if the source file's encoding is utf8, then the issue disappears.
faust21 commented
Solved, I removed the bom header.