Text loader doesn't remove byte order mark (BOM)

Question

Text loader doesn't remove byte order mark (BOM)

Opened this issue 3 months ago · 4 comments

When loading files using the text loader, the loader doesn't strip byte order marks from the beginning of the file. For HTML files, for instance, this can turn into awkward problems like having an HTML entity like  inserted into the DOM inadvertently. Example here:

https://esbuild.github.io/try/#YgAwLjI0LjAALS1idW5kbGUKLS1mb3JtYXQ9ZXNtCi0tb3V0ZmlsZT1vdXQuanMKLS1zb3VyY2VtYXAKLS1kcm9wLWxhYmVsczpERUJVRwotLW1pbmlmeS1pZGVudGlmaWVycwotLWxvYWRlcjouaHRtbD10ZXh0AGUAZW50cnkudHMAaW1wb3J0IGZpbGVUZXh0IGZyb20gIi4vZXhhbXBsZS5odG1sIjsKCmNvbnNvbGUubG9nKGZpbGVUZXh0KTsAAGV4YW1wbGUuaHRtbAD+u788ZGl2PmhlbGxvIHdvcmxkPC9kaXY+

Bear in mind the example shows the text content of the HTML file as:

Whereas loading an HTML file with a BOM at the beginning in any reasonable text editor won't show that leading BOM.

I can work around it by ensuring that no text loader-loaded files have BOMs, but it does seem reasonable for the text loader to strip a leading BOM.

Answer 1 · 2024-10-08T00:17:14.000Z

And I'm also glad to make an attempt at a PR if the maintainer(s) agree that BOMs should be stripped by the text loader.

Answer 2 · 2024-10-08T13:56:16.000Z

I fixed it by simply converting the html file from UTF-8 BOM to UTF-8 (without BOM)

Answer 3 · 2024-12-20T01:25:56.000Z

I think this change makes sense (and is trivial, so no need for a PR) but I could see it breaking things with code that relies on this (or that works around it), so I'm going to wait until a breaking change release to do this.

Answer 4 · 2024-12-20T01:48:01.000Z

Could possibly go with opt-in behavior, and then change the default behavior (or remove the configurability altogether) on the next breaking change release?