Undeprecate <plaintext>, it is needed for portable text files
Closed this issue · 4 comments
What is the issue with the HTML Standard?
Please, undeprecate <plaintext>
( https://developer.mozilla.org/en-US/docs/Web/HTML/Element/plaintext ). Let me describe why I think it is useful.
Let's consider I want send to someone text document in non-English language, for example, by handing them USB stick. I don't know beforehand what OS they are using and what is its system locale. So issues on encoding and line ending (CRLF vs LF) arise. Also, let's assume that this document should preserve all benefits of plain text, namely: the receiver should be able to edit it using plain text editors, such as Windows Notepad or vim or Emacs, and send the document further. And the document should be greppable, i. e. is should be possible to search in it using grep
.
If I just send them text document as-is, then it is possible they will see it in wrong encoding. So, the perfect way to handle this task, in my opinion, is to save this document as UTF-8 text with .html
extension and prepend to it this:
<!doctype html><html lang="..."><head><meta charset="utf-8"></head><body><plaintext>
This will create "portable text document", which will be perfectly displayed in correct encoding on any OS. The recipient will just need to click at it and the document will be opened in a browser. Also, by specifying attribute lang
, I solve problem of language pairs, which use same Unicode characters, but which are rendered differently, such as Russian-Bulgarian (see here for details https://tonsky.me/blog/unicode/ ) and Chinese-Japanese.
You may say: "Okay, but why we need <plaintext>
here? Why not just <pre>
?" If we use <pre>
here, then we should escape &
and <
. This will mean that our document is not truly plain text anymore. If the receiver decides to edit it, then they should be careful to escape &
and <
. Also, the document will not be greppable anymore. We would lose all benefits of plain text here, and the situation is now no better than if we send them full-blown HTML or pdf or docx.
My <plaintext>
-based solution is best of two worlds: it is the only solution I can think of, that allows the recipient to view the document on any OS, while at the same time not losing benefits of plain text.
You may say: "Why not just save the document as usual text file in UTF-8 with byte order mark? Then most software will recognize it as UTF-8 and will render it properly". Well, this is good idea, but this doesn't solve the issue of line endings (CRLF vs LF) and issue of language pairs with same Unicode characters (Russian-Bulgarian and Chinese-Japanese).
You may say: "Okay, but if you open this document in some text editor, there is still chance that the editor will not render the document properly". Yes, this is totally true. I don't say that my proposal is free of any disadvantages. But still my proposal has these two features: it allows the recipient to open the document for reading on any platform using just web browser, and it allows us to preserve benefits of plain text. Yes, editing experience may still be sub-optimal. But I still believe that my proposal is better than any alternatives.
So, I'm asking for undeprecating <plaintext>
. This will enable text editors to add button to their interface "Save as portable text document", which will produce document, which will be viewable on any platform.
Chrome (as of 129.0.6668.58) already implements <plaintext>
properly. Current HTML spec already specifies how <plaintext>
should be parsed in https://html.spec.whatwg.org/commit-snapshots/00b42be693bf53ef2990ccb4f4da9df22d1b3df8/#parsing-main-inbody . So the only thing we should do is to remove <plaintext>
from "Non-conforming features" ( https://html.spec.whatwg.org/commit-snapshots/00b42be693bf53ef2990ccb4f4da9df22d1b3df8/#non-conforming-features ) and state that it is fully supported.
If you agree, then I will write PR
I don't think you'll find agreement on this as due to its nature <plaintext>
has caused numerous security issues.
Perhaps @whatwg/i18n have suggestions for portable plain text, including language annotation and the like.
I've added this to the W3C I18N agenda for tomorrow (although we might not have a complete response tomorrow--much to do, so little year left...). My initial reaction is similar to @annevk's with the addition that such a file is no longer a "plain text file". It's an HTML document. Classical text operations (cat
, etc.) and text editors (vi
for example) would all see the starting markup as text.
However, the problems enumerated are interesting and this is a creative solution to them. Stay tuned.
The I18N WG did discuss this in our teleconference of 2024-12-19 (unfortunately a bug in the notes generator the <plaintext>
tag in this issue's description [😮] renders the transcript a bit difficult to read).
The general sense is that the proposed format is not a "plain-text" file, but rather a new specialized format. There are already plenty of these. While we are sympathetic to (and strongly encourage!) document formats to transmit language and direction metadata, plaintext files are not set up to do this. There are mechanisms at the MIME level (e.g. Content-Type
, Content-Language
, etc.) for sending such metadata. Storage in the file itself, however, is another matter.
@annevk notes the security and other issues that come from the <plaintext>
element in HTML. Undeprecating it seems unsound for that reason.
Okay, thank you