Save what you can in new encoding instead of aborting

Question

Save what you can in new encoding instead of aborting

Opened this issue a year ago · 0 comments

Issue

Xed will not save properly save a file in another encoding format if even one invalid byte exists in a (possibly massive) text file. There is no option to Save anyway and to just let that one out of a thousand characters get lost.
This problem affects me and others who make subtitles for Smart TVs. Even if only one character of text in a large text transcript of movie dialog (e.g., as in subtitles) would get corrupted, we cannot choose to save the file in the new format and just compare the old and new files to see if anything substantial was lost.

Steps to reproduce

Load a text file created by ffmpeg or some other tool that added some weird whitespace.
Try to save the file in another text encoding format -- i.e., not UTF-8.

Expected behaviour

As in KWrite/Kate, there should be a pop-up that says "one or more characters does not fit the chosen encoding. Do you want to save anyway?" and allow one to save the file. Looking for one corrupted byte out of thousands of lines is madness. I would rather save and lose a space or a line break somewhere and deal with that. On playback, the mistake (if visible) will show itself and one can fix it later. Looking for it line-by-line in UTF-8 before even being able to export the text file and play it back on a device like a smart TV is not feasible.

Other information

Kate/KWrite does this. The GTK-based editors do not work well with this. If I have a file with 물, then I want to be ale to save that in another format like UHC to play back on my smart TV, which does not do UTF-8 (none seem to do this [sad face]). 물 exists in both UTF-8 and in UHC encodings, but the "non-normal" spaces do not exist in UHC. This makes me unable to save extracted subtitles in a format like UHC and use them. This hurts language learners and the hard of hearing who rely on subtitles. Not all subtitles are in English and work with ASCII. Even ISO-8859-* encodings are common based on your device only supporting a font with limited glyphs and using a 1- to 2-byte encoding instead of variable byte encodings.

I made a similar bug report for XFCE's Mousepad: https://gitlab.xfce.org/apps/mousepad/-/issues/183