glasserc/ethan-wspace

Difference between dirty and clean

buhtz opened this issue · 5 comments

buhtz commented

I'm switching from ws-buttler (which seems dead) to your nice package.

From the documentation it is not clear for me what the difference is between dirty and clean whitespaces?

I also of "empty" lines only containing whitespaces in a python file. But that whitespaces are highlighted by your package. This means they are "dirty" and not cleaned on the next save: Am I right?

The difference between dirty and clean whitespace is that dirty whitespace is highlighted and clean whitespace is cleaned on save. When you are working on a project which has clean whitespace already, clean whitespace prevents you from introducing bad whitespace. If you are working on a project which does not have clean whitespace, dirty whitespace helps you see where the bad whitespace is. This can be helpful because you may need to reintroduce bad whitespace to minimize a diff, or you may be able to fix whitespace on a line that you are about to change.

Yes, if the whitespace is highlighted, then that category of whitespace is understood to be dirty. ethan-wspace adds a little element to your modeline, something like ew:mnLt. The capital letters show the whitespace types that are dirty.

Does that help? If you have suggestions for how to better explain this, I'm open to suggestions!

buhtz commented

Sorry, this doesn't answer my question which was "what the difference is between dirty and clean whitespaces".

I have understand how your package treat that two types of whitespaces. But I didn't understand how your package decide which type one whitespace is.

ethan-wspace-mode decides whether whitespace is clean or dirty when it is turned on, it examines the whitespace in the buffer. If the whitespace is already clean, then the whitespace is considered to be clean. If the whitespace is not already clean, then it is considered to be dirty.

buhtz commented

f the whitespace is already clean, then the whitespace is considered to be clean. If the whitespace is not already clean, then it is considered to be dirty.

But what makes a white space "clean" or "dirty"?

Oh! I think I understand your question now. You're not looking for a technical explanation of how the package works but more like a theoretical explanation of what "dirty" means, right?

The goal is that the contents of a file should have exactly one representation on disk. If there are multiple possible representations for on-disk content, then it's possible to have diffs that move from one representation to another (and thus, causing a diff) without having any semantic change. Any whitespace which can contribute to the file having multiple representations is considered an "error" and any file with errors is "dirty".

It's the same principle as having automatic code formatting -- by eliminating all representations except one, you eliminate discussions about formatting and style. When the whitespace is formatted correctly, the file is clean.

There are multiple categories of whitespace that ethan-wspace understands and tries to track as clean or dirty. Each category has a different definition of what an "error" is. For example, the eol category considers any whitespace at the end of a line an error. Going back to our definition of "error", if whitespace is allowed after text, any file can have multiple representations by adding one or more spaces at the end of any line. The categories are listed in the README, and hopefully it's obvious how each of them relates to some situation where a file's whitespace can be changed without changing its other contents.