clj-commons/rewrite-clj

rewrite-clj does not preserve CRLF whitespaces

ikappaki opened this issue · 15 comments

Version
1.1.45

Platform
Operating System: any
Clojure version: any
JDK vendor and version: any

Symptom
crewrite-clj does not preserve CRLF whitespace when converted from zipper back to string,

Reproduction
On the REPL:

(require '[rewrite-clj.zip :as z])
;; => nil
(-> (z/of-string "(ns nl-test\r\n  (:require [rewrite-clj.zip :as z]))")
      z/root-string)
;; => "(ns nl-test\n  (:require [rewrite-clj.zip :as z]))"

The \r\n line endings have been converted to \n.

Actual behavior
The zipper converts CRLF line endings to LF.

Expected behavior
As per documentation whitespaces should be preserved:

Rewrite-clj is a library that can read, update and write Clojure, ClojureScript and EDN source code while preserving whitespace and comments.

Diagnosis
The CRLF to LF conversion was an intentional workaround as per f78856e:

Handle Windows line endings

While we await https://clojure.atlassian.net/browse/TRDR-65, I've
introduced a work-around.

See rewrite-clj.reader.cljc comments for details.

Fixes https://github.com/clj-commons/rewrite-clj/issues/93

Action
I'm not familiar with the codebase, but I might try to have a look at it at some point.

lread commented

Thanks for raising such a clear and easy-to-digest issue @ikappaki, very much appreciated!

This is currently as designed, but I think rewrite-clj docs should describe this behaviour clearly if this is not already the case.

We use clojure tools.reader and I think it wants to normalize \r\n to \n but does not always handle this well. I raised an issue about this.

Thanks for raising such a clear and easy-to-digest issue @ikappaki, very much appreciated!

This is currently as designed, but I think rewrite-clj docs should describe this behaviour clearly if this is not already the case.

We use clojure tools.reader and I think it wants to normalize \r\n to \n but does not always handle this well. I raised an issue about this.

Hi @lread, and thank you for the prompt reply! May I please argue that there shouldn't be any exceptions in the design doc that suggest that all newlines seq should be treated as LF. I think the goal that wtitespaces should be preserved is quite powerful, since it guarantees compatibility across all architecture, which should be part of the design.

This adversely affects window's users. Its native line ending format is CRLF. If a zipper pipeline is constructed on such a file, it is likely to convert the CRLF to LF even if there are no other changes, and the file stored on disk with incompatible line endings. Every rewrite-clj consumer has to implement workarounds for just that case to maintain compatibility on windows. Instead it is much wiser IMHO to maintain whitespaces in this library.

To exaggerate a bit, imagine the trouble it would have caused on *nix if rewrite-clj was converting any newlines to CRLF instead.

IMHO the tools reader issue is an incidental complexity that should be either prioritised upstream (the issue looks stale?) or a workaround should be devised in rewrite-clj instead.

The JIRA issue doesn't yet have a patch. Maybe it would help if someone wrote that patch and brought the issue to the attention of the core team either on Slack, ask.clojure, etc.

I haven't looked inside rewrite-clj where this could be patched temporarily, but some looking into this might also help the process of contemplating if a workaround in rewrite-clj is feasible and desired.

lread commented

My understanding: I think the clojure tools.reader does not intend to preserve \r\n line endings. I think it intends to normalize them to \n. This is not Windows-friendly, but a reality.

Before my tools.reader work-around in rewrite-clj, files with mixed \r\n and \n were causing issues for rewrite-clj and raised as an issue by the author of zprint.

At this point not normalizing to \n might be considered a breaking change for some rewrite-clj users and would likely have to be optional.

For now I'm willing to document the behaviour (if I haven't already done so).

lread commented

So @ikappaki, I don't disagree with your sentiment at all here, but Clojure is not entirely Windows-friendly. We, unfortunately, inherit that Windows unfriendly-ness.

lread commented

Related: #93

So @ikappaki, I don't disagree with your sentiment at all here, but Clojure is not entirely Windows-friendly. We, unfortunately, inherit that Windows unfriendly-ness.

Sure @lread we are both I think on the same page here and we are just having a friendly discussion from our personal perspectives :)

I have been using Clojure primarily on MS-Windows for a long time now, and I have not noticed anything at the language level that I would consider brought me to a disadvantage compared to working on other operating systems. The tooling perhaps is an exception where the user has to go via PowerShell and calling conventions and arguments passing do not match between unix and windows, but I think this area is not relevant to this case.

I believe there is a large user base who are primarily developing on windows, that should not be discounted as second class citizens, since this hinders Clojure's wider adaptation.

Thanks :)

lread commented

I'm not in disagreement, and very much appreciate this issue and discussion.

My guess: Clojure Windows support is not first-class because there are fewer Clojure Windows users, and there are fewer Clojure Windows users because Windows support is not first-class.

I'll document this rewrite-clj behaviour but leave this issue open for a bit to potentially capture thoughts/ideas from others.

Just for my understanding: rewrite-clj has a temporary workaround (normalizing \r\n to \n) for an issue in tools.reader, but tools.reader maintainers haven't responded in the issue, but word has it that they will never fix this?

Why does the workaround in rewrite-clj normalize and does not preserve the Windows newline?

lread commented

The current intent of clojure tools.reader is to normalize newlines to \n.
Rewrite-clj worked around a bug in that intent.

I was willing to work around a bug, but was not interested in changing the intended behaviour.
I felt that changing the intended behaviour would be riskier.

I do not know if the Clojure core team will address the issue I raised, or if they will change the newline normalization behaviour. There was interest on Slack when I raised the issue, but have not heard back since.

Note:

user=> (read-string "\"foo\r\nbar\"")
"foo\r\nbar"
user=> (require '[clojure.tools.reader :as rdr])
nil
user=> (rdr/read-string "\"foo\r\nbar\"")
"foo\r\nbar"

I.e. the clojure reader(s) do not mess with newlines within literal strings, but rewrite-clj currently does.

But I don't see why rewrite-clj would not preserve newlines as they are written: it's a rewriting tool with preservation for whitespace after all.

lread commented

Yeah, it took me a while, but I finally agree.

The fact that the Clojure readers normalize newlines is an internal detail of the readers.

One detail that we might want to remember is that:

(read-string "\"foo
bar\"")

Will always (appropriately) return "foo\nbar" regardless of CR or CRLF between foo and bar.

lread commented

I'm starting to think about this one a bit again.

New Newlines

When inserting a new newline, we have to decide the variant of that newline (CR or CRLF).

Ideally we'd just use whatever variant the source is already using but:

  1. the source might have mixed newline variants (here we could just use the first variant seen?)
  2. there might be no newlines in the source
  3. the user might be working on a subset of the source without newlines
  4. the user might be creating source from scratch

It would be nice if the newline variant selection could be automatic for new newlines, but I'm not sure if it can be.

Mixed Newlines

If a source has mixed newlines we'd preserve that oddity.

I remember mixed newlines causing the author of zprint some grief but that might have been due do issues with rewrite-clj and not anything else.
I'd keep this in mind.