VFileMessage generates invalid position if no position is given
remcohaszing opened this issue · 26 comments
Initial checklist
- I read the support docs
- I read the contributing guide
- I agree to follow the code of conduct
- I searched issues and couldn’t find anything (or linked relevant results below)
Affected packages and versions
3.1.2
Link to runnable example
No response
Steps to reproduce
import { VFileMessage } from 'vfile-message'
const message = new VFileMessage('reason')
console.log(message.position)
Expected behavior
The message position is undefined.
According to the type definitions it’s optional, but currently it’s always set.
Actual behavior
The message position is a unist position, but the start and end line and column are set to null
.
According to the unist types, start and end line and column can’t be null.
This causes various type errors. For example:
- No type errors should be suppressed in https://github.com/vfile/vfile-message/blob/main/index.js#L28-L30
- This check should not be necessary: https://github.com/unifiedjs/unified-language-server/blob/main/lib/index.js#L70-L74
This test explicitly asserts the incorrect behaviour:
Lines 63 to 66 in e5e87f2
Affected runtime and version
node@18.8.0
Affected package manager and version
npm@8.17.0
Affected OS and version
Pop!_OS 22.04
Build and bundle tools
No response
I think that’s just unist
types not matching reality.
Particularly I think your second link, https://github.com/unifiedjs/unified-language-server/blob/main/lib/index.js#L70-L74, is always a good idea. We’re dealing with users here, and should expect some garbage. Not everyone uses TS. And even if TS is used, NaN
could still be set. Or floats. Or floating point math could result in weird things?
I’d be more in favor of tools like unistPointToLspPosition
to, at runtime, check whether values actually match what TypeScript says they do (same as syntax-tree/unist-util-lsp#1 (comment))
position
could be null
if someone subclasses VFileMessage
without calling the constructor. It’s set to null
here:
Line 185 in e5e87f2
The unist types match the specification for Position
and Point
accurately. A Position
has a required start and end point and an optional indent. A Point
has a required line and column of type number and an optional offset. vfile-message
doesn’t follow this specification.
I won’t argue defensive programming is a good idea. People can indeed pass in incorrect data. But I do believe that official unified utilities shouldn’t create incorrect data.
The code is there to make things easier before ?
was in JS.
It’s there to make defensive coding easier.
It wasn‘t really meant to exactly match unist
, because it’s not a node. And this can also represent a single point instead of two points (position). This project can’t match unist when only one point is given?
Other than that, the downside of going with types is that it might break some folks. 😬
Friendly ping! :)
I think we’re on the same line that currently VFileMessage#position
is incorrectly typed as a unist Position
.
At this point, VFileMessage#position
can never be null. I don’t think we should take subclassing into account. Technically a subclass could change any property to anything, even without subclassing a user could change any value to anything.
The code is there to make things easier before
?
was in JS.
But this comes at the cost that checking if(message.position) {}
won’t work. Also it would be convenient to be able to pass message.position
into utilities that accept unist positions. If it would be typed correctly now, TypeScript will no longer allow this.
Also now that the oldest supported Node.js version supports optional chaining, perhaps this is a good time to change it. I think it would also be nice timing to combine this with a major unified release (which is necessary because of unifiedjs/unified#202)
The main problem I believe is that this project accepts a single point and exposes that information. The utilities typically support that starting point too as far as I am aware.
Is there any particular reason start
and end
in a position may not be the same (shallow copy)?
Interesting idea. Positions are supposed to be ranges. E.g., the end
point of a position in the whole document, if it were a starting point, would be a point outside the document.
- say
a
was the whole document, it could be represented as a whole as1:1-1:2 (0-1)
- the point
1:1 (0)
points to the byte/charactera
- the point
1:2 (1)
is EOF
In micromark/markdown-rs, events are never empty. This is carried through to mdast as well, and I believe every mdast/hast/nlcst/xast utility follows it too
This reminds me of the way Monaco editor handles TypeScript diagnostics. TypeScript text spans use the format { start: number, length: number }
. As can be seen here, TypeScript can return a text span with a length of zero, but when rendering a squiggly line to display the error, the editor requires a minimal length of 1 character to underline.
I think this applies the exact same way in unist, or any text format. A text range can have a length of 0 characters, but the best way to display a range of length 0 is often to point to the character at that index.
But then you’re suggesting to not support points at all, that users should pass a position of (pseudocode) start: start, end: start + 1
?
No, I suggest that a user can use:
vfile.message('hello', { line: 1, column: 1 })
assert.equal(vfile.position, { start: { line: 1, column: 1 }, end: { line: 1, column: 1 } })
assert.ok(vfile.position.start !== vfile.position.end)
But tools such as unist-util-stringify-position
should be able to handle this:
import { stringifyPosition } from 'unist-util-stringify-position'
assert.equal(stringifyPosition({ start: { line: 1, column: 1 }, end: { line: 1, column: 1 } }), '1:1')
assert.equal(stringifyPosition({ start: { line: 1, column: 1 }, end: { line: 1, column: 2 } }), '1:1-1:2')
Although it might look less nice, I don’t really see any practical problems with the following representation:
assert.equal(stringifyPosition({ start: { line: 1, column: 1 }, end: { line: 1, column: 1 } }), '1:1-1:1')
I still feel a bit weird about positions having the same start and end points. It feels like that shouldn’t occur. It feels “wrong”.
but when rendering a squiggly line to display the error, the editor requires a minimal length of 1 character to underline.
That’s a bit in line with what I feel as “wrong”. Same start and end is really empty. No line rendered.
Maybe I’m on my own here though, or have to get used to it?
For example slice(start, end)
in JavaScript also accepts a range. If the end equals start, then it returns an empty string / array.
Also cursors in a text editor represent a point, yet most terminal emulators display a cursor as a range of length 1.
Yeah, good “points”. I agree that that’s good on the “accepting” side of the API. But I feel weird about exposing it from an API. In this case, VfileMessage.position
being like that.
I’m leaning towards making it:
position?: {start: Point; end?: Point | undefined}
What do you think?
I feel pretty strongly about using a real full unist Positions
, where both start
and end
are required.
I still feel a bit weird about positions having the same start and end points. It feels like that shouldn’t occur. It feels “wrong”.
I understand this feeling, but I don’t think this is a strong argument to stick with optional values.
I’ve worked with LSP and Monaco editor, which deal with similar situations. They do use matching start
and end
points in a position to denote an empty range. This feels quirky (in a clever way) for a second, but it feels pretty natural once you get over it.
Another big downside of making start
or end
optional, is assignability. Given a function like this:
export function doSomethingWithPosition(position: Position): unknown
It would be really nice if we could do this:
for (const message = file.messages) {
doSomethingWithPosition(position)
}
The only libraries I can imagine are slightly negatively affected are those that represent the position as a string (only unist-util-stringify-position
that I’m aware of). However, such libraries could already get a position with a matching start and end anyway.
@ChristianMurphy I would love to hear your opinion on this as well. :)
even if TS is used, NaN could still be set. Or floats. Or floating point math could result in weird things?
We could tighten that down with better typings.
type-fest
for example has a NonNegativeInteger
type https://github.com/sindresorhus/type-fest#numeric
we’re dealing with users here, and should expect some garbage. Not everyone uses TS
we could validate the data as it is passed in, throwing exceptions invalid data on the JS side.
position?: {start: Point; end?: Point | undefined}
I would read a defined start
and an undefined end
as an unbounded length, start to infinity, rather than zero length.
They do use matching start and end points in a position to denote an empty range.
That makes sense, and lines up with what I've seen in other text editors as well.
slightly negatively affected are those that represent the position as a string
An even then, those libraries would just need to be aware that start and end could be a point, correct?
We could tighten that down with better typings.
throwing exceptions invalid data on the JS side.
I think there is a large runtime cost associated with all of this. But also a cost of how to type APIs that accept numbers currently. It’s a massive change in 500 repos. I don’t understand the benefits yet.
I feel pretty strongly about using a real full unist Positions, where both start and end are required.
Then it isn’t really about this project, but lots of projects, and it should be described in syntax-tree/unist
. And tests in lots of projects.
Another big downside of making start or end optional, is assignability.
It is indeed not assignable, this issue can be solved without it being assignable. We can use types than make end optional, and then you can use an if statement:
if (position.end) {
// Now it’s a full position, a range.
} else {
// There’s just one point.
}
There are several ideas floating around in this discussion.
Trying to make it actionable:
- the types here do not match the implementation
we can fix that, we can make the types match the implementation - the implementation should change
sure, we can do that, it’s a breaking change, some ways it can change:- we can expose
Position | Point | undefined
fine - we can expose
Position | undefined
and specify thatPosition
can be an empty range by having the same point as start and end
I’m not sure, needs specification, needs docs, needs tests, needs to be clear what it means i.e. how it is different from a position spanning one character. - feel free to propose other changes
- we can expose
- feel free to propose other ideas
- The unist docs need to be updated
According to the unist docsPosition
has required propertiesstart
andend
, both of typePoint
. APoint
hasline
andcolumn
, both required. Regardless of the solution here, it’s good to document explicitly how empty ranges are handled. - implementation changes
Inevitably there need to be implementation changes, and types need to match the implementation.
Options for the implementation changes:
- Make the
unist.Position
implementation match whatvfile-message
creates- Breaking change: This means
Position['start']
,Position['end']
,Point['line']
, andPoint['column']
would be optional / nullable.
- Breaking change: This means
- Make
unist.Position['end']
optional- Breaking change: This affects all packages that depend on
unist
- Breaking change: This affects all packages that depend on
- Allow
unist.Position['end']
to matchunist.Position['start']
, makeVFileMessage#position
adhere- The specification doesn’t explicitly state this is allowed, so in that sense it’s backwards compatible.
- Presentational libraries may yield slightly different output
unist-util-stringify-position
output already produces better results when given valid positions with am empty range IMO. Current behaviour:> stringifyPosition({start: {line: 2, column: 3}, end: {line: 2, column: 3}}) '2:3-2:3' > stringifyPosition({start: {line: 2, column: 3}, end: null}) '2:3-1:1'
- Maybe it affects a library that depends on the boolean value of a range length or string selection for that range. This is an edge case that shouldn’t occur in the first place.
vfile-message
needs to be updated to makeVFileMessage#position
matchunist.Position
.
- Keep
VFileMessage#position
in a format that’s not compatible withunist.Position
- Breaking change (type level only):
VFileMessage#position
is no longer assignable to utilities that expectunist.Position
.
- Breaking change (type level only):
I really don’t see any downsides to option 3. Both Christian and I have seen this works in practice in other places that use positional information.
According to the unist docs Position has required properties start and end, both of type Point. A Point has line and column, both required.
Regardless of the solution here, it’s good to document explicitly how empty ranges are handled.
The specification doesn’t explicitly state this is allowed, so in that sense it’s backwards compatible.
Look at the docs on Position
:
The start field of Position represents the place of the first character of the parsed source region. The end field of Position represents the place of the first character after the parsed source region, whether it exists or not.
— https://github.com/syntax-tree/unist#position
It explicitly states that “empty” is impossible.
If you want this, it’s breaking. I don’t think it’s trivial at all.
You’re right about it being documented. I needed to read more thoroughly.
I still don’t think this would cause any practical issues though. Personally I would prefer to update the spec to explicitly allow an empty Position
.
Considering upcoming breaking changes in the ecosystem: I think it would be an improvement if a nullish end
means it equals start
, and point line
and column
are always defined. If start is also unknown, the entire position should be null
. So that means a position could be one of the following:
{
start: { line: 1, column: 2 },
end: { line: 1, column: 2 }
}
{
start: { line: 1, column: 2 },
end: null
}
{
start: { line: 1, column: 2 },
}
null
Personally I would prefer to update the spec to explicitly allow an empty Position.
It still feels very weird to me to change unist in a way to support something that never actually occurs in unist. unist is supposed to represent ASTs in different programming languages. Why should it explain things that can never happen in an AST? Why should someone implementing unist in Swift or whatever have to support something that won’t ever happen?
Just because this one utility in JavaScript accepts other structures than Position
s?
Your code in this last comment seems to follow solution 2 in your earlier comment (end
optional), while you previously seemed to prefer your solution 3 (end
can be equal to start
)?
I personally prefer exposing Position | Point | undefined
here, which I mentioned earlier, perhaps at a place
field, which is the same as the input: https://github.com/vfile/vfile-message#vfilemessagereason-place-origin.
We could even allow Node
or Node[]
too.
Perhaps this can be done together with stack
? #16
There’s one place where start and end are the same: a root node representing an empty document (https://github.com/syntax-tree/mdast-util-from-markdown/blob/e30d9c8025d726360123ae95beb9cf011335b921/dev/lib/index.js#L380).
I’d think it’s better to explain this in the unist document though as an exception to the rule, rather than that all nodes can have same start/end points.
I actually think this is a good case that shows empty positions do exist, although they are rare.
Typically a position is part of an AST node from a parsed text document. In text documents, syntax is mostly defined by a non-zero length range of characters. Although empty ranges could have meaning.
Hypothetically a JavaScript string 'example'
could be represented as:
{
type: 'string',
position: { start: { line: 1, column: 1 }, end: { line: 1, column: 2 } },
children: [
{ type: 'stringQuote', value: "'", position: { start: { line: 1, column: 2 }, end: { line: 1, column: 3 } } },
{ type: 'stringContent', value: 'example', position: { start: { line: 1, column: 3 }, end: { line: 1, column: 10 } } },
{ type: 'stringQuote', valye: "'", position: { start: { line: 1, column: 10 }, end: { line: 1, column: 11 } } }
]
}
With such an example, an empty string (''
) would be represented as:
{
type: 'string',
position: { start: { line: 1, column: 1 }, end: { line: 1, column: 2 } },
children: [
{ type: 'stringQuote', value: "'", position: { start: { line: 1, column: 2 }, end: { line: 1, column: 3 } } },
{ type: 'stringContent', value: 'example', position: { start: { line: 1, column: 3 }, end: { line: 1, column: 3 } } },
{ type: 'stringQuote', valye: "'", position: { start: { line: 1, column: 3 }, end: { line: 1, column: 4 } } }
]
}
I also think there’s a difference between unist nodes and vfile message positions. Positions on unist nodes typically refer to parsed content, which is typically non-empty. Vfile message positions are constructed from a unist node / position / point.
Also I looked into the terminology of point / position a bit, as I find it confusing that Position
in unist means something other than in LSP. I think the term Position
in unist comes from a position vector. In that context a position may have length zero, it’s called a zero vector.
I think the term Position in unist comes from a position vector. In that context a position may have length zero, it’s called a zero vector.
The reason the field is called position
is because of https://github.com/reworkcss/css/blob/434aa1733f275a67ea700311451b98a14f8cc21a/lib/parse/index.js#L33. That project has no term for the things at start and end. Calling the whole thing Position and the start and end things points was added later.
Hypothetically a JavaScript string 'example' could be represented as:
This could be done like that, but it’s currently explicitly not done anywhere, and it’s explicitly documented that it doesn’t happen.
Like... Should there be an empty text node inside an paragraph node inside a root node, if the document is empty? Or an empty paragraph in a root?
OK, some examples to think about:
Empty document
Say there was a lint rule like https://github.com/sindresorhus/eslint-plugin-unicorn/blob/main/rules/no-empty-file.js. In a project unified-file-lint
or so.
The root node has {start: {line: 1, column: 1, offset: 0}, end: {line: 1, column: 1, offset: 0}}
How would that work? How could VS code or whatever highlight (“mark red or so”) that problem?
A tool that would auto-fix this would report an actual: ''
and an expected: '<!-- Empty -->'
or so, right?
No tabs
https://github.com/remarkjs/remark-lint/blob/main/packages/remark-lint-no-tabs/index.js. For a\tb
, it emits a warning at {line: 1, column: 2, offset: 1}
. An editor would mark the tab character as red. actual: '\t', expected: ' '
?
Final eol
https://github.com/remarkjs/remark-lint/blob/main/packages/remark-lint-final-newline/index.js. It currently doesn’t pass a point but it probably should? For a
, I’d expect a point of {line: 1, column: 2, offset: 1}
, actual: '', expected: '\n'
?
Hi! This was closed. Team: If this was fixed, please add phase/solved
. Otherwise, please add one of the no/*
labels.