Infer `position` info for replaced nodes

Question

Infer `position` info for replaced nodes

danielberndt opened this issue 2 years ago · 8 comments

Initial checklist

I read the support docs
I read the contributing guide
I agree to follow the code of conduct
I searched issues and couldn’t find anything (or linked relevant results below)

Problem

That the nodes generated via remark-breaks do not contain position information. This might be something that this underlying library could take care of.

Solution

I'm not 100% sure how much the remark-breaks represents a typical usage of this library, but for the remark-break case this library should be able to automatically infer the position of the break and subsequent text nodes.

Alternatives

Otherwise the replace() callback could contain some context which allows the caller to infer the correct position information.

Answer 1 · 2023-06-12T19:44:26.000Z

Hey @danielberndt! 👋
Thanks for reaching out.

My initial take is this probably should not be done.
By definition:

Position represents the location of a node in a source file.

The start field of Position represents the place of the first character of the parsed source region. The end field of Position represents the place of the first character after the parsed source region, whether it exists or not. The value of the start and end fields implement the Point interface.

source: https://github.com/syntax-tree/unist#position

Here we are replacing nodes with something not in the original source.
I would think that it would and should not have a position.

I'm not 100% sure how much the remark-breaks represents a typical usage of this library

It's used in a variety of ways.
full list of (open source) uses here: https://www.npmjs.com/browse/depended/mdast-util-find-and-replace

Answer 2 · 2023-06-13T08:39:48.000Z

Thanks @ChristianMurphy for sharing some context!

I understand that it probably would be wrong to automatically infer the position then. But it still might be beneficial to give callers of the library access to the position context so when they are creating an e.g. {type: "break"} node, they can assign a position if the new node still corresponds to something in the original source.

To give you some context on where this request comes from:
I'm working on a plain text-editor with some more or less advanced markdown support And for this it's important to parse the current markdown and keep as much position information as possible. So I was curious why using remark-breaks loses position information resulting in me creating this issue.

Answer 3 · 2023-06-13T08:59:03.000Z

Why do you use remark-break if you make a text editor? I don’t think you need it
Positional info might not be there. mdast could be created from hast (HTML). ASTs could be generated from programming code, some tool could line wrap long lines and inject them. It’s important to handle this on your side: positional info might be missing
This project searches for partial matches inside text nodes, it has potential access to where that text node was (see stack in RegExpMatchObject) but it doesn’t know where \ns would be
I was curious why using remark-breaks loses position information resulting in me creating this issue.

because things that do not exist in the source file, should not have positional info (ref https://github.com/syntax-tree/unist#node)

Answer 4 · 2023-06-13T10:10:18.000Z

Thanks for the quick responses you two :)

Why do you use remark-break if you make a text editor? I don’t think you need it

True, I was just copying the existing processor for rendering the markdown which wasn't fully necessary for the editor.

There's one use case in our app where positional information in combination with remark-break would be very useful though:
When people click on an markdown-based html element, the click handler has access to the position object of the rendered element. It then opens the editor and positions the cursor at the end of the corresponding markdown code.

I was curious why using remark-breaks loses position information resulting in me creating this issue.

because things that do not exist in the source file, should not have positional info (ref https://github.com/syntax-tree/unist#node)

It seems like I'm missing some context here or our definitions differ. To my understanding the underlying \n does exist as syntactical unit in the source file.

Beside the specifics about remark-breaks, how do you feel about adding position information to the {type: "text"} nodes created via this library (provided the original node has positional information associated with it)?

Answer 5 · 2023-06-13T10:28:38.000Z

To my understanding the underlying \n does exist as syntactical unit in the source file.

A soft break exists in the markdown, in this case.
No hard break exists in the markdown.
It could be that no break exists in the markdown/HTML/etc, from the the perspective of this utility or remark-breaks.

When people click on an markdown-based html element, the click handler has access to the position object of the rendered element. It then opens the editor and positions the cursor at the end of the corresponding markdown code.

You are going to get nodes without positional info, so I think it’s a good idea to handle that case: when someone clicks somewhere inside an element, look for parent nodes that have positional info set, if a parent does not have that, look for its parent.

How do humans click, in a preview, on a line ending anyway?

how do you feel about adding position information to the {type: "text"} nodes created via this library (provided the original node has positional information associated with it)?

I think it’s a bad idea because it can’t be done and it shouldn’t be done.

the input could come from a different markup language (e.g., HTML, org mode)
markdown can have indents, e.g. - > Alpha\n > bravo., theoretically there could be exdents, we don’t know where things are in the original document if someone was replacing bravo with charlie
markdown can have character escapes and references representing characters, which are part of the text node: alpha is available as alpha to this utility
https://github.com/syntax-tree/unist#node

Lastly, if you use this utility and want positional info, you can return nodes yourself, which can have positional info

Answer 6 · 2023-06-13T10:53:37.000Z

Thanks for the clarifications.

How do humans click, in a preview, on a line ending anyway?

Yeah they likely won't. For this use case the issue is that the generated {type: "text"} nodes lack the position information. In that case the handler indeed falls back to setting the cursor at the end of the full paragraph. But it would be nicer to able to set it at the end of the clicked line instead.

I now understand though that solving it for the general case is a tough ask.
So I'm fine creating my own remark-breaks variant that should return the correct position in most cases and doesn't fully consider all the edge cases.

In any case, thanks for your time and this fantastic resource you've created!
Feel free to close this issue

Answer 7 · 2023-06-13T10:56:30.000Z

If you want to access to much much more positional info, you might be interested in markdown-rs or micromark, which expose much more in finer detail.

👋

Answer 8 · 2023-06-13T10:56:42.000Z

Hi! This was closed. Team: If this was fixed, please add phase/solved. Otherwise, please add one of the no/* labels.