Vanderhoof/PyDBML

Note deteriorates with each load and save

Closed this issue · 7 comments

The behaviour of notes makes it hard to use pydbml for editing dbml, i.e. read and save the a dbml file repeatedly.
Each time a multiline note is output pydbml adds new linebreaks before and after and indents it one level.
My personal preference is to leave notes exactly as they are to avoid messing up markdown and stuff, so I made some ugly hacks to make that happen; changed note.dbml to simply output the text and also remove one level of indentation, columns got even worse, I hacked some method that generates the inline note for columns to also remove one level of indentation from the note...
This will not keep the notes unmodified unless the notes is already indented in the original, but stops the indentation from growing beyond one level.
It would make sense to me if the note contents were not modified at all by loading and saving the same file. Or at least not after the first load and save, when the file is "pydbml formatted". Notes are free text and maybe it should be on the developer to do the formatting rather than the package forcing it on the text.
I am putting this notice in here while I am developing, it is early days, I will try to revisit this when I have worked more on the editing. Maybe I'll have a clearer idea for a solution then.
Thanks again for doing this! I find pydbml very useful.

That's an interesting point. Probably you are right and we shouldn't touch note formatting. Or otherwise remove all indentation from the note text during parsing and force specific level of indentation during dbml rendering (since we are forcing other structures' indentation in dbml rendering as well).

Anyway, the current behavior, where the note indentation incrementally grows on each save is definitely wrong. Thanks for finding that! I'll fix it one way or another in the new version, but feel free to share your solution, maybe we can find the best one together

Yeah, if the note is formatted by pydbml the result should be idempotent. So that pydbml fixes your note once and for all and then it stays exactly the same. This would work nicely for the simple cases.
If you use markdown notes you might want to disable the automatic formatting altogether. You might have indentation in the text. Being able to disable formatting also provides a catch-all workaround for any case where the formatting doesn't give the desired output. You can always manage it yourself.

I use a comments on tables and columns to control behaviour, a comment of --upnote-- on a column or table indicates the notes are managed by the program, --no-- indicates the column or table should not be touched at all, to protect manually curated tables and columns from being overwritten. It might be a simple way to disable formatting, although it does clutter the dbml file quite a lot.
I just realised I am going to implement the comments on the project tag as well, to control an entire file. That might be a good place to put hints for pydbml behaviour?

I got the "hint in comment" idea from sqlfluff, the dbt sql linter, where you can suppress the linting rules one by one or all of them in a comment.

@jens-koster I've tried my best to fix this issue, the result is not merged yet, it lives in this branch. Could you check if it will work for you?

What changed:

PyDBML now removes the main indentation level from the note text, and strips empty lines from beginning and end of it. I know you asked to have an option to disable note reformatting at all, but with the way I currently generate DBML it's unfortunately not possible. Although removing main indentation level and empty lines shouldn't be a problem for your Markdown content, and it will make notes idempotent.

By default PyDBML also splits long one-line notes into multiline. You can disable this behavior by parsing with reformat_notes=False option:

PyDBML(dbml_source, reformat_notes=False)

Please tell me what you think

@Vanderhoof I think you have nailed it, or very close to. I realised my notes look fine at dbdiagram.io, it turns out the dbml people actually documented how to deal with this: https://www.dbml.org/docs/#multi-line-string
The smallest number of spaces prefixing any line in the note, is stripped from the beginning of every line, when parsing the dbml.
So, while on file, the notes are indented to look good and give flow in reading the file. Any parser will strip that away when loading the file for processing the textual content. I am guessing that is pretty much what you did?
Anyway, I will pull the code and come back to you when I have tried it out.

Wow, I completely missed that info about multiline strings. Also pretty sure PyDBML doesn't support line continuation and escaping, mentioned on that page. I'll add that to TODO list.

But yeah, generally I tried to do the same thing with removing minimal indentation level. I'll also do more tests to see if it works correctly.

@jens-koster on second thought I should probably remove reformat_notes parameter and functionality completely. Right now the only reformatting it does is adding line breaks to the long lines of note text. I would rather leave this to the user — if they want to wrap the lines, they'd easily do it themselves. Removing reformatting will make the code simpler, and will eliminate implicit editing of user's content.

I agree completely. The reformatting is a slightly unexpected behaviour, and explicit is better than implicit :-)
I googled a bit and it turns out wrap() is actually implemented in the standard library of textwrap.
Together with an indent, and a dedent which "Remove any common leading whitespace from every line in text.", which would implement the dbdocs spec for how to handle Note text.
https://docs.python.org/3.9/library/textwrap.html