ge-ne/bibtool

Single newlines in annotate field discarded during printing

Opened this issue · 7 comments

I want to use bibtool to cleanup my BibDesk library and created the following resource file to mimic the output format of BibDesk:

% Preserve BibDesk comments (bibtool discards them by default)
pass.comments = on
% Use same cased key format as BibDesk
symbol.type = cased
% This option would be better than 'symbol.type' because it also preserves the case of publication types but still translates all keys to lower case
%preserve.key.case = on
% BibDesk does not align anything
print.align = 0
print.align.key = 0
print.align.string = 0
print.align.preamble = 0
print.align.comment = 0
% Use whitespaces around equal signs '='
print.wide.equal = on
% BibDesk does not break lines. Simulate by using max value for signed 32 bit integer
print.line.length = 2147483647
% Suppress newline at the very beginning of the document
suppress.initial.newline = on

% Emulate BibDesk's single tab indentation but does not turn into tab. 2 whitespaces by default (https://github.com/ge-ne/bibtool/blob/99351329a17c88b751d7dba50fd066de408b7d8e/include/bibtool/resource.h#L146)
print.indent = 4
% tabs only seem to apply for value alignment (https://github.com/ge-ne/bibtool/blob/99351329a17c88b751d7dba50fd066de408b7d8e/test/print_use_tab.t)
print.use.tab = on

Using a very long print.line.length discards all single newlines that were part of the original annotate field I use store personal notes through BibDesk.

This example shows how all single newlines are discarded (left original, right bibtool output):
Screenshot 2020-07-11 at 11 47 46

However, two consecutive newlines (i.e., "\n") are preserved as shown in this example:
Screenshot 2020-07-11 at 11 48 11

The current behavior of discarding single but preserving two consecutive newlines appears counter-intuitive and unexpected to me.
Is there a way to preserve all newlines in bibtext fields such as Annotate?

Attaching a small example file for reproducing the issue:
preserve_bibdesk.zip

ge-ne commented

BibTool assumes that the text is typeset with (La)TeX at the end. Thus the rules for this target are taken into account.
Multiple newlines separate paragraphs and single newlines have no significance at all. To honor the parameter print.line.length the parapgraphs are reformatted by eliminating singe line breaks and inserting new newlines at appropriate places if required.

Three and more consecutive newlines are collapsed into a single one.

This behaviour is currently not configurable

That's a valid assumption. Thank you for your clarification.

Would it be possible to introduce a new option pass.newlines = OnOff (or alike) similar to the existing pass.comments option?

This would be very helpful for interoperability with other tools, such as BibDesk or JabRef. This is the only critical blocker for a smooth combination of BibDesk and bibtool in my workflow. One could argue I misuse the Annote field but it appears very convenient and useful when managing a large library. Further, other tools such as JabRef also save multi-line strings in a comments field:
Screenshot 2020-07-12 at 22 14 30

The pass.comments option has already proven useful for me and other users (see StackExchange: https://tex.stackexchange.com/questions/464602/preserve-comment-line-with-bibtool)


The remaining two differences in formatting I did not manage to mimic with bibtool are of minor nature and can be reverted by re-saving the bibtool output file with BibDesk:

  • bibtool cannot indent fields using tabs as print.use.tab = on only seems to apply for value alignment even when using print.indent = 4 for fields
  • bibtool cannot preserve the (BibDesk) capitalization of publication types
ge-ne commented

Would it be possible to introduce a new option pass.newlines = OnOff (or alike) similar to the existing pass.comments option?

Somehow I don't see how this option should interact with print.line.length somehow they seem to contradict each other. If you want to eliminate newlines then you can set the pint.line.length to a very large value. What good would it be to leave newlines distributed somewhere in the source?

I see your example. I might think about turning off the formatting of single attributes. But this appears to be rather complicated. I am not convinced that this is worth it.

bibtool cannot indent fields using tabs as print.use.tab = on only seems to apply for value alignment even when using print.indent = 4 for fields

TABs semantics is not clear. The width of tab positions vary according to the taste of the developer or even the user. I don't see a good reason to support them more then I already do.

bibtool cannot preserve the (BibDesk) capitalization of publication types

Right, it can not preserve arbitrary writing variants. But BibTool is able to use uniform capitalization once you declare it. See new.entry.type

Would it be possible to introduce a new option pass.newlines = OnOff (or alike) similar to the existing pass.comments option?

Somehow I don't see how this option should interact with print.line.length somehow they seem to contradict each other.

I think pass.newlines and print.line.length can complement each other:

  • print.line.length can ensure a maximal line length. Hence, it inserts additional newlines accordingly.
  • pass.newlines can preserve existing intentional line breaks for compatibility with other tools, such as BibDesk or JabRef.

If you want to eliminate newlines then you can set the pint.line.length to a very large value.

In that case, pass.newlines = on gives additional control to preserve existing intentional newlines. Of course, the default should be off to maintain the current behavior.
Sidenote: As a "very large value", I was using the max value for a signed 32bit integer (2147483647). I tried -1 and 0 but there seems no option for unlimited (i.e., introduce no extra newlines).

What good would it be to leave newlines distributed somewhere in the source?

Existing newlines as part of field values can be intentional as in the case of comment fields for JabRef and BibDesk:

Before bibtool
before

After bibtool
after

Discarding these newlines removes all manually structured comments in these cases and makes bibtool practically unusable for libraries with comments managed with tools such as JabRef or BibDesk. (I tried a workaround using git add --patch to selectively commit non-Annotate fields but this didn't scale.)

Implementation alternative

As you hinted, such an option could also accept a list of fields to limit the preserve newlines behavior to selected fields (alike select.fields):

pass.newlines = {Annote,Comments}

Minor issues

Ok, the minor issues are not worth fixing then. I can live with them as BibDesk fixes them when re-saving.

ge-ne commented

Sidenote: As a "very large value", I was using the max value for a signed 32bit integer (2147483647). I tried -1 and 0 but there seems no option for unlimited (i.e., introduce no extra newlines).

Right. I use large values myself, usually something like 9999.

Existing newlines as part of field values can be intentional as in the case of comment fields for JabRef and BibDesk:

There is a cheap fix to this: just insert two newlines where you want to have newlines preserved.

Implementation alternative

As you hinted, such an option could also accept a list of fields to limit the preserve newlines behavior to selected fields (alike select.fields):

pass.newlines = {Annote,Comments}

I was thinking about a more general solution;

print.omit.formatting { @article # comments }
print.omit.formatting {  # remarks }

But as I said, this might be overdosed. Maybe

print.formatting = false

to suppress linebreaking completely is enough.

It could also be funny to support some LaTeX constructs when formatting...when I have really much time.

Existing newlines as part of field values can be intentional as in the case of comment fields for JabRef and BibDesk:

There is a cheap fix to this: just insert two newlines where you want to have newlines preserved.

Well, manually inserting newlines for my 223 papers with Annote fields isn't that cheap and breaks the visual structure in other tools (I basically write Markdown comments).

I was thinking about a more general solution;

Sounds pretty fancy. Agree, this might be overkill.

A simple global toggle to preserve intentional linebreaks is sufficient for the described use case. I find linebreaking more intention-revealing than formatting. Formatting would need some explanation what it includes and whether true/false maps to preserve/omit.