Auto detection of syntax fails if first line is empty
Closed this issue ยท 5 comments
I'm not sure if this is a bug or feature. The bug is not just for html but I use html as an example only:
Will highlight correct:
echo -e "<HTML>" | bat -ppWill not highlight:
echo -e "\n<HTML>" | bat -ppAt the moment (when stdin is used), batcat checks for the first line to determine the syntax. Instead, it shall check the first line that is not "empty".
Hi, thanks for bringing this up. The current behaviour matches Sublime Text, which invented the sublime-syntax highlighting grammars which bat is using. I'm a little bit afraid that changing it could cause other compatibility problems later on as well as complicate the code base / syntax detection logic. I personally have practically never encountered a file with a blank first line. But if it can be implemented cleanly, I would probably agree to merge such a change...
I've looked into this; it seems that InputReader has a first_line property which is just read up til the first newline of the file. This property seems to be used only to check the syntax with the first line (HighlightedAssets::get_first_line_syntax in assets.rs) and to check if the file has anything to print at all (Controller::print_file in controller.rs).
Implementing this would require bat to look ahead through whitespace-only lines until it finds non-whitespace. However, first_line is also used currently to avoid re-reading the input buffer; if it's not empty, the read_line function returns something from there instead of querying the input reader. I don't think it's acceptable to have this function return a vec with more than one newline, so we'd have to change how InputReader stores the data. It could be done efficiently if we stored lines in a Vec<Vec<u8>> , such that they could be popped (the lines would have to be stored in reverse order). Then the only change outside of InputReader would be in get_first_line_syntax, where we use the first element instead of the whole property.
I don't know how clean this would be. I can implement it, but is this over-engineering for such a small case? Or is this reasonable?
Thanks for investigating @apometta. It does sound like your solution would be clean enough, but it does indeed feel to me a bit like over-engineering, for a use-case which I have never personally encountered and can be trivially handled by a wrapper script to trim leading blank lines...
Let's see what the other maintainers think. What are your thoughts please @Enselic @eth-p ? ๐
As far as I know there is never a reason to keep the leading newline in a file, so simply requiring to removing any leading newline seems like a reasonable requirement to me.
I agree it feels like over engineering to handle this case.
Just to play devil's advocate, this issue only exists because @SkyperTHC did in fact encounter files with blank lines at the start. It's not just a hypothetical ๐