Bug: null chars in input stream
Opened this issue ยท 4 comments
I am indexing the Mozilla mdn CSS web documents , and making tags accordingly. I have the documentation related to the CSS properties and some other topics in one single file localdocu.mdncssdan
, each topic starts with #
followed by a newline then the name of the topic.
The following is a simplified view of the document.
This page was last modified on Jul 7, 2023 by MDN contributors.
(...)
#
animation-composition
The animation-composition CSS property specifies the composite operation
to use when multiple animations affect the same property simultaneously.
(...)
This page was last modified on Jun 26, 2023 by MDN contributors.
#
animation-delay
The animation-delay CSS property specifies the amount of time to wait
from applying the animation to an element before beginning to perform
(...)
What I have is the following rule
--kinddef-mdncssdantags=t,topic,topics
--mline-regex-mdncssdantags=/^#\n(\w.*)$/\1/t/{mgroup=0}
Which works , detecting the tags , but the regex is refering to as the destiny of the tags is shifted some lines up
animation-composition localdocu.mdncssdan /^This page was last modified on Jul 7, 2023 by MDN contributors.$/;" t
animation-delay localdocu.mdncssdan /^This page was last modified on Jun 26, 2023 by MDN contributors.$/;" t
So this is not accurate, when I open say animation-delay
, it will go 3 lines up to the This page was last modified....
, and the biggest issue is that, when this line is the same for other topics (say I have got other topic which its 3 lines upper is that same This page was last modified on Jun 26
, then the tag referecing system is completely messed up)
$ ctags --version
Universal Ctags 5.9.0, Copyright (C) 2015 Universal Ctags Team
Universal Ctags is derived from Exuberant Ctags.
Exuberant Ctags 5.8, Copyright (C) 1996-2009 Darren Hiebert
Compiled: Sep 3 2021, 18:12:18
URL: https://ctags.io/
Optional compiled features: +wildcards, +regex, +gnulib_regex, +iconv, +option-directory, +xpath, +json, +interactive, +sandbox, +yaml, +packcc, +optscript
It seems that your ctags is old.
Here is the output with ctags version 6.0.0 shipped as a binary package of Fedora 39.
$ ctags --version
Universal Ctags 6.0.0, Copyright (C) 2015-2022 Universal Ctags Team
Universal Ctags is derived from Exuberant Ctags.
Exuberant Ctags 5.8, Copyright (C) 1996-2009 Darren Hiebert
Compiled: Jul 19 2023, 00:00:00
URL: https://ctags.io/
Output version: 0.0
Optional compiled features: +wildcards, +regex, +iconv, +option-directory, +xpath, +json, +interactive, +sandbox, +yaml, +packcc, +optscript
$ cat ./mdncssdantags.ctags
--langdef=mdncssdantags
--kinddef-mdncssdantags=t,topic,topics
--mline-regex-mdncssdantags=/^#\n(\w.*)$/\1/t/{mgroup=0}
$ ctags --options=NONE --options=./mdncssdantags.ctags --language-force=mdncssdantags -o - localdocu.mdncssdan
ctags: Notice: No options will be read from files or environment
animation-composition localdocu.mdncssdan /^#$/;" t
animation-delay localdocu.mdncssdan /^#$/;" t
If the pattern #
is not what you want, use {mgroup=1}
instead of {mgroup=0}
.
$ sed -e 's/mgroup=0/mgroup=1/' mdncssdantags.ctags > mdncssdantags2.ctags
$ ctags --options=NONE --options=./mdncssdantags2.ctags --language-force=mdncssdantags -o - localdocu.mdncssdan
ctags: Notice: No options will be read from files or environment
animation-composition localdocu.mdncssdan /^animation-composition$/;" t
animation-delay localdocu.mdncssdan /^animation-delay$/;" t
See also https://docs.ctags.io/en/latest/man/ctags-optlib.7.html#flags-for-mline-regex-lang-option about group
flag.
I have updated to Universal Ctags 6.1. and used your parameters (that all identical to mines) , with my file and there was still the missalignment.
When using the simplified version of the docu (as I pasted it before ) it works (providing {mgroup=1}).
The issue is that my files are generated by a bash script that appends contents with >> , somehow adding the null terminator character in between those topics.
This file full of null-terminator characters seems to mess Universal Ctags engine and end up missbehaving in the way I showed in my first post.
I will just pre-process the files to delete those null-terminator bytes, but should you want to check on the issue (don't know if you consider it as an issue or a bug) , I attach my documentation file so you can check what I said.
(I dont know why is not letting me attach text-files through github attachments so I use google.drive)
https://drive.google.com/file/d/18adrICMqptt5EBzUFIj3k20GMATtLvnT/view?usp=sharing
Thank you
Thank you for trying the newer version.
Oh, I see. I am not surprised if ctags cannot handle null chars in input because the internal of ctags highly depends on C strings, byte sequences terminated with '\0'.
Ideally, this should be fixed, but we do not have enough time to fix it.
I will change the title of this issue.