universal-ctags/issues-we-will-not-fix-in-soon

Bug: null chars in input stream

Opened this issue ยท 4 comments

I am indexing the Mozilla mdn CSS web documents , and making tags accordingly. I have the documentation related to the CSS properties and some other topics in one single file localdocu.mdncssdan, each topic starts with # followed by a newline then the name of the topic.

The following is a simplified view of the document.

This page was last modified on Jul 7, 2023 by MDN contributors.

(...)

#
animation-composition

The animation-composition CSS property specifies the composite operation
to use when multiple animations affect the same property simultaneously.
(...)

This page was last modified on Jun 26, 2023 by MDN contributors.



#
animation-delay

The animation-delay CSS property specifies the amount of time to wait
from applying the animation to an element before beginning to perform
(...)

What I have is the following rule

--kinddef-mdncssdantags=t,topic,topics
--mline-regex-mdncssdantags=/^#\n(\w.*)$/\1/t/{mgroup=0}

Which works , detecting the tags , but the regex is refering to as the destiny of the tags is shifted some lines up

animation-composition	localdocu.mdncssdan	/^This page was last modified on Jul 7, 2023 by MDN contributors.$/;"	t
animation-delay	localdocu.mdncssdan	/^This page was last modified on Jun 26, 2023 by MDN contributors.$/;"	t

So this is not accurate, when I open say animation-delay , it will go 3 lines up to the This page was last modified.... , and the biggest issue is that, when this line is the same for other topics (say I have got other topic which its 3 lines upper is that same This page was last modified on Jun 26 , then the tag referecing system is completely messed up)

$ ctags --version
Universal Ctags 5.9.0, Copyright (C) 2015 Universal Ctags Team
Universal Ctags is derived from Exuberant Ctags.
Exuberant Ctags 5.8, Copyright (C) 1996-2009 Darren Hiebert
  Compiled: Sep  3 2021, 18:12:18
  URL: https://ctags.io/
  Optional compiled features: +wildcards, +regex, +gnulib_regex, +iconv, +option-directory, +xpath, +json, +interactive, +sandbox, +yaml, +packcc, +optscript

It seems that your ctags is old.
Here is the output with ctags version 6.0.0 shipped as a binary package of Fedora 39.

$ ctags --version
Universal Ctags 6.0.0, Copyright (C) 2015-2022 Universal Ctags Team
Universal Ctags is derived from Exuberant Ctags.
Exuberant Ctags 5.8, Copyright (C) 1996-2009 Darren Hiebert
  Compiled: Jul 19 2023, 00:00:00
  URL: https://ctags.io/
  Output version: 0.0
  Optional compiled features: +wildcards, +regex, +iconv, +option-directory, +xpath, +json, +interactive, +sandbox, +yaml, +packcc, +optscript
$ cat ./mdncssdantags.ctags 
--langdef=mdncssdantags
--kinddef-mdncssdantags=t,topic,topics
--mline-regex-mdncssdantags=/^#\n(\w.*)$/\1/t/{mgroup=0}

$ ctags --options=NONE --options=./mdncssdantags.ctags --language-force=mdncssdantags -o - localdocu.mdncssdan
ctags: Notice: No options will be read from files or environment
animation-composition	localdocu.mdncssdan	/^#$/;"	t
animation-delay	localdocu.mdncssdan	/^#$/;"	t

If the pattern # is not what you want, use {mgroup=1} instead of {mgroup=0}.


$ sed -e 's/mgroup=0/mgroup=1/'  mdncssdantags.ctags > mdncssdantags2.ctags 
$ ctags --options=NONE --options=./mdncssdantags2.ctags --language-force=mdncssdantags -o - localdocu.mdncssdan
ctags: Notice: No options will be read from files or environment
animation-composition	localdocu.mdncssdan	/^animation-composition$/;"	t
animation-delay	localdocu.mdncssdan	/^animation-delay$/;"	t

See also https://docs.ctags.io/en/latest/man/ctags-optlib.7.html#flags-for-mline-regex-lang-option about group flag.

I have updated to Universal Ctags 6.1. and used your parameters (that all identical to mines) , with my file and there was still the missalignment.
When using the simplified version of the docu (as I pasted it before ) it works (providing {mgroup=1}).
The issue is that my files are generated by a bash script that appends contents with >> , somehow adding the null terminator character in between those topics.
This file full of null-terminator characters seems to mess Universal Ctags engine and end up missbehaving in the way I showed in my first post.

I will just pre-process the files to delete those null-terminator bytes, but should you want to check on the issue (don't know if you consider it as an issue or a bug) , I attach my documentation file so you can check what I said.

(I dont know why is not letting me attach text-files through github attachments so I use google.drive)
https://drive.google.com/file/d/18adrICMqptt5EBzUFIj3k20GMATtLvnT/view?usp=sharing

Thank you

Thank you for trying the newer version.

Oh, I see. I am not surprised if ctags cannot handle null chars in input because the internal of ctags highly depends on C strings, byte sequences terminated with '\0'.

Ideally, this should be fixed, but we do not have enough time to fix it.

I will change the title of this issue.