Maybe add completion of citation keys for `.bib` file in Markdown
lukeflo opened this issue · 24 comments
Not sure, how much effort that would be. But at the moment, neither Markdown LSP nor Helix itself seems to support auto completion of citation keys when writing papers/notes in Markdown.
Thus, this would be a great feature.
E.g. if the YAML header of a md
/qmd
file contains the line bibliography: /path/to/bibfile.bib
, to be able to trigger auto completion of the included bibentries when typing @...
with previewing the related entry of the bibfile.
Hi! It's not difficult to solve.
- please give me example
md/qmd
and.bib
files - links to standards/specifications/etc if you have some
- list of citations must be scoped for each document by its header with
bibliography: file.bib
? In other words each document has its own list of citations without one list for all docs in "project"?
Hi,
thanks for the response. I'll provide the files tomorrow if possible.
Best
A typical Markdown file could look like that (doesn't matter if the file extension is md
or qmd
), whereas the filepath to the .bib
file can be absolute or relative:
---
author: lukeflo
date: 2024-08-13
keywords: [tag1]
title: A simple note
pdf-engine: lualatex
cite-method: biblatex
bibliography: "~/Documents/notes-db/test.bib" # could also be surrounded by brackets instead of quotation marks
---
# Heading
incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis
nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.
Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu
fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in
culpa qui officia deserunt mollit anim id est laborum.
::: {#refs}
:::
The corresponding .bib
file can contain multiple entries in the BibTex format. For example, a test.bib
file:
@online{irfanullah_open_acces_global_south_2021,
author = {Irfanullah, Haseeb},
title = {{Open Access and Global South}},
subtitle = {It is More Than a Matter of Inclusion},
date = {2021-02-08},
urldate = {2024-08-04},
language = {english},
url = {https://web.archive.org/web/20240303223926/https://scholarlykitchen.sspnet.org/2021/01/28/open-access-and-global-south-it-is-more-than-a-matter-of-inclusion/},
}
@article{brainard_pay-to-publ_model_open_acces_2024,
author = {Brainard, Jeffrey},
title = {{Is the pay-to-publish model for open access pricing scientists
out?}},
journal = {American Association for the Advancement of Science},
volume = {385},
issue = {6708},
date = {2024-08-01},
urldate = {2024-08-04},
doi = {10.1126/science.zp80ua9},
}
@article{brembs_replacing_academic_journals_2023,
author = {Brembs, Björn and Huneman, Philippe and Schönbrodt, Felix and
Nilsonne, Gustav and Susi, Toma and Siems, Renke and Perakakis,
Pandelis and Trachana, Varvara and Ma, Lai and Rodriguez-Cuadrado,
Sara},
title = {Replacing academic journals},
year = {2023},
month = may,
doi = {10.5281/zenodo.7974116},
}
Generally, every Markdown file has its own header with the particular bibliography:
keyword. The given path can be surrounded by quotes "
, bracktes [...]
or just be plain (More infos).
How it should work:
When writing in the body of a Markdown file like the one above, typing @
at the beginning of a new word (this means after a space character) should trigger autocompletion of the citekeys
from the given .bib
file:
- For the example above, typing
@bra
should suggestbrainard_pay-to-publ_model_open_acces_2024
as completion candidate while only typing@b
should suggestbrainard_pay-to-publ_model_open_acces_2024
andbrembs_replacing_academic_journals_2023
- If moving over a candidate via Tab a preview window could pop up which shows the particular source code form the
.bib
file. But thats optional. - When inserting the selected candidate with Enter, the
citekey
should be inserted with the following formatting:[@brembs_replacing_academic_journals_2023]
. There are more options (e.g. adding pre- and postnote, or inserting multiple entries), but for now, just autocompleting single entries like this in general would be really great. More infos on Markdown citation syntax can be found in the Pandoc manual or Quarto docs.
As an example, how it should act, take a look at the Gif from the PandocCiter for VSCode.
@lukeflo please try first attempt of this feature
branch https://github.com/estin/simple-completion-language-server/tree/citation-keys
$ cargo install --branch citation-keys --git https://github.com/estin/simple-completion-language-server.git
and enable this feature in languages.toml
[language-server.scls]
command = "simple-completion-language-server"
[language-server.scls.config]
max_completion_items = 20
snippets_first = true
feature_words = true
feature_snippets = true
feature_unicode_input = true
feature_citations = true # enable it <--
Use case video
scls-citation-2024-08-14_13.38.28.mp4
How it works
- not following specifications, file formats, etc (don't know needs this or not)
- if completion triggered by editor with prefix containing
@
then scls via regex searchbibliography: ...
paths. Read file, parse it and search citation keys started with prefix - on each completion action would be search paths, read, parse and find keys. and it must be have some optimizations
- bib files parsed by https://crates.io/crates/biblatex
Currently it works and I think useful as is
Sorry for my poor English
Great. I'll try it ASAP and come back to you with some feedback. Don't know if I've time today, but will try.
PS: Your English is fine. I'm also not a native speaker... 😉
Hey, just was able to test it out. One good news, one bad (unfortunately, the bad one is much more relevant):
First the good thing:
If I test it with a very small example file, as you did in the video, it works great. E.g. with a file:
---
bibliography: "/home/lukeflo/Documents/notes-db/literatur-lukeflo.bib
---
Test @bra...
@bra
triggers the autocompletion!
But:
If I try it in a larger file, containing a relevant yaml header and some paragraphs etc., it doesn't work out. E.g. a file like the following wont work, and `[@bra...] wont trigger autocompletion:
---
date: 2023-11-11
title: FAIR Principles
bibliography: "/home/lukeflo/Documents/notes-db/literatur-lukeflo.bib"
---
# FAIR data
FAIR Guiding Principles should be applied to the workflows too:[@bra...]
> "Importantly, it is our intent that the principles apply not only to
> 'data' in the conventional sense, but also to the algorithms, tools,
> and workflows that led to that data."
I tried out different dirs, relative to my bib file or in other dirs, but the only aspect seems to be the content of the Markdown file itself.
@lukeflo please try to debug it. can't reproduce bug
- Install new version of scls from related branch (added more logs)
$ cargo install --branch citation-keys --git https://github.com/estin/simple-completion-language-server.git
- Ensure scls configured for logging to file
/tmp/completion.log
[language-server.scls]
command = "simple-completion-language-server"
[language-server.scls.config]
max_completion_items = 20
snippets_first = true
feature_words = true
feature_snippets = true
feature_unicode_input = true
feature_citations = true # enable it
# write logs to /tmp/completion.log
[language-server.scls.environment]
RUST_BACKTRACE = "1"
RUST_LOG = "debug,simple-completion-language-server=trace"
LOG_FILE = "/tmp/completion.log"
- Run helix with
hx -vvv /tmp/doc.md
and check log files for error entries
- scls `/tmp/completion.log``
- helix
~/.cache/helix/helix.log
In tmp/completion.log
must be something like
2024-08-16T07:32:33.841972Z DEBUG simple_completion_language_server: Citation word_prefix: bra, chars_prefix: too:[@bra
2024-08-16T07:32:33.842012Z DEBUG simple_completion_language_server: Citation try to read: /tmp/literatur-lukeflo.bib
2024-08-16T07:32:33.842088Z DEBUG simple_completion_language_server: Citation from file: /tmp/literatur-lukeflo.bib prefix: bra key: irfanullah_open_acces_global_south_2021 - match: false
2024-08-16T07:32:33.842093Z DEBUG simple_completion_language_server: Citation from file: /tmp/literatur-lukeflo.bib prefix: bra key: brainard_pay-to-publ_model_open_acces_2024 - match: true
2024-08-16T07:32:33.842107Z DEBUG simple_completion_language_server: Citation from file: /tmp/literatur-lukeflo.bib prefix: bra key: brembs_replacing_academic_journals_2023 - match: false
2024-08-16T07:32:33.842214Z DEBUG simple_completion_language_server: completion request took 0ms with 1 result items
- Try to save doc file or reopen it to reset internal state of scls. May some bug in scls on processing incremental doc changes
Hey, thanks for the fast response.
My logging was already set up. I just run it with -vvv
flag with a clean completion.log
(removed the older file before opening Helix). There occurs an error when trying to insert a citation key, just after the first line from your example log:
�[2m2024-08-16T10:17:58.529684Z�[0m �[34mDEBUG�[0m �[2msimple_completion_language_server�[0m�[2m:�[0m Citation word_prefix: mbem, chars_prefix: @mbem
�[2m2024-08-16T10:17:58.529695Z�[0m �[33m WARN�[0m �[2msimple_completion_language_server�[0m�[2m:�[0m Failed to repr slice as str
�[2m2024-08-16T10:17:58.529823Z�[0m �[34mDEBUG�[0m �[2msimple_completion_language_server�[0m�[2m:�[0m completion request took 0ms with 0 result items
The full log (tried two citations @mbem...
and @bra...
):
completion.log
Inside the (much longer) helix.log
I cant find an error message related to this use case. But I might have overlooked something, since because of my not existing Rust knowledge I do not know which kind of message I've to look for.
Here is the full log:
helix.log
I've created an even simpler file:
---
title: A great test file
author: lukeflo
bibliography: "/home/lukeflo/Documents/notes-db/literatur-lukeflo.bib"
---
# Heading
Lorem ipsum odor amet, consectetuer adipiscing elit. Tristique hendrerit
faucibus elementum sapien euismod gravida hendrerit orci. Litora litora
sociosqu torquent dignissim tortor a. Curae porttitor penatibus lorem odio
nisi. Sapien aliquam varius curabitur imperdiet in tincidunt. Ac bibendum
aenean dis vivamus sem purus cras eget. Tortor fermentum quam sodales sit ut in
neque. Duis mauris varius habitant mollis sollicitudin gravida ullamcorper. Est
potenti nec facilisi posuere arcu velit dictum lobortis. Tortor efficitur morbi
vitae in orci nibh ullamcorper habitant ex. Porta penatibus morbi odio magnis
volutpat felis felis tristique. @mbembe
Nisl nibh amet nam nascetur auctor. Euismod blandit ultrices litora conubia hac
habitant egestas. Tortor ut pretium cubilia litora parturient hendrerit nibh
posuere. Vel nam sed mollis sit molestie congue magnis lorem. Ipsum elementum
eget efficitur accumsan dis scelerisque. Donec velit volutpat ultrices purus
condimentum suscipit. Morbi elementum est bibendum; aliquam phasellus netus
diam in. Tempus et scelerisque dignissim lacinia pulvinar nunc. Curabitur magna
curae arcu; donec nullam tempus. Placerat habitant commodo finibus vel ex.
Cubilia metus eget primis venenatis metus ante. Tincidunt rutrum ante; class
montes aliquet odio consequat vivamus. Fames condimentum vivamus conubia nisi
diam porta hendrerit. Lectus neque felis rhoncus commodo quis cursus phasellus
pharetra. Purus finibus duis fringilla faucibus quam phasellus curabitur.
[@bra]
It still only working with the short example from my post above. There can be no typo or so, since I copied the working short example and just enhanced it with the "Lorem Ipsum" stuff and some additional yaml arguments
@lukeflo you're right! File size cause on Rope logic (internal scls text buffer). Reproduced and fixed.
@estin thanks for your once again fast reply. Just built the branch and now it works better, but unfortunately there are still some drawbacks.
First the good news: The longer "Lorem ipsum" example from my last comment now works... most of the time. But there seems to be a problem if the entered characteres can match a citekey, as well as a simple text
completion from one words already typed in the buffer. For example, my bibfile contains the following key, grandsire_the_metafonttutorial_2004
, but the lorem ipsum text also contains the word gravida
. Thus, when I type @gra
, it only suggests the in buffer text
word gravida
, but not the key.
As long as I only type @gr
, it matches:
�[2m2024-08-16T18:58:28.253348Z�[0m �[34mDEBUG�[0m �[2msimple_completion_language_server�[0m�[2m:�[0m Citation from file: test.bib prefix: gr key: grandsire_the_metafonttutorial_2004 - match: true
But when I add the a
, only the in-buffer completion is shown as candidate. And the log shows no entry for @gra
or prefix: gra
, as it did for @gr
:
completion-lorem.log
Now, when I try it with an even bigger file, its still not working at all. At the moment, for example, I'm writing a scientific paper regarding my current research. The text already runs multiple A4 pages. If I open the respective Markdown file, which also contains a yaml header with more than 20 lines, and try to trigger the citekey completion somewhere in a paragraph, nothing happens.
The log even does not show true
/false
matches as in the case of the lorem example:
�[2m2024-08-16T19:02:54.841446Z�[0m �[34mDEBUG�[0m �[2msimple_completion_language_server�[0m�[2m:�[0m Citation word_prefix: gr, chars_prefix: @gr
�[2m2024-08-16T19:02:54.842162Z�[0m �[34mDEBUG�[0m �[2msimple_completion_language_server�[0m�[2m:�[0m Citation try to read: papersiz
�[2m2024-08-16T19:02:54.842182Z�[0m �[31mERROR�[0m �[2msimple_completion_language_server�[0m�[2m:�[0m Failed to read file papersiz: No such file or directory (os error 2)
�[2m2024-08-16T19:02:54.843571Z�[0m �[34mDEBUG�[0m �[2msimple_completion_language_server�[0m�[2m:�[0m completion request took 2ms with 2 result items
Full file here:
completion-paper.log
Sorry for bothering you with this stuff. Would be totally ok, if you've other things to do 😉
@lukeflo please try new update - citation completion will not mixed with words completion
on yours bigger file found in logs - papersiz
file not found
Failed to read file papersiz: No such file or directory (os error 2)
completion request took 2ms with 2 result items
Please send value of be bibliography:
line in yours bigger file.
may regex to extract file path is invalid
I'll have a look asap. But probably not before tomorrow...
Hey, finally had time to test it. Sorry for the waiting time.
The first issue is solved. As you say, citation completion is not interacting with word completion anymore. Great!
The papersize
part from the log file regarding my article corresponds to the bigger yaml header of the file. The lines surrounding the bibliography:
keyword are the following:
---
# some more lines
header-includes: |
\setlist{nosep}
\usepackage{blindtext}
\DeclareFieldFormat[online]{shorthand}{\texttt{#1}}
\newcommand{\origunderscore}{}
\let\origunderscore\_
\renewcommand{\_}{\allowbreak\origunderscore}
\usepackage[htt]{hyphenat}\usepackage{emptypage}
\setcounter{secnumdepth}{0}
bibliography: "/home/lukeflo/Documents/notes-db/literature-lukeflo.bib"
papersize: a4
urlcolor: articlecolour
---
The syntax is fine, as is the file path of the bibliography. That is confirmed because I can process the document using pandoc
without getting any error messages.
And again. Please try new update.
Bug was on extract path by captured span.
It works! As far as I can see, it now works in all circumstances I've tested. GREAT, thank you!
nice! on next week I will merge this feature to the master branch and make some changes on tests.
How are your deal with spelling on helix? Which tools are you use?
One minor thing which could be enhanced is the preview of the selected entry. Right now it is not very unified. Sometimes the highlighting changes and sometimes it does not show the whole entry, especially with longer entries:
Its not a big thing, as I personally know most of my bibliographic entries good enough to identify them only by citekey. But if someone is using another scheme or is not as familiar with the database, he/she might have problems recognizing which entry is selected.
The best use case would be to extract the value of the author/editor, the title, and the year/date field and only preview those values, for example, with the following format:
An author, Some kind of title, 2020
But thats fully optional, since it already works very good!
How are your deal with spelling on helix? Which tools are you use?
What do you mean exactly. Spell checking grammar or code?
The best use case would be to extract the value of the author/editor, the title, and the year/date field and only preview those values, for example, with the following format:
it's ease to be done
What do you mean exactly. Spell checking grammar or code?
grammar on text docs and grammar on code docs such as string literals and comments.
I'm currently use typos for coding, but want robust spell-check on "notes taking"
Nothing specific right now, since I'm still setting up Helix to work properly regarding my needs. I just switched recently from Emacs.
For prose there is vale
, but I haven't tested it so far. typos
also looks good. Have to try both
Just tried ltex-lsp
. It works really good and detects many typing and spelling errors in prose text; at least in German.
Not much is needed, only a simple setup in languages.toml
:
#ltex-ls
[language-server.ltex]
command = "/home/lukeflo/Documents/packages/ltex-ls-16.0.0/bin/ltex-ls"
[[language]]
name = "markdown"
roots = [".marksman.toml"]
language-servers = [ "marksman", "markdown-oxide", "scls", "ltex" ]
After setting it up, you can correct wrong spelling under the cursor with space a.
- fixed default regex to extract bib file path
- item highlighting (documentation on lsp terms) now formatted like
# {title:?}\n*{authors}*\n\n{entry_type}, {date}
- merged to main branch
- to enable features the scls must be compiled with
--features citation
cargo install --features citation --git https://github.com/estin/simple-completion-language-server.git
@estin Sorry for the delay, I was busy the last days.
Just built the updated main branch with your --features
flag. Works great! And looks great! I'm already using your LSP on a daily basis and think others will appreciate this feature too, since no other Helix plugin/extension handles Markdown citations yet.
I guess this is done and can be closed!