inbo/checklist

Should we check spelling?

Closed this issue · 5 comments

The spelling package has spell_check_package() and spell_check_files(). It allows to add use a custom WORDLIST in a project to handle false positives.

Should we implement automated spell checking by checklist? It should be rather straight forward for packages. spell_check_package() picks the language based on the setting in DESCRIPTION. So checking a non-English package is not a problem.

Automated checking non-package repositories would be a bit more work. I think we should only handle .Rmd and .md files. Something to tackle would be on how to specify the language. And how to handle repositories with multiple languages (e.g. README.md in English and Rmd files in Dutch).

If we want to have our package published in Zenodo, we have to give the language code in ISO 639-3 format in the DESCRIPTION, and apparently this spell check feature requires RFC 5646 format. Will it be possible to combine both in checklist, and how will you implement this?

Also: what about multiple languages in packages? In Dutch packages, I tend to add the package description and the README.md in English (and if appropriate also an English vignette) to give foreign language speakers the chance to at least know what the package is about. Will it be possible to use checklist for future packages with English README.md and DESCRIPTION that have to be in Dutch on request of the client? (I don't need all spelling to be checked, but it would be a pity if using checklist wouldn't be possible because of a spell check, I think.)

In general, I think a spell check could be a nice to have, that should only be implemented in checklist if it doesn't cause problems for other functionality, or for users that use other languages (Dutch, and maybe French). Personally I wouldn't mind if only English language is checked, as long as we can still use checklist for its other functionalities in packages on other languages. (In non-package repositories, I guess the use of Dutch is more relevant to be implemented as well.)

56aa3d4 allows both ISO 639-3 and RFC 5646 formats in the DESCRIPTION. And it translates RFC 5646 into ISO 639-3 in .zenodo.json.

Maybe use an optional spellcheck.yml where the user can list the default language, files with a different language or files to exclude from spell check. Multiple languages within a file seems like a bridge to far at the moment.

The only requirement for other languages than British and US English, is that you need to install the dictionary. We could install Dutch and French in the Docker. Other languages can be installed via the aptget option in the checklist GHA.

Could this be an 'opt-in' feature? I'm a bit concerned about the false positives and the time / effort it will take to create a whitelist. Also, RStudio already has spell-checking. That is sufficient for me, although - like other word processing tools - it doesn't force you to do something with misspelled words. I don't know if the cost-benefits of adding this to checklist really are worth it - both with respect to you as the maintainer of the package and to the users of the package.

I'm a bit concerned about the false positives

I had the same thought while following the issue, although I think this tradeoff is personal. What is an annoyance for one person, may be helpful to another. In a reviewing context, not needing to manually correct typos from an author is a bonus for the reviewer. However don't expect too much of a spell checker alone; there's still grammar as well 🙂.

The case of a project context (reports, papers) may be even more relevant than the package case, although for packages correct language is especially relevant in the documentation (while reading some text, my focus does get distracted in case of language errors).

However, the more an author spell-checks (re-reads) his/her own writings 'by nature', the more this author is annoyed by spell-checkers due to false positives. This may lead him/her to disabling the spell check—so yes I agree this should not be something too insisting. E.g. I tend to 'allow' a spell-checker only in case it is super-easy (e.g. right-clicking) to add words to the custom dictionary, and if this custom dictionary applies in a cross-project manner while still machine or user specific.

Maybe use an optional spellcheck.yml where the user can list the default language, files with a different language or files to exclude from spell check. Multiple languages within a file seems like a bridge to far at the moment.

This seems indeed a practical solution from the point of view of a package (or repo) maintainer, but it sounds a lot of work for you to implement. And then I wonder: is it worth the effort, or are their more essential jobs to tackle (first)? On the whole I think the spell check is a nice to have, but it doesn't seem 'essential', so it shouldn't require too much effort, not to implement, and not for the user.

Multiple languages within a file is maybe not necessary to implement? It would be nice if lines (or files?) could be marked as 'do not check spelling' to ignore citations in other languages. But I wouldn't care if not all text blocks were checked for spelling, when using multiple languages in one file. I would already be happy to be able to pass the checklist tests without difficulties, with or without a spell check of the multiple languages file.