bids-standard/bids-specification

Allow CITATION.cff as alternative to Authors field in dataset_description

Remi-Gau opened this issue ยท 24 comments

CITATION.cff can be used for citing software or datasets.

Would it make sense to allow them officially in a BIDS dataset ? What do you all think?

Its content would be in part redundant with dataset_description and thus might require validation for internal consistency.


Links

https://citation-file-format.github.io/

https://docs.github.com/en/repositories/managing-your-repositorys-settings-and-features/customizing-your-repository/about-citation-files

https://docs.github.com/en/repositories/managing-your-repositorys-settings-and-features/customizing-your-repository/about-citation-files#citing-a-dataset

tsalo commented

Do you know if CITATION.cff can include multiple citations? E.g., citing the versioned dataset and a data paper?

i thought that is used for software only? ie we should have one in our BIDS repo

i thought that is used for software only? ie we should have one in our BIDS repo

created an example

https://github.com/Remi-Gau/cff_example_data

YOUR_NAME_HERE, Y., & Lisa, M. (2021). cff_example_data (Version 1.0.0) [Data set]. https://doi.org/10.5281/zenodo.1234

@misc{YOUR_NAME_HERE_cff_example_data_2021,
author = {YOUR_NAME_HERE, YOUR_NAME_HERE and Lisa, Mona},
doi = {10.5281/zenodo.1234},
month = {10},
title = {{cff_example_data}},
url = {https://github.com/Remi-Gau/cff_example_data},
year = {2021}
}

Do you know if CITATION.cff can include multiple citations? E.g., citing the versioned dataset and a data paper?

Testing things here

https://github.com/Remi-Gau/cff_example_software

YOUR_NAME_HERE, Y., & Lisa, M. (2021). cff_example_software (Version 1.0.0) [Computer software]. https://doi.org/10.5281/zenodo.1234

@software{YOUR_NAME_HERE_cff_example_software_2021,
author = {YOUR_NAME_HERE, YOUR_NAME_HERE and Lisa, Mona},
doi = {10.5281/zenodo.1234},
month = {10},
title = {{cff_example_software}},
url = {https://github.com/Remi-Gau/cff_example_software},
version = {1.0.0},
year = {2021}
}

Do you know if CITATION.cff can include multiple citations? E.g., citing the versioned dataset and a data paper?

Updated the software example to use the preferred citation feature.

ok @Remi-Gau smarty pants you win :-)
so it's all possible - the questions are

  • what is the advantage over the current solution (all in dataset_description right?)
  • what is the technical support needed

ok @Remi-Gau smarty pants you win :-) so it's all possible - the questions are

* what is the advantage over the current solution (all in dataset_description right?)

Their schema does offer a few things we don't have.
https://github.com/citation-file-format/citation-file-format/blob/main/schema-guide.md#index

Could also allow a "division of labor": typical dataset info goes in CITATION.cff, BIDS specific info goes in dataset.

This could also potentially better integrate with other non-BIDS tools and services (at the moment "only" github, zenodo, zotero).

FYI I am not really convinced that this should be done. Just wanted to start a conversation to weight the pros and cons. (And advertise CFF files in case it could interest people for other things).

* what is the technical support needed

there is a python validator for those files and there a json schema already, that could be used for other validations

https://github.com/citation-file-format/citation-file-format/blob/main/README.md#validation-heavy_check_mark

From the BIDS perspective we would have to ensure consistency between dataset_description and those .cff files.

My personal opinion on this is that we should wait how CITATION.CFF develops in the next months / year / years and then revive the discussion. If we see that it becomes very important and widespread (which I hope it does), we should officially adopt it. Until then, users can add it, and bids-ignore it ... as is already done for many BIDS datasets on GIN and the datacite.yml file there. E.g., https://gin.g-node.org/sappelhoff/mpib_sp_eeg/

Until then, one could also write a dataset_description.json to CITATION.CFF converter. I think I recently saw such a converter from BIDS to datacite.yml on Twitter. @adswa might know more about that :-)

My personal opinion on this is that we should wait how CITATION.CFF develops in the next months / year / years and then revive the discussion.

yup I think that sums up why this is not a hill I want to die on just yet.

but we could still use one inside https://github.com/bids-standard/bids-specification with all relevant publications :-) so it renders nice on github (ie we don't support it for datasets, but use it for the repo)

but we could still use one inside https://github.com/bids-standard/bids-specification with all relevant publications :-) so it renders nice on github (ie we don't support it for datasets, but use it for the repo)

Agreed!

I suggest we revisit this for the BIDS repo after the steering group election because we'll have updated our list of contributors by then and .cff could also help us to do that but it will have to take into account suggestions from #66 and #627

adswa commented

I think I recently saw such a converter from BIDS to datacite.yml on Twitter. @adswa might know more about that :-)

@christian-monch wrote one during a hackathon, I believe the most recent state of it can be found here :-)

@christian-monch wrote one during a hackathon, I believe the most recent state of it can be found here :-)

Had forgotten about this WIP while I started creating a package to streamline the creation of datacite.yml file for BIDS dataset...

https://github.com/Remi-Gau/bids2cite

@Remi-Gau: Should BIDS support CITATION.cff files ?

Yes.

@CPernet: what is the advantage over the current solution (all in dataset_description right?)

The Authors list is just list of strings. There is a lot more nuance to authorship than just a name. Like a whole file-format's worth! And GitHub, Zenodo, and Zotero are supporting CITATION.cff. And there is a user-friendly tool to make CITATION.cff files.

@CPernet: what is the technical support needed

  1. A PR to the BIDS Specification to include language about using either a CITATION.cff or the Authors list, but not both.
  2. Work on the validator (I do not know how or what exactly) to say one or the other is allowed, but not both.

@Remi-Gau: FYI I am not really convinced that this should be done. Just wanted to start a conversation to weight the pros and cons.

I think this should be done. The pros seem to outweigh the cons.

@sappelhoff commented on Oct 21, 2021
My personal opinion on this is that we should wait how CITATION.CFF develops in the next months / year / years and then revive the discussion.

It's been years and it looks good to me!

  1. Work on the validator (I do not know how or what exactly) to say one or the other is allowed, but not both.

In the schema, we would write a rule like:

SingleSourceAuthors:
  issue:
    code: AUTHORS_AND_CITATION_FILE_MUTUALLY_EXCLUSIVE
    level: error
    message: |
      CITATION.cff file found. The "Authors" field of dataset_description.json
      should be removed to avoid inconsistency.
  selectors:
    - path == 'CITATION.cff'
  checks:
    - '!("Authors" in dataset_description)'

I would not be inclined to also implement this in the legacy validator.

Unfortunately, CFF does not have a Javascript validator, just Python. They do share JSON schemas though, so it wouldn't be awful to validate ourselves: https://github.com/citation-file-format/cff-converter-python/tree/main/cffconvert/schemas

nellh commented

I agree this change would be very helpful for including more complete authorship information in BIDS datasets. It's an issue for OpenNeuro and a BIDS solution would let us add this to datasets in a way that allowed for reuse.

Unfortunately, CFF does not have a Javascript validator, just Python. They do share JSON schemas though, so it wouldn't be awful to validate ourselves: https://github.com/citation-file-format/cff-converter-python/tree/main/cffconvert/schemas

The CFF Initializer tool @ericearl mentioned has a simple JavaScript validator implementation. https://github.com/citation-file-format/cff-initializer-javascript/blob/main/src/store/validation.ts

I had worked on a little package to help create citation files for bids datasets because they can also be ingested by datalad metadata tools.

Having the citation file take precedence and not having to synch with the dataset description would make things even easier.

https://github.com/Remi-Gau/bids2cite

Looking at https://github.com/citation-file-format/citation-file-format/blob/main/README.md, we have additional overlaps with dataset_description.json:

BIDS CFF
HowToAcknowledge message/preferred-citation
Name title
Authors authors
Version version
ReferencesAndLinks references
DatasetDOI doi
License license

We may want to make more than just authors mutually exclusive with CITATION.cff. I think at least for name and version we should probably just duplicate and validate identity.

Also, authors have no role at this point (citation-file-format/citation-file-format#112). While highly desirable, this is also not currently possible in BIDS, so CITATION.cff is still an upgrade.

Wait... Technically we only have bids version in dataset description and not version, right?
The only "trace" of a the version of the dataset is in the changelog if it is present. Or maybe I missed it somewhere else?
So in that sense citation.cff would actually add a way to track this.

Ah, sorry, I didn't actually look it up. I guess I was thinking of it being part of DOIs in many cases.

Contribution roles will be included in the next release!

citation-file-format/citation-file-format#112 (comment)

Contribution roles will be included in the next release!

citation-file-format/citation-file-format#112 (comment)

I saw that and got all excited about it!

Please see #1525 for proposed text and validation rules.