Allow CITATION.cff as alternative to Authors field in dataset_description
Remi-Gau opened this issue ยท 24 comments
CITATION.cff can be used for citing software or datasets.
Would it make sense to allow them officially in a BIDS dataset ? What do you all think?
Its content would be in part redundant with dataset_description and thus might require validation for internal consistency.
Links
Do you know if CITATION.cff can include multiple citations? E.g., citing the versioned dataset and a data paper?
i thought that is used for software only? ie we should have one in our BIDS repo
i thought that is used for software only? ie we should have one in our BIDS repo
created an example
https://github.com/Remi-Gau/cff_example_data
YOUR_NAME_HERE, Y., & Lisa, M. (2021). cff_example_data (Version 1.0.0) [Data set]. https://doi.org/10.5281/zenodo.1234
@misc{YOUR_NAME_HERE_cff_example_data_2021,
author = {YOUR_NAME_HERE, YOUR_NAME_HERE and Lisa, Mona},
doi = {10.5281/zenodo.1234},
month = {10},
title = {{cff_example_data}},
url = {https://github.com/Remi-Gau/cff_example_data},
year = {2021}
}
Do you know if CITATION.cff can include multiple citations? E.g., citing the versioned dataset and a data paper?
Testing things here
https://github.com/Remi-Gau/cff_example_software
YOUR_NAME_HERE, Y., & Lisa, M. (2021). cff_example_software (Version 1.0.0) [Computer software]. https://doi.org/10.5281/zenodo.1234
@software{YOUR_NAME_HERE_cff_example_software_2021,
author = {YOUR_NAME_HERE, YOUR_NAME_HERE and Lisa, Mona},
doi = {10.5281/zenodo.1234},
month = {10},
title = {{cff_example_software}},
url = {https://github.com/Remi-Gau/cff_example_software},
version = {1.0.0},
year = {2021}
}
Do you know if CITATION.cff can include multiple citations? E.g., citing the versioned dataset and a data paper?
Updated the software example to use the preferred citation
feature.
ok @Remi-Gau smarty pants you win :-)
so it's all possible - the questions are
- what is the advantage over the current solution (all in dataset_description right?)
- what is the technical support needed
@tsalo It seems you can have several identifiers.
https://github.com/citation-file-format/citation-file-format/blob/main/schema-guide.md#identifiers
ok @Remi-Gau smarty pants you win :-) so it's all possible - the questions are
* what is the advantage over the current solution (all in dataset_description right?)
Their schema does offer a few things we don't have.
https://github.com/citation-file-format/citation-file-format/blob/main/schema-guide.md#index
Could also allow a "division of labor": typical dataset info goes in CITATION.cff, BIDS specific info goes in dataset.
This could also potentially better integrate with other non-BIDS tools and services (at the moment "only" github, zenodo, zotero).
FYI I am not really convinced that this should be done. Just wanted to start a conversation to weight the pros and cons. (And advertise CFF files in case it could interest people for other things).
* what is the technical support needed
there is a python validator for those files and there a json schema already, that could be used for other validations
From the BIDS perspective we would have to ensure consistency between dataset_description and those .cff files.
My personal opinion on this is that we should wait how CITATION.CFF develops in the next months / year / years and then revive the discussion. If we see that it becomes very important and widespread (which I hope it does), we should officially adopt it. Until then, users can add it, and bids-ignore it ... as is already done for many BIDS datasets on GIN and the datacite.yml
file there. E.g., https://gin.g-node.org/sappelhoff/mpib_sp_eeg/
Until then, one could also write a dataset_description.json
to CITATION.CFF
converter. I think I recently saw such a converter from BIDS to datacite.yml
on Twitter. @adswa might know more about that :-)
My personal opinion on this is that we should wait how CITATION.CFF develops in the next months / year / years and then revive the discussion.
yup I think that sums up why this is not a hill I want to die on just yet.
but we could still use one inside https://github.com/bids-standard/bids-specification with all relevant publications :-) so it renders nice on github (ie we don't support it for datasets, but use it for the repo)
but we could still use one inside https://github.com/bids-standard/bids-specification with all relevant publications :-) so it renders nice on github (ie we don't support it for datasets, but use it for the repo)
Agreed!
I think I recently saw such a converter from BIDS to datacite.yml on Twitter. @adswa might know more about that :-)
@christian-monch wrote one during a hackathon, I believe the most recent state of it can be found here :-)
@christian-monch wrote one during a hackathon, I believe the most recent state of it can be found here :-)
Had forgotten about this WIP while I started creating a package to streamline the creation of datacite.yml file for BIDS dataset...
@Remi-Gau: Should BIDS support CITATION.cff files ?
Yes.
@CPernet: what is the advantage over the current solution (all in dataset_description right?)
The Authors list is just list of strings. There is a lot more nuance to authorship than just a name. Like a whole file-format's worth! And GitHub, Zenodo, and Zotero are supporting CITATION.cff. And there is a user-friendly tool to make CITATION.cff files.
@CPernet: what is the technical support needed
- A PR to the BIDS Specification to include language about using either a CITATION.cff or the Authors list, but not both.
- Work on the validator (I do not know how or what exactly) to say one or the other is allowed, but not both.
@Remi-Gau: FYI I am not really convinced that this should be done. Just wanted to start a conversation to weight the pros and cons.
I think this should be done. The pros seem to outweigh the cons.
@sappelhoff commented on Oct 21, 2021
My personal opinion on this is that we should wait how CITATION.CFF develops in the next months / year / years and then revive the discussion.
It's been years and it looks good to me!
- Work on the validator (I do not know how or what exactly) to say one or the other is allowed, but not both.
In the schema, we would write a rule like:
SingleSourceAuthors:
issue:
code: AUTHORS_AND_CITATION_FILE_MUTUALLY_EXCLUSIVE
level: error
message: |
CITATION.cff file found. The "Authors" field of dataset_description.json
should be removed to avoid inconsistency.
selectors:
- path == 'CITATION.cff'
checks:
- '!("Authors" in dataset_description)'
I would not be inclined to also implement this in the legacy validator.
Unfortunately, CFF does not have a Javascript validator, just Python. They do share JSON schemas though, so it wouldn't be awful to validate ourselves: https://github.com/citation-file-format/cff-converter-python/tree/main/cffconvert/schemas
I agree this change would be very helpful for including more complete authorship information in BIDS datasets. It's an issue for OpenNeuro and a BIDS solution would let us add this to datasets in a way that allowed for reuse.
Unfortunately, CFF does not have a Javascript validator, just Python. They do share JSON schemas though, so it wouldn't be awful to validate ourselves: https://github.com/citation-file-format/cff-converter-python/tree/main/cffconvert/schemas
The CFF Initializer tool @ericearl mentioned has a simple JavaScript validator implementation. https://github.com/citation-file-format/cff-initializer-javascript/blob/main/src/store/validation.ts
I had worked on a little package to help create citation files for bids datasets because they can also be ingested by datalad metadata tools.
Having the citation file take precedence and not having to synch with the dataset description would make things even easier.
Looking at https://github.com/citation-file-format/citation-file-format/blob/main/README.md, we have additional overlaps with dataset_description.json
:
BIDS | CFF |
---|---|
HowToAcknowledge |
message /preferred-citation |
Name |
title |
Authors |
authors |
Version |
version |
ReferencesAndLinks |
references |
DatasetDOI |
doi |
License |
license |
We may want to make more than just authors mutually exclusive with CITATION.cff
. I think at least for name and version we should probably just duplicate and validate identity.
Also, authors have no role
at this point (citation-file-format/citation-file-format#112). While highly desirable, this is also not currently possible in BIDS, so CITATION.cff is still an upgrade.
Wait... Technically we only have bids version in dataset description and not version, right?
The only "trace" of a the version of the dataset is in the changelog if it is present. Or maybe I missed it somewhere else?
So in that sense citation.cff would actually add a way to track this.
Ah, sorry, I didn't actually look it up. I guess I was thinking of it being part of DOIs in many cases.
Contribution roles will be included in the next release!
Contribution roles will be included in the next release!
I saw that and got all excited about it!