/citation-file-format

A machine-readable and human-readable and -writable format for CITATION files. CITATION files provide reference and citation information for (research/scientific) software.

Primary LanguagePythonCreative Commons Attribution 4.0 InternationalCC-BY-4.0

Citation File Format (CFF)

DOI License: CC BY 4.0

The Citation File Format (CFF) is a human- and machine-readable file format in YAML 1.2 which provides citation metadata for software. The main website for CFF can be found at https://citation-file-format.github.io.

Example

If you want to make your software easily citable, you can put a file called CITATION.cff in the root of your repository. This file should provide at least the minimally necessary metadata to cite your software. An example:

cff-version: 1.1.0
message: If you use this software, please cite it as below.
authors:
  - family-names: Druskat
    given-names: Stephan
title: My Research Tool
version: 1.0.4
date-released: 2017-12-18

Specification

Format

CFF CITATION files must be named CITATION.cff.

CFF is implemented in YAML 1.2. For details, see the YAML 1.2 Specifications.

Formatting

CFF follows the formatting rules of YAML 1.2, of which one of the most important ones is that the colon (:) after a key should always be followed by a whitespace.

Structure is determined by indentation, i.e., lines containing nested elements must be indented by at least one whitespace character, although using at least two whitespaces as a standard for indentation preserves readability.

Value strings should be double-quoted, e.g. "string", especially when they contain YAML special characters, or special characters in general. These include:

: { } [ ] , & * # ? | - < > = ! % @ \

To check whether your YAML is correctly formatted, you can use http://www.yamllint.com/.

File structure

CITATION.cff files represent YAML 1.2 dictionaries ("maps") with the keys listed in the table below. Note that the order of the keys is arbitrary, and that most YAML linters will re-order the keys alphabetically.

The primary keys are used to specify

  • the version of CFF in use (cff-version);
  • a message which should be conveyed to the user of the software, along the lines of "If you use this software, please cite it as follows" (message);
  • the citation metadata for the software version itself, according to Smith et al., 2016, i.e., metadata that can be picked up in a CodeMeta JSON file;
  • optionally, a list of references which should be cited in different use cases or scopes, e.g., a software paper describing the abstract concepts of the software (references).

cff-version (required)

cff-version must specify the exact version of the Citation File Format that is used for the file.

cff-version: 1.1.0

message (required)

message must specify instructions to users on how to cite the software the CITATION.cff file is associated with.

message: "Please cite the following works when using this software."

Software citation metadata (required)

CFF provides the following keys for software citation metadata.

CFF key required CFF data type Description
abstract String A description of the software (version)
authors Collection of entity or person objects The author(s) of the software
commit String The commit hash or revision number of the software version
contact Collection of entity or person objects The contact person, group, company, etc. for the software version
date-released Date The release date of the software version
doi String The DOI of the work (not the resolver URL, i.e., 10.5281/zenodo.1003150, not http://doi.org/10.5281/zenodo.1003150)
identifiers Collection of identifier objects The persistent identifiers of the work (for identifiers that are not DOIs)
keywords Collection of strings Keywords pertaining to the software version
license SPDX License List Identifier string The license the software version is licensed under
license-url String (URL) The URL of the license text under which the software version is licensed (only for non-standard licenses not included in the SPDX License List)
repository String (URL) The URL to the software version in a repository (when the repository is neither a source code repository or a build artifact repository)
repository-code String (URL) The URL to the software version in a source code repository
repository-artifact String (URL) The URL to the software version in a build artifact/binary/release repository
title String The name of the software (may include a specific name for the software version)
url String (URL) The URL to a landing page/website for the software version
version String The version of the software

references (optional)

Provides an optional list of references pertaining to the software version, or the software itself, e.g., a dependency of the software, a software paper describing the abstract concepts of the software, a paper describing an algorithm that has been implemented in the software version, etc.

A reference item, i.e., an item in the list under references, must at least specify values for the following mandatory keys: type, authors, title.

type must specify the type of the referenced work. For a list of available values, cf. reference types.

authors must specify a list of entity or person objects.

title must specify the title of the referenced work.

Example:

cff-version: 1.0.3
message: "Please cite the following works when using this software."
...
references:
  - type: book
    authors:
      - ...
    title: The science of citation
  - type: software
    authors:
      - ...
    title: Software Citation Tool

Additionally, it can contain any further reference keys.

Reference keys

CFF defines the following reference keys.

CFF Key CFF Data Type Description
abbreviation String The abbreviation of the work
abstract String The abstract of a work
authors Collection of entity or person objects The author of a work
collection-doi String The DOI of a collection containing the work
collection-title String The title of a collection or proceedings
collection-type String The type of a collection
commit String The (e.g., Git) commit hash or (e.g., Subversion) revision number of the work
conference Entity object The conference where the work was presented
contact Collection of entity or person objects The contact person, group, company, etc. for a work
copyright String The copyright information pertaining to the work
data-type String The data type of a data set
database String The name of the database where a work was accessed/is stored
database-provider Entity object The provider of the database where a work was accessed/is stored
date-accessed Date The date the work has been last accessed
date-downloaded Date The date the work has been downloaded
date-published Date The date the work has been published
date-released Date The date the work has been released
department String The department where a work has been produced
doi String The DOI of the work
edition String The edition of the work
editors Collection of entity or person objects The editors of a work
editors-series Collection of entity or person objects The editors of a series in which a work has been published
end Integer The end page of the work
entry String An entry in the collection that constitutes the work
filename String The name of the electronic file containing the work
format String The format in which a work is represented
identifiers Collection of identifier objects The persistent identifiers of the work (for identifiers that are not DOIs)
institution Entity object The institution where a work has been produced or published
isbn String The ISBN of the work
issn String The ISSN of the work
issue Integer The issue of a periodical in which a work appeared
issue-date String The publication date of the issue of a periodical in which a work appeared - see note below
issue-title String The name of the issue of a periodical in which the work appeared
journal String The name of the journal/magazine/newspaper/periodical where the work was published
keywords Collection of strings Keywords pertaining to the work
languages Collection of ISO 639 language strings The language of the work
license License string The license under which a work is licensed
license-url String (URL) The URL of the license text under which a work is licensed
location Entity object The location of the work
loc-start Integer The line of code in the file where the work starts
loc-end Integer The line of code in the file where the work ends
medium String The medium of the work
month Integer The month in which a work has been published
nihmsid String The NIHMSID of a work
notes String Notes pertaining to the work
number String The accession number for a work
number-volumes Integer The number of volumes making up the collection in which the work has been published
pages Integer The number of pages of the work
patent-states Collection of strings The states for which a patent is granted
pmcid String The PMCID of a work
publisher Entity object The publisher who has published the work
recipients Collection of entity or person objects The recipient of a personal communication
repository String (URL) The repository where the work is stored
repository-code String (URL) The version control system where the source code of the work is stored
repository-artifact String (URL) The repository where the (executable/binary) artifact of the work is stored
scope String The scope of the reference, e.g., the section of the work it adheres to
section String The section of a work that is referenced
senders Collection of entity or person objects The sender of a personal communication
status Status string The publication status of the work
start Integer The start page of the work
term String The term being referenced if the work is a dictionary or encyclopedia
thesis-type String The type of the thesis that is the work
title String The title of the work
translators Collection of entity or person objects The translator of a work
type Reference types string The type of the work
url String (URL) The URL of the work
version String The version of the work
volume Integer The volume of the periodical in which a work appeared
volume-title String The title of the volume in which the work appeared
year Integer The year in which a work has been published
year-original Integer The year of the original publication

Notable reference keys

conference, database‑provider, institution, publisher

These keys take an entity object as value. Entity objects reference named entities and provide a fixed set of keys, such as name and contact information.

Example:

references:
  - type: book
    publisher:
      - name: PeerJ
        city: London
        country: GB
        website: https://peerj.com/

authors, contact, editors, editors-series, recipients, senders, translators

These keys take a collection of entity objects or person objects as value. Person objects provide a fixed set of keys to reference individuals, including a detailed set for specifiying personal names, an affiliation, etc.

Example:

references:
  - type: software
    authors:
      - family-names: Druskat
        given-names: Stephan
        orcid: https://orcid.org/0000-0003-4925-7248
        affiliation: "Humboldt-Universität zu Berlin"
        email: "mail@sdruskat.net"
        website: http://sdruskat.net
      - family-names: Beethoven
        name-particle: van
        given-names: Ludwig
      - family-names: Fernández de Córdoba
        given-names: Gonzalo
        name-suffix: Jr.
    ...

type, languages, status

These keys only take values from a defined set, cf. the respective sections:

license‑url, repository, repository-code, repository-artifact, url

These keys take URL strings as values. URLs will be validated by a regular expression, such as the one provided in a GitHub Gist by Diego Perini.

keywords

This key takes a collection of strings.

Example:

references:
  - type: software
    keywords:
      - linguistics
      - "multi-layer annotation"
      - web service
    ...

scope

A reference item can specify a more detailed scope for the reference, via the reference key scope. This key can be useful if certain references should only be cited under specific circumstances, e.g., only when a specific package of the software is used. In such a case, the package would ideally have its own CFF file, but if this is not possible for whatever reason, the scope key may come in handy.

For a discussion of this key, cf. issue citation-file-format/citation-file-format#15.

Example:

references:
  - scope: "Cite this paper when you run software X with flag --xy"
    type: article
    ...

issue-date

Specify the date of release of an issue. This key has been left as a plain string, rather than a formal date type, to allow for text values such as "November-December 2018".

For a discussion of this key, cf. issue citation-file-format/citation-file-format#48.

Exemplary uses

This section details exemplary use cases for some of the keys to avoid ambiguity/misuse.

abstract

  • If the work is a journal paper or other academic work: The abstract of the work.
  • If the work is a film, broadcast or similar: The synopsis of the work.

department

  • If the work is a thesis: The academic department where the thesis has been produced.
  • If the work is a government document: The governmental department which has issued the document.

format

  • If the work is a music file: The digital format in which a musical piece is saved, e.g., MP3.
  • If the work is a data set: The digital format in which the data set is saved.
  • If the work is a painting: The format of the painting, e.g., the width and height of the canvas.

institution

  • If the work is a report: The institution where the report has been produced.
  • If the work is a case: The court where a case has been held.
  • If the work is a blog post: The institution responsible for running the blog.
  • If the work is a patent, legal rule or similar: The issuing institution of the patent/rule.
  • If the work is a grant: The funding agency sponsoring the grant.
  • If the work is a thesis: The university where a thesis has been produced.
  • If the work is a statute: The institution or geographical unit which the statute adheres to.
  • If the work is a conference: The organisation which held the conference.

languages

  • If the work is a book: The language in which the book is written.

location

  • If the work is an artwork: E.g., the museum holding the work.
  • If the work is a historical work, illuminated manuscript or similar: The library or archive where the work is held.

medium

  • If the work is an artwork: The medium of the artwork, e.g., "photograph", "painting", "oil on canvas", etc.
  • If the work is a book or similar: Whether it is a printed book or an ebook.

month

  • If the work is a conference: The month in which the conference has been held.
  • If the work is a magazine article: The month in which the magazine issue containing the article has been published.

number

  • If the work is a conference paper: E.g., the submission number of the paper
  • If the work is a grant: The grant number provided by the funding agency.
  • If the work is a work of art: E.g., the catalogue number provided by a museum holding the artwork.
  • If the work is a report: The report number of a report.
  • If the work is a patent: The patent number of the work.
  • If the work is a historical work, illuminated manuscript or similar: The codex or folio number of a manuscript, or the library identifier for a manuscript.

term

  • If the work is a dictionary or encyclopedia: The term in the dictionary or encyclopedia that is being referenced.

title

  • If the work is a case: The name of the case (e.g., Name v. Name).

version

  • If the work is a software: The version of the referenced software.

Reference types

Reference type string Description
art A work of art, e.g., a painting
article
audiovisual
bill A legal bill
blog A blog post
book A book or e-book
catalogue
conference
conference-paper
data A data set
database An aggregated or online database
dictionary
edited-work An edited work, e.g., a book
encyclopedia
film-broadcast A film or broadcast
generic The fallback type
government-document
grant A research or other grant
hearing
historical-work A historical work, e.g., a medieval manuscript
legal-case
legal-rule
magazine-article
manual A manual
map A geographical map
multimedia A multimedia file
music A music file or sheet music
newspaper-article
pamphlet
patent
personal-communication
proceedings Conference proceedings
report
serial
slides Slides, i.e., a published slide deck
software Software
software-code Software source code
software-container A software container (e.g., a docker container)
software-executable An executable software, i.e., a binary/artifact
software-virtual-machine A virtual machine/vm image
sound-recording
standard
statute
thesis An academic thesis
unpublished
video A video recording
website

Objects

Entity objects

Entity objects can represent different types of entities, e.g., a publishing company, or conference. In CFF, they are realized as collections with a defined set of keys. Only the key name is mandatory.

Entity key Entity data type optional
name String
address String
city String
region String
post-code String
country String
orcid String (ORCID URL)
email String
tel String
fax String
website String (URL)
date-start Date
date-end Date
location String

Exemplary uses

address

  • To be used for street names and house numbers, etc.

region

  • To be used for, e.g., states (as in US states or German federal states).

post-code

  • The post code or zip code of an address.

country

  • The ISO 3166-1 alpha-2 country code for a country. A list of ISO 3166-1 alpha-2 codes can be found at Wikipedia:ISO 3166-1.

Example:

references:
  - type: book
    publisher:
      - name: PeerJ
        city: London
        country: GB

date-start and date-end

  • The start and end date of, e.g., a conference. This must be formatted according to ISO 8601, e.g., YYYY-MM-DD, or 2017-10-04T16:20:57+00:00.

orcid

The ORCID iD is expressed as an https URI, i.e. the 16-digit identifier is preceded by https://orcid.org/. A hyphen is inserted every 4 digits of the identifier to aid readability (See https://support.orcid.org/knowledgebase/articles/116780-structure-of-the-orcid-identifier, section "Expressing the ORCID iD").

Example:

orcid: https://orcid.org/0000-0001-2345-6789

Person objects

A person object represents a person. In CFF, person objects are realized as collections with a defined set of keys, of which only family-names and given-names are mandatory.

Person key Person data type optional
family-names String
given-names String
name-particle String
name-suffix String
affiliation String
address String
city String
region String
post-code String
country String
orcid String (ORCID URL)
email String
tel String
fax String
website String (URL)

Exemplary uses

Name keys

CFF aims to implement a culturally neutral model for personal names, according to the suggestions on splitting personal names by the W3C and the implementation of personal name splitting in BibTeX (Hufflen, 2006).

To this end, CFF provides four generic keys to specify personal names:

  1. Values for family-names specify family names, including combinations of given and patronymic forms, such as Guðmundsdóttir or bin Osman; double names with or without hyphen, such as Leutheusser-Schnarrenberger or Sánchez Vicario. It can potentially also specify names that include prepositions or (nobiliary) particles, especially if they occur in between family names such as in Spanish- or Portuguese-origin names, such as Fernández de Córdoba.
  2. Values for given-names specify given and any other names.
  3. Values for name-particle specify nobiliary particles and prepositions, such as in Ludwig van Beethoven or Rafael van der Vaart.
  4. Values for name-suffix specify suffixes such as Jr. or III (as in Frank Edwin Wright III).

Note that these keys may still not be optimal for, e.g., Icelandic names which do not have the concept of family names, or Chinese generation names, but the alternative is highly localized customization, which would be counterintuitive as to CFF's goal to be easily accessible. Thus, it is ultimately the task of CFF file authors to find the optimal name split in any given case.

affiliation

  • To specify the affiliation of a person, e.g., a university, research centre, etc.

Address keys

orcid

Identifier objects

An identifier object represents a persistent identifier. In CFF, identifier objects are realized as collections with two defined keys, both mandatory.

Identifier key Identifier data type optional
type String (Identifier type string)
value String

Exemplary uses

A Software Heritage identifier

identifiers:
  - type: "swh"
    value: "swh:1:rel:99f6850374dc6597af01bd0ee1d3fc0699301b9f"

An identifier unknown to CFF

identifiers:
  - type: "other"
    value: "my-custom-identifier-1234"

Specified value strings

The keys status, license, languages, and identifier:type can only take values from a fixed set of strings. These are specified below.

Status strings

Works can have a different status of publication, e.g., journal papers. CFF specifies the following value strings for the key status.

Status (String) Description
in-preparation A work in preparation, e.g., a manuscript (covers drafts)
abstract The abstract of a work
submitted A work that has been submitted for publication
in-press A work that has been accepted for publication but has not yet been published
advance-online A work that has been published online in advance of publication in the target medium
preprint A work that has been published as a preprint before peer review

For a work that is complete and has been published, leave status unset.

License strings

License strings must conform with the SPDX Licenses list, i.e., a license must be specified via the short identifier from the list. If a license is not included in the SPDX Licenses list, the license-url should be provided as a fallback.

Example:

references:
  - type: software
    authors:
      - ...
    title: My Research Tool
    license: Apache-2.0
  - type: software
    authors:
      - ...
    title: Obscure Research Tool
    license-url: http://r3s34archs0ft.com/eula

Language strings

Natural languages as a value for the key languages are specified via their respective 3-character ISO 639-3 code. A list of ISO 639-3 codes in maintained at Wikipedia:List of ISO 639-3 codes. Alternatively, a language's 2-character ISO 639-1 code may be used. A list of ISO 639-1 codes is maintained at Wikipedia:List of ISO 639-1 codes.

Example for a work in both English and Daakaka:

references:
  - type: book
    ...
    languages:
      - en
      - bpa

Identifier type strings

The key

identifiers:
  - type

can only take the following values:

  • doi: Signifies that the value string of the identifier typed thus is a valid DOI.
  • url: Signifies that the value string of the identifier typed thus is a valid URL.
  • swh: Signifies that the value string of the identifier typed thus is a valid Software Heritage identifier.
  • other: Signifies that the value string of the identifier typed thus is a valid identifier not currently known to CFF.
    If you want to add an identifier type to CFF, please create a new issue on the CFF GitHub repository, and suggest a name for the identifier, and ideally also describe its format as a valid regex.

Examples for valid identifiers:

identifiers:
  - type: "other"
    value: "other-schema://abcd.1234.efgh.5678"
  - type: "swh"
    value: "swh:1:rel:99f6850374dc6597af01bd0ee1d3fc0699301b9f"

Schema

CFF CITATION.cff files can be validated against a schema which is available at https://github.com/citation-file-format/citation-file-format/blob/master/schema.yaml.

Contributions

Contributions to the format specifications are welcome! For details on how to contribute, please refer to the contributing guidelines for CFF at https://github.com/citation-file-format/citation-file-format/blob/master/CONTRIBUTING.md.