Papers, reports and scholarly documents need information that provides basic metadata: title, authors, institutions, etc. There are two goals of the metadata, to allow for formatting of the paper, and to supply the information to machine readers of the document. There is discussion of some of the requirements for Scholarly Markdown in a blog post by Martin Fenner, and this is elaborated further below.
The standard way of adding metadata to markdown is through a YAML metadata block at the start of the file. The core information is the title and authors, and ideally the metadata should be able to specify enough information to recreate the citation of the article or report. The recommendation is that the accepted YAML should follow Citation Style Language JSON schema. This covers a wide range of requirements.
Here is a paper on dolphin bycatch published in PLoS. A simple YAML header for a scholarly markdown version of this document might be:
--- type: article title: Common dolphin *Delphinus delphis* bycatch in New Zealand commercial trawl fisheries author: - family: Thompson given: Finlay N. - family: Abraham given: Edward R. - family: Berkenbusch given: Katrin ---
Note that this article needs formatting in the title, to handle the italics needed for the species name. In typesetting of the article, these fields are treated as markdown. This is close to the format used in the blog post, however we use the
type field, rather than the
layout field in the blog post, as
type is in the CSL JSON specification.
As this particular article has now been published, the full citation could be added to the YAML, to give a record like the one below. The CSL specification is verbose, and some of the fields read strangely (for example, we have a
container-title field rather than a common word such as
journal, and the publication date is a structured list), so this gets to the point where the writeability is being lost. However, we are now able to have a single file with all the publication metadata, and the publication source. From structured data such as this, we would be able to build a publication database, as well as generate our documents.
--- type: article-journal title: Common dolphin *Delphinus delphis* bycatch in New Zealand commercial trawl fisheries author: - family: Thompson given: Finlay N. - family: Abraham given: Edward R. - family: Berkenbusch given: Katrin container-title: PloS one volume: 8 number: 5 page: e64438 doi: 10.1371/journal.pone.0064438 published: date-parts: - - 2013 keyword: - dolphin - bycatch ---
The author information may be extended to include fields that are typically used in publications
- email: email address
- tel: primary telephone number (the name is chosen from the hcard format)
- url: a url giving further information on the author
- orcid: the ORCID of the author. If specified, this would provide a canonical source for all the other author information. Implementing this would require querying the ORCID API.
- affiliation: the id of the author's organization
The proposal here is that the organizations are listed as separate metadata blocks, with the
id field used to reference each author to their corresponding affiliation. When formatting the article, these may be turned into footnotes (depending on the journal style). The organization fields are:
- id (required): the id of the organization
- name: the name of the organization
- address: a text address
- url: website
There are more structured approaches to the address data in particular (the h-card microformat could be used as a guide here), however more structure would make entering the data more cumbersome.
- abstract: The CSL JSON has a field for the abstract. The simple approach is to put the entire markdown for the abstract into the YAML. This has the disadvantage of bulking up metadata with multiline YAML, which may contain paragraph breaks. A suggestion is that this field specifies a section title, with the contents of that section being assumed to be the abstract. By default, this section is called 'abstract', and in this case the abstract field does not need to be specified.
- bibliography: filename of the bibliography (bibtex, yaml, or other understood formats), or else a YAML block that contains all the references.
- csl: filename of the Citation Style Language file that is used to specify the layout of the references
Putting this together, a YAML header suitable for a manuscript now looks as follows:
--- type: article title: Common dolphin *Delphinus delphis* bycatch in New Zealand commercial trawl fisheries author: - family: Thompson given: Finlay N. email: email@example.com tel: +64 4 385 9285 affiliation: 1 - family: Abraham given: Edward R. affiliation: dragonfly - family: Berkenbusch given: Katrin affiliation: 1 organization: - id: 1 name: Dragonfly Science address: PO Box 27535, Wellington 6141, New Zealand url: http://www.dragonfly.co.nz bibliography: dolphins.bib csl: plos.csl keyword: - dolphin - bycatch ---