Formulas in Markdown

A syntax for formulas is out of scope of the markdown syntax. However, there are good tools available for adding math capabilities on top of markdown. The goal of Scholarly Markdown is to recommend both best practices and tools for this purpose.

Why are formulas challenging?

The de-facto standard syntax for formulas is (La)TeX. However, parsing TeX is a Turing-complete problem. Of course there are subsets of TeX that are much easier to parse and cover most use cases, but there is no consensus on what such a subset is. Instead of trying to fix such a grammar as part of markdown, it is better practice to leave this task to specialized tools for math typesetting.

To successfully compose markdown processors with external tools for handling formulas, however, it is important to have standard libraries available that handle the job of recognizing which parts of a markdown document constitute a formula. Unfortunately, even this task is not as simple as labeling "everything between $...$" as a formula. An practical example that shows some of the difficulties is the following:

$\left\{ x^*\in\mathbb{R}^n \middle| \sum_{i=1}^n x^*=1 \text{ and } x^*_i \leq x^*_j\text{ for all $i$ less than $j$}\right\}$

Note that unescaped $ appear inside the formula. Note also that ranges like *...* that are typically handled by markdown processor appear inside the formula.

What tools are available?

On the server side, Pandoc is an excellent markdown processor that handles formulas by way of LaTeX.

On the client side the situation is more difficult. There are a number of Javascript markdown parsers available. Also, there are tools like MathJax and jqMath that typeset math inside the web browser. However, these currently do not compose well.

The following are some recommendations for converting markdown including TeX formulas into HTML.

  1. If LaTeX is available on the server and JavaScript is not available on the client, then a good practice is to use Pandoc and LaTeX on the server to produce static HTML pages with formulas embedded as images.
  2. If LaTeX is available on the server and JavaScript is available on the client, an alternative is to use Pandoc to convert markdown+TeX into HTML+MathML and use MathJax on the client to typeset MathML if the browser does not support MathML. The advantage of this method over 1) is that formulas are more accessible.
  3. If markdown and TeX conversion are supposed to happen on the client, a good practice is to first typeset formulas using MathJax and then apply a JavaScript markdown library to turn markdown markup into HTML. While the output is sometimes not 100% correct, this generally produces much better results than doing it the other way around.

Future Steps

Future goals of scholarly markdown regarding the support of formulas are:

  • to provide a small library that can recognize TeX formulas inside of markdown documents to ease composability of different tools,
  • to provide reference implementations of markdown processors that integrate well with available mathematical software.

Block Environments

Block environments, e.g., for theorems, lemmas and proofs, are an important of many mathematical documents. These block environments typically stretch over several paragraphs. Markdown syntax does not provide a standard facility for delimiting block environments (other than XML tags) and it is not the goal of scholarly markdown to define one at this point. Other file formats may be better suited to encode block structure.

Further Topics

  • Be able to label equations.
  • Be able to specify anchors.
  • Markdown should be able to recognize variables defined in a preamble.