The
Mathematical Markup Language (
MathML) is a
XML language developed by the
World Wide Web Consortium (
W3C) that makes mathematical formulae embedding possible in
Web documents.
People willing to publish scientific documents on the Internet face a huge problem. The most common solutions adopted are ASCII art which yields huge formulae, creating PDF or DVI documents, which are not adapted to web browsing or converting mathematical formulae to images which has many drawbacks.
PDF/DVIs are not web documents and require a plugin or an external viewer that is more or less successfully integrated in the browser.
Moreover it is hard for web servers to generate dynamic PDF/DVIs whereas PHP, ASP, JSP or CGI can be installed in a glimpse. PDFs are good for printing, not for browsing.
Creating images of equations was the simplest and most portable way of including mathematical formulae in web documents. But it makes pages very heavy to load and renders printing and editing impossible.
Words of wisdom from mcc : "PDF has the additional problem that since it generally has no content-describing tags whatsoever, and many symbols are just represented as vector shapes, mathematical PDFs can sometimes be literal gibberish to the screen readers used by the blind.. MathML is great for the blind though, since it's all content tags."
A MathML mathematical formulae is a set of XML tags that describe its structure. Like with the classic HTML scheme, the rendering is done by the browser. There are no more images to save along with the document since the code describing the formulae is included in the document and there is no need for an external viewer since more and more browsers support MathML
Here is an overview of the Mathematical Markup Language (MathML) Version 2.0 - W3C Recommendation 21 February 2001.
Focus on Layout or on Ideas ?
Since the first public draft published by the W3C MathML has been divided in two ways of representing formulae :
Presentation Markup and Content Markup.
Presentation tags describe the expression structure, that is how you wish to have things displayed. Typical tags are mrow which means that elements should be displayed in a row, msup which displays elements in superscript or mfrac which displays a fraction bar.
For example the expression x2+y consists of three elements written in a row : a superscripted element
(x2), an operator (+) and an identifier (y) and thus would be coded :
<mrow>
<msup>
<mi>x</mi>
<mn>2</mn>
</msup>
<mo>+</mo>
<mi>y</mi>
</mrow>
On the contrary, content tags focus on what the formulae "means" and provides a set of mathematical function tags, such as plus or power that are applied to an expression (apply tag).
When authors use presentation tags alone, the meaning of a particular expression may not be clear to a processor that needs to produce something other than a visual rendering, such as a computer algebra system, or an audio renderer for example. To unambiguously encode the meaning of elementary math expressions, authors can use MathML content tags.
WD-math-970515 - 2.2 Using MathML Content Tags
The above expression x
2+y rather means "x power 2 plus y", which in
prefix notation would be written :
The content markup scheme is no more than a
XML adaptation of the
prefix notation used for example in
LISP,
ML or
Scheme. The above example would be written :
<mrow>
<apply>
<plus/>
<apply>
<power/>
<ci>x</ci>
<cn>2</cn>
</apply>
<ci>y</ci>
</apply>
</mrow>
Those two ways of
encoding the expression are fundamentally different because one focuses on the
layout of the expression and the other on its meaning.
It is wrong to think that presentation markup is a poor way of expressing formulae because
notations are very important in mathematics since they convey implicit ideas. The most probant example is probably the
Leibniz differential notation :
(∂x/∂y)*(∂y/∂z) "suggests" that it is equal to
∂x/∂z.
Suppose you wish to display sin-1. The presentation markup philosophy is to write "sin power -1" :
<msup>
<mi>sin</mi>
<mrow>
<mo>-</mo>
<mn>1</mn>
</mrow>
</msup>
whereas the content markup philosophy is to write "inverse(sin)" :
<apply>
<inverse/>
<sin/>
</apply>
However both presentation
markup and content
markup do yield the same graphic representation of the expression. From a viewer's point of view they are strictly equivalent.
In practice both representation schemes can be mixed but it is not used much since you are not actually writing MathML code by hand but using one of those powerful
mathematical formulae authoring tools available.
MathML bricks
Every MathML expression is based on three types of bricks : numbers, identifiers and operators. Distinguishing between them is important because they obey to different typesetting policies. Identifiers should be separated from operators with white space a bit larger than that between two consecutive letters.
An identifier is a variable name, a function name or a symbolic constant, for example 'a', 'Γ' and 'π' are typical identifiers. In presentation markup identifiers are represented by the mi tag. In content markup they are represented by the ci tag. Definitions of an identifier are a bit different though in content markup and presentation markup.
A number is a numeric value. Typical numbers are '13', '6.55957', '12,345', '4/5', '0x13D' and 'XIII'.
In presentation markup numbers are represented by the mn tag. In content markup they are represented by the cn tag. Content markup consider rationals as numbers whether presentation markup doesn't. Thus the following code must be used to represent the rational number 3/4 :
Content markup :
<cn type="rational">3<sep/>4</cn>
Presentation markup :
<mfrac type="rational">
<mn>3</mn>
<mn>4</mn>
</mfrac>
Operators are specific to presentation
markup. There is no need for operators in content markup since an
operator is a (
unary or
binary)
function. The MathML tag to code operators is
mo. Typical operators are '+', '-', 'XOR', '('.
TEX also uses parentheses as an operator. In MathML it can also be done by using the
fencing presentation tag.
Brief history
On 15 May 1997 the World Wide Web Consortium (W3C) rendered public the Mathematical Markup Language W3C Working Draft. This document sets up the different goals of MathML and defines a ground framework for developers. The MathML W3C group mainly consisted of
Stephen Buswell, Angel Diaz, Nico Poppelier, Bruce Smith, Neil Soiffer and Stephen Watt.
By January 1998 the MathML draft became stable enough for the W3C group to propose it, on 24 February 1998 as a
W3C Proposed Recommendation. The draft was finally adopted as a specification on 07 April 1998. The
Mathematical Markup Language (MathML) 1.0 Specification became a W3C Recommendation, that is, a stable specification suited for widespread deployment. This specification was left unmodified until the 1.01 revision of 7 July 1999.
By September 1999 the W3C MathML group was over 45 members and started developing version 2.0 of the specification. Many tags were deprecated (eg. reln) and others added (eg. csymbol, grad, xref...). But probably the biggest change is the behaviour of the apply tag. The Mathematical Markup Language (MathML) Version 2.0 W3C Working Draft was published 01 December 1999, became W3C Candidate Recommendation on 13 November 2000, W3C Proposed Recommendation on 08 January 2001 and W3C Recommendation on 21 February 2001.
The latest development version of MathML is the
Mathematical Markup Language (MathML) Version 2.0 (2nd Edition) W3C Working Draft published 19 December 2002.
How to learn and use MathML ?
The W3C web site is probably the best place to start. It has links to the specifications, tutorials, papers, handbooks and articles on MathML.
Unlike many other specifications, W3C specifications are descriptive and do explain things. They do not only consist of a list of all tags with all the different attributes (this information is present in appendices) but contain many illustrated examples which make them suitable for the newcomer.
http://www.w3c.org/
Today most browsers support MathML (Amaya (of course!),Mozilla, E-Lite, IBM techexplorer) and symbolic mathematical software such as Maple or Mathematica can read/write MathML.
MathML isn't supported by Microsoft Internet Explorer before version 5.5. The only way to have MathML support on 5.5 (and above versions that do not support MathML) is by installing the MathPlayer high-performance display engine plugin. This plugin is developed by Design Science (www.dessci.com) and is available for free.
MathML examples
Here is a short list of MathML examples to show how powerful is this language. You will need a MathML compliant browser in order to see the examples. Otherwise you will see crap.
Note :For the moment E2 is stripping out all MathML tags so you'll always see crap. I've contacted some gods here to see what they can do about it.
x
2
x
2
x
=
-
b
±
b
2
-
4
a
c
2
a
x
=
-
b
±
b
2
-
4
a
c
2
a
Closing words...
Content markup yields short code that is rather easy to read. However Presentation markup is sometimes more straightforward because it doesn't require functions to be written with prefix notation.
The rising solution to formulae embedding in web documents is MathML. Its syntax is well structured and supported by an increasing number of browsers. However using MathML is horrible because it is way too verbose and thus, people have to rely on mathematical formulae editing software which are either lacking features or non-affordable for an occasional use. Writing mathematical formulae using a TeX-like syntax is pretty straightforward and easy. It would be a great improvement to have a math html tag that supports this syntax.
Sources
http://www.w3.org/
Mathematical Markup Language (MathML) Version 2.0
W3C Recommendation 21 February 2001
http://www.dessci.com