Embedding equations in HTML - Part II

  • Published 2005-02-15 (3 years, 6 months ago)
  • Updated 2007-08-26 (1 year ago)

This is the second part of my article on how to easily embed mathematics in XHTML documents using LaTeX. In part one I discussed various aspects concerning browsers and markup. In part two I'll get technical and show you how I have implemented a solution in my CMS.

Prerequisites

I have used the following software:

  • A LaTeX distribution. LaTeX is included in most Linux/Unix distributions. For Windows users I recommend MikTeX. MikTeX has very useful package manager that makes it very easy to add utilities and LaTeX packages.
  • dvipng for converting dvi files to png or gif. The program is included with the MikTeX distribution.
  • Python for gluing it all together. I also use the excellent XML-library ElementTree from effbot.org.

If you prefer, you can easily implement the techniques described in this article using your favorite language/tool. It should not be that difficult.

Note that I use XHTML for writing my documents. This allows me to process my documents as XML, which greatly simplifies the processing.

Outline of the process

In part one I decided to embed LaTeX code between ordinary <div> and <span> tags. The equations must then be extracted from the document and replaced by images. The process can be summarized in these steps:

  1. Read source document and extract equations.
  2. Generate equations and save them as images.
  3. Insert generated images in the document
  4. Publish document

I'll dwell a bit on item one and two. However, I'll let the source code do most of the talking.

Extracting the equations

Extracting the equations is quite easy with a proper XML/XHTML-library. Python has good support for HTML and XML in the standard library, but I'm a bit lazy and prefer to use a higher level library like ElementTree. With ElementTree, extracting the equations can be done in a few lines of code:

from elementtree import ElementTree as et

source_filename = "test.xhtml"

# parse document
xhtmltree = et.parse(source_filename)

# find all elements with attribute class='eq' 
eqs = [element for element in xmltree.getiterator() 
       if element.get('class','')=='eq']
# equations are now available in the eqs[..].text variable

ElementTree also handles the encoding issues for us, and we can easily change the markup.

Generating the equations

We now have a list of elements containing LaTeX code. In order to render the equations, we need to make a LaTeX document and compile it. The next problem is to save them as images. This is where dvipng comes to the rescue:

This program makes PNG and/or GIF graphics from DVI files as obtained from TeX and its relatives. It produces high-quality images while its internals are tuned for speed. It supports PK, VF, PostScript, and TrueType fonts, color and PostScript inclusion.

Dvipng is a command line utility with many options for tuning the output. See the documentation for a full list of features. It basically saves each page in a dvi document as an image, which means that we should put each equation on a page of its own.

The final code

Below is the listing of the final program. It's also available for download as eqhtml.py.

"""A simple tool for embedding LaTeX in XHTML documents.

This script lets you embed LaTeX code between <div> and <span> tags. Example:
    <div class="eq>
      y = \int_0^\infty \gamma^2 \cos(x) dx 
    </div>
    <p> An inline equation <span class="eq">y^2=x^2+\alpha^2</span> here.</p>

The script extracts the equations, creates a temporary LaTeX document, 
compiles it, saves the equations as images and replaces the original markup 
with images.

Usage:
    python eqhtml.py source dest
    
Process source and save result in dest. Note that no error checking is 
performed. 
"""

from elementtree import ElementTree as et
import os, sys

# Include your favourite LaTeX packages and commands here
tex_preamble = r'''
\documentclass{article} 
\usepackage{amsmath}
\usepackage{amsthm}
\usepackage{amssymb}
\usepackage{bm}
\newcommand{\mx}[1]{\mathbf{\bm{#1}}} % Matrix command
\newcommand{\vc}[1]{\mathbf{\bm{#1}}} % Vector command 
\newcommand{\T}{\text{T}}                % Transpose
\pagestyle{empty} 
\begin{document} 
'''

imgpath = '' # path to generated equations. e.q 'img/'

# get source and dest filenames from command line
sourcefn = sys.argv[1]
destfn = sys.argv[2]
sourcefn_base = os.path.splitext(os.path.basename(sourcefn))[0]
# change working directory to the same as source's
cwd = os.getcwd()
os.chdir(os.path.abspath(os.path.dirname(sourcefn)))
sourcefn = os.path.basename(sourcefn)
texfn = sourcefn_base+'.tex'

print "Processing %s" % sourcefn
# load and parse source document
f = open(sourcefn)
xhtmltree = et.parse(f)
f.close()

# find all elements with attribute class='eq' 
eqs = [element for element in xhtmltree.getiterator() 
       if element.get('class','')=='eq']
# equations are now available in the eqs[..].text variable

# create a LaTeX document and insert equations
f = open(texfn,'w')
f.write(tex_preamble)
counter = 1
for eq in eqs:
    if eq.tag == 'span': # inline equation
        f.write("$%s$ \n \\newpage \n" % eq.text)
    else:
        f.write("\\[\n%s \n\\] \n \\newpage \n" % eq.text)
    # delete LaTeX code from the document tree, and replace
    # them by image urls.
    del eq.text
    imgname = "%seq%s%i.png" % (imgpath,sourcefn_base, counter)
    et.SubElement(eq,'img',src=imgname, alt='')
    counter += 1
# end LaTeX document    
f.write('\end{document}')
f.close()

# compile LaTeX document. A DVI file is created
os.system('latex %s' % texfn)

# Run dvipng on the generated DVI file. Use tight bounding box. 
# Magnification is set to 1200
cmd = "dvipng -T tight -x 1200 -z 9 -bg transparent " \
+ "-o %seq%s%%d.png %s" % (imgpath , sourcefn_base, sourcefn_base)
os.system(cmd) 

# Remove temporary files
os.remove(sourcefn_base+'.tex')
os.remove(sourcefn_base+'.log')
os.remove(sourcefn_base+'.aux')
os.remove(sourcefn_base+'.dvi')

os.chdir(cwd)

# Write processed source document to dest 
xhtmltree.write(destfn)

print "Done."

Update: A.M. Kuchling has written a Movable Type plugin, called mt-math, for writing equations in weblog entries. The work is derived from my code. He has added som interesting features to it, such as storing the images in html using the data: URL scheme.

Update: Jay Edwards has written htmlatex, a cool tool that does on-the-fly rendering of latex source in HTML documents. It's derived from A.M Kuchling's MT plugin and my simple script.

Display issues

A disadvantage with using bitmaps is that they don't scale with your document's text size. If you find the generated equations too large or too small, you can tweak the images with dvipng's -x magnification setting. The inline equations may also look a bit out of place. This issue can be fixed by adjusting the vertical-alignment property of the image with CSS. This is how I style my equations:

/* Center block equations */
div.eq {text-align:center;} 
/* Align inline equations with parents content area */
span.eq img{vertical-align:text-bottom;} 

Concluding remarks

That's it. I can now have fancy equations in my web pages. The technique can also be extended to embed arbitrary LaTeX code, which allows to include EPS graphics and other (La)TeX goodies.

Comments

  • #1 David O Kazmer, September 25, 2006 at 2:40 p.m.

    Nice, but could you or somebody else just provide a server that converts tex into tiff images?

  • #2 Kjell Magne Fauske, September 25, 2006 at 3:05 p.m.

    You could try mimeTeX. They offer a public server for rendering equations on the fly.

  • #3 Jan-Åke Larsson, September 26, 2006 at 3:10 p.m.

    I'm the author of dvipng. One recommendation: don't use the -x switch. Use the -D option, which allows direct control of the output resolution. For 120 dpi output, use -D 120. The default output is 100 dpi.

  • #4 Kjell Magne Fauske, September 26, 2006 at 3:34 p.m.

    @Jan-Åke

    Last time I tried to use the -D option it would not work. Probably because of an outdated installation of dvipng and LaTeX. I will give it a new try.

  • #5 Kjell Magne Fauske, October 10, 2006 at 8:08 a.m.

    Due to large amounts of comment spam I have disabled comments on this page.

  • #6 Kjell Magne Fauske, April 23, 2007 at 6:13 p.m.

    Comment are now enabled again. I hope the spam bots have forgotten me now.

    I stumbled upon latexmath2png today:

    A versatile program and Python module to allow conversion of LaTeX math equations in to PNG images.

    The author has borrowed some ideas from my simple script. Nice work.

  • #7 octalina, November 19, 2007 at 2:55 a.m.

    Very useful code :P

    Thanks

Comments are disabled for this entry