I've had this vague idea that it would be really cool to convert simple HTML pages into PDFs via XSL-FO. The only free processor worth looking at for XSL-FO is FOP, so I've been experimenting with that. I started off with a simple HTML page (my Dreaded command line article minus the page layout table) and tried to turn it into something reasonable in PDF.
I managed to get basic stuff like paragraphs and headings working, but then I ran up against the inadequacies of the tools available. Images aren't really supported well enough. JPEG bitmaps work, but not PNG, and EPS works, but Acroread can't display them, although they do show up in printouts. I even tried using SVG now that I've discovered that recent versions of transfig can convert xfig diagrams to SVG, but that didn't work at all.
Tables are another big problem. FOP only supports the ‘fixed’ table layout mode, which means that you have to specify the width of each column in the table by hand. Come on now, even Netscape 4 could work out how wide to make the columns in tables!
FOP seems to be riddled with bugs. There are many situations—some
caused by bad input and some out-and-out bugs—which just make it
give up with a NullPointerException. In particularly extreme
cases it just dies printing ‘null’. There are very few
situations when it actually tells you what went wrong. Worst of all, a
lot of these bugs are affected by particular measurements in the page
layout, with some values working perfectly and others resulting in a
mysterious crash.
So for example I can have a bottom margin on my
‘region-body’ of 11mm or 16mm, but not anything in between.
I can have 1.2em of space before my main heading, but not 1.3em. It's
infuriating!
Anyway, If anyone's interested, the stuff I've done so far is available as a tar ball (html_to_pdf.tar.gz). The PDF output is also available (command_line.pdf). Hillariously, it takes 8–11 seconds for FOP to build this simple 5 page PDF on my relatively fast box.