Server side pdf generation from html
I want to generate PDF documents from HTML on the server. Is that so much to ask? There are a few strategies to do this, but I found them all to be laking one way or another. Any tool that isn't using a robust rendering engine is right out. They just can't handle the complexity of the HTML that I'm working with. Ok, so we need to use a real browser. The closest thing was probably khtml2png, but it's more of a cool hack than a useable tool (page size and length are actually important).
Firefox has supported printing to postscript files for as long as I can remember. Exploiting this functionality seemed like the path of least resistance. And now that Firefox 3 has the ability to print to pdf natively I thought I would give it another shot.
The only real question was how to interface nicely with Firefox. This weekend I found my solution: JSSh.
JSSh is a javascript shell server for mozilla. Basically it allows you to remotely control a mozilla session. Pretty cool. Using this along with a few tweaks to a Firefox profile gets me pretty close to a decent solution. So let me walk you through exactly what I did.
I'm running a Linux server without X. Since Firefox needs some graphical environment to run in, I'm using Xvfb. I would suggest trying this first on a machine running X so you can see whats happening.
The first step is to build Firefox with support for jssh. The is a plugin but it isn't compatible with FF3 and it didn't work for me in FF2. So lets build it from source. Get the latest Firefox source tarball from here ftp://ftp.mozilla.org/pub/mozilla.org/firefox/releases/. As of writing it is 3.0b5. After extracting the tarball you will need to create a .mozconfig file in the new mozilla directory. My .mozconfig looks like this (hopefully it will work for you:)
. $topsrcdir/browser/config/mozconfig
mk_add_options MOZ_OBJDIR=@TOPSRCDIR@/fx-jssh
mk_add_options MOZ_CO_PROJECT=browser,xulrunner
ac_add_options --enable-extensions=default,jssh,webservices
ac_add_options --enable-debug --disable-optimize
ac_add_options --enable-default-toolkit=cairo-gtk2
mk_add_options MOZ_MAKE_FLAGS=-j3
Make sure you have all the required dependencies installed. Alright, lets build it. In the mozilla directory:
$ make -f client.mk build
Time to give our newly build FF a try. (Don't do this if you're running a headless environment)
$ cd fx-jssh/dist/bin;
$ ./firefox -jssh -P -no-remote
This will start up Firefox and prompt you to create a new profile. If you have multiple versions of Firefox on the same machine, you will want different profiles for each of them. You can then specify which profile to use after the '-P' flag. I created a profile named jssh. In my profile's directory (~/.mozilla/firefox/jssh), I created a user.js:
user_pref("print.always_print_silent", true);
user_pref("print.print_bgcolor", true);
user_pref("print.print_bgimages", true);
user_pref("print.print_shrink_to_fit", false);
user_pref("print.print_to_file", true);
user_pref("print.print_to_filename", "/tmp/mozilla.pdf");
user_pref("browser.startup.homepage", "about:blank");
user_pref("browser.sessionstore.resume_from_crash", false);
This will cause Firefox to always print to /tmp/mozilla.pdf without going through the normal print dialogs. These settings work well for what I'm using this for, but I would suggest playing around with these and other settings to get something that will work well for you.
Alright, lets telnet in. The default is for jssh to listen on port 9997.
$ telnet localhost 9997
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
Welcome to the Mozilla JavaScript Shell!
And to print our first pdf:
> var w0 = getWindows()[0];
> var browser = w0.getBrowser();
> browser.loadURI("http://google.com")
> var window = browser.contentWindow;
> window.print();
If all goes well you should have a new pdf at /tmp/mozilla.pdf.
JSSh seems to be fairly powerful. I wonder what other interesting things you could do with it.