PDF stands for Portable Document Format. It is a proprietary format, owned and maintained by Adobe Systems Incorporated, but it has become something of a lingua franca for document exchange on the web and elsewhere.

Its popularity is no accident.

PDF and PostScript

Adobe Systems also developed (and still own) the PostScript language, which remains unrivaled as a Page Description Language for print applications. Their experience with PostScript was an important proving ground for the concepts that have made PDF so successful, but they have changed and extended them with PDF:

  • Device Independence - Unlike HTML, which achieves device independence by leaving most of the duties of rendering the appearance of the content to the tender mercies of the client, the goal of PostScript was to create page descriptions that render as nearly identically as possible on whatever display device they are thrown at. PDF remains true to that ideal, but it allows document creators to trade some absolute fidelity for file size based on the document's target purpose (still not device).
  • Extensibility - PostScript is a full-featured programming language optimized for rendering text and graphics. PDF is 'just' a document format, but its specification expressly enables the document creator to include information of any type, and rules for creating new types of content (including 'program' content) in a PDF.
  • Accessibility - Adobe Systems went against the common wisdom of the day when it made the PostScript Language Specification freely available, but this move allowed anyone to create PostScript, which required a PostScript interpreter to run. Adobe sells PS interpreters for large sums of money to makers of printers and other display systems. When it comes to PDF, the idea is flipped around - Adobe gives away a PDF interpreter (Acrobat Reader), but it markets the canonical PDF creation tool, Adobe Acrobat/Distiller, as a reasonably priced mass-market application.
Notwithstanding their considerable commonalities, PDF is not really a variant or an extension of PostScript, as is often claimed. For one thing, their structures are radically different - a PostScript document is a program whose instructions are executed sequentially. A PDF document is a collection of 'contents', not necessarily in any particular order, with a catalog that allows them to be looked up when needed. The master catalog contains pages, document information, and many other things, quite a few of which may also be catalogs in themselves. These hierarchical structures are referred to as 'dictionaries'; it is a construct borrowed from PS, but it's used for purposes unimaginable to PostScript.

It is correct in a way to say that PDF is a superset of PostScript, because a PDF page content may be, for instance, an Encapsulated PostScript picture, and the viewer must be able to render it. However, the content may also be a bitmap or a movie or sound file, or a backup of E2. The viewer may have to pass the content off to some other application (or plugin) to handle, but from the document's viewpoint they are all on an equal footing.

Adobe's Free Viewer

Adobe makes available free PDF viewers for Windows, Mac, and many Unix-like operating systems, as well as corresponding plugins for the Big Browsers. This is an important factor in PDF's wide acceptance, but not enough to define a defacto standard. Toward that end, they cleverly embedded their base 14 PostScript fonts in the reader, and made it easy for PDF creators to make font-free documents that would render reasonably (okay, extremely) well across platforms. Throw in painless downsampling of raster graphics, and you have a tightly controlled layout in an economical file size. Oh, by the way, it also has form fillout capability and internal and external hyperlinking. All in all, an attractive choice for network document sharing where layout is paramount.

Adobe's PDF Suite

The reader is built to take maximum advantage of PDF's capabilities, but the hub of a PDF environment is the Acrobat suite. It includes Acrobat, which is like an Acrobat Reader with editing features, and Distiller, which is a PostScript to PDF converter.

Acrobat has never really caught on at the typical desktop level, but in print graphic and some e-biz circles, it's as common as MS Word. Of course, it's nothing like Word - you don't type up a PDF. Usually, original document creation is done in any application of choice, then "printed" to a Postscript file and converted to PDF using Distiller.

Distiller has settings that enable a PostScript file to be repurposed for different target uses. You may want to embed fonts, high-resolution graphics, and complex color information for printing, or you may want a stripped-down version for the web. Maybe both.

Once you have a PDF to work with, you can open it in Acrobat, where you can add form fields, digital signatures, media clips, annotations, javascript, bookmarks, embedded comments; you can scan and OCR hardcopy, index directories of PDFs for full-text searching... whew, it's a Swiss Army Knife. The suite lists for US $249.00, and I know of no other application that drags in as much cool stuff for the price.

Still, the app's assumption is that you'll be doing all your document creation and heavy editing somewhere else, so there are only the most rudimentary tools for changing the page content. We can forgive Adobe somewhat for this, because they have also made available an extensive, thoroughly documented API for extending Acrobat.

Acrobat SDK

The PDF Reference and Acrobat Core API Reference comprise nearly 4000 pages of densely written material, and there are thousands more on ancillary features and programming interfaces. The SDK and all the documentation are free for downloading at adobe.com, and Adobe does not impose a license fee on applications or Acrobat plugins developed with them*. You have complete freedom to add menu items and tools into Acrobat, and to define your own actions in the application, and your own extensions to the PDF format. Adobe only asks that you register a four-character identifier with them so that your plugins and PDF content types don't end up with the same name as someone else's. There is no charge for registering an identifier.

Finding your way through all that documentation to begin writing Acrobat plugins is not very easy (or it wasn't for me), and the cross-platform imperative produces some oddities, but there are some very powerful Acrobat plugins on the market, and more on the horizon.

* There are license fees associated with plugins for the Acrobat Reader, but not for the full version of Acrobat.

Contrary Musings

Much of the PDF/Acrobat design seems oriented toward the web, almost as though Adobe thought of PDF as some kind of replacement for HTML. That's a kooky idea, and I really hate having to load that browser plugin to view content that would have been just as nifty in HTML.

It's handy to have documents in a compact, portable format to move them around, but the usual reason to exert fine control over a document's layout is because you expect it to be printed. Many of the byte-trimming features of PDF are deleterious to the quality of printed output. A default installation of Distiller will downsample raster graphics to 150 or 72 DPI, depending on the version. Your family album will look fine on the screen, but it will print like crap.

Acrobat enforces TrueType font license protection rigidly. This will result in some fonts being unable to be embedded in a PDF, even some fonts that are not intended to be so restricted. If you intend to prepare print-ready documents using freeware fonts or those from Corel products, you should be especially aware of this issue.

Acrobat contains safeguards against malicious content, but PDF's extensibility has inevetibly led to the advent of PDF-borne trojans. This is not a widespread problem yet, but Pandora's box is open.

Conclusion

I can't heap enough compliments on Adobe for their care, skill, and cleverness in the design of PDF and Acrobat. Much of that skill went toward the marketing of the product, but their strategy seems to have been to make it irresistible.

It's great for sharing business documents around, and the forms capability has some interesting possibilities. It really shines in a printing environment where, for instance, a high-res "real" version and a lightweight electronic proof can be made from the same source document. It's a big pain in the neck when graphic designers overuse it for reasons of ego, rather than functionality.

PDF has become ubiquitous, and it will probably stay that way, not only because of its intrinsic merits, but because it is maintained and marketed by such a deft company.


Author's Note (January 2008): The above was written in 2002, when Acrobat was shipping Version 5, which associates with with PDF version 1.4. As I write this, Adobe is shipping Acrobat 8. Some of the details in this writeup are no longer accurate (for instance, the document format formerly known as PDF 1.7 is now managed not by Adobe, but by the ISO, under Standard 32000). I was going to delete this, because I'm no longer qualified to opine on current PDF internals, but I find that the overall flavor of the writeup has held up pretty well so far. I'll leave it here for now, but it's a fat target for a good superseding.