What they are, what they’re good for, and why I’m a fan

Zoom interfaces present information in a dynamic space, in which the user’s “camera” can move around: closer to see more detail, and farther away to see an overview. The benefits are maximized when the motion between camera positions is smoothly animated. The name comes from the resulting effect of “zooming” around the information.

While some talk around such interfaces can be traced back in the HCI community as far as the mid 1980s, the term “zoom interface” was first provided in a 1995 ACM paper published by Lyn Bartram et al. titled “The continuous zoom: A constrained fisheye technique for viewing and navigating large information spaces”. Since them the HCI community has been calling these user interfaces zoom interfaces or Zooming User Interfaces (ZUIs). A few sources reference this approach as the “Pan and Zoom approach,” if you're doing research on the topic, but ZUI is far more common.

ZUIs are an output and display technique, so I will speak to them in terms of presentation software like Microsoft's PowerPoint. ZUIs are useful tools when incorporated as full UIs, but the conversation around that gets much more complicated, so I'll stick to presentation.

Why do we need them? The core problem to solve is that we have a need to “chunk”.

Our need to chunk...

If everything we had to visually communicate fit comfortably into a 135 degree visual cone, information architects would have a lot less to do because it would all fit easily into (sighted) reader’s perception. But of course quite frequently the information we have to present needs to be divided into parts, whether those parts are pages or screens.

Once you divide information into such chunks, the problem for the designer becomes how to make those chunks easy for the reader to reconnect into a cohesive whole.

...provides our need for elegant reconstruction

Page-based presentation software doesn’t provide any natural relationship cues to its parts. With each new slide that pops up, the user has to ask, "How does this slide fit in with the prior ones?"

Is it a sublevel?
Is it a return to a superlevel?
Is it another component of the communication at the same level?
Or an example?

While this information can be gleaned from the context, presenter’s speech, or text on the slide, it has cognitive load, and therefore less ease-of-use. If there were some way to imply this sort of information with the design of the experience, that would be easier on the reader, letting her concentrate on the content and not its structure. ZUIs provide this by the motion between chunks:

If the camera moves away from our current location, we’re heading to a superlevel.
If it moves closer, we’re heading into more detail, into sublevels.
Lateral motion implies more information at the same level.

Humans easily map their experience of motion in the world to this kind of information motion, and so it bears very little cognitive friction.

This relies on a deliberate information hierarchy in which superlevels are larger and sublevels smaller. This has been done on paper since the days of the illustrated manuscript and was formalized into modern design with the art movement known as the Bauhaus during the first half of the 20th century, and so is a recognized convention for members of Western media-literate socities.

Paper has a limited number of "levels"
The problem with paper is that (these are arbitrary but reasonable restrictions) given a 33% difference between super- and subcategories, there are only 5 levels of hierarchy between 72 point type, which at a 2 feet reading range only admits around 100 characters in the fixed-head reading field, and 12 point type, by convetion the minimum size for reading. If the designer needs more levels, she’s largely out of luck. Furthermore, if there is an optimal reading size of text for the reader, only one level in a fixed-size hierarchy will be optimal, and everything else will be off.

Computers don't have this problem
Zoom interfaces help solve the problems of a limited number of levels in a fixed-size hierarchy and of suboptimal reading sizes. Because zoom information exists in “infinite space” it can have as many levels of hierarchy as it requires. Because its scale is dynamic, the reader can place the text in a way that is optimal for her.

Real Estate benefits

The dynamic scale also lets you get lots of information into a small amount of on-screen real estate. (For the purists: OK, really it doesn’t. After all, you could put the same information in simple hyperlinked documents. But cognitively the information is a single thing, connected by space relationships, which improves the perception of there being more information squeezed into a small space.)

Wow Factor

An additional benefit is the oft-discounted gee-whiz factor. It’s neat to use a zoom interface. People who saw the ones I produced in grad school for my presentations got excited about the information. This effect may fade if they become more common and people become habituated, but for now, cool is cool.

Overview, review, and reflection

All of the above benefits occur during the reading of the information. Zoom interfaces have an additional benefit afterwards, when the reader is reflecting on the information. To reflect on what they have just read, a user needs to see the information as a whole, and to navigate around its parts as she reexamines and clarifies questions she has. A zoom interface allows the user to back up all the way and see the information as a single thing. At this level, even though much of the text may be too small to read, the experience of having moved through the information leaves memories of the elements’ spatial relationships, which acts as a cue to their meaning. Well designed information in zoom interfaces also ensure that at this viewing level, key graphic elements will be readable that further act as a cue to the meaning.

One domain which benefits greatly from the act of overview is code. Portable code is optimally chunked into small logical bits, but the relationships between those parts is crucial to understanding how it works in toto. The Continuous Zoom Java interface does an admirable job of trying to display C++ functions in a zoom-ish interface that links directly to the code. Mike Heinrichs has posted interactive demos of CZoom to code for Checkers and Eliza.

An interesting note about the act of reviewing is that much of modern media doesn’t support this kind of reflection, with the sole example in mind being the table of contents in a book, and these are often used more as an index. This may be partially a matter of media literacy, but I suspect that if media supported it more, it would be more common.

Windowlessness, Iconlessness

One surprising result of a zoom interface is that you don't need windows. (The UI component, not the OS, but the implications scale.) Windows exist to solve the information/screen problem by cropping the information and providing means to scroll to other, cropped parts of the information. If you can "back up"—as in the ZUI—then there's no need to crop the information. This strategy limits dual-view capabilities that allow you to see multiple parts of a document at once. For this reason, even if ZUIs make it into common OS usage, they will probably utilize windows as tools in limited (not default) circumstances.

Another surprising result is that you don't really need icons to represent files in file management. The information itself is the icon, shrunk down. Many OSs already do this with images. The document is the icon. This doesn't work as well with audio or all-text documents, which look the same when small. For this reason some HCI practicioners advocate what's called semantic zooming, in which the information changes at different scales. For example, when your document gets around icon size, it is replaced with the title of the document instead. I think completely swapping out the content is clunky and adds cognitive load to the interpretation of the space. In my work (I do this kind of thinking for a living) support a blended approach of fading up an additional layer of semantic information on small icon-sized information.

Another effect of information-as-its-own-icon is that if the information has been designed in a way unique to its content, its "icon" takes on a single shape, like an information-age pictogram. Perhaps if this type of information becomes common, simplified versions of these pictograms could come to be their own symbols, and stand for the overall meaning of the information. Like a new type of word. Or infoglyph.

A Few Examples

Movies

the movies

WIMP

Phone dialing in Johnny Mnemonic: Johnny dons his VR headgear. To make a call, he sees an image of the Earth. He selects a country, and he zooms in to that country (the USA in this case). He selects a state, zooms, selects a city, zooms, selects a borough, neighborhood, a house, and his call is connected. While there's an argument that two more levels are needed (person and device) his call was only 7 "digits", so not more memory expensive than modern phone numbers.
Minority Report showed a more recent example as John Anderton zoomed in an out of the precog's video feed.
It's worth noting that Jurassic Park's infamous "This is UNIX. I know this!" sequence was less a ZUI and more of a true 3D environment.

Computers

Jef Raskin believes in ZUIs like I do. He is working on what he calls The Humane Interface based on the arguments he made in his book of the same name. It incorporates ZUI and he has a simple demo available at http://humane.sourceforge.net/the/zoom.html.
NYU has been persuing the PAD and PAD++ zoom code libraries for over a decade now. They're available on the web and visible as a Java Applet at http://mrl.nyu.edu/~perlin/zoom/TestButton.html. It's also used as a site browser for their CAT department at http://mrl.nyu.edu/~perlin/zoom/SiteTour.html.

Challenges to zoom interfaces

OK. So these things are the greatest thing since the slice tool in Adobe Illustrator. Why aren't we all using them right now?

Navigation
The main problem with zoom interfaces is that while the motion is natural and informative, the controls are not. There is no common OS 3D experience at the time of this writing. The 2+ dimensional space of the hyperlinked web is the closest thing most people are familiar with. Despite this lack of navigational experience, I believe that the problem is not intractable. During reading, a linear presentation is often easiest, and simple forward and back controls suffice. During reflection, more control is needed. In the zoom interfaces I have built, I have found success using a strategy that allows users to click an object to zoom down to its level of detail, and click in border spaces to zoom back out. The up and down arrow keys common to keyboards are also handy tools in this regard.

I should also mention that the prevalance of PC video games has made keyboard navigation across a plane, e.g. a landscape where twitchy monsters are scrapping for a fight, a fairly common experience amongst video gamers. This experience should easily map to information space.
Construction
The other practical problem with zoom interfaces is that they are hard to build. In my opinion no good zoom space construction tools currently exist. CZoom is automated, which is good for the strictly logical world of code. But for the messier types of information most people deal with, automation is doubtful. Creating your own tools is beyond the abilities of most consumers. If a good software can be built, it still requires a sophisticated design eye or at least tools that support the easy construction of engaging and communicative information designs.
Bad examples
ZUIs work best when constrained to a linear space. Little changes underneath the user in use. Some zoom interfaces have been constructed with "cooler" motion, but that confound the spatial skill. Earl Rennison achieved this with his “Galaxy of News” visualizations. Since it undermines the universal human skill of natural-space processing, it only replaces the cognitive load of relationless chunking.

Similarly, fisheye lenses were a darling of information design for a short while around the turn of the 20th century for their ability to simultaneously show detail and context. The visuals and motion produced by the hyperbolic math are quite eye catching. And there is something to reducing the load on short term memory by keeping everything visible on screen at all times. But, like Rennison’s non-linear space, this distortion seems to miss the opportunity of the sorts of real world spatial relationships people are good at.

Some sources and references: (though this has been by no means exhaustive)

Continuous zoom (Java program) - CZoom interface http://members.rogers.com/mheinrichs/CZoom/checkers/
Bederson, B., Hollan, J., “Pad++: A Zooming GraphicalInterface for Exploring Alternate Interface Physics”, ProcUIST, 1994.
L. Bartram, A. Ho, J. Dill, and F. Henigman. The continuous zoom: A constrained fisheye technique for viewing and navigating large information spaces. In ACM Symposium on User Interface Software and Technology, pages 207–215, 1995
Fairchild, K.M., S.E. Poltrock, and G.W. Furnas. “SemNet: Three-dimensional graphic representations of large knowledge bases.” Cognitive Science and its Application for Human-Computer Interface, R. Guindon (Ed.), Elsevier, pp. 201-233, 1988.
Furnas, G., Bederson, B., “Space-Scale Diagrams: Un-derstanding Multiscale Interfaces”, Proc SIGCHI, 1995.
Furnas, G.W. “Generalized Fisheye Views.” Proceedings of ACM SIGCHI'86, pp. 16-12, April 1986.
Igarashi, T., Hinckley, K., “Speed-Dependent AutomaticZooming for Browsing Large Documents”, Proc. UIST2000.20-1
Rennison, Earl. Galaxy of news: An approach to visualizing and understanding expansive news landscapes.. Proc. UIST'94, pages 3--12, Marina del Rey, California, November 1994. ACM
Small, David, Suguru Ishizaki and Muriel Cooper. "Typographic Space." CHI'94 Companion, 1994.

WIMP environment	Audio-Visual Equipment Pictograms	Zoom	Eliza
Human Computer Interface	Information Architecture	Atanasoff-Berry Computer	Fisheye
chunk	perception	wimp	icon
PowerPoint	window	paper	information theory
Operating system	C++

zoom interface (thing)