Grokking the Java CLASSPATH:

Java's use of the ClassPath is undoubtedly one of the most confusing aspects of the language for those just begining to learn how to use it. It took me about 6 months of exposure to Java before I wrapped my head around the whole concept. I attribute a large part of this confusion to the fact that nobody ever explained the concept to me properly. Eventually, I came to understand it purely as a result of familiarity brought on by prolonged exposure. However, trying to write & run Java programs for those initial few months was a painful exercise. This is an attempt on my part to ensure that nobody else will ever have to go through that.

The primary reason that most people find the ClassPath so confusing is that it is peculiar to Java. Hence, even those with significant amounts of prior coding experience in various other languages tend to be thrown for a loop by it. The problem is intensified because this somewhat radical concept is absolutely neccesary to do anything at all with Java. In truth, there is one concept that can be considered an ancestor of Java's ClassPath. If you have some familiarity with DOS or Unix, you may have had previous encounters with an entity known as the PATH (generally denoted by $PATH or %PATH%). If you've never heard of it before, don't worry. The Path is merely a list of places that the computer system should expect to find runnable programs in. That way, when you want to start an application, you can just provide its name and all the different places listed in the Path will automatically be searched until the program is found. While this may sound somewhat inefficient, bear in mind that the computer can do all this searching faster than you could possibly type out the exact location of the program.

Like the Path, Java's ClassPath is also a list of places to look for something. By the way, these 'places' are always directories on your computer's hard drive. The critical difference between the regular Path and the ClassPath lies in the way the Java platform works. Unlike programs written in most other languages, Java programs are not designed to run on a particular type of CPU (e.g. Apple's PowerPC, Intel's Pentium or Sun's UltraSparc) or a particular Operating System (e.g. MacOS, Linux, Windows, Solaris). Instead, Java programs are designed to run on what is essentiallly a CPU emulator program, the Java Virtual Machine (JVM for short). In place of a whole Operating System to provide basic functionality that all Java programs can use, there are a collection of 'core' classes (e.g. String & Vector) that come with the JVM and can always be used in any Java program you write.

While most applications you use (e.g. Solitaire, Netscape, Excel) are usually just one file that can be run, Java has a very different way of doing things - one that reflects the way in which you have written your programs. The Java compiler turns each Class that you have written code for into a seperate file, which has the same name as the Class but with a '.class' extension. For instance, you may have noticed that when you compile a Class called Foo, which is in a file called 'Foo.java', you end up with a file called 'Foo.class'. These '.class' files are all in a format called Java ByteCode, which is really just the native language of the CPU emulator. Now, the trick to actually running the program is to give the JVM (that's the CPU emulator, remember) the name of a class that you have defined a 'Main()' method for (as in 'public static void Main() {...}'). Once you do that, the JVM will just find the appropriate bytecode (.class) file and run that 'Main()' method.

You might wonder how the JVM knows where to find this bytecode file. This is where the notorious ClassPath comes in. However, before unveiling the secrets of the ClassPath, a word about another Javaism called 'packages'. A package is just a convenient way of grouping some related Classes together. These packages are laid out in a tree-like hierarchy, much like the way the directories on your hard drive (or the topics on usenet) are. The notation is slightly different, however: the different levels of the hierarchy are seperated by a dot (instead of a slash or backslash). All the 'core' Java classes will be found in a package that starts with 'java' (e.g. the Vector class is in the 'java.util' package so it's full name would be java.util.Vector).

Using packages for your programs helps to keep them distinct from each other. This way, the classes you write for program Foo can be in 'myprograms.foostuff', which may contain the classes Foo, FooHelper & SpecialFoo). Because classes within the same package are supposed to be related, they can address each other on a 1st name basis. So, inside Foo, you can use FooHelper simply by referring to it by name (i.e. FooHelper). However, different packages could have classes with the same name so to provide clarity when using classes in a different package, you refer to them by their full name. For instance, if you wanted to refer to class Foo inside another class, Bar, that is part of a different package, you have to call it 'myprograms.foostuff.Foo'. Now imagine that you use Foo a lot in Bar. It would get rather tiresome to use the whole package name every single time you need to refer to Foo. When this happens, you can save yourself some trouble by having Bar formally introduced to Foo. All this requires is a line at the top of Bar.java that says "import myprograms.foostuff.Foo;" and then Bar can refer to Foo directly (i.e. just say 'Foo' instead of 'myprograms.foostuff.Foo').

Naturally, the most logical way for packages to transalate into a structure on your hard drive is to map it to a directory structure. Thus, you will have some top-level directories (e.g. java & myprograms) that contain other directories (e.g. util, foostuff & barstuff), which contain the actual bytecode (.class) files required to run the Java programs you write. In order for the JVM to find these bytecode files, it needs to know 2 things: where are the top-level directories located & what is the full name (e.g. 'myprograms.barstuff.Bar') of the class whose Main() method you wish to run. That 1st bit of information is what the ClassPath lists. Yeah, that's all it is: a list of places in which the JVM should expect to find the directories that represent the topmost level of the package hierarchy. For instance, if the 2 directories, 'java' & 'myprograms', are in something like '/home/duke/mycode' or 'D:\My Documents\mycode' then that is what will be inside the ClassPath.

There are 2 ways of telling the JVM know what ClassPath to use. The more obvious one is to specify it when you start the JVM, the same way you do with the name of the class whose Main() method you want to run. However, if you have your bytecode scattered all over your hard drive, then the list of places in your ClassPath can get quite long and it would be tedious to specify it each time you want to run a program. Fortunately, there is a more convenient way of letting the JVM know where to look: you just set it once in a place that the JVM can always get it from. This involves setting an environment variable (much like 'PATH') called 'CLASSPATH' to the list of places that contain any top-level package directories on your hard drive. From then onward, the JVM will just get this information from there and you will never have to do it yourself again.

Well, now you know what the ClassPath is, why it is so important and how to use it. Have fun coding :-)

Copyright (c) Antonio M. D'souza, 2001.