Some or all of this information may be specific to Microsoft SQL Server 2000, and is not guaranteed accurate for other RDBMS implementations.
Like a nonclustered index, a clustered index is referenced by a b-tree that can be used to increase the performance of searches. Building a clustered index forces a physical sort of the data and the leaves of its b-tree contain the actual data rows. Because items can not be sorted two different ways at the same time, each table can only have a single clustered index. However, the clustered index can be built over several columns in the way that a telephone directory is sorted by last name and first name. The leaves of a nonclustered index built on a table having a clustered index will contain the clustering key value which is then used to traverse the clustered index. Because of this, using a large clustering key will increase the size of the nonclustered indexes on that table.
So What is a Clustering Key?
The clustering key is a value used to uniquely identify a single row in the clustered index. It is built from a composite of the columns the index uses. If the key values are not unique, SQL Server will append hidden values to its representation of the data to force internal uniqueness, unfortunately increasing the size of the key. Using the UNIQUE constraint will make this unnecessary.
While the costs of using a clustered index are not trival, there are several situations in which one can improve performance. Because the data is already sorted, queries requiring sequential rows in that order will benefit from this type of index. Performance increases reading sequential data is a direct result of the physical sort: SQL Server can read an entire extent (basically server’s idea of a disk block) at a time, rather than fetching each page from a random extent. This greatly boosts the effectiveness of readahead cache (aside from empty space within a page (which may later be used to mitigate the INSERT penalty), the table is essentially not fragmented at all). The ordering of data is also beneficial to aggregations such as MAX(), to selecting a range of values such as with the BETWEEN operator, and to queries which employ a JOIN or GROUP BY operation.
Performance impacts for data modification can be even more significant in a clustered index than in a nonclustered index because the data ordering must be maintained in addition to updating the tree. This penalty is less significant when the index is built on a table where the key value is monotonically increasing, such as an identity column.