garbage collection - Everything2.com

by viterbiSearcher

Tue May 02 2000 at 20:06:41

Garbage collection offers an alternative to Traditional Memory Management. Instead of the application task or Mutator explicitly de-allocating objects when they are no longer needed, the Garbage collector uses program state information to determine whether any valid sequence of Mutator actions could reference an object. If no valid sequence of program actions can reference a object, then it is unnecessary in the on-going computation, the object is Garbage and may be reclaimed by the collector. Conversely, all objects which can be referenced by such a sequence of actions are Live.

Garbage objects may be isolated, containing no references to other garbage, or they may form lists, trees or other structures of potential arbitrary complexity with other objects, both free and live.

Garbage collection also implications for the operation of execution stack. Studies indicate that the cost of allocating stack frames and associated objects on a garbage collected heap can be comparable the cost of traditional stack allocation. This technique is especially useful for languages with closures, such as Scheme, in which frames can have a lifetime independent in the lifetimes of their callers and callees (those immediately `above' and `below' them on the stack).

In many cases garbage collection is integrated into other system components, for example virtual memory/paging or the saving of the runtime heap to backing store in Smalltalk.

Garbage collection as a graph traversal problem

Objects which can be referenced by a future sequence of valid mutator actions are said to be reachable. Those which will be referenced in the future by the mutator are said to be live, that is, necessary to the ongoing computation.

A garbage collector can be thought of as performing a liveness analysis on heap objects, locating those objects which cannot be referenced either directly from a root set (the stack and global variables), or from a root set via other objects on the heap. Transforming garbage collection into a graph traversal problem enables traditional graph traversal algorithms such as depth- and breadth-first searching to be utilised when tracing the heap to decide whether a object is garbage.

Liveness analysis is, however, inherently undecidable, because the Halting problem prevents determination of the liveness of objects in some situations.

Consider, for example, the following Java class:

/** A class to illustrate the undecidability of reachability analysis */
class Undecidable {

  /** A method whose completion is undecidable */
  void anUndecidableMethod(){
    // ...
  }

  /** A method which illustrate undecidable reachability. */
  void anotherMethod(){

    // the object whose reachability is undecidable
    Object anObject = new Object();

    // the method call during which anObject reachability is undecidable
    anUndecidableMethod();

    // an access of anObject
    anObject.getClass();

    // another call method call
    anUndecidableMethod();
  }
}

anUndecidableMethod() is a method whose completion is an undecidable problem, in the Halting sense. anotherMethod() is another method, which creates a new Object and then calls anUndecidableMethod(). An anObject is only accessed in afterwards, which will never be executed if anUndecidableMethod() never returns. Since anUndecidableMethod()'s return is undecidable, so is the access of anObject().

In practice, all known garbage collectors assume that all method calls return. This is a conservative conservative assumption---it may lead to the retention of reachable but not live objects, but never to the freeing of reachable or live objects.

Most garbage collectors also assume that anObject is live throughout the call to anotherMethod(), even though no code in the last lines access it. This is also a conservative assumption, but it greatly increases the usefulness of debuggers, as variables hold their value until they pass out of scope, rather than having undefined value between their last executed reference and when they pass out of scope.

These two assumptions convert liveness into scope, a well understood notion fundamental to modern programming language design. Some work has been done on the effect of these two assumptions and it appears that in some cases, considerable memory is lost to them.

For example, consider the following Java class, which accepts a number of command-line arguments, reads them into variables and then performs some processing based on their values:

/** A class to illustrate retention of excessive memory */
class AClass {

  /* A number of state variables whose value is determined by the command line
     arguments to the program */
  static boolean _stateInformationA;
  static boolean _stateInformationB;
  //...

  /** A method to read the command line arguments into the state variables. */
  static private void handleCommandLineArgs(String argv){
    // ...
  }
  /** A method which actually does the programs work */
  static private void doStuff(){
    //...
  }
  /** The entry point into the program */
  static public void main(String argv){

    // handle the command line arguments
    handleCommandLineArgs(argv);

    while (true){
      doStuff();
    }
  }
}

The program spends the vast majority of it's time in the while loop, by which time the command-line arguments, stored in argv, are known to be redundant (unnecessary for ongoing computation). While retention of such objects are necessary when a debugger is running (or in a system, such as the Smalltalk runtime environment, where the system debugger may be invoked at any time), in the general case, they may be reclaimed.

It should be noted that any objects used (or potentially used) throughout the nonterminating function are not reclaimable and need not always impact of collector design or correctness.

A good review paper on memory management and garbage collection is: Paul R. Wilson. Uniprocessor garbage collection techniques. In Proc of International Workshop on Memory Management in the Springer-Verlag Lecture Notes in Computer Science series., St. Malo, France, September 1992.

The garbage collection FAQ is at: http://iecc.com/gclist/GC-faq.html

I like it!

3 C!s

(idea)

by shmOOnkie pOOnks

Tue Feb 06 2001 at 7:00:10

Who collects most of the garbage in post-election urban Philippines? The poor. People race for ownership of hanging cloth banners used for candidates' promotion. After cleaning, they sew the wind holes shut then use them to make clothing, curtains, and tablecloths. People also collect glass bottles and tin cans from the ground to sell to bottling companies and junkyards after rallies and marches.

In the more rural neighborhood in which my grandmother resides in the Philippines, no form of garbage collection exists, mostly due to fraudulent and irregular tax collection within municipalities. My relatives dispose of their household garbage by collecting it over a week or so in a small heap in their yard. Then they burn it, including plastic.

Since my relatives don't have regular garbage collections, they try to get things repaired as much as possible rather than just throw them out and get new ones. This probably has to do with inexpensive labor costs too, but watching a prized stereo or a string of Christmas lights burn in a fire is pretty saddening, to me at least. It's a little easier to just let them get taken away as is to a garbage dump...somewhere.

I like it!

1 C!

(thing)

by dzhezus

Mon Mar 04 2002 at 2:39:17

"Garbage Collecting" is a message displayed by Texas Instruments graphing calculators when they are "defragging" their Flash RAM. Memory on a TI-calc is divided up into sectors that can be re-arranged when each is too full.

I like it!

(thing)

by Lord Brawl

Sun Aug 25 2002 at 0:26:26

A simplified look at the automatic memory management technique known as garbage collection. Hopefully complementary, but less 'code-y' than viterbiSearcher's writeup above. A software development writeup.
Geek factor: high.

What is it?

Garbage collection is a technique for automatic memory management used in most modern programming languages (such as Java and Microsoft's C#). It was invented around 1958 for the programming language LISP. It differs from the manual memory management normally used in languages such as C and C++, in that programmers do not need to explicitly call a function (such as free()) to release allocated memory. Many of the techniques for automatic memory management were developed for the Smalltalk language.

The garbage collector is a service provided by the language (most commonly, by the virtual machine). The garbage collector's job is to identify areas of memory that are no longer in active use and make that space available for future allocation. Efficiently determining which areas of memory are in use and reclaiming the remaining memory is done in many different ways, and is an ongoing area of study.

The garbage collector may also reduce fragmentation. During program execution, objects are allocated and discarded in no particular order. Over time, this leads to active objects being scattered throughout memory, breaking up the free memory space. When the system needs to allocate a large object, there may not be enough free memory to do so, even if there is a lot of total free memory. Some garbage collectors pack the active objects together to make the maximum amount of contiguous free space.

A garbage collector must attempt to minimize its impact on the application. Common issues to be considered include:

Overhead - The amount of processing or CPU time taken by the garbage collector;
Pause time - The time (if any) that the application itself is stopped waiting for garbage collection;
Memory use - The amount of memory used to track information for the garbage collector.

How does it work?

The exact strategy used depends on the language and the strategy implemented.

Within Java, there are three main types of memory:

Global or static data - allocated at class load time (there being no global scope in Java), and invariant in size thereafter. Not managed by the garbage collector.
Stack allocated data - allocated locally during execution when a declaration is encountered within the scope of a function / routine / method etc. It is released when execution leaves the scope containing the declaration. Each thread will have a stack frame. These are not managed by the garbage collector.
Heap data - all objects that are dynamically allocated during execution. This is the memory managed by the garbage collector.

Heap objects in Java are created by the new operator. They are never explicitly released. Instead, the garbage collector automatically frees objects when it has determined that they are no longer referenced by the application.

Garbage collection is guaranteed to keep all accessible objects, but there are no guarantees about when, or even if, inaccessible objects will be released.

There is no official Java garbage collection technique, and the language specification does not require memory management at all. Without it, however, memory will fill and things will grind to a halt.

What are some techniques?

The most common type of garbage collector is the Tracing collector, of which there are many subtypes. An alternative strategy is to use Reference counting. Both approaches have their own challenges.

Ideally, a garbage collector would remove from memory any object that would never be used again in the execution of the program. The remaining objects are called "live" objects. It is exceedingly difficult, if not impossible, for the collector to determine liveness. Tracing collectors therefore rely on a concept called reachability (q.v.). In short, reachability is determined by starting with local and static variables and tracing all objects referenced by those variables, and then by those objects, and so on, and so on.

The original tracing collectors used a mark and sweep algorithm, sometimes called a two-phase collector. In this algorithm all objects contain (or are associated with) a 'reachable' flag. When the collector starts, the application and all threads (except the collector's) are suspended. The collector clears the reachable flag for all objects. It then executes the reachability test, setting the reachable flag for all reachable objects. (Duh!) This is the first phase. It then takes a second pass over memory, in address space order, and adds any objects without the flag set to the 'free list'. This is the second phase. In Java the finalizer methods are run during the sweep phase of a mark and sweep collector.

The simplest collectors then scavenge free memory by making all contiguous free blocks into single blobs of free emory. Some tracing collectors can also move the reachable objects and compact the heap. The simplest approach to this is to copy reachable objects to one end of the heap, leaving the free memory pooled at the other. An advance on this method is the copy collector. It divides the heap into two portions, only one of which is actively used at any one time. When the active half of the heap fills, the garbage collector runs, after (or as) reachable objects are identified, they are copied to the other half of the heap. Once all objects are copied, the formerly active half of the heap becomes inactive, and processing resumes with the 'new' half of the heap used for object allocation.

These types of collectors are sometimes called "stop-the-world" collectors. They are increasingly disparaged because their Pause time grows with application complexity and increasing heap size. Java virtual machine implementors are working to introduce incremental collectors. These work on small areas of memory, using an intermediary process called a mutator to handle memory allocation and access. The mutator seamlessly handles the relocation of objects as well as allocation. This allows the collector to work without blocking the application.

One type of incremental collector know in common use is the generational collector. The idea behind this algorithm is that a large percentage of the objects allocated by an application are short-lived. Relatively few objects remain reachable for a longer portion of the program's execution. Thus, memory is divided into an area for short-term objects (variously called the nursery or Eden), which is collected frequently, and an area for venerable objects, which is collected less often. The collectors run faster, and scan consistently reachable objects less often. Often the generational collector is implemented using a combination of the previous techniques: the new-object space is managed by a quick but conservative collector, and the old-object space by a slower but more robust collector such as mark and sweep.

As garbage collectors grow more complex, train collectors are beginning to appear, with many memory spaces for different generations of objects, tunable in size and using different collectors at each step. Trying to explain the object transitions without a complex diagram defeats my patience for this writeup!

For C/C++ programmers, a garbage collector for C and C++ can be found at: http://www.hpl.hp.com/personal/Hans_Boehm/gc/

Sources

http://www.memorymanagement.org/
http://www.artima.com/insidejvm/ed2/ch09GarbageCollection01.html

Disclaimers

You may notive I forgot the reference count collectors. I don't know enough about them yet. Someday I hope to come back and add it, unless someone wants to tackle it below.

The only language I've used that has a garbage collector is Java. I talked as generically as I can, but I may miss things that apply only to other languages, such as Smalltalk, Prolog or LISP. Additions and clarifications welcome. That's how E2 works after all....

I like it!

3 C!s

The horror that is gc	garbage collector	The most elusive type of problem in a C program	Why C sucks
Traditional Memory Management	fully buzzword compliant	mutator	reference counting
Structure and Interpretation of Computer Programs	C#	currying functions	Applesoft BASIC
Garbage collection vs. manual memory management	SML	Some AI Koans	Dumpster diving
q.v.	G4 Cube	memory leak	Objective-C
How knots weaken rope	Killing the angel in the house	It's Hard to be Humble