The ergodicity of a physical or computational process is a statement about initial and final states of that process.

What defines a state? It depends on your particular problem. If we think of a classical gas of atoms, then the positions and velocities of those atoms at a given instant in time would define the state of the gas. The state of a computer's memory could be represented by a sequence of 0's and 1's.

Let's use the letter S(t) to represent the state of a given system at a given time, labeled by t. And let's say t=0 is the initial time and t=1 is the final time. Let's also use an arrow to denote the process under study. So under our process, S(0) → S(1), that is the process starts from initial state S(0) at t=0 and ends up at S(1).

Now imagine any 2 possible states. Let one of them be the initial state of the system, and the other the final. A process is defined to be ergodic if there is a nonzero probability for the process to evolve S(0) to S(1), that is, if Prob( S(0) → S(1) ) > 0 no matter which 2 states you picked. Another way of phrasing the definition is: a process is ergodic if every point in state space will be visited at least once if the process is applied for an infinite number of steps.

If you have a nasty multi-dimensional integral, Monte Carlo methods are a very efficient way to evaluate the integral numerically. Let's say the integrand depends on a field. Think of the Ising model: a 2 dimensional lattice of points, where at each point the field is equal to +1 or -1; quantities like the magnetization and internal energy are functions of the field configuration, or state, of the spins. To calculate thermal averages of these quantities, integrals must be done over all possible spin configurations, weighted by a Boltzmann factor, exp(-E/T). (E=energy, T=temperature). Due to the decaying exponential, most field configurations will give exponentially small contributions to the thermal average, so it is tremendously effecient to use an algorithmic updating process to generate just the states which dominate the integral. In order not to skew the sample, that is, in order to find the important configurations with the right weight, the updating process must be able to, in principle, visit every single state. That is, the updating process must be ergodic in order to have an unbiased calculation. (The process must also tend to the canonical distribution after repeated application, see below.)

Nontrivial, nonpathalogical processes which are ergodic, are called Markov chains. If a process is ergodic, and if repeated applications of the process will take any distribution of states toward the canonical distribution (Boltzmann distribution), then the process will (eventually) yield a sample of states which can be used as a realistic sample. The sample average, which you can calculate using your generated chain of states, will tend toward the desired thermal average (more generally, the ensemble average), the answer you're looking for. Detailed balance is a sufficient, but not generally necessary, condition for tending to the canonical distribution.

For references, see any text on Monte Carlo methods. The one I have handy right now is Quantum Fields on a Lattice by I. Montvay and G. Münster (Cambridge Univ. Press, 1994).