The Stieltjes integral was first introduced by the Dutch mathematician Thomas Jan Stieltjes in his monumental 1894 paper "Recherches sur les fractions continues" ("Researches on continued fractions", Annales de la Faculte de Sciences de Toulouse, 8(1):1-122, 1894). Stieltjes' immense 160-page paper covers, among continued fractions, the development of the theory of what would later be known as the Riemann-Stieltjes integral, which arose out of Stieltjes' attempts to solve the moment problem, which was to find the distribution of mass of a body given its moments of all orders. However, his original formulation suffered from the same convergence limitations as the original Riemann integral upon which it was based, so later mathematicians developed a further extension of Stieltjes' results by combining them with Lebesgue's measure-theoretic integral, to produce what is now known as the Lebesgue-Stieltjes integral, with which this writeup is solely concerned. Since most of the work of extending the Stieltjes integral was primarily that of the mathematician Johann Radon, this formulation is sometimes called, in older references, the Lebesgue-Radon integral.

The formulation of the Stieltjes integral is based on two simple observations. First, that the ordinary Lebesgue integral is based solely on a particular measure, the Lebesgue measure, but however, there is nothing in the development of the theory of the Lebesgue integral that depends on this measure's intrinsic properties (that is, the fact that it is translation invariant). One could just as easily use any other measure from some suitable measure space, and develop an integral with similar properties using it, using the exact same construction. This is the first observation that leads us to the construction of the Stieltjes integral.

The other observation is that there is a rather natural correspondence between such general measures and a certain class of functions. Let μ(B) be some measure defined on a closed bounded subinterval of the real line X = [a, b]. The measure μ can easily be seen to define a function over X, F(x) = μ([a, x)) for all x in [a, b], called the generating function or distribution function of the measure μ. This function, obviously, satisfies F(a) = 0, is finite, monotonic, and nondecreasing for all x in X. It may also be seen to be left continuous.

Conversely, let F(x) be some function defined over X, such that F(a) = 0. Such a function may tentatively act as a generating function for some measure μ_{F}. To make the measure it generates positive, F(x) must also be nondecreasing. To produce a valid measure, F(x) must also be of bounded variation, i.e. it must satisfy:

k
sup Σ |(F(x ) - F(x )| < ∞
j=1 j j-1

where the

supremum is taken over all finite

dissections of [a, b] a = x

_{0} < x

_{1} < ... < x

_{k} = b. If the function F(x) is not of bounded variation, it could produce a "measure" that could be infinite for some finite union of subintervals of X. which cannot be a proper measure. We may now define a

set function μ

_{F} based on F(x), in this way:

μ_{F}([α, β]) = lim_{h -> 0+} F(β + h) - lim_{h -> 0+}F(α - h)

μ_{F}([α, β)) = lim_{h -> 0+} F(β - h) - lim_{h -> 0+}F(α - h)

μ_{F}((α, β]) = lim_{h -> 0+} F(β + h) - lim_{h -> 0+}F(α + h)

μ_{F}((α, β)) = lim_{h -> 0+} F(β - h) - lim_{h -> 0+}F(α + h)

It may be shown that the set function μ_{F} constructed this way is actually a valid measure over X. It may also be shown that all valid measures that may be used to construct a measure space over X have a one-to-one correspondence with these functions of bounded variation.

So now nothing stops us from using such functions of bounded variation to define a measure, as described above, and then constructing an integral using that measure. This is the whole idea behind the Stieltjes integral. We thus may speak of integrating one function with respect to another, and we write:

∫_{X} g(x) dF(x)

One may construct measures over the whole real line in a similar way, but we would then need to force the generating function to satisfy a normalization condition such as F(0) = 0, or maybe lim_{x -> -∞} F(x) = 0 (otherwise there might be several different functions corresponding to the same measure). The function would be said to be of bounded variation over the whole real line if:

k
sup Σ |(F(x ) - F(x )| < ∞
j=1 j j-1

for all x_{0} < x_{1} < ... < x_{k} if and only if the corresponding measure is finite. Further extensions to higher-dimensional spaces are similar.

More generally, it is possible to use *any* function of bounded variation for the Stieltjes integral, not just *nondecreasing* functions of bounded variation, as it may be shown, by a technique analogous to the Jordan decomposition for signed measures, that any function of bounded variation is the difference between two nondecreasing functions of bounded variation. The corresponding Stieltjes integral is the difference of the integrals with respect to the two functions. One may even use *complex* functions of bounded variation, as every complex function of bounded variation ψ is of the form f + ig, where f and g are (real) functions of bounded variation, and the Stieltjes integral of some function φ is just ∫ φ d&psi(x) = ∫ φ df(x) + i ∫ φ dg(x).

If one is using the Daniell integral construction, an equivalent formulation of the Stieltjes integral may be arrived at by using the set of all continuous functions as the elementary functions, and the Riemann-Stieltjes integral as the elementary integral.

The Lebesgue-Stieltjes integral may also be thought of as defining an inner product, and thus a natural Hilbert space in much the same way that the Lebesgue integral is capable of defining a Banach space.

The Stieltjes integral has many applications to the theory of probability and random variables, as well as in potential theory. For instance, the cumulative probability distribution function F(x ≤ X) of some random variable is by definition a nondecreasing function of bounded variation of exactly the type required for Stieltjes integration. In elementary treatments of probability and statistics, the derivative F'(x) of this function is taken to produce a probability density function, and the expected value of the random variable with respect to some function g may be taken as:

∞
E[g(X)] = ∫ g(x)F'(x) dx
-∞

However, this does not work if the distribution function F has discontinuities, e.g. for a discrete random variable (unless one is willing to endure the Dirac delta function and all that), or is otherwise not differentiable at certain points. However, using the Stieltjes integral one can avoid this difficulty, as it may be shown that:

∞
E[g(X)] = ∫ g(x) dF(x)
-∞

always holds no matter how ill-behaved the distribution function F(x) is (provided it *stays* a proper probability distribution function).

**References:**

J.H. Williamson, *Lebesgue Integration*, chapter 6

G.E. Shilov and B.L. Gurevich, *Integral, Measure, and Derivative, a unified approach*, chapters 4 and 5.