An ultraproduct is a ~~construction~~ method of model theory to create a model for a theory in some language (see this or this if you're unsure what "model" means in the context of (modern) mathematical logic) out of an infinite collection of models for that theory in that language. Taking a cartesian product of models in "the obvious way" fails to work; an ultraproduct divides by some equivalence relation to get a new model. Unfortunately, it turns out that the equivalence relation requires the axiom of choice, so it's NOT a "construction".

What it is

Suppose we have a collection {M_i}_i∈I of models, for some index set I. Let U be an ultrafilter on I. Call a subset A⊆I "most of" I if A∈U. This is not standard terminology! Note that due to the maximality of the ultrafilter U, either A∈U or (I\A)∈U.

We want a new model M which is the ultraproduct of the M_i's over U. Take the cartesian product of the worlds of all the M_i. Define an equivalence relation "~": For elements a=(a_i)_i∈I and b=(b_i)_i∈I of the product,

a ~ b ⇔ {i∈I : a_i=b_i} ∈ U.

That is, a is equivalent to b iff at most of the indices i∈I they are equal. The world of our model M is the set of equivalence classes of ~. By abuse of notation we might say that

M = (∏_i∈I M_i) / ~

Once you get a hang of what we're doing, the rest is pretty obvious... For every k-place relation symbol R, we define that

R(a¹,...,a^k) ⇔ {i∈I : R(a¹_i, ..., a^k_i) } ∈ U.

("Relation R holds between elements of the ultraproduct if it holds on almost all their coordinates"). Note that this definition of R is well defined: If we pick different representatives of the equivalence classes of the a^j's, they are equal at almost all coordinates, so (as U is closed to finite intersections of elements!) the same truth value is given to R.

For every k-place function symbol "f", we define

f(a¹,...,a^k) = (f(a¹_i,...,a^k_i))_i∈I.

("the value of f in the ultraproduct at the i'th coordinate is its value in the i'th model"). Again note that this is well defined, again due to finite intersections of elements of U being elements of U. And constants are really just 0-place functions, so we've covered those, too.

Do I have to use the axiom of choice?

Unfortunately, yes. Beyond the minor problems of ensuring that the product is non-empty and determining a set of equivalence classes, you really need U to be an ultrafilter:

The "upwards closure" property (any subset of I which includes a member of U is a member of U) is needed for ~ to be an equivalence relation.
The "finite intersection" property (any intersection of finitely many members of U is a member of U) is needed for relations and functions to be well-defined.
The maximality property (U is an ultrafilter, i.e. U is maximal, i.e. for any A⊆I, either A∈U or (I\A)∈U) is needed to give a truth value to every relation.

The simplest ultrafilters, which also don't require the axiom of choice, are principal ultrafilters: U=U_x={A⊆I : x∈A}, for some x∈I. But if we take U_x in the ultraproduct, the properties of the elements are determined solely by their properties at coordinate x. So the model we get is (isomorphic to) M_x. We don't need an ultraproduct to do that, we've already got M_x! We might therefore do well to call an ultraproduct resulting from a principal ultrafilter a trivial ultraproduct.

If I is finite, we only have principal ultrafilters. So the ultraproduct construction is uninteresting for multiplying together finitely many models: it gives us one of the existing models.

When U is a nonprincipal ultrafilter (and therefore I must be infinite), the ultraproduct is interesting. Unfortunately we've no idea what a general element of U looks like, so the ultraproduct will be very mysterious. But we can take some filter F and extend it to an ultrafilter F⊆U. For instance, we can take F={A∈I : I\A is finite}. This ensures that a property of any finite set of models cannot determine the property on the ultraproduct. To get such a nontrivial ultrafilter, we need to use some Choice.

What is it good for

The use of ultraproducts comes from Los' theorem: A proposition P in the first-order language of the models is true in the ultraproduct model iff it is true in most of the models. Proving Los' theorem is actually an easy induction on the structure of P. It's the definition of the ultraproduct that's hard!

It is essentially the compactness theorem of first order logic, but applied to a pre-specified collection of models. And indeed, many uses of the one can be performed with the other as well.

There's also a parallel to non-standard analysis. If the models are all sets (and we can usually take them in some internal model of set theory, so they will be sets), then an infinite collection of models will have non-standard pseudo-elements. An ultraproduct has the same properties as a pseudo-element of the collection; in fact, if a suitable expansion of the standard world into the nonstandard world is performed, then it will be a pseudo-element of the collection. (However, different ultraproducts may require different expansions, while non-standard analysis gives a huge amount of pseudo-elements within the same expansion. There is still some difference between the concepts.)

See an example of an ultraproduct for how to actually make ultraproducts work for you, as well as a proof of a theorem or two...

Example of an ultraproduct	NSA: What's a model?	mathematical model	ultrafilter
Axiom of Choice	Los' theorem	compactness theorem	model theory
Proof that any filter can be extended to an ultrafilter	Princess Eugenie	Equivalence relation	The square root of -1
Gross National Product	isomorphic	hyperreal number	abuse of notation
transfer principle	abbr	proof of Caratheodory's theorem on convexity	principal ultrafilter