Introduction
The covariant derivative is a differential operator which plays an important role in differential geometry and gives the rate of change or total derivative of a scalar field, vector field or general tensor field along some path through curved space. Consider specifying a vector field in terms of some coordinate basis vectors. In generally curved space, these basis vectors change from point to point, so that finding the derivative of a vector field we must not only know how the components of the vector change, but also how the basis vectors change. Anyone who has seen expressions for ∇× (curl) or ∇^{2} (Laplacian) in spherical, polar or other curvilinear coordinates has encountered the difficulties that arise when not dealing with a fixed coordinate basis.
Definition
A linear operator ∇ : T^{m}_{n} → T^{m}_{n+1} where T^{m}_{n} is a (m,n) tensor field and T^{m}_{n+1} is a type (m,n+1) tensor field, that, for any two tensor fields A and B, satisfies the following properties:
(Throughout this document the notation ∂_{α} ≡ ∂/∂x^{α} is used.)
Note: In the above definition I chose to suppress the indices of A and B in favour of readability. However, A and B are general tensors e.g. A^{bcde...}_{pqrst...}. The covariant derivative of this tensor is then &nabla_{a}A^{bcde...}_{pqrst...}, wherein the indices begin to obscure the intended meaning. Throughout the rest of this document I will also take tensor to mean tensor field for brevity, since a tensor field associates with every point on a manifold a tensor, and it is properly the tensor field upon which the covariant derivative operates. Thanks to krimson for suggesting that I point this out.
The covariant derivative is a derivative of tensors that takes into account the curvature of the manifold in which these tensors live, as well as dynamics of the coordinate basis vectors. In cartesian coordinates, the covariant derivative is simply a partial derivative ∂_{α}. In spherical coordinates, for example, the coordinate basis vectors change between different points, so the derivative of a vector expressed in terms of these basis vectors must take this into account.
The covariant derivative is also known as the semicolon derivative and is written as A_{;a} = ∇_{a}A.
If we further require that, for any vectors u, v ∈ V, v^{a}∇_{a}u^{b} = ∇_{v}(u) where ∇_{v} is the affine connection, then we can completely specifiy the the action of ∇ on any tensor. To do this we will also place a torsionfree condition on the connection.
v^{a}∇_{a}u^{b} = (v^{α}e^{a}_{α})(e_{a}^{β}∇_{&beta})(u^{γ}e^{b}_{γ})
v^{a}∇_{a}u^{b} = v^{α}δ_{α}^{β}∇_{&beta}(u^{γ}e^{b}_{γ})
v^{a}∇_{a}u^{b} = v^{α}∇_{α}(u^{γ}e^{b}_{γ})
v^{a}∇_{a}u^{b} = v^{α}∇_{α}(u^{β}e^{b}_{β})
(relabling indices)
Now ∇_{v}(u) = v^{α} ∂_{α}(u^{β})e^{b}_{β} + v^{α}u^{β}Γ^{γ}_{α} _{β} e^{b}_{γ} where Γ^{γ}_{α} _{β} is the Christoffel symbol of the 2^{nd} kind.
We can exchange β and γ in the second term to obtain ∇_{v}(u) = v^{α} ∂_{α}(u^{β})e^{b}_{β} + v^{α}u^{γ}Γ^{β}_{α} _{γ} e^{b}_{β} which is equal to
v^{α}( ∂_{α}u^{β} + Γ^{β}_{α} _{γ} u^{γ}) e^{b}_{β}
Thus we have:
v^{α}∇_{α}(u^{β}e^{b}_{β}) = v^{α}( ∂_{α}u^{β} + Γ^{β}_{α} _{γ} u^{γ}) e^{b}_{β} which is true for all v^{α}, so that we can write:
∇_{a}(u^{b}) = e_{a}^{α}(∂_{α}u^{β} + Γ^{β}_{α} _{γ} u^{γ}) e^{b}_{β}, or
∇_{a}(u^{b}) = (∂_{α}u^{β} + Γ^{β}_{α} _{γ} u^{γ}) e_{a}^{α} e^{b}_{β}
Armed with this information, we can find the covariant derivative of any tensor. The method in each case will be the same: Contract the tensor with objects for which the covariant derivative is known, in such a way that the result is a scalar. Compute the covariant deriviative of the product using the both the Leibniz rule for the covariant derivative and for partial derivatives, keeping in mind that the covariant derivative of a scalar is merely the gradient of that scalar.
As an example, consider the covariant derivative of a oneform ω_{b}, ∇_{a}ω_{b}. Contracting ω_{b} with a vector v^{b} yields a scalar, ω_{β}v^{β}. Thus we can compute ∇_{a}(ω_{b}v^{b}) in two ways:
Firstly, ∇_{a}(ω_{b}v^{b}) = v^{b}∇_{a}(ω_{b}) + ω_{b}∇_{a}(v^{b}) (Leibniz rule)
Since we already know the covariant derivative's action on vectors, we can expand the second term:
∇_{a}(ω_{b}v^{b}) = v^{b}∇_{a}(ω_{b}) + ω_{β}(∂_{α}v^{β} + Γ^{β}_{α} _{γ} v^{γ}) e^{α}_{a}.
Secondly, ∇_{a}(ω_{b}v^{b}) =
∇_{a}(ω_{β}v^{β}) = ∂_{α}(ω_{β}v^{β}) e^{α}_{a}, which is equal to (ω_{β}∂_{α}v^{β} + v^{β}∂_{α}ω_{β}) e^{α}_{a} by the Leibniz rule.
We can now equate these two results and solve for the term we want, ∇_{a}(ω_{b}):
v^{b}∇_{a}(ω_{b}) + ω_{β}(∂_{α}v^{β} + Γ^{β}_{α} _{γ} v^{γ}) e^{α}_{a} = (ω_{β}∂_{α}v^{β} + v^{β}∂_{α}ω_{β}) e^{α}_{a}
v^{b}∇_{α}(ω_{b}) e^{α}_{a} = (v^{β}∂_{α}ω_{β}  ω_{β}Γ^{β}_{α} _{γ} v^{γ}) e^{α}_{a}
We can exchange γ and β in the second term, to give:
v^{b}∇_{α}(ω_{b}) e^{α}_{a} = v^{β}(∂_{α}ω_{β}  ω_{γ}Γ^{γ}_{α} _{β}) e^{α}_{a}
We can commute with respect to contraction with β on the left hand side, giving v^{β}∇_{α}(ω_{β}) e^{α}_{a}. This result must be true independent of the choice of v^{b} so:
∇_{a}(ω_{b}) = (∂_{α}ω_{β}  Γ^{γ}_{α} _{β}ω_{γ})e^{α}_{a}e_{β}^{b}
To summarize:

∇_{a}(f) = (∂_{α}f)e_{a}^{α}

∇_{a}(u^{b}) = (∂_{α}u^{β} + Γ^{β}_{α} _{γ} u^{γ}) e_{a}^{α} e_{β}^{b}
 ∇_{a}(ω_{b}) = (∂_{α}ω_{β}  Γ^{γ}_{α} _{β}ω_{γ})e^{α}_{a}e_{β}^{b}
This method generalizes to any tensor, for example:
∇_{a}(T_{bc}^{de}) = (∂_{α} T_{βγ}^{δε}
 Γ^{κ}_{αβ}T_{κγ}^{δε}
 Γ^{κ}_{αγ}T_{βκ}^{δε}
+ Γ^{δ}_{ακ}T_{βγ}^{κε}
+ Γ^{ε}_{ακ}T_{βγ}^{δκ}
)e^{α}_{a}
e_{b}^{β}
e_{c}^{γ}
e^{δ}_{d}
e^{ε}_{e}
For each raised index we contract with a lowered index on the Christoffel symbol, and for each lowered index we contract with a raised index on the Christoffel symbol, whilst taking a negative sign.
Finally, if we require that the covariant derivative be torsion free, which means that covariant derivatives of a scalar field commute, then
∇_{α}∇_{β}(f) = ∇_{β}∇_{α}(f)
We can expand either side since ∇_{α}(f) = ∂_{α}f are the components of a oneform (gradient). Then
∂_{α}∂_{β}f  Γ^{γ}_{αβ} ∂_{γ}f
= ∂_{β}∂_{α}f  Γ^{γ}_{βα} ∂_{γ}f
Since partial derivatives commute, the torsion free condition requires that
Γ^{γ}_{αβ} = Γ^{γ}_{βα}
.
More formally, the torsion free condition requires that
(∇_{u}∇_{v}  ∇_{v}∇_{u})A = [u,v]A ∀ A where [u,v] is the Lie bracket or commutator of u and v, and A is any tensor. Here (∇_{u}∇_{v})A is shorthand for v^{a}∇_{a}(u^{b}∇ _{b} A). The commutator is such that if u = d/dλ and v = d/dμ then [u,v]f = ∂^{2}f/∂λ∂μ  ∂^{2}f/∂μ∂λ. If λ and μ are coordinates then the partial derivatives commute and the commutator vanishes. Above we only considerd a special case of the torsion free condition applied to scalar fields in order to discover the constraint imposed on
Γ^{γ}_{αβ}. Kudos to krimson for helping me out with this one.
Note that the original definition of the covariant derivative was made without fixing its effect on any general tensors. However, by requiring that it behave like the affine connection, and by requiring that the connection be torsion free, we were able to completely specify the covariant derivative. The result is what is usually meant by the term "covariant derivative", but it is not the only one. Normally, though, the metric induces a unique metric compatible connection, which in turn specifies the covariant derivative uniquely. Again, thanks to krimson for suggesting this clarification.
Applications
Using the covariant derivative we can succinctly express various concepts in differential geometry. For example, consider parallel transporting the vector u^{b} along the curve Γ(λ) with tangent vector v^{a} = (d/dλ)^{a}. Using the affine connection we would require that ∇_{v}u = 0, i.e. the vector remains parallel to it's original direction as it is moved across the manifold. Using the covariant derivative we can write the equivalent expression v^{a}∇_{a}u^{b} = 0. Thus for a given curve, once the Christoffel symbols are known, this expression results in a set of differential equiations that describe how the components of u change as it is parallel transported along the curve.
We can also obtain the geodesic equations which describe curves that are geodesics, or curves between two points for which the arclength is an extremum (local minimum/maximum). These curves can be thought of as "straight lines" in curved space, for example great circles on the surface of a 2sphere. It can be shown that a geodesic is a curve that parallel transports its own tangent vector. Thus for a curve Γ(λ) with tangent vector v^{a} = (d/dλ)^{a}, we require that v^{a}∇_{a}v^{b} = 0.
Although the Christoffel symbols that define the covariant derivative may be chosen with some degree of freedom, we can choose them in a way that is compatible with the metric g. Consider parallel transporting two vectors w^{b} and u^{c} along a curve with tangent vector v^{a}. We first require that v^{a}∇_{a}w^{b} = 0 and v^{a}∇_{a}u^{c} = 0. Since the dot product of u and w, g_{bc}u^{b}w^{c} should remain unaltered by the parallel transport, we then require v^{a}∇_{a}(g_{bc}u^{b}w^{c}) = 0. Then:
v^{a}∇_{a}(g_{bc}u^{b}w^{c}) = v^{a}∇_{a}(g_{bc})u^{b}w^{c}+
v^{a}∇_{a}(u^{b})g_{bc}w^{c}+
v^{a}∇_{a}(w^{c})g_{bc}u^{b}w^{c} = 0
The third and forth terms vanish since v^{a}∇_{a}w^{b} = 0 and v^{a}∇_{a}u^{c} = 0, and since v^{a} was arbitrarily chosen, it must be true that ∇_{a}(g_{bc}) = 0. Using this equation, along with the torsion condition, Γ^{γ}_{αβ} can be determined in terms of g_{αβ}:
First, we expand the components of ∇_{a}(g_{bc}):
∇_{α}(g_{βγ}) = ∂_{α}g_{βγ} 
Γ^{κ}_{αβ}g_{κγ}  Γ^{κ}_{αγ}g_{βκ} = 0
We can then permute the indices α, β, and γ, and then adding/subtracting gives:
∂_{α}g_{βγ} 
Γ^{κ}_{αβ}g_{κγ}  Γ^{κ}_{αγ}g_{βκ} +
∂_{β}g_{γα} 
Γ^{κ}_{βγ}g_{κα}  Γ^{κ}_{βα}g_{γκ} 
∂_{γ}g_{αβ} +
Γ^{κ}_{γα}g_{κβ} + Γ^{κ}_{γβ}g_{ακ} = 0
Now, since the metric is symmetric, g_{αβ} = g_{βα}, and since we have imposed the torsion free condition that Γ^{γ}_{αβ} = Γ^{γ}_{βα}, we have:
∂_{α}g_{βγ} 
Γ^{κ}_{αβ}g_{κγ} +
∂_{β}g_{γα}  Γ^{κ}_{βα}g_{γκ} 
∂_{γ}g_{αβ} = 0
2Γ^{κ}_{αβ}g_{γκ} =
∂_{α}g_{βγ} +
∂_{β}g_{γα} 
∂_{γ}g_{αβ}, so multiplying both sides by the inverse metric, g^{γκ}, gives the metric compatible connection:
Γ^{κ}_{αβ} =
½g^{γκ}(∂_{α}g_{βγ} +
∂_{β}g_{γα} 
∂_{γ}g_{αβ})
As an example of using the covariant derivative, we can write the divergence of a vector field as:
∇.v = ∇_{a}v^{a}
We can also write the curl of a vector field, ∇×v, as:
(∇×v)^{α} = ε^{αμλ}∇_{λ}v_{μ}
where ε^{αμλ} is the volume form.