Let M be a manifold in Rn defined by a vector equation f(x)=0 for some differentiable function f:RnRm. (A solution of a smooth vector equation is generally some manifold.) Let g:M→R be a differentiable function, and let v be an extremum point of M.

The Lagrange multiplier rule says that at v, the system of m+1 vectors

( f)(v)
( g)(v)
has less than full degree (i.e. is linearly dependent).

If we write f=(f1,...,fm), with each fi:RnR, the system of vectors is the perhaps more familiar

(∇ f1)(v)
...
(∇ fm)(v)
(∇ g) (v),
and one possible linear dependence is given by (∇ g)(v) being a linear combination of the (∇ fi)(v)'s; writing
(∇ g)(v) = a1 (∇ f1)(v) + ... + am (∇ fm)(v)
in this case shows you why the ai's are called Lagrange multipliers.

If you think about it, what the Lagrange multiplier rule is telling you is merely that if you constrain an end of a rubber band to a curved surface and pull the other end in some direction, the constrained end will come to a stop when the rubber band is perpendicular to the surface.