Nebulon gives a good description of Bayesian networks, but I think that Bayes nets are best understood with a diagram.

Consider the following example:

Let R be the random variable indicating that it will rain tomorrow.
Let S indicate whether of not I go to sleep late tonight.
Let L indicate whether or not I am late to class tomorrow.

Tonight, I don't know if it will rain tomorrow, so the fact that I will go to sleep late is independent of rain tomorrow. However, rain tomorrow and a late bedtime tonight both increase the chance that I will be late to class in the morning.

If I was to represent this example as a Bayes net, it would look like this:

``` _________                  _________
|         |                |         |
|  Rain   |                |  Sleep  |
|_________|                |_________|
\                     /
\                   /
\                 /
\               /
\             /
\/ __________\/
|          |
|   Late   |
|__________|
```

To represent all of the relevant probabilities in this example, I need to provide the probabilities of R and S, P(R) and P(S), as well as all of the conditional probabilities of L, P(L | R & S), P(L | R & ~S), P(L | ~R & S), P(L | ~R & ~S). However, I gain by not having to provide conditional probabilities for R and S; I know that they are independent of each other.

Since there are three Boolean variables in this example, to provide the full joint probability distribution, I would have to give 8 pieces of information. By using the conditional information represented in my Bayes net, I can give only 6 pieces of information, and still provide the user with enough information to calculate any joint distribution that he desires.

Now, suppose I want to add another variable to my example.

Let U be the variable that indicates whether or not I understand tomorrow's lecture.
I am more likely to understand the lecture if I arrive on time, so U is dependent on L.
My new Bayesian network looks like this:
``` _________                  _________
|         |                |         |
|  Rain   |                |  Sleep  |
|_________|                |_________|
\                     /
\                   /
\                 /
\               /
\             /
\/ __________\/
|          |
|   Late   |
|__________|
|
|
\ /
____________
|            |
| Understand |
|____________|
```

Looking at this network, it is clear that U is not truly independent of R or S, since if it rains, I am more likely to be late, and therefore misunderstand the lecture. However, if I am given the value for L, I can gain no more information about U by knowing either R or S.

"If I know that I was late to class (L = 1), then knowing if it rained or if I went to bed late does not help me to predict if I will understand lecture."
What this means is that although U is not fully independent of R or S, it is conditionally independent of R and S.
P(U | L & R) = P(U | L)
P(U | L & S) = P(U | L)