Nebulon gives a good description of Bayesian networks, but I think that Bayes nets are best understood with a diagram.
Consider the following example:
Let R be the random variable indicating that it will rain tomorrow.
Let S indicate whether of not I go to sleep late tonight.
Let L indicate whether or not I am late to class tomorrow.
Tonight, I don't know if it will rain tomorrow, so the fact that I will go to sleep late is independent of rain tomorrow. However, rain tomorrow and a late bedtime tonight both increase the chance that I will be late to class in the morning.
If I was to represent this example as a Bayes net, it would look like this:
_________ _________
   
 Rain   Sleep 
_________ _________
\ /
\ /
\ /
\ /
\ /
\/ __________\/
 
 Late 
__________
To represent all of the relevant probabilities in this example, I need to provide the probabilities of R and S, P(R) and P(S), as well as all of the conditional probabilities of L, P(L  R & S), P(L  R & ~S), P(L  ~R & S), P(L  ~R & ~S). However, I gain by not having to provide conditional probabilities for R and S; I know that they are independent of each other.
Since there are three Boolean variables in this example, to provide the full joint probability distribution, I would have to give 8 pieces of information. By using the conditional information represented in my Bayes net, I can give only 6 pieces of information, and still provide the user with enough information to calculate any joint distribution that he desires.
Now, suppose I want to add another variable to my example.
Let U be the variable that indicates whether or not I understand tomorrow's lecture.
I am more likely to understand the lecture if I arrive on time, so U is dependent on L.
My new Bayesian network looks like this:
_________ _________
   
 Rain   Sleep 
_________ _________
\ /
\ /
\ /
\ /
\ /
\/ __________\/
 
 Late 
__________


\ /
____________
 
 Understand 
____________
Looking at this network, it is clear that U is not truly independent of R or S, since if it rains, I am more likely to be late, and therefore misunderstand the lecture. However, if I am given the value for L, I can gain no more information about U by knowing either R or S.
"If I know that I was late to class (L = 1), then knowing if it rained or if I went to bed late does not help me to predict if I will understand lecture."
What this means is that although
U is not fully
independent of
R or
S, it is
conditionally independent of
R and
S.
P(U  L & R) = P(U  L)
P(U  L & S) = P(U  L)