Set theory defines a union operation. For any set A, an axiom guarantees the existence of the union: a set
B = ∪A = { x : ∃y.y∈A&x∈y }.
B is the set of all
elements of elements of A.
When A={a,b} is a pair (derived e.g. by the axiom of pairing), we use the more common notation a∪b = ∪A. This notation is also used when a=b.
Note, however, that the existence of unions ∪A when A is infinite does not follow from the existence of unions of such pairs. In particular: The cardinality of a finite union is bounded by the cardinalities of A and of each of the elements of A. But the cardinality of an infinite (even countable) union has no such bound. Hence the need for an axiom.
When A is an indexed set A = {ai: i∈I} for some index set I, notations based on ∪i∈Iai = ∪A are common. Thus, we may see
N = ∪n=1∞ {k⋅pn : k=1, 2, ...}
where p
1=2, p
2=3, p
3=5, ... is the
sequence of
prime numbers.
These less formal notations are more common in most of mathematics.
Given a "universal" set in which to take complements, unions are related to intersections by means of DeMorgan's laws.
In C and in C++, a union is a type. The syntax of a union is exactly the same as the syntax of a struct:
union U {
char * a_string;
int an_integer;
double a_number;
};
defines a union U with 3
fields. However, unions are limited in such a way as to allow
access to
only one field at a
time.
Defined behaviour
The informal rule is that a union holds all fields overlapping in memory. The formal rules are more complex, but amount to the same thing:
- A pointer to the union can be converted into a pointer to the type of any of the fields (in particular, it has the correct alignment for each of these types);
- Reading any but the last-written field of the union yields undefined behaviour;
- The sizeof the union is the largest of the sizeofs each of the members.
As usual,
standardese prohibits the use of words such as "memory", or indeed any mention of
implementation. But we all know what they mean.
Since no portable conversions exist in such a configuration, standard C unions are limited to expressing just the "variant" part of Pascal's variant records. C has no notion of the type member; if YOU want to know the type currently stored in a union, YOU have to make sure you know what it is. One way to do this could be to say
struct variant {
enum {e_pchar, e_int, e_double} type;
union U value;
};
and use the type field to
keep track of the type of the last stored data.
But C doesn't force you to do it this way. Your program might store types differently. For instance:
struct leaf { /* ... */ };
struct node { /* ... */ };
struct tree {
struct tree *left, *right; /* left and right subtrees */
union {
struct leaf a_leaf; /* If (!left) && (!right) */
struct node a_node; /* Otherwise */
} data;
};
could be one way to define a tree.
Undefined Behaviour
Once compiler-specific features are taken into account, unions naturally become more flexible: you can taken advantage of specific knowledge of what the "undefined behaviour" of reading from a "wrong" member will do. For instance, the standard assigns no meaning to this code:
union address_convert {
void *ptr;
long addr;
};
void *convert(long addr)
{
union address_convert cvt;
cvt.addr = addr;
return cvt.ptr;
}
It invokes undefined behaviour. However, a particular implementation MAY specify that such code will work "correctly". This would involve ensuring that
longs and
void pointers have appropriate
sizes, and that the "correct" conversion does indeed occur.
For such purposes too, unions are a very practicable way for implementation-specific code to achieve certain behaviours.
There isn't really one, except that the English language provides the same word for both. The C union { A a; B b; }; can store all values of the disjoint union A∪B. But it relies on labeled values (you can store in either of the fields .a and .b, and you must specify which one you mean). And if A and B refer to the same type, the distinction is even more pronounced.
C unions better fit (one of) the semantics commonly ascribed to "A ``square cup'' B", which isn't really a union in the mathematical sense.