Set Theory

Set theory defines a union operation. For any set A, an axiom guarantees the existence of the union: a set

B = A = { x : y.y∈A&x∈y }.
B is the set of all elements of elements of A.

When A={a,b} is a pair (derived e.g. by the axiom of pairing), we use the more common notation a∪b = A. This notation is also used when a=b.

Note, however, that the existence of unions A when A is infinite does not follow from the existence of unions of such pairs. In particular: The cardinality of a finite union is bounded by the cardinalities of A and of each of the elements of A. But the cardinality of an infinite (even countable) union has no such bound. Hence the need for an axiom.

When A is an indexed set A = {ai: i∈I} for some index set I, notations based on ∪i∈Iai = A are common. Thus, we may see

N = n=1 {k⋅pn : k=1, 2, ...}
where p1=2, p2=3, p3=5, ... is the sequence of prime numbers.

These less formal notations are more common in most of mathematics.

Given a "universal" set in which to take complements, unions are related to intersections by means of DeMorgan's laws.


C and C++

In C and in C++, a union is a type. The syntax of a union is exactly the same as the syntax of a struct:

union U {
  char *   a_string;
  int      an_integer;
  double   a_number;
};
defines a union U with 3 fields. However, unions are limited in such a way as to allow access to only one field at a time.

Defined behaviour

The informal rule is that a union holds all fields overlapping in memory. The formal rules are more complex, but amount to the same thing:

  • A pointer to the union can be converted into a pointer to the type of any of the fields (in particular, it has the correct alignment for each of these types);
  • Reading any but the last-written field of the union yields undefined behaviour;
  • The sizeof the union is the largest of the sizeofs each of the members.
As usual, standardese prohibits the use of words such as "memory", or indeed any mention of implementation. But we all know what they mean.

Since no portable conversions exist in such a configuration, standard C unions are limited to expressing just the "variant" part of Pascal's variant records. C has no notion of the type member; if YOU want to know the type currently stored in a union, YOU have to make sure you know what it is. One way to do this could be to say

struct variant {
  enum {e_pchar, e_int, e_double} type;
  union U value;
};
and use the type field to keep track of the type of the last stored data.

But C doesn't force you to do it this way. Your program might store types differently. For instance:

struct leaf { /* ... */ };
struct node { /* ... */ };
struct tree {
  struct tree *left, *right;      /* left and right subtrees */
  union {
    struct leaf a_leaf;           /* If (!left) && (!right) */
    struct node a_node;           /* Otherwise */
  } data;
};
could be one way to define a tree.

Undefined Behaviour

Once compiler-specific features are taken into account, unions naturally become more flexible: you can taken advantage of specific knowledge of what the "undefined behaviour" of reading from a "wrong" member will do. For instance, the standard assigns no meaning to this code:

union address_convert {
  void *ptr;
  long addr;
};
void *convert(long addr)
{
  union address_convert cvt;
  cvt.addr = addr;
  return cvt.ptr;
}
It invokes undefined behaviour. However, a particular implementation MAY specify that such code will work "correctly". This would involve ensuring that longs and void pointers have appropriate sizes, and that the "correct" conversion does indeed occur.

For such purposes too, unions are a very practicable way for implementation-specific code to achieve certain behaviours.


The connection

There isn't really one, except that the English language provides the same word for both. The C union { A a; B b; }; can store all values of the disjoint union A∪B. But it relies on labeled values (you can store in either of the fields .a and .b, and you must specify which one you mean). And if A and B refer to the same type, the distinction is even more pronounced.

C unions better fit (one of) the semantics commonly ascribed to "A ``square cup'' B", which isn't really a union in the mathematical sense.