16  Zorn’s Lemma

An existence theorem asserts the existence of an object belonging to a certain set and possessing certain properties. Many existence theorems can be formulated (or, if need be, reformulated) so that the underlying set is a partially ordered set and the crucial property is maximality. Our next purpose is to state and prove the most important theorem of this kind.

Lemma 16.1 (Zorn’s lemma) If \(X\) is a partially ordered set such that every chain in \(X\) has an upper bound, then \(X\) contains a maximal element.

Discussion. Recall that a chain is a totally ordered set. By a chain “in \(X\)” we mean a subset of \(X\) such that the subset, considered as a partially ordered set on its own right, turns out to be totally ordered. If \(A\) is a chain in \(X\), the hypothesis of Zorn’s lemma guarantees the existence of an upper bound for \(A\) in \(X\); it does not guarantee the existence of an upper bound for \(A\) in \(A\). The conclusion of Zorn’s lemma is the existence of an element \(a\) in \(X\) with the property that if \(a \le x\), then necessarily \(a = x\).

The basic idea of the proof is similar to the one used in our preceding discussion of infinite sets. Since, by hypothesis, \(X\) is not empty, it has an element, say \(x_{0}\). If \(x_{0}\) is maximal, stop here. If it is not, then there exists an element, say \(x_{1}\), strictly greater than \(x_{0}\). If \(x_{1}\) is maximal, stop here; otherwise continue. Repeat this argument ad infinitum; ultimately it must lead to a maximal element.

The last sentence is probably the least convincing part of the argument; it hides a multitude of difficulties. Observe, for instance, the following possibility. It could happen that the argument, repeated ad infinitum, leads to a whole infinite sequence of non-maximal elements; what are we to do in that case? The answer is that the range of such an infinite sequence is a chain in \(X\), and, consequently, has an upper bound; the thing to do is to start the whole argument all over again, beginning with that upper bound. Just exactly when and how all this comes to an end is obscure, to say the least. There is no help for it; we must look at the precise proof. The structure of the proof is an adaptation of one originally given by Zermelo.

Proof. The first step is to replace the abstract partial ordering by the inclusion order in a suitable collection of sets. More precisely, we consider, for each element \(x\) in \(X\), the weak initial segment \(\bar{s}(x)\) consisting of \(x\) and all its predecessors. The range \(\mathcal{S}\) of the function \(\bar{s}\) (from \(X\) to \(\mathcal{P}(X)\)) is a certain collection of subsets of \(X\), which we may, of course, regard as (partially) ordered by inclusion. The function \(\bar{s}\) is one-to-one, and a necessary and sufficient condition that \(\bar{s}(x) \subset \bar{s}(y)\) is that \(x \le y\). In view of this, the task of finding a maximal element in \(X\) is the same as the task of finding a maximal set in \(\mathcal{S}\). The hypothesis about chains in \(X\) implies (and is, in fact, equivalent to) the corresponding statement about chains in \(\mathcal{S}\).

Let \(\mathcal{X}\) be the set of all chains in \(X\); every member of \(\mathcal{X}\) is included in \(\bar{s}(x)\) for some \(x\) in \(X\). The collection \(\mathcal{X}\) is a non-empty collection of sets, partially ordered by inclusion, and such that if \(\mathcal{C}\) is a chain in \(\mathcal{X}\), then the union of the sets in \(\mathcal{C}\) (i.e., \(\bigcup_{A \in \mathcal{C}} A\)) belongs to \(\mathcal{X}\). Since each set in \(\mathcal{X}\) is dominated by some set in \(\mathcal{S}\), the passage from \(\mathcal{S}\) to \(\mathcal{X}\) cannot introduce any new maximal elements. One advantage of the collection \(\mathcal{X}\) is the slightly more specific form that the chain hypothesis assumes; instead of saying that each chain \(\mathcal{C}\) has some upper bound in \(\mathcal{S}\), we can say explicitly that the union of the sets of \(\mathcal{C}\), which is clearly an upper bound of \(\mathcal{C}\), is an element of the collection \(\mathcal{X}\). Another technical advantage of \(\mathcal{X}\) is that it contains all the subsets of each of its sets; this makes it possible to enlarge non-maximal sets in \(\mathcal{X}\) slowly, one element at a time.

Now we can forget about the given partial order in \(X\). In what follows we consider a non-empty collection \(\mathcal{X}\) of subsets of a non-empty set \(X\), subject to two conditions: every subset of each set in \(\mathcal{X}\) is in \(\mathcal{X}\), and the union of each chain of sets in \(\mathcal{X}\) is in \(\mathcal{X}\). Note that the first condition implies that \(\emptyset \in \mathcal{X}\). Our task is to prove that there exists in \(\mathcal{X}\) a maximal set.

Let \(f\) be a choice function for \(X\), that is, \(f\) is a function from the collection of all non-empty subsets of \(X\) to \(X\) such that \(f(A) \in A\) for all \(A\) in the domain of \(f\). For each set \(A\) in \(\mathcal{X}\), let \(\hat{A}\) be the set of all those elements \(x\) of \(X\) whose adjunction to \(A\) produces a set in \(\mathcal{X}\); in other words, \(\hat{A} =\{ x \in X:A \cup \{ x \} \in \mathcal{X} \}\). Define a function \(g\) from \(\mathcal{X}\) to \(\mathcal{X}\) as follows: if \(\hat{A} - A \neq \emptyset\), then \(g(A) = A \cup \{ f( \hat{A} - A) \}\); if \(\hat{A} - A = \emptyset\), then \(g(A) = A\). It follows from the definition of \(\hat{A}\) that \(\hat{A} - A = \emptyset\) if and only if \(A\) is maximal. In these terms, therefore, what we must prove is that there exists in \(\mathcal{X}\) a set \(A\) such that \(g(A) = A\). It turns out that the crucial property of \(g\) is the fact that \(g(A)\) (which always includes \(A\)) contains at most one more element than \(A\).

Now, to facilitate the exposition, we introduce a temporary definition. We shall say that a subcollection \(\mathcal{J}\) of \(\mathcal{X}\) is a tower if

\[\begin{align*} &(i) \: \emptyset \in \mathcal{J}, \\ &(ii) \mathit{\ if\ } A \in \mathcal{J}, \mathit{\ then\ } g(A) \in \mathcal{J} ,\\ &(iii) \mathit{\ if\ } \mathcal{C} \mathit{\ is\ a\ chain\ in\ } \mathcal{J}, \mathit{\ then\ } \bigcup_{A \in \mathcal{C}} A \in \mathcal{J}. \end{align*}\]

Towers surely exist; the whole collection \(\mathcal{X}\) is one. Since the intersection of a collection of towers is again a tower, it follows, in particular, that if \(\mathcal{J}_{0}\) is the intersection of all towers, then \(\mathcal{J}_{0}\) is the smallest tower. Our immediate purpose is to prove that the tower \(\mathcal{J}_{0}\) is a chain.

Let us say that a set \(C\) in \(\mathcal{J}_{0}\) is comparable if it is comparable with every set in \(\mathcal{J}_{0}\); this means that if \(A \in \mathcal{J}_{0}\), then either \(A \subset C\) or \(C \subset A\). To say that \(\mathcal{J}_{0}\) is a chain means that all the sets in \(\mathcal{J}_{0}\) are comparable. Comparable sets surely exist: \(\emptyset\) is one of them. In the next couple of paragraphs we concentrate our attention on an arbitrary but temporarily fixed comparable set \(C\).

Suppose that \(A \in \mathcal{J}_{0}\) and \(A\) is a proper subset of \(C\). Assertion: \(g(A) \subset C\). The reason is that since \(C\) is comparable, either \(g(A) \subset C\) or \(C\) is a proper subset of \(g(A)\). In the latter case \(A\) is a proper subset of a proper subset of \(g(A)\), and this contradicts the fact that \(g(A) - A\) cannot be more than a singleton.

Consider next the collection \(\mathcal{U}\) of all those sets \(A\) in \(\mathcal{J}_{0}\) for which either \(A \subset C\) or \(g(C) \subset A\). The collection \(\mathcal{U}\) is somewhat smaller than the collection of sets in \(\mathcal{J}_{0}\) comparable with \(g(C)\); indeed if \(A \in \mathcal{U}\), then, since \(C \subset g(C)\), either \(A \subset g(C)\) or \(g(C) \subset A\). Assertion: \(\mathcal{U}\) is a tower. Since \(\emptyset \subset C\), the first condition on towers is satisfied. To prove the second condition, i.e., that if \(A \in \mathcal{U}\), then \(g(A) \in \mathcal {U}\), split the discussion into three cases. First: \(A\) is a proper subset of \(C\). Then \(g(A) \subset C\) by the preceding paragraph, and therefore \(g(A) \in \mathcal{U}\). Second: \(A = C\). Then \(g(A) = g(C)\), so that \(g(C) \subset g(A)\), and therefore \(g(A) \in \mathcal{U}\). Third: \(g(C) \subset A\). Then \(g(C) \subset g(A)\), and therefore \(g(A) \in \mathcal{U}\). The third condition on towers, i.e., that the union of a chain in \(\mathcal{U}\) belongs to \(\mathcal{U}\), is immediate from the definition of \(\mathcal{U}\). Conclusion: \(\mathcal{U}\) is a tower included in \(\mathcal{J}_{0}\), and therefore, since \(\mathcal{J}_{0}\) is the smallest tower, \(\mathcal{U} = \mathcal{J}_{0}\).

The preceding considerations imply that for each comparable set \(C\) the set \(g(C)\) is comparable also. Reason: given \(C\), form \(\mathcal{U}\) as above; the fact that \(\mathcal{U} = \mathcal{J}_{0}\) means that if \(A \in \mathcal{J}_{0}\), then either \(A \subset C\) (in which case \(A \subset g(C)\)) or \(g(C) \subset A\).

We now know that \(\emptyset\) is comparable and that \(g\) maps comparable sets onto comparable sets. Since the union of a chain of comparable sets is comparable, it follows that the comparable sets (in \(\mathcal{J}_{0}\)) constitute a tower, and hence that they exhaust \(\mathcal{J}_{0}\); this is what we set out to prove about \(\mathcal{J}_{0}\).

Since \(\mathcal{J}_{0}\) is a chain, the union, say \(A\), of all the sets in \(\mathcal{J}_{0}\) is itself a set in \(\mathcal{J}_{0}\). Since the union includes all the sets in \(\mathcal{J}_{0}\), it follows that \(g(A) \subset A\). Since always \(A \subset g(A)\), it follows that \(A = g(A)\), and the proof of Zorn’lemma is complete.

Exercise 16.1 Zorn’s lemma is equivalent to the axiom of choice. [Hint for the proof: given a set \(X\), consider functions \(f\) such that \(\text{dom } f \subset \mathcal{P}(X)\), \(\text{ran } f \subset X\), and \(f(A) \in A\) for all \(A\) in \(\text{dom } f\); order these functions by extension, use Zorn’s lemma to find a maximal one among them, and prove that if \(f\) is maximal, then \(\text{dom } f = \mathcal{P}(X) - \{ \emptyset \}\).] Consider each of the following statements and prove that they too are equivalent to the axiom of choice. (i) Every partially ordered set has a maximal chain (i.e., a chain that is not a proper subset of any other chain). (ii) Every chain in a partially ordered set is included in some maximal chain. (iii) Every partially ordered set in which each chain has a least upper bound has a maximal element.