22  The Schröder-Bernstein Theorem

The purpose of counting is to compare the size of one set with that of another; the most familiar method of counting the elements of a set is to arrange them in some appropriate order. The theory of ordinal numbers is an ingenious abstraction of the method, but it falls somewhat short of achieving the purpose. This is not to say that ordinal numbers are useless; it just turns out that their main use is elsewhere, in topology, for instance, as a source of illuminating examples and counterexamples. In what follows we shall continue to pay some attention to ordinal numbers, but they will cease to occupy the center of the stage. (It is of some importance to know that we could in fact dispense with them altogether. The theory of cardinal numbers can be constructed with the aid of ordinal numbers, or without it; both kinds of constructions have advantages.) With these prefatory remarks out of the way, we turn to the problem of comparing the sizes of sets.

The problem is to compare the sizes of sets when their elements do not appear to have anything to do with each other. It is easy enough to decide that there are more people in France than in Paris. It is not quite so easy, however, to compare the age of the universe in seconds with the population of Paris in electrons. For some mathematical examples, consider the following pairs of sets, defined in terms of an auxiliary set \(A\): (i) \(X = A\), \(Y = A^{+}\) (ii) \(X = \mathcal{P}(A)\), \(Y = 2^{A}\); (iii) \(X\) is the set of all one-to-one mappings of \(A\) into itself, \(Y\) is the set of all finite subsets of \(A\). In each case we may ask which of the two sets \(X\) and \(Y\) has more elements. The problem is first to find a rigorous interpretation of the question and then to answer it.

The well ordering theorem tells us that every set can be well ordered. For well ordered sets we have what seems to be a reasonable measure of size, namely, their ordinal number. Do these two remarks solve the problem? To compare the sizes of \(X\) and \(Y\), may we just well order each of them and then compare \(\text{ord } X\) and \(\text{ord } Y\)? The answer is most emphatically no. The trouble is that one and the same set can be well ordered in many ways. The ordinal number of a well ordered set measures the well ordering more than it measures the set. For a concrete example consider the set \(\omega\) of all natural numbers. Introduce a new order by placing \(0\) after everything else. (In other words, if \(n\) and \(m\) are non-zero natural numbers, then arrange them in their usual order; if, however, \(n = 0\) and \(m \neq 0\), let \(m\) precede \(n\).) The result is a well ordering of \(\omega\); the ordinal number of this well ordering is \(\omega + 1\).

If \(X\) and \(Y\) are well ordered sets, then a necessary and sufficient condition that \(\text{ord } X < \text{ord } Y\) is that \(X\) be similar to an initial segment of \(Y\). It follows that we could compare the ordinal sizes of two well ordered sets even without knowing anything about ordinal numbers; all we need to know is the concept of similarity. Similarity was defined for ordered sets; the central concept for arbitrary unordered sets is that of equivalence. (Recall that two sets \(X\) and \(Y\) are called equivalent, \(X \sim Y\), in case there exists a one-to-one correspondence between them.) If we replace similarity by equivalence, then something like the suggestion of the preceding paragraph becomes usable. The point is that we do not have to know what size is if all we want is to compare sizes.

If \(X\) and \(Y\) are sets such that \(X\) is equivalent to a subset of \(Y\), we shall write

\[ X \precsim Y. \]

The notation is temporary and does not deserve a permanent name. As long as it lasts, however, it is convenient to have a way of referring to it; a reasonable possibility is to say that \(Y\) dominates \(X\). The set of those ordered pairs \((X, Y)\) of subsets of some set \(E\) for which \(X \precsim Y\) constitutes a relation in the power set of \(E\). The symbolism correctly suggests some of the properties of the concept that it denotes. Since the symbolism is reminiscent of partial orders, and since a partial order is reflexive, antisymmetric, and transitive, we may expect that domination has similar properties.

Reflexivity and transitivity cause no trouble. Since each set \(X\) is equivalent to a subset (namely, \(X\)) of itself, it follows that \(X \precsim X\) for all \(X\). If \(f\) is a one-to-one correspondence between \(X\) and subset of \(Y\), and if \(g\) is a one-to-one correspondence between \(Y\) and a subset of \(Z\), then we may restrict \(g\) to the range of \(f\) and compound the result with \(f\); the conclusion is that \(X\) is equivalent to a subset of \(Z\). In other words, if \(X \precsim Y\) and \(Y \precsim Z\),then \(X \precsim Z\).

The interesting question is that of antisymmetry. If \(X \precsim Y\) and \(Y \precsim X\), can we conclude that \(X = Y\)? This is absurd; the assumptions are satisfied whenever \(X\) and \(Y\) are equivalent, and equivalent sets need not be identical. What then can we say about two sets if all we know is that each of them is equivalent to a subset of the other? The answer is contained in the following celebrated and important result.

Theorem 22.1 (Schröder-Bernstein theorem) If \(X \precsim Y\) and \(Y \precsim X\), then \(X \sim Y\).

Remark. Observe that the converse, which is incidentally a considerable strengthening of the assertion of reflexivity, follows trivially from the definition of domination.

Proof. Let \(f\) be a one-to-one mapping from \(X\) into \(Y\) and let \(g\) be a one-to-one mapping from \(Y\) into \(X\); the problem is to construct a one-to-one correspondence between \(X\) and \(Y\). It is convenient to assume that the sets \(X\) and \(Y\) have no elements in common; if that is not true, we can so easily make it true that the added assumption involves no loss of generality.

We shall say that an element \(x\) in \(X\) is the parent of the element \(f(x)\) in \(Y\), and, similarly, that an element \(y\) in \(Y\) is the parent of \(g(y)\) in \(X\). Each element \(x\) of \(X\) has an infinite sequence of descendants, namely, \(f(x)\), \(g(f(x))\), \(f(g(f(x)))\), etc., and similarly, the descendants of an element \(y\) of \(Y\) are \(g(y)\), \(f(g(y))\), \(g(f(g(y)))\), etc. This definition implies that each term in the sequence is a descendant of all preceding terms; we shall also say that each term in the sequence is an ancestor of all following terms.

For each element (in either \(X\) or \(Y\)) one of three things must happen. If we keep tracing the ancestry of the element back as far as possible, then either we ultimately come to an element of \(X\) that has no parent (these orphans are exactly the elements of \(X - g(Y)\)), or we ultimately come to an element of \(Y\) that has no parent \((Y - f(X))\), or the lineage regresses ad infinitum. Let \(X_{X}\) be the set of those elements of \(X\) that originate in \(X\) (i.e., \(X_{X}\) consists of the elements of \(X - g(Y)\) together with all their descendants in \(X\)), let \(X_{Y}\) be the set of those elements of \(X\) that originate in \(Y\) (i.e., \(X_{Y}\) consists of all the descendants in \(X\) of the elements of \(Y - f(X)\)), and let \(X_{\infty}\) be the set of those elements of \(X\) that have no parentless ancestors. Partition \(Y\) similarly into the three sets \(Y_{X}\), \(Y_{Y}\), and \(Y_{\infty}\).

If \(x \in X_{X}\), then \(f(x) \in Y_{X}\), and, in fact, the restriction of \(f\) to \(X_{X}\) is a one-to-one correspondence between \(X_{X}\) and \(Y_{X}\). If \(x \in X_{Y}\), then \(x\) belongs to the domain of the inverse function \(g^{-1}\) and \(g^{-1}(x) \in Y_{Y}\); in fact the restriction of \(g^{-1}\) to \(X_{Y}\) is a one-to-one correspondence between \(X_{Y}\) and \(Y_{Y}\). If, finally, \(x \in X_{\infty}\), then \(f(x) \in Y_{\infty}\), and the restriction of \(f\) to \(X_{\infty}\) is a one-to-one correspondence between \(X_{\infty}\) and \(Y_{\infty}\); alternatively, if \(x \in X_{\infty}\), then \(g^{-1}(x) \in Y_{\infty}\), and the restriction of \(g^{-1}\) to \(X_{\infty}\) is a one-to-one correspondence between \(X_{\infty}\) and \(Y_{\infty}\). By combining these three one-to-one correspondences, we obtain a one-to-one correspondence between \(X\) and \(Y\).

Exercise 22.1 Suppose that \(f\) is a mapping from \(X\) into \(Y\) and \(g\) is a mapping from \(Y\) into \(X\). Prove that there exist subsets \(A\) and \(B\) of \(X\) and \(Y\) respectively, such that \(f(A) = B\) and \(g(Y - B) = X - A\). This result can be used to give a proof of the Schröder-Bernstein theorem that looks quite different from the one above.

By now we know that domination has the essential properties of a partial order; we conclude this introductory discussion by observing that the order is in fact total. The assertion is known as the comparability theorem for sets: it says that if \(X\) and \(Y\) are sets, then either \(X \precsim Y\) or \(Y \precsim X\). The proof is an immediate consequence of the well ordering theorem and of comparability theorem for well ordered sets. Well order both \(X\) and \(Y\) and use the fact that either the well ordered sets so obtained are similar or one of them is similar to an initial segment of the other; in the former case \(X\) and \(Y\) are equivalent, in the latter one of them is equivalent to a subset of the other.