Quantcast
Channel: physics.quant-ph – Annoying Precision
Viewing all articles
Browse latest Browse all 10

Hilbert spaces (and dagger categories)

$
0
0

Hilbert spaces are a particularly nice class of Banach spaces. They axiomatize ideas from Euclidean geometry such as orthogonality, projection, and the Pythagorean theorem, but the ideas apply to many infinite-dimensional spaces of functions of interest to various branches of mathematics. Hilbert spaces are also fundamental to quantum mechanics, as vectors in Hilbert spaces (up to phase) describe (pure) states of quantum systems.

Today we’ll develop and discuss some of the basic theory of Hilbert spaces. As with the theory of Banach spaces, there are (at least) two types of morphisms we might want to talk about (unitary operators and bounded operators), and we will discuss an elegant formalism that allows us to talk about both. Things written by John Baez will be cited excessively.

Definition and introductory remarks

Let V be a vector space over k = \mathbb{R} or k = \mathbb{C}. An inner product on V is a map \langle -, - \rangle : V \times V \to k satisfying

  1. \langle x, ay + bz \rangle = a \langle x, y \rangle + b \langle x, z \rangle (linearity in the second argument),
  2. \langle x, y \rangle = \overline{  \langle y, x \rangle } (conjugate symmetry; this implies conjugate linearity in the first argument),
  3. \langle x, x \rangle \ge 0 and \langle x, x \rangle = 0 \Rightarrow x = 0 (positive-definiteness).

(Linearity in the second variable is conventional in physics but in mathematics the convention is generally to have linearity in the first variable. We use the physics convention above for reasons explained in the next section.)

A vector space equipped with an inner product is an inner product space. Inner products generalize the ordinary dot product of vectors in \mathbb{R}^n, but the formalism applies to infinite-dimensional spaces such as various function spaces, allowing us to use geometric intuition from the former to understand the latter. In quantum mechanics, inner products are fundamental as they give rise to transition amplitudes (see for example the Born rule).

Any inner product spaces gives rise to a function \| x \| = \sqrt{ \langle x, x \rangle } which is readily seen to satisfy all of the axioms of a norm with the possible exception of the triangle inequality, which we now prove.

Cauchy-Schwarz inequality: let u, v be vectors in an inner product space. Then |\langle u, v \rangle| \le \| u\| \| v \|.

The Cauchy-Schwarz inequality can be proven in many ways (see for example Steele’s The Cauchy-Schwarz Master Class). Although it is stated here for an arbitrary inner product space, by restricting to the subspace generated by u and v we see that it is really a statement about 2-dimensional inner product spaces.

Proof. Consider the quadratic polynomial

\displaystyle \| u - vt \|^2 = \langle u - vt, u - vt \rangle = \| u \|^2 - 2t \text{Re}(\langle u, v \rangle) + \| v \|^2.

By positive-definiteness, it cannot be negative, so its discriminant cannot be positive. This gives

\Delta = 4 \text{Re}(\langle u, v \rangle)^2 - 4 \| u \|^2 \| v \|^2 \le 0

and it follows that \text{Re}(\langle u, v \rangle) \le \| u \|^2 \| v \|^2. Multiplying u by a complex number of absolute value 1 does not change the RHS, and it can make the LHS real and non-negative, giving the desired inequality. \Box

Corollary: \| u + v \| \le \| u \| + \| v \|.

Proof. By Cauchy-Schwarz,

\| u + v \|^2 = \| u \|^2 + 2 \text{Re}(\langle u, v \rangle) + \| v \|^2 \le \| u \|^2 + 2 \| u \| \| v \| + \| v \|^2. \Box

Following the above, for an inner product \langle \cdot, \cdot \rangle we call \| \cdot \| the induced norm.

Corollary: For any inner product space V and any v \in V, the map v \mapsto \langle u, v \rangle is a continuous linear functional of operator norm \| u \| with respect to the induced norm.

The identity \| u + v \|^2 = \| u \|^2 + 2 \text{Re}(\langle u, v \rangle) + \| v \|^2 should be thought of an abstract form of the law of cosines. In particular, if \langle u, v \rangle = 0 (u, v are orthogonal), then the Pythagorean theorem

\displaystyle \| u + v \|^2 = \| u \|^2 + \| v \|^2

holds.

An inner product space V is a Hilbert space if it is complete with respect to the induced norm.

Example. For X any measure space with measure \mu, the space L^2(X) is a Hilbert space with inner product

\displaystyle \langle f, g \rangle = \int_X \overline{f(x)} g(x) \, d \mu.

Special cases include the spaces \ell^2(S) for a set S as in the Banach space examples; wehn S is finite and we work over the reals we recover Euclidean space with the usual inner product. In quantum mechanics, a fundamental example is X = \mathbb{R}^3 with Lebesgue measure, as L^2(\mathbb{R}^3) is the space in which wave functions describing a particle in three spatial dimensions live. If \mu is a probability measure we can think of f, g as random variables, and if they happen to have expected value 0 then \langle f, g \rangle is their covariance.

If V is a real inner product space with induced norm \| v \|, then a straightforward computation shows that

\displaystyle \langle x, y \rangle = \frac{ \| x + y \|^2 - \| x - y \|^2 }{2}

and if V is a complex inner product space a somewhat more tedious computation shows that

\displaystyle \langle x, y \rangle = \frac{ \| x + y \|^2 - \| x - y \|^2 + i \| ix + y \|^2 - i \| ix - y \|^2}{4}.

In any case, we conclude that the inner product uniquely determined by the norm it induces. Thus being Hilbert is a property of a Banach space up to isometric isomorphism. We can even characterize the Banach spaces with this property in a fairly straightforward manner: they are precisely the ones with norms satisfying the parallelogram identity

\displaystyle \| x + y \|^2 + \| x - y \|^2 = 2 \| x\|^2 + 2 \| y \|^2.

This is fairly annoying to prove, but it has a nice interpretation: if a norm is like the Euclidean norm in this particular respect, then it must be like the Euclidean norm in various other respects (coming from what can be proven using the inner product space axioms).

We might now be tempted to think of Hilbert spaces as a subcategory of \text{Ban}_1, but we shouldn’t. For example, the product or coproduct of Hilbert spaces in \text{Ban}_1 is almost never a Hilbert space; Hilbert spaces instead admit a direct sum coming from a generalized \ell^2-norm rather than a generalized \ell^1– or \ell^{\infty}-norm. This suggests that weak contractions aren’t a natural choice of morphisms between Hilbert spaces.

If we want to be permissive, we should take bounded linear operators as morphisms. If we want to be restrictive, we want all of the relevant structure to be preserved (namely the inner product), so we could take as morphisms maps U : H_1 \to H_2 such that

\langle v, w \rangle_{H_1} = \langle U(v), U(w) \rangle_{H_2}.

These include the unitary maps, which are the invertible maps with this property.

(Note that since the inner product is uniquely determined by a composition of linear functions and the norm, it follows that a linear operator between Hilbert spaces preserves the inner product if and only if it preserves the norm. Thus we may call a map satisfying the above property an isometry.)

We also make the following observation whose name will be explained below.

The Yoneda lemma for inner product spaces: Let u, v be vectors in an inner product space such that \langle u, \cdot \rangle = \langle v, \cdot \rangle. Then u = v.

Proof. The above implies \langle u-v, \cdot \rangle = 0, so \|u - v \| = 0, so u = v by positive-definiteness. \Box

2-Hilbert spaces

The theory of real Hilbert spaces is a straightforward axiomatization of the properties of the dot product in Euclidean space, but the theory of complex Hilbert spaces includes an additional wrinkle, namely the issue of conjugate symmetry and the fact that the inner product is conjugate-linear rather than linear in one variable. Above I chose to have inner products be linear in the second variable rather than the first, and the reason is the following example.

Let G be a finite group and consider the category \text{Rep}(G) of finite-dimensional complex representations of G. For V, W \in \text{Rep}(G) with characters \chi_V, \chi_W, recall that we have

\displaystyle \dim \text{Hom}_G(V, W) = \frac{1}{|G|} \sum_{g \in G} \overline{\chi_V(g)} \chi_W(g).

In other words, the dimension of spaces of intertwining operators defines an inner product on the complex vector space spanned by characters (formally, the tensor product \mathbb{C} \otimes K(\text{Rep}(G)) where K denotes the Grothendieck group) which is naturally conjugate-linear in the first variable. Morally this is because Hom is contravariant in the first variable and covariant in the second.

This example is particularly interesting because in quantum mechanics the inner product of states describes the transition amplitude between them (in a sense that I don’t completely understand), and it would not be too far-fetched to think of transition amplitudes as being morphisms in some vague sense between states.

In this way we see that \text{Rep}(G) itself is a kind of categorified Hilbert space, with morphisms as a kind of categorified inner product. Decategorifying the Yoneda lemma for elements of \text{Rep}(G) gives back the Yoneda lemma for inner products above. Decategorifying the isomorphism (V \Rightarrow W)^{\ast} \cong (W \Rightarrow V) gives conjugate-symmetry. Decategorifying the adjunction between, say, restriction and induction functors gives adjoint operators (see below). And so forth. For a further elaboration on this theme, see Baez’s Higher-Dimensional Algebra II: 2-Hilbert spaces.

Projections and complements

In \mathbb{R}^n, the ordinary dot product allows us to define the projection

\displaystyle P_u(v) = \frac{ \langle u, v \rangle }{ \langle u, u \rangle } u

of a vector v onto another vector u. The above notation is somewhat confusing, as it takes two vectors as inputs when it should really take as input a vector v and a subspace W; the projection P_W(v) should then be the closest vector in W to v. The above is just the special case that W = \text{span}(u).

We formalize this as follows. For v \in V and S \subset V, define the distance

d(v, S) = \inf_{s \in S} \| v - s \|.

(Of course this definition makes sense in any metric space.) Then s \in S is a closest vector in S to v if \| v - s \| = d(v, S). We say that S admits closest vectors if such a vector always exists for all v \in V. (Note that such a subset is in particular closed.)

For general subsets S, closest vectors are not guaranteed to be unique. However:

Proposition: Let S be a subset of an inner product space V which is closed under taking midpoints. Then the closest vector s \in S to a vector v \in V is unique if it exists.

Proof. Suppose that s_1, s_2 are two closest vectors. By the parallelogram identity,

\displaystyle \left\| v - \frac{s_1 + s_2}{2} \right\|^2 + \left\| \frac{s_1 - s_2}{2} \right\|^2 = \frac{\| v - s_1 \|^2 + \| v - s_2 \|^2}{2}.

It follows that \frac{s_1 + s_2}{2} (which lies in S by assumption) is strictly closer to v than either s_1 or s_2 unless \| s_1 - s_2 \| = 0, hence unless s_1 = s_2. \Box

Note that this is badly false in a general normed space. For example, in \mathbb{R}^2 with the \ell^{\infty} norm, every vector (0, y), |y| \le 1 is closest among the vectors on the y-axis to the vector (1, 0).

In Euclidean space, projection is valuable among other things because it resolves a vector into two perpendicular components. The same is true in arbitrary inner product spaces.

Proposition: Let S be a subset of an inner product space V which is closed under scalar multiplication. If the closest vector s \in S to a vector v \in V exists, then \langle s, v-s \rangle = 0.

Proof. By multiplying by a suitable unit complex number as necessary we may assume WLOG that \langle s, v-s \rangle is real. Since s is closest, the real function t \mapsto \langle v - ts \rangle^2 has a local minimum at t = 1. Its derivative there is therefore

\displaystyle \left( \langle -s, v - ts \rangle + \langle v - ts, s \rangle \right)_{t=1} = - 2 \text{Re}(\langle s, v - ts \rangle) = 0. \Box

Let W be a subspace (necessarily closed) which admits closest vectors. Then it follows by the above that we may write any v \in V as a sum

v = w + (v-w)

of a vector in W and a vector in its orthogonal complement W^{\perp} = \{ v : \langle w, v \rangle = 0 \forall w \in W \}.

We now need to introduce some important terminology. If V, W are inner product spaces, their direct sum V \oplus W can be given the inner product

\displaystyle \langle v_1 \oplus w_1, v_2 \oplus w_2 \rangle = \langle v_1, v_2 \rangle_{V} + \langle w_1, w_2 \rangle_W.

This defines the direct sum of inner product spaces. If an inner product space U has subspaces V, W such that U is the internal direct sum of V, W as vector spaces and moreover such that V, W are orthogonal, then U is an internal direct sum of V, W as inner product spaces, which we write as U = V \oplus W.

Proposition: Let W be a subspace of an inner product space V which admits closest vectors. Then V = W \oplus W^{\perp}.

Proof. By assumption, every v \in V can be written as v = w + (v-w) where w \in W, (v-w) \in W^{\perp}. Since W \cap W^{\perp} = 0, this sum decomposition is necessarily unique, which already implies that it must be linear. Since W^{\perp} is orthogonal to W by assumption, V has the direct sum inner product. \Box

We can reformulate the above geometric discussion algebraically in terms of axioms that the map v \mapsto w satisfies as follows. A projection on an inner product space V is a bounded linear operator P : V \to V such that

  1. P is idempotent (P^2 = P), and
  2. P is self-adjoint (\langle u, Pv \rangle = \langle Pu, v \rangle for all u, v \in V).

We recall the following general result on idempotents.

Proposition: Let M be a left R-module (R a ring, not necessarily commutative) and P : M \to M be an idempotent morphism of R-modules. Then M admits a direct sum decomposition

\displaystyle M = PM \oplus (1-P)M = \text{im}(P) \oplus \text{ker}(P).

Proof. We may write any m \in M as m = Pm + (1-P)m. Since P(1-P) = 0, we have (1-P)m \in \text{ker}(P). Conversely, if m \in \text{ker}(P) then (1-P)m = m, so (1-P)M = \text{ker}(P). Since P^2 m = Pm, P fixes any element of \text{im}(P), so \text{im}(P) \cap \text{ker}(P) = 0. Finally, since P is a morphism, its kernel and image are both submodules of M. \Box

The converse is straightforward; hence studying idempotents in \text{End}_R(M) is equivalent to studying direct sum decompositions of M.

Applied to projections, we have the following.

Proposition: Let P be a projection on an inner product space V. Then V admits a direct sum decomposition

\displaystyle V = PV \oplus (1-P)V = \text{im}(P) \oplus \text{ker}(1-P).

In particular, \text{ker}(1-P) = \text{im}(P)^{\perp}, so a projection is uniquely determined by its image.

Proof. Everything follows from the last proposition except the last claim, which follows from self-adjointness:

\forall v : \langle u, Pv \rangle = 0 \Leftrightarrow \forall v : \langle Pu, v \rangle = 0. \Box

The converse is again straightforward. Altogether we can summarize our discussion as follows.

Theorem: The following conditions on a subspace W of an inner product space V are equivalent.

  1. W admits closest vectors.
  2. V = W \oplus W^{\perp}.
  3. There exists a projection P such that \text{im}(P) = W.

We turn now to the question of which subspaces have this property.

Proposition: Let W be a finite-dimensional subspace of an inner product space V. Then W admits closest vectors.

Proof. Let v \in V. We want to show that there is a closest vector in W to V. Since 0 \in W is at a distance \| v \| from v, it follows by the triangle inequality that the closest vector, if it exists, is necessarily contained in the closed ball of radius 2 \| v \| centered at the origin in W. Since W is finite-dimensional, this ball is compact, so the function w \mapsto \| v - w \| attains its minimum. \Box

This proof does not generalize to the infinite-dimensional case, since closed unit balls are no longer compact in this setting. By assuming that V is a Hilbert space, we can substitute completeness for compactness.

Theorem: Let K be a closed convex subset of a Hilbert space H. Then K admits a closest vector.

Proof. One direction is straightforward. In the other direction, let v \in H be a vector and let w_n \in W be a sequence such that

\lim_{n \to \infty} \| v - w_n \| = d(v, W).

By the parallelogram identity,

\displaystyle \left\| v - \frac{w_n + w_m}{2} \right\|^2 + \left\| \frac{w_n - w_m}{2} \right\|^2 = \frac{\| v - w_n \|^2 + \| v - w_m \|^2}{2}

for any n, m (note that this is the same use of the parallelogram identity as when we proved that closest vectors are unique). The RHS approaches d(v, W) as n, m \to \infty while \| v - \frac{w_n + w_m}{2} \| \ge d(v, W) by definition, so it follows that \| w_n - w_m \| \to 0, hence that w_n is a Cauchy sequence. Since H is a Hilbert space and W is closed, w_n has a limit w \in W satisfying \| v - w \| = d(v, W), hence this limit must be the closest vector. \Box

Corollary: A subspace of a Hilbert space admits closest vectors if and only if it is closed.

Corollary: If V is a subspace of a Hilbert space H, then V^{\perp \perp} = \overline{V}.

Proof. \overline{V} is a closed subspace of H such that \overline{V}^{\perp} = V^{\perp}, hence by the above we have a direct sum decomposition

\displaystyle H = \overline{V} \oplus V^{\perp}.

In any direct sum decomposition the two spaces are orthogonal complements of each other, so it follows that \overline{V} = V^{\perp \perp} as desired. \Box

Orthonormal bases

The theory of Banach spaces is unlike ordinary linear algebra in that Banach spaces do not admit a particularly good notion of basis. The linear-algebraic notion of basis, which only allows finite sums, is clearly unsuitable: it ignores the infinite sums which are now available, and spaces of functions don’t have reasonable Hamel bases anyway (see for example this math.SE question). The next obvious choice is to talk about Schauder bases, which are sequences e_i in a Banach space B such that every v \in B has a unique representation as an infinite sum

\displaystyle v = \sum c_i e_i.

Unlike ordinary bases, Schauder bases must be ordered since the sum above is not required to converge absolutely. They also don’t always exist, even for separable Banach spaces; there is a counterexample due to Enflo. Finally, as far as I can see there is no guarantee that the function sending a vector v to the coefficient c_i above is even linear, let alone continuous, due to the lack of absolute convergence.

But everything works out for Hilbert spaces. In any inner product space, a collection of vectors e_i is orthonormal if they satisfy \langle e_i, e_j \rangle = \delta_{ij}. In particular, the v_i have norm 1 and are linearly independent, since if \sum c_i e_i = 0 then \sum c_i \langle e_i, e_j \rangle = c_j = 0 for all j. An orthonormal basis of a Hilbert space H is an orthonormal set e_i whose span is dense in H.

Bessel’s inequality: Let e_i be an orthonormal set and v a vector in an inner product space V. Then \langle e_i, v \rangle = 0 for all but countably many i, and \| v \|^2 \ge \sum_i |\langle e_i, v \rangle|^2.

Proof. Let e_i be indexed by a set I and let S be a finite subset of I. Let P_S be the projection onto \text{span}(e_i : i \in S). Then we may write P_S explicitly as

P_S v = \sum_{i \in S} \langle e_i, v \rangle e_i

by inspection. Since v = P_S v + (1-P_S)v, taking norms gives \| v \|^2 \ge \sum_{i \in S} |\langle e_i, v \rangle|^2 for all finite subsets S. By exhausting every countable subset of I by finite sets, it follows that the inequality holds for all countable subsets of I. Because we cannot take uncountable sums of positive real numbers, it follows that \langle e_i, v \rangle = 0 for all but countably many i, so the inequality holds for I. \Box

Bessel’s inequality becomes an equality in the following case, which is an infinite-dimensional generalization of the Pythagorean theorem.

Parseval’s identity: Let e_i be an at most countable orthonormal set. If v = \sum c_i e_i converges, then it converges absolutely, c_i = \langle e_i, v \rangle, and \| v \|^2 = \sum |c_i|^2.

Proof. One direction is clear. In the other direction, let v_n = \sum_{i=1}^n c_i e_i, let v = \lim_{n \to \infty} v_n. We have \| v \|^2 = \lim_{n \to \infty} \| v_n \|^2 = \lim_{n \to \infty} \sum_{i=1}^n |c_i|^2 by assumption, so the sum converges absolutely. Convergence implies convergence of norms, hence \| v \|^2 = \sum |c_i|^2. Finally, since c_i = \langle e_i, v_n \rangle for all n \ge i, it follows by continuity that c_i = \langle e_i, v \rangle. \Box

We would like to conclude that orthonormal bases really are bases in a suitable Hilbert space sense, but first we need to prove the following.

Proposition: Let e_i be an orthonormal set and suppose that v lies in the closure of the span of the e_i. Then v = \sum \langle e_i, v \rangle e_i.

Proof. By the above, we may assume WLOG that e_i is countable, indexed e_1, e_2, .... Let P_i be the projection onto \text{span}(e_1, ... e_i). Since P_i is the closest vector in \text{span}(e_1, ... e_i) to v, it follows that v lies in the closure of the span of the e_i if and only if P_i v \to v. \Box

Corollary: Let H be a Hilbert space with an orthonormal basis e_i, i \in I. Then the map

\displaystyle T: \ell^2(I) \ni (c_i) \mapsto \sum c_i e_i \in H

is a unitary isomorphism.

Proof. We showed above that T preserves norms, so it remains to prove that it is linear. T clearly respects scalar multiplication, and it also clearly respects addition on the subspace of \ell^2(I) consisting of sequences with finite support. Since T preserves norms, the rest follows by the continuity of T and addition. \Box

This is a strong structure theorem for Hilbert spaces with an orthonormal basis. We now turn our attention to proving that an orthonormal basis always exists. The idea, known as the Gram-Schmidt process, is the following in finitely many dimensions.

Suppose v_1, ... v_n are finitely many nonzero vectors in an inner product space. We’d like to find an orthonormal set of vectors e_1, ... e_m with the same span. We’ll do this inductively. First, set e_1 = \frac{v_1}{\| v_1 \|}. Assuming that e_1, ... e_i have been defined, let P_i denote the projection onto \text{span}(e_1, ... e_i). Now, if j = j_i is the smallest index such that v_j \not \in V_i, we can set

\displaystyle e_{i+1} = \frac{v_j - P_i(v_j)}{\| v_j - P_i(v_j) \|}.

It follows that e_1, ... e_i is an orthonormal basis of v_1, ... v_{j_i} for all i.

The Gram-Schmidt process as defined here extends without fuss to countably many vectors v_1, v_2, ....

Corollary: Every separable Hilbert space has an orthonormal basis.

In particular, the separable infinite-dimensional Hilbert space is unique up to unitary isomorphism. Thus physicists sometimes speak of “Hilbert space” (as in “vectors in Hilbert space”) by which they mean the unique separable infinite-dimensional Hilbert space.

To extend the Gram-Schmidt process to an arbitrary number of vectors in a Hilbert space, we use transfinite induction. If you don’t care about non-separable Hilbert spaces, you can stop reading here.

Let v_{\alpha} be a collection of vectors in a Hilbert space indexed by ordinals and define a corresponding orthonormal set e_{\alpha} as follows. As above, we set e_1 = \frac{v_1}{\| v_1 \|}. If e_{\alpha} has already been defined for all \alpha \le \beta, let \gamma be the least ordinal such that v_{\gamma} is not contained in the closure of \text{span}(e_{\alpha} : \alpha \le \beta), let P_{\beta} be the projection onto this subspace, and define

\displaystyle e_{\beta+1} = \frac{v_{\gamma} - P_{\beta} v_{\gamma}}{\| v_{\gamma} - P_{\beta} v_{\gamma} \|}.

Similarly, if e_{\alpha} has already been defined for all \alpha < \beta for \beta a limit ordinal, let v_{\gamma} be the least ordinal such that v_{\gamma} is not contained in the closure of \text{span}(e_{\alpha} : \alpha < \beta), let P_{<\beta} be the projection onto this subspace, and define

\displaystyle e_{\beta} = \frac{v_{\gamma} - P_{<\beta} v_{\gamma}}{\| v_{\gamma} - P_{<\beta} v_{\gamma} \|}.

By transfinite induction the e_i are orthonormal and e_1, ... e_{\beta} is an orthonormal basis for \text{span}(v_1, ... v_{\gamma}) (where \gamma is defined as above in relation to \beta).

Corollary: Every Hilbert space has an orthonormal basis.

Unfortunately, this seems to require some form of choice. What we can prove in ZF is that every Hilbert space for which one can exhibit explicitly a dense well-ordered subset has an orthonormal basis.

Using orthonormal bases

Consider the Hilbert space H = L^2(S^1) where S^1 carries normalized Haar measure. Equivalently, consider H = L^2([-\pi, \pi]) with the inner product

\displaystyle \langle f, g \rangle = \frac{1}{2\pi} \int_{-\pi}^{\pi} \overline{f(x)} g(x) \, dx.

The function f(x) = x separates points, so by Stone-Weierstrass the smallest algebra it contains which is closed under complex conjugation is dense in C([-\pi, \pi]) in the uniform topology, hence in L^2. Since C([-\pi, \pi]) is dense in L^2([-\pi, \pi]), it follows that in fact the algebra of complex polynomials is dense in H. Consequently, H is separable and has an orthonormal basis. The Gram-Schmidt process can be used to construct such a basis starting from the vectors 1, x, x^2, ...; these are, up to some normalization, the Legendre polynomials.

Another orthonormal basis comes from the observation that f(x) = e^{ix} also separates points, so the span of the functions e^{inx}, n \in \mathbb{Z}, is also dense by Stone-Weierstrass. Happily, these functions are already orthonormal: we have

\displaystyle \langle e^{inx}, e^{imx} \rangle = \frac{1}{2\pi} \int_{-\pi}^{\pi} e^{i(m-n)x} \, dx = \delta_{nm}.

It follows that we may expand any function in L^2 in a Fourier series

\displaystyle f(x) = \sum_{n \in \mathbb{Z}} \langle e^{inx}, f(x) \rangle e^{inx}.

We caution that what we have proven so far is only enough to conclude that Fourier series converge in L^2, which says nothing about uniform or pointwise convergence; these are much more subtle matters. However, even just L^2 convergence is enough to prove some nontrivial results. For example, we compute using integration by parts that

\displaystyle \langle e^{inx}, x \rangle = \frac{1}{2\pi} \int_{-\pi}^{\pi} x e^{-inx} \, dx = \frac{(-1)^n i}{n}

if n \neq 0 and \langle 1, x \rangle = 0 since x is odd, hence

\displaystyle x = \sum_{n \neq 0} \frac{(-1)^n i e^{inx}}{n}

in H. Taking norms of both sides, we conclude

\displaystyle \frac{\pi^2}{3} = 2 \sum_{n \ge 1} \frac{1}{n^2} = 2 \zeta(2).

This is the answer to the famous Basel problem. Replacing x with x^k above gives us a method for evaluating \zeta(2k) for all positive integers k.

Adjoints

The assignment v \mapsto \langle v, - \rangle defines an injection from any inner product space V to its dual space V^{\ast} (recall that this consists of bounded linear operators V \to k). Moreover, \langle v, - \rangle has norm \| v \|, so this injection is norm-preserving. However, it is conjugate-linear rather than linear. To fix this, we introduce for any complex vector space V the conjugate \overline{V} (not to be confused with its closure in some ambient space!), which is the same abelian group as V but with scalar multiplication defined by the conjugate of scalar multiplication in V. (This only matters if we work over \mathbb{C} rather than over \mathbb{R}.) Then the inner product on any inner product space defines a linear norm-preserving injection

\overline{V} \ni v \mapsto \langle v, - \rangle \in V^{\ast}.

It is natural to ask when this map is an isomorphism (of normed vector spaces).

Riesz represenation: Let H be a Hilbert space. Then the map \overline{H} \to H^{\ast} above is an isomorphism.

Proof. We know that it is linear, injective, and norm-preserving, so it suffices to prove that it is surjective. Let \varphi : H \to k be a continuous linear functional. The claim is trivial if \varphi is zero, so suppose \varphi is nonzero. \text{ker}(\varphi) is closed, so H admits a direct sum decomposition

\displaystyle H = \text{ker}(\varphi) \oplus \text{ker}(\varphi)^{\perp}.

Since \varphi is nonzero, \text{ker}(\varphi)^{\perp} is nontrivial, and if v, w \in \text{ker}(\varphi)^{\perp} then \varphi(w) v - \varphi(v) w \in \text{ker}(\varphi), so it follows that \text{ker}(\varphi)^{\perp} is one-dimensional. If u \in \text{ker}(\varphi)^{\perp} is any nonzero vector, then \langle u, - \rangle is a continuous linear functional which is trivial in \text{ker}(\varphi) and nontrivial on its orthogonal complement, so must be equal to \varphi up to a scalar.

The completeness of H is essential. For example, let V be the space of compactly supported sequences \mathbb{Z} \to \mathbb{C} with the inner product induced from \ell^2(\mathbb{Z}). Then there is a continuous linear functional V \to \mathbb{C} sending such a sequence c_i to, say, \sum \frac{c_i}{i^2 + 1} which is not of the form \langle v, - \rangle for any v \in V.

Corollary: Hilbert spaces are reflexive.

The Riesz representation theorem allows us to define the following crucial operation.

Theorem-Definition: let T : H_1 \to H_2 be a bounded linear operator. There exists a unique map T^{\dagger} : H_2 \to H_1, the adjoint (or Hermitian adjoint) of T, which satisfies

\displaystyle \langle v, Tw \rangle_{H_2} = \langle T^{\dagger} v, w \rangle_{H_1} \forall v \in H_2, w \in H_1.

Proof. For fixed v the map w \mapsto \langle v, Tw \rangle_{H_2} is a continuous linear functional on H_1, so by Riesz representation there exists a unique vector T^{\dagger} v \in H_1 such that \langle v, Tw \rangle_{H_2} = \langle T^{\dagger} v, w \rangle_{H_1} for all w. Moreover, by uniqueness

\langle u + cv, Tw \rangle_{H_2} = \langle T^{\dagger} u, Tw \rangle + \langle c T^{\dagger} v, Tw \rangle = \langle T^{\dagger}(u+cv), \langle w \rangle_{H_1}

so the assignment v \mapsto T^{\dagger} v is linear. Finally,

\displaystyle \| T \| = \sup_{|u|, |v| = 1} \langle u, Tv \rangle = \sup_{|u|, |v| = 1} \langle T^{\dagger} u, v \rangle = \| T^{\dagger} \|

so T^{\dagger} is bounded (in fact has the same norm as T). \Box

Remark. Let H_1 = H_2 = H and let e_i be an orthonormal basis. Then \langle e_i, T e_j \rangle = \langle T^{\dagger} e_i, e_j \rangle = \overline{ \langle e_j, T^{\dagger} e_i \rangle }, which says precisely that the “matrix” of T^{\dagger} with respect to the basis e_i is the conjugate transpose of the “matrix” of T.

Remark. The adjoint is closely related, but not identical, to the dual T^{\ast}. If B, C are any two Banach spaces, then for any bounded linear operator T : B \to C we may define its dual T^{\ast} : C^{\ast} \to B^{\ast} on dual spaces, which is defined by precomposition. It is a corollary of the Hahn-Banach theorem that \| T^{\ast} \| = \| T \|, but the above argument does not need the Hahn-Banach theorem. If B, C are Hilbert spaces, then T^{\ast} is a map C^{\ast} \to B^{\ast}, or equivalently by Riesz representation a map \overline{C} \to \overline{B}, whereas the adjoint is a map C \to B, so it is important not to confuse the two as mathematical objects; however, one is essentially the complex conjugate of the other.

The adjoint satisfies the following basic properties which follow straightforwardly from the definition. The second property shows that taking adjoints may be regarded as a generalization of complex conjugation for operators on Hilbert spaces.

  1. (S + T)^{\dagger} = S^{\dagger} + T^{\dagger},
  2. c^{\dagger} = \bar{c} (c a scalar),
  3. (TS)^{\dagger} + S^{\dagger} T^{\dagger},
  4. T^{\dagger \dagger} = T.

The adjoint allows us to define the following important classes of linear operators. A bounded linear operator f : H \to H on a Hilbert space is

  1. self-adjoint if f^{\dagger} = f,
  2. skew-adjoint if f^{\dagger} = -f,
  3. unitary if f^{\dagger} = f^{-1},
  4. normal if f^{\dagger} f = f f^{\dagger}.

In quantum mechanics, self-adjoint operators play the role of real-valued observables. They should be thought of as the “real operators,” for example because their eigenvalues are necessarily real. Any operator can be written uniquely as the sum of a self-adjoint and skew-adjoint operator

\displaystyle f = \frac{f + f^{\dagger}}{2} + \frac{f - f^{\dagger}}{2}.

Since T is self-adjoint if and only if iT is skew-adjoint, one can think of the above as a decomposition of an operator into its real and imaginary parts \text{Re}(f), \text{Im}(f), although this is not particularly useful unless the two commute (which is the case if and only if f is normal). When that happens, if v is an eigenvector of f with eigenvalue \lambda, then v is an eigenvector of f^{\dagger} with eigenvalue \overline{\lambda} (we will prove this below), hence v is an eigenvector of \text{Re}(f) with eigenvalue \text{Re}(\lambda) and an eigenvector of \text{Im}(f) with eigenvalue \text{Im}(\lambda).

The unitary maps are precisely the invertible maps preserving the inner product. They form a group, the unitary group U(H) of H. A homomorphism G \to U(H) where G is a group is a unitary representation of G, and these are a very natural object of study. (See for example the Peter-Weyl theorem.)

The skew-adjoint maps form a Lie algebra under commutator, the unitary Lie algebra \mathfrak{u}(H). These are precisely the maps f such that t \mapsto e^{ft} is a continuous group homomorphism \mathbb{R} \to U(H). The proof is straightforward but we will defer it to the next post when it can be done in slightly greater generality.

The spectral theorem in finite dimensions

As a simple but important illustration of thinking in terms of adjoints, we prove the following.

Spectral theorem: Let A : H \to H be a self-adjoint operator on a finite-dimensional Hilbert space H. Then there exists an orthonormal basis of H consisting of eigenvectors of A, and all eigenvalues of A are real.

Proof. The first step is to prove that A has an eigenvector. This is true for any linear transformation on a finite-dimensional complex vector space using, for example, standard facts about characteristic polynomials, but we will give an independent proof that more strongly suggests the correct generalization to the infinite-dimensional case.

Let v \in H be a vector of norm 1 such that

\displaystyle \langle v, Av \rangle

is maximized. (Such a vector exists by compactness.) We claim that v is an eigenvector of A. To see this, let W = v^{\perp} and let w \in W be a unit vector. Then v_{\theta} = (\cos \theta) v + (\sin \theta) w is a one-parameter family of unit vectors, and by assumption the function \langle v_{\theta}, Av_{\theta} \rangle has a local maximum at \theta = 0. We compute that this is equal to

\langle (\cos^2 \theta) \langle v, Av \rangle + (\cos \theta \sin \theta)(\langle v, Aw \rangle + \langle w, Av \rangle) + (\sin^2 \theta) \langle w, Aw \rangle.

Its derivative at \theta = 0 is equal to

\displaystyle \langle v, Aw \rangle + \langle w, Av \rangle = 2 \text{Re} \langle w, Av \rangle = 0.

Since we may scale v, w by unit complex numbers without loss of generality, it follows that \langle w, Av \rangle = 0 for all w \in \text{span}(v)^{\perp}, hence Av = \lambda v for some \lambda. Since

\lambda = \langle v, Av \rangle = \langle Av, v \rangle = \overline{\lambda}

it follows that \lambda is real. Finally, since

\displaystyle w \in W \Rightarrow \langle w, Av \rangle = 0 \Leftrightarrow \langle Aw, v \rangle = 0

it follows that W is an invariant subspace for A, so by induction we may complete v to an orthonormal basis of eigenvectors of A as desired. \Box

An equivalent statement is that a self-adjoint operator is diagonalizable by a unitary operator. Since commuting operators act on each other’s eigenspaces, this is also true for normal operators (although the eigenvalues need no longer be real in this case). More generally, we can say the following.

Corollary: Let A_i, i \in I be a commuting family of normal operators on a finite-dimensional Hilbert space H. Then there exists an orthonormal basis e_1, ... e_n consisting of eigenvectors for all of the A_i.

In other words, the A_i may be simultaneously diagonalized by a unitary operator.

A geometric interpretation of the spectral theorem is the following. Working over \mathbb{R} for simplicity, A is a self-adjoint operator if and only if the bilinear form \langle w, Av \rangle is symmetric. Associated to such a bilinear form is the quadratic form q(v) = \langle v, Av \rangle from which it may be recovered. The spectral theorem shows that, letting e_1, ... e_n be an orthonormal basis and letting \lambda_1, ... \lambda_n be the corresponding eigenvalues, we may write

\displaystyle q(\sum x_i e_i) = \sum \lambda_i x_i^2.

The “unit spheres” q(v) = 1 then describe shapes in \mathbb{R}^n generalizing conic sections for n = 2 depending on how many of the \lambda_i are positive, negative, or zero. For example, when n = 3 we may get ellipsoids or hyperboloids. q is positive-definite if and only if all of the \lambda_i are positive, in which case q(v) = 1 describes an ellipsoid. In this case the vectors e_i can be interpreted as the “principal axes” of the ellipsoid, which generalize the semimajor and semiminor axis from the case n = 2, and the \lambda_i are the squares of the reciprocals of the lengths of these axes.

The dagger category of Hilbert spaces

The category \text{Hilb} of Hilbert spaces has as morphisms the bounded linear operators. Since two Hilbert spaces which are bi-Lipschitz equivalent have orthonormal bases of the same cardinality, they are actually isometrically (equivalently, unitarily) isomorphic, but not every bi-Lipschitz equivalence is an isometry. We still want to talk about unitary maps in this setting, so how should we do that?

The answer is to explicitly make the adjoint part of the structure of \text{Hilb}. We define a dagger category, or ^{\dagger}-category, to be a category C equipped with a contravariant functor

^{\dagger} : C \to C

which is the identity on objects and which satisfies ^{\dagger \dagger} = \text{id}_C. More explicitly, for every pair of objects a, b \in C there is a map

\displaystyle \text{Hom}(a, b) \ni f \mapsto f^{\dagger} \in \text{Hom}(b, a)

such that (fg)^{\dagger} = g^{\dagger} f^{\dagger} and f^{\dagger \dagger} = f. In any dagger category, an endomorphism f : a \to b is self-adjoint if f^{\dagger} = f and an isomorphism f : a \to b is unitary if f^{\dagger} = f^{-1}. A functor F : C : \to D between dagger categories is a dagger functor if F(f^{\dagger}) = F(f)^{\dagger}.

Example. Let \text{Rel} denote the category of sets and relations. Recall that a relation R between two sets X, Y is a subset of their Cartesian product X \times Y. We write xRy to mean that (x, y) is in this subset. Composition of relations is defined as follows: if R : X \to Y and S : Y \to Z are two relations, then R \circ S : X \to Z is the relation defined by

\displaystyle x (R \circ S) z \Leftrightarrow \exists y \in Y : x R y, y S z.

(Note that this disagrees with the usual convention for function composition, where a function f : X \to Y is realized as the relation (x, f(x)); what I call R \circ S would be for functions called S \circ R.) For intuition, you should think of a relation between two sets as defining a partially defined and nondeterministic function between them (“nondeterministic” is another way to say “multivalued” but I think it gives a better intuition).

\text{Rel} is a dagger category with the dagger R^{\dagger} : Y \to X defined by

\displaystyle y R^{\dagger} x \Leftrightarrow x R y.

A relation is self-adjoint if and only if it is symmetric, and every isomorphism is unitary (and is also a bijective function).

Example. Let n be a positive integer. The category \text{nCob} of ncobordisms is the category whose objects are n-1-dimensional compact manifolds and whose morphisms f : M \to N are diffeomorphism classes of n-dimensional manifolds with boundary the disjoint union M \sqcup N. Composition in this category is defined by “sewing together” two manifolds at a common boundary component. (There are some subtleties here about maintaining a manifold structure when doing this that we will ignore completely.)

\text{nCob} is a dagger category with the dagger given by switching the role of M and N; in other words, “turning cobordisms around.”

Heuristically speaking, the morphisms in \text{nCob} describe time evolution between n-1-dimensional “spaces,” with the cobordisms describing n-dimensional “spacetimes.” (To make the connection to general relativity closer we should require, say, a Lorentzian structure on the cobordisms such that the boundary is a spacelike slice.) \text{nCob} is of fundamental importance to the subject of topological quantum field theory, which is roughly speaking the study of certain kinds of functors \text{nCob} \to \text{Vect}. A unitary TQFT is a certain kind of dagger functor \text{nCob} \to \text{Hilb}, which can be thought of as a “functor from general relativity to quantum mechanics.” For an elaboration on this point of view, see Baez’s Physics, Topology, Logic, and Computation: a Rosetta Stone.

Example. Let C be any category which admits finite pullbacks. The category \text{Span}(C) of spans in C is the category whose objects are those of C and whose morphisms f : a \to b are diagrams a \leftarrow c \to b with composition defined by pullback. Given any span its dagger is simply obtained by switching a and b.

Spans of sets generalize relations in that they allow “multiple arrows” between an element of a and an element of b. They also generalize cobordisms, since one can think of a cobordism as a cospan a \to c \rightarrow b where the two arrows are the two inclusions of the boundary components into the cobordism. For more about spans, see this page by Baez, which contains slides for a talk as well as references. The tale of groupidification is also relevant.

But let’s return to Hilbert spaces for the time being. Given that we can define unitary maps using only the adjoint, and unitary maps are the isomorphisms preserving the inner products, it seems that the adjoint already captures the inner product on a Hilbert space. This is in fact true.

We first need some notation. In \text{Hilb}, there is a distinguished object 1, the one-dimensional Hilbert space \mathbb{C}. This object represents the obvious forgetful functor to \text{Set} in that \text{Hom}(1, H) can be canonically identified with the vectors in H. Thus we may think of vectors in H as morphisms 1 \to H.

Proposition: Let v, w : 1 \to H be vectors. Then \langle w, v \rangle = w^{\dagger} v.

Proof. By definition, w^{\dagger} is the unique operator H \to 1 satisfying

\displaystyle \langle w, v \rangle_H = \langle 1, w^{\dagger} v \rangle_1.

Since w^{\dagger} v is a morphism 1 \to 1, it is just a scalar, so \langle 1, w^{\dagger} v \rangle_1 = w^{\dagger} v and the conclusion follows. \Box

In any dagger category C with a distinguished object 1 \in C (usually the identity object of a monoidal operation on C making it a dagger monoidal category) we may therefore define inner products of morphisms 1 \to c taking values in \text{End}(1), and this inner product satisfies w^{\dagger} fv = (f^{\dagger} w)^{\dagger} v, so the dagger behaves the same way with respect to it as the adjoint does for Hilbert spaces. Moreover, among the isomorphisms in C we can distinguish the unitary isomorphisms because they preserve inner products.

Example. In \text{Rel}, a morphism 1 \to X is a subset of X, so the functor \text{Hom}(1, X) sends a set to its collection of subsets and sends a relation R : X \to Y to the function

\displaystyle R(S) = \{ y \in Y : \exists x \in S : xR y \}.

(These functions are precisely the functions 2^X \to 2^Y which preserve arbitrary unions.) If v, w : 1 \to X are two subsets, then w^{\dagger} v : 1 \to 1 is one of the two possible subsets of \{ 1 \}, the empty set and the entire set; it is empty if w, v are disjoint and the entire set otherwise. Then the relation w^{\dagger} Rv = (R^{\dagger} w)^{\dagger} v when restricted to one-element subsets v, w says precisely that xRy \Leftrightarrow yR^{\dagger} x.

Quantum weirdness is not so weird

It turns out that some important quantum phenomena, such as quantum teleportation, can be described in an abstract framework based on dagger categories. More precisely, we need dagger compact categories, which are dagger categories equipped with extra structure generalizing the tensor product and dual of Hilbert spaces. The nLab page on this subject has a nice list of references.

This suggests that part of the difference between classical and quantum mechanics boils down to the difference between dagger compact categories and a category like \text{Set}. A basic such difference is that in a dagger category, the two representable functors \text{Hom}(c, -) and \text{Hom}(-, c) are canonically (contravariantly) isomorphic, the isomorphism provided by the dagger operation. (A unitary isomorphism is then precisely an isomorphism which preserves both representable functors and which also preserves this identification between them.) This is very far from the case in a more classical category like \text{Set}.

Replacing \text{Set} with \text{Rel} already helps a great deal. Since relations behave like nondeterministic functions, they are morally much more closely related to linear operators between vector spaces than to (deterministic) functions between sets. In some sense they already are linear operators: it is possible to think of relations as being matrices over the truth semiring \text{End}(1) = \{ \emptyset, \{ 1 \} \} with addition defined by union and multiplication defined by intersection. For the special case of relations between finite sets, this is abstractly because \text{Rel} admits finite biproducts (given by the disjoint union) and every finite set is a biproduct of copies of 1.

\text{Rel} admits a monoidal operation given on sets by the Cartesian product. The fact that this is not the categorical product is reflected in the fact that “entangled states” exist: namely there are subsets of a Cartesian product X \times Y which cannot be obtained by taking the product of a subset of X with a subset of Y. \text{Rel} further admits an internal hom X \Rightarrow Y which is also given on sets by the Cartesian product (but it is contravariant in the first variable; remember that the underlying set here is \text{Hom}(1, -), so we get the set of subsets of the Cartesian product as we should), and the tensor-hom adjunction

\displaystyle \text{Hom}(X \otimes Y, Z) \cong \text{Hom}(X, Y \Rightarrow Z)

holds, making \text{Rel} a closed monoidal category and in fact a dagger compact category.

There is a lot more to say here, but it will have to wait for later posts.


Viewing all articles
Browse latest Browse all 10

Trending Articles