The traditional mathematical axiomatization of probability, due to Kolmogorov, begins with a probability space and constructs random variables as certain functions
. But start doing any probability and it becomes clear that the space
is de-emphasized as much as possible; the real focus of probability theory is on the algebra of random variables. It would be nice to have an approach to probability theory that reflects this.
Moreover, in the traditional approach, random variables necessarily commute. However, in quantum mechanics, the random variables are self-adjoint operators on a Hilbert space , and these do not commute in general. For the purposes of doing quantum probability, it is therefore also natural to look for an approach to probability theory that begins with an algebra, not necessarily commutative, which encompasses both the classical and quantum cases.
Happily, noncommutative probability provides such an approach. Terence Tao’s notes on free probability develop a version of noncommutative probability approach geared towards applications to random matrices, but today I would like to take a more leisurely and somewhat scattered route geared towards getting a general feel for what this formalism is capable of talking about.
Classical and quantum probability
(Below, if the reader chooses, she can restrict herself to finite sets and finite-dimensional Hilbert spaces so as to ignore measure-theoretic and analytic difficulties.)
A classical probability space consists of the following data:
- A set
, the sample space. Elements of this set describe possible states of some system.
- A
-algebra
of subsets of
, the events. Events describe properties of states. The pair
is a measurable space
- A probability measure
on
. This measures the probability that the system has some property.
Example. Let be the set of possible outcomes of
coin flips. Letting
be the set of all subsets of
, we can describe various events like “no heads are flipped” or “at most three tails are flipped,” and we can compute their probabilities using the fact that every point in
has probability
.
Example. Let be a
-dimensional symplectic manifold with symplectic form
(e.g. the cotangent bundle of some other manifold). The
exterior power of the symplectic form defines a volume form on
which defines a Borel measure on
called Liouville measure (locally just Lebesgue measure). Since Liouville measure is built from the symplectic form, it is preserved under all symplectomorphisms, and in particular under time evolution with respect to any Hamiltonian.
A random variable is a measurable function (where
is given the Borel
-algebra generated by the Euclidean topology). In our coin-flipping example, “number of heads flipped” is a random variable. A random variable which only takes the values
or
encodes the same data as an event (more precisely, it is the indicator function
of a unique event, which takes the value
on
and
on its complement). More generally, we can construct events from random variables: for any Borel subset
the preimage
is an event (the event that
lies in
), often written
, and so we can consider its probability
. (This is the pushforward measure.)
Random variables should be thought of as real-valued observables of our system (and events, as random variables which take the value or
, are the observables given by answers to yes-no questions). By repeatedly measuring an observable and averaging, we can obtain its expected value
.
(if this integral converges). If only takes the values
, then
reduces to the probability of the corresponding event. In general, if
is a random variable and
is a Borel subset of
, then
is the indicator function of
, and
.
If we wanted to define a quantum probability space by analogous data, it would consist of the following (not standard):
- A Hilbert space
, the space of states.
- An abstract
-algebra
of closed subspaces of
, the events. The intersection of two subspaces is their set-theoretic intersection, the union is the closure of their span, and the complement is the orthogonal complement.
- A unit vector
, the state vector.
Example. Every classical probability space defines a quantum probability space as follows:
is the Hilbert space
of (equivalence classes of) square-integrable functions
under the inner product
,
consists of the closed subspaces of functions which are (a.e.) equal to zero except on a given event
, and
is the function
which is identically equal to
.
Example. The quantum probability space describing a qubit comes from applying the above construction to a bit; thus is a
-dimensional Hilbert space with orthonormal basis
,
consists of four subspaces
, and
is the state of the qubit.
A quantum probability space does not have points in the classical sense, but we can still talk about the probability of an event : if
denotes the projection onto
, then it is given by
and writing as the sum of its components parallel and orthogonal to
we see that this is the square of the absolute value of the component of
parallel to
. This is a simple form of the Born rule, and it describes the probability that
, when measured to determine whether or not it lies in
, will in fact lie in
. Applied to a qubit, we conclude that a qubit described by
, when measured, takes the value
with probability
and the value
with probability
.
Note that if is the entire Hilbert space then the condition that the corresponding probability is
is precisely the condition that
is a unit vector. Note also that the probability assigned by
to an event does not change if
is multiplied by a unit complex number; for this reason, state vectors are really points in the projective space over
. Thus the possible states of a qubit are parameterized by the Riemann sphere (called in this context the Bloch sphere).
A (real-valued) quantum random variable (probably not standard) is a self-adjoint operator on
(possibly unbounded and/or densely defined in general). The values taken by
are precisely its spectral values (the points in its spectrum
). This specializes even to the classical case: the values
for which a random variable
has the property that
fails to be invertible are precisely its values (up to the subtlety that we can ignore the behavior of
on a set of measure zero, but in practice we cannot meaningfully evaluate random variables at points anyway). In particular, for
bounded,
takes only the values
if and only if it is idempotent by Gelfand-Naimark, hence if and only if it is a projection; thus as in the classical case, random variables generalize events.
The expected value of a quantum random variable is
(when lies in the domain of
). If
happens to have a countable orthonormal basis
of eigenvectors with eigenvalues
, then writing
we compute that
so this really is the expected value of if we think of it classically as a random variable taking the value
with probability
.
As in the classical case, we can make sense of probabilities such as where
is a Borel subset of
, but this requires more work. If
has a countable orthonormal basis this is straightforward; in general, we need the Borel functional calculus in order to define
as an operator so that we can compute
(note that we do not need
to be a projection whose image lies in
to compute this expectation, although this would be the appropriate analogue of the random variable
being measurable). Roughly speaking we ought to be able to start from the continuous functional calculus and approximate the indicator function of
by continuous functions, then show that the corresponding limit exists as a self-adjoint operator.
Unlike the classical case, the expected value can be computed independently of any measurability hypotheses on
; in particular, the probability of a particular event occurring (that is, the expected value of an arbitrary projection) is automatically well-defined.
Noncommutative probability
The classical and quantum cases above have several features in common. In both cases we saw that, although we started with a description of events and their probabilities and moved on to a description of random variables and their expected values, we could recover events through their indicator functions as the idempotent random variables and recover probabilities of events as expected values of indicator functions. This suggests that we might fruitfully approach probability in general using algebras of random variables and the expectation.
If the algebra is commutative, we might hope to recover an underlying probability space, but a random-variables-first approach will allow us to work independently of a particular representation of a family of random variables as an algebra of functions on a probability space. If the algebra is noncommutative, we might hope to recover a Hilbert space on which it acts, but again, a random-variables-first approach will allow us to work independently of a particular representation as operators on a Hilbert space. We can also think of the algebra as the algebra of functions on a noncommutative space in the spirit of noncommutative geometry. Although noncommutative spaces don’t have a good notion of point, quantum probability spaces suggest that they have a good notion of measure (which we can think of as a “smeared-out” point, the Dirac measures corresponding to ordinary points).
The following definition is morally due to von Neumann and Segal. A random algebra (not standard) is a complex -algebra
together with a
-linear functional
such that
.
Such a functional is called a state on (as it describes the state of some probabilistic system by describing the expected value of observables). The (real-valued) random variables in
are its self-adjoint elements (and an event is a projection; that is, a self-adjoint idempotent).
A morphism of random algebras is a morphism
of complex
-algebras such that
. This defines the category
of random algebras, and the category of noncommutative probability spaces is
. (This is probably the wrong choice of morphisms, but we’ll ignore that for now.)
Example. From a classical probability space we obtain a random algebra by letting
be the von Neumann algebra
of essentially bounded measurable functions
under conjugation and letting
be the integral.
Example. From a quantum probability space we obtain a random algebra by letting
be the span of the space of self-adjoint operators
such that
for all Borel subsets
and letting
be the functional
.
(Because we have not developed the Borel functional calculus, it will be cleaner just to work with an arbitrary -algebra of bounded operators
from which
can but need not be derived. We can do the same thing in the classical case by starting with a collection of functions
and taking the preimages under all of them of the Borel subsets of
to define a
-algebra on
.)
The above examples require some analysis to define in full generality. However, the reason we do not require any analytic hypotheses on is to have a formalism flexible enough to discuss more algebraic examples such as the following.
Example. Let be a group. The group algebra
is a
-algebra in the usual way (with involution extending
, so that every element of
is unitary). There is a distinguished state given by
and
for every non-identity
.
The axioms we have chosen require some explanation. Working in a complex -algebra is both convenient and has clear ties to quantum mechanics, but I do not have a good explanation of this axiom from first principles. The condition that
is
-linear reflects linearity of expectation, which holds both in the classical and quantum cases, and the fact that we want the expected value of a self-adjoint element to be real. The condition that
(positivity) reflects the fact that we want probabilities to be non-negative in the following sense.
In any complex -algebra
, we may define a positive (really non-negative) element to be an element of the form
. A positive element is in particular self-adjoint. In the case of measurable functions on a probability space, the positive elements are precisely the elements which are (a.e.) non-negative, and in the case of operators on a Hilbert space, the positive elements are precisely the self-adjoint elements which have non-negative spectrum (
by Gelfand representation this is subtle; see the comments below). Hence positivity is a natural analogue in the algebraic setting of the condition that probabilities are non-negative.
(Edit, 1/2/22: It’s been pointed out in the comments that a more natural definition of “positive” is an element which is a sum of the form ; this makes the positive elements form a convex cone. However, this doesn’t affect the definition of a positive linear functional, and the two are equivalent in any C*-algebra.)
Finally, the condition that reflects the fact that we want the total probability to be
.
The semi-inner product
The state allows us to define a bilinear map on any random algebra
which satisfies all of the axioms of an inner product except that it is not necessarily positive-definite, but only satisfies the weaker axiom that
. We call such a gadget a semi-inner product (since it is positive-semidefinite).
As for classical random variables we can define the covariance
of two elements, and positive-semidefiniteness implies that the variance is non-negative, hence that
. More generally, the proof of the Cauchy-Schwarz inequality goes through without modification, and we conclude that
.
This is already enough for us to prove the following general version of Heisenberg’s uncertainty principle.
Theorem (Robertson uncertainty): Let be self-adjoint elements of a random algebra
. Then
.
Proof. Since both sides are invariant under translation of either or
by a nonzero constant, we may assume without loss of generality that
have mean zero (that is, that
). This gives
by Cauchy-Schwarz. We can write as the sum of its real and imaginary parts
and computing using the above decomposition gives
.
where and
. The conclusion follows.
Interpreting Robertson uncertainty will be easier once we do a little more work. By Cauchy-Schwarz, if an element satisfies then in fact it satisfies
for all
(and the converse is clear). In the classical picture, a function satisfying either of these conditions is equal to zero almost everywhere, which motivates the following definition. An element
of a random algebra
is null or zero almost surely (abbreviated a.s.) if
for all
, which as we have seen is equivalent to
. The null elements
form a subspace of
. Two elements
are equal almost surely if
, hence equality a.s. is equivalent to equality in the quotient
.
An element has variance zero if and only if it is constant almost surely. Robertson uncertainty then says that if two self-adjoint elements have the property that their commutator
has nonzero expectation, then neither of them can be constant almost surely in a strong sense: the product of their variances is bounded below by a positive constant, so as one increases, the other must decrease. In other words, not only are they uncertain, but a state in which
is less uncertain is a state in which
is more uncertain.
The standard application of Robertson uncertainty is to the case that are the position and momentum operators respectively acting on a quantum particle on
. This application has the following purely mathematical interpretation: a function in
and its Fourier transform cannot simultaneously be too localized.
Independence
A fundamental notion in classical probability theory is the notion of independence. It can be generalized to random algebras as follows: two -subalgebras
of a random algebra
are independent if
for all .
Example. Let be the random algebra
associated to a classical probability space
and let
be the subalgebras of functions which are measurable with respect to two
-subalgebras
of
. Then
are independent in the above sense if and only if
are independent in the sense that
where (by the monotone class theorem). Note that this condition is equivalent to
.
Example. Let be two random algebras with expectations
. Their tensor product
acquires a natural
-algebra structure given by
on pure tensors (it is the universal
-algebra admitting morphisms from
whose images commute), and moreover we can define on it a state given by
on pure tensors. Conversely, any state on such that
and
are independent is of this form. This is a noncommutative generalization of product measure; when
come from classical probability spaces
, a suitable completion of
is the corresponding algebra of functions on the product
, and the state above comes from integration against the corresponding product measure.
Example. is independent of itself (in
) if and only if the state
is actually a homomorphism
of
-algebras. Thinking of the case that
is a C*-algebra in particular, the corresponding states can be thought of as Dirac measures supported at points of
. In the noncommutative case,
may admit no homomorphisms to
(for example if
contains the Weyl algebra), hence no Dirac measures, an expression of the general intuition that noncommutative spaces are “smeared out” and not easily expressible in terms of points.
Independence is a formalization of the intuitive idea that knowing the values of the random variables in doesn’t allow you to deduce anything about the values of the random variables in
and vice versa. One indication of how this works in the setting of random algebras is as follows: if
is a projection with
(that is, an event that occurs with positive probability) we can define a conditional expectation
.
(The first factor of is necessary in the noncommutative case to ensure that the result is still a state.) This represents the expected value of
given that the event
occurred. If
are independent, it follows that
; in other words, knowing that
occurred has no effect on the expected value of any of the elements of
.
Independence is a very strong condition to impose if the subalgebras do not commute. For example, it implies that
for all
, which is the only condition under which Robertson uncertainty cannot relate the variances of
. In the particular case of the position and momentum operators,
is a nonzero scalar, hence always has nonzero expectation; it follows that position and momentum cannot be made independent! (By contrast, in the classical setting any pair of random variables is independent with respect to a Dirac measure.)
In the noncommutative setting, a different notion of independence, free independence (replacing the tensor product with the free product), becomes more natural and useful. We will not discuss this issue further, but see Terence Tao’s notes linked above.
The Gelfand-Naimark-Segal construction
If is any inner product space,
any
-algebra of linear operators on
, and
is any unit vector, then
is a concrete random algebra with expectation
. This subsumes the examples coming from both classical and quantum probability spaces. The goal of this section is to determine to what extent we can prove a Cayley’s theorem for random algebras to the effect that random algebras are concrete.
The above suggests the following definition. If is a complex
-algebra, then a
-representation of
is a homomorphism
from
to the endomorphisms of an inner product space
such that
for all . (Note that if
is not a Hilbert space then
is not necessarily a
-algebra because adjoints may not exist in general.) A Hilbert
-representation is a
-representation on a Hilbert space.
The semi-inner product on a random algebra
descends to the quotient space
, where it is becomes an inner product because we have quotiented by the elements of norm zero. Moreover, since
it follows that is a left ideal, so the quotient map
is a quotient of left
-modules; consequently,
acts on
by linear operators. Since
it follows that defines a
-representation of
. The procedure we have outlined is essentially the Gelfand-Naimark-Segal (GNS) construction: we associate to any state on a
-algebra a corresponding
-representation such that the state can be recovered from the representation as
where . This may be regarded as a weak Cayley’s theorem: unfortunately, this
-representation is not faithful in general. To get a stronger statement about random algebras, we will now assume another condition, namely that the if
, then
(the state is faithful).
The faithfulness axiom is equivalent to requiring and also equivalent to requiring that
is an inner product space (rather than a semi-inner product space). It implies, but is stronger than, the assumption that the action of
on
is faithful. The remarks about the state above then prove the following.
“Cayley’s theorem for random algebras”: A random algebra with a faithful state is concrete.
This is still not a true analog of Cayley’s theorem because the converse is false: the state of a concrete random algebra need not be faithful.
From here we will assume, in addition to faithfulness, another condition, namely that for every there exists a constant
such that
(boundedness). Boundedness is equivalent to requiring that
acts on itself by bounded linear operators. This action therefore uniquely extends to the completion of
with respect to its inner product, which we’ll denote by
, and consequently it follows that in this case
admits a Hilbert
-representation
. The closure of the image of
in
is a C*-algebra
of bounded linear operators on
, and moreover since
where
, the expectation uniquely extends to
.
This motivates the following definition: a random C*-algebra is a random algebra which is also a C*-algebra. The above discussion proves the following.
Theorem: Let be a random algebra with a faithful state satisfying boundedness. Then
canonically embeds as a dense
-subalgebra of a random C*-algebra
equipped with a Hilbert
-representation
via the GNS construction; moreover, there is a canonical vector
such that
for all
.
This is a much stronger conclusion than the conclusion that is concrete, since it allows us to use facts from the theory of C*-algebras.
Corollary: Let be a commutative random algebra with a faithful state satisfying boundedness. Then
canonically embeds as a dense
-subalgebra of the algebra of continuous functions on a compact Hausdorff space
.
Proof. The closure of a commutative -subalgebra of
is also commutative, since commutativity is a continuous condition. The conclusion then follows from Gelfand-Naimark.
Corollary (“Maschke’s theorem”): Let be a finite-dimensional random algebra with a faithful state. Then
is semisimple.
Proof. A finite-dimensional random algebra automatically satisfies boundedness. The GNS construction equips with a faithful
-representation, namely
itself. Let
be a submodule of
. Then for every
,
so is also a submodule of
. So every submodule of
is a direct summand; consequently,
is semisimple.
Note that we really do recover Maschke’s theorem for complex representations of finite groups as a corollary, since is a finite-dimensional random algebra with a faithful state.
Moments
The axioms for a random algebra may not seem strong enough to capture random variables. For example, it does not seem possible to directly access probabilities like . However, our axioms are enough to define the moments
of a random variable, and under suitable hypotheses (discussed under the general heading of the moment problem) it is possible to recover a random variable in the classical sense from its moments. We prove a result of this type for random C*-algebras.
Proposition: Any state on a C*-algebra has norm
(and in particular is continuous).
Proof. By examining real and imaginary parts, it suffices to show that a self-adjoint element of norm
maps to an element of norm at most
. Since
has non-negative spectrum, by the continuous functional calculus it has a square root, hence is positive, so
.
Similarly, has non-negative spectrum, so by the continuous functional calculus it has a square root, hence is positive, so
.
We conclude that , with equality if
.
In fact a much stronger statement is true due to the following corollary of the Riesz-Markov theorem, which we will not prove; see Terence Tao’s notes.
Theorem: Let be a compact Hausdorff space and
be a positive linear functional. Then there is a unique Radon measure
on
such that
for all (and conversely any Radon measure defines a positive linear functional on
).
It follows by Gelfand-Naimark that specifying a commutative random C*-algebra is equivalent to specifying a compact Hausdorff space and a Radon measure on it of total measure .
Corollary: Let be a random C*-algebra. If
is normal, then there is a unique Radon measure
on
such that
for all continuous functions (where
is defined using the continuous functional calculus).
Proof. Since is dense in
by construction, a morphism
is uniquely determined by what it does to
, hence
, regarded as a function
, is injective. Since it is a continuous map between compact Hausdorff spaces, it is also an embedding, so we may regard
as canonically embedded into
. (This embedding is actually a homeomorphism but we do not need this.) By Tietze extension, any continuous function
extends to a continuous function
, so the continuous functions
given by applying the continuous functional calculus to
include all continuous functions
, and we reduce to the previous result.
Corollary: With the same hypotheses as above, the Radon measure above is uniquely determined by the values
where is a polynomial in
and
. Consequently,
is uniquely determined by the
-moments
. If
is self-adjoint,
is uniquely determined by the values
where is a polynomial in one variable. Equivalently,
is uniquely determined by the moments
.
Proof. is a compact subset of
, so by Stone-Weierstrass the polynomial functions in
and
are uniformly dense in the space of continuous functions
. Now recall that the continuous functional calculus and
both preserve uniform limits. If
is self-adjoint,
is real, so we only need to take polynomial functions in
.
The proofs above generalize essentially unchanged to the following.
Corollary: Let be commuting normal elements of a random C*-algebra
. Then there exists a unique Radon measure
on
such that
for all continuous functions . Furthermore,
is uniquely determined by the joint
-moments
of the . If the
are self-adjoint,
is uniquely determined by the joint moments
of the
.
The hypothesis that the commute is crucial in the following sense. We restrict to the self-adjoint case for simplicity.
Proposition: Let be self-adjoint elements of a random C*-algebra
with faithful state such that there exists a measure
on a measure space
and two measurable functions
satisfying
for all . Then
.
Proof. If are self-adjoint then so is
. The hypothesis above implies that
, but since
is self-adjoint
is positive, hence by faithfulness
.
This result may be interpreted as saying that two noncommuting random variables do not in general have a reasonable notion of joint distribution.
Some closing remarks about quantumness
Classical mechanics is in principle deterministic: if the initial state of a system is known deterministically, then classical mechanics can in principle determine all future states. The predictions of quantum mechanics are, however, probabilistic: all that can be determined is a probability distribution on possible outcomes of a given experiment.
The two can be made to seem more similar if classical mechanics is generalized by allowing the state of the system to be probabilistic in the classical sense. Then classical and quantum mechanics can both be subsumed under the heading of random algebras, where in the classical case we do not keep track of the position and momentum of a particle but a probability distribution over all possible positions and momenta. What distinguishes the classical from the quantum cases is the noncommutativity of the random algebras in the latter case, and in particular the fact that the random algebras occurring in quantum mechanics generally do not admit any homomorphisms , hence admit no Dirac measures, so we are forced to always work probabilistically.
The formal similarity between classical and quantum mechanics described here only applies to states and observables; to get time evolution back into the picture we should endow our random algebras with Poisson brackets, giving us random Poisson algebras, and Hamiltonians…