Free Encylopedia Of Mathematics-vol 2

  • December 2019
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View Free Encylopedia Of Mathematics-vol 2 as PDF for free.

More details

  • Words: 308,909
  • Pages: 1,290
Free Encyclopedia of Mathematics (0.0.1) – volume 2

Chapter 242 16-00 – General reference works (handbooks, dictionaries, bibliographies, etc.) 242.1

direct product of modules

Let {Xi : i ∈ Q I} be a collection of modules in some category of modules. Then the direct product i∈I Xi of that collection is the module whose underlying set is the cartesian product of the Xi with componentwise addition and scalar multiplication. For example, in a category of left modules: (xi ) + (yi ) = (xi + yi), r(xi ) = (rxi ). Q For each j ∈QI we have a projection pj : i∈I Xi → Xj defined by (xi ) 7→ xQ j , and an injection λj : Xj → i∈I Xi where an element xj of Xj maps to the element of i∈I Xi whose jth term is xj and every other term is zero. Q The direct product i∈I Xi satisfies a certain universal property. Namely, if Y is a module and there exist homomorphisms fi : Xi → Y for all i ∈ I, then there exists a unique Q homomorphism φ : Y → i∈I Xi satisfying φλi = fi for all i ∈ I. fi

Xi

λi

Q

i∈I

Y

φ

Xi

The direct product is often referred to as the complete direct sum, or the strong direct sum, or simply the product. 1088

Compare this to the direct sum of modules. Version: 3 Owner: antizeus Author(s): antizeus

242.2

direct sum

Let {X ` i : i ∈ I} be a collection of modules in some category of modules. Then the direct sum i∈I Xi of that collection is the submodule of the direct product of the Xi consisting of all elements (xi ) such that all but a finite number of the xi are zero. ` For each j ∈`I we have a projection pj : i∈I Xi → Xj defined by (xi ) 7→ x` j , and an injection λj : Xj → i∈I Xi where an element xj of Xj maps to the element of i∈I Xi whose jth term is xj and every other term is zero. ` The direct sum i∈I Xi satisfies a certain universal property. Namely, if Y is a module and there exist homomorphisms fi : Y → Xi for all i ∈ I, then there exists a unique ` homomorphism φ : i∈I Xi → Y satisfying pi φ = fi for all i ∈ I. fi

Xi

pi

`

i∈I

Y

φ

Xi

The direct sum is often referred to as the weak direct sum or simply the sum. Compare this to the direct product of modules. Version: 3 Owner: antizeus Author(s): antizeus

242.3

exact sequence

If we have two homomorphisms f : A → B and g : B → C in some category of modules, then we say that f and g are exact at B if the image of f is equal to the kernel of g. A sequence of homomorphisms fn+1

fn

· · · → An+1 −→ An −→ An−1 → · · ·

is said to be exact if each pair of adjacent homomorphisms (fn+1 , fn ) is exact – in other words if imfn+1 = kerfn for all n. Compare this to the notion of a chain complex. Version: 2 Owner: antizeus Author(s): antizeus 1089

242.4

quotient ring

Definition. Let R be a ring and let I be a two-sided ideal of R. To define the quotient ring R/I, let us first define an equivalence relation in R. We say that the elements a, b ∈ R are equivalent, written as a ∼ b, if and only if a − b ∈ I. If a is an element of R, we denote the corresponding equivalence class by [a]. Thus [a] = [b] if and only if a − b ∈ I. The quotient ring of R modulo I is the set R/I = {[a] | a ∈ R}, with a ring structure defined as follows. If [a], [b] are equivalence classes in R/I, then • [a] + [b] := [a + b], • [a] · [b] := [a · b]. Here a and b are some elements in R that represent [a] and [b]. By construction, every element in R/I has such a representative in R. Moreover, since I is closed under addition and multiplication, one can verify that the ring structure in R/I is well defined. properties. 1. If R is commutative, then R/I is commutative. Examples. 1. For any ring R, we have that R/R = {0} and R\{0} = R. 2. Let R = Z, and let I be the set of even numbers. Then R/I contains only two classes; one for even numbers, and one for odd numbers. Version: 3 Owner: matte Author(s): matte, djao

1090

Chapter 243 16D10 – General module theory 243.1

annihilator

Let R be a ring. Suppose that M is a left R-module. If X is a subset of M, then we define the left annihilator of X in R: l.ann(X) = {r ∈ R | rx = 0 for all x ∈ X}. If Z is a subset of R, then we define the right annihilator of Z in M: r.annM (Z) = {m ∈ M | zm = 0 for all z ∈ Z}. Suppose that N is a right R-module. If Y is a subset of N, then we define the right annihilator of Y in R: r.ann(Y ) = {r ∈ R | yr = 0 for all y ∈ Y }. If Z is a subset of R, then we define the left annihilator of Z in N: l.annN (Z) = {n ∈ N | nz = 0 for all z ∈ Z}. Version: 3 Owner: antizeus Author(s): antizeus

243.2

annihilator is an ideal

The right annihilator of a right R-module MR in R is an ideal. 1091

y the distributive law for modules, it is easy to see that r. ann(MR ) is closed under addition and right multiplication. Now take x ∈ r. ann(MR ) and r ∈ R. B

Take any m ∈ MR . Then mr ∈ MR , but then (mr)x = 0 since x ∈ r. ann(MR ). So m(rx) = 0 and rx ∈ r. ann(MR ). An equivalent result holds for left annihilators. Version: 2 Owner: saforres Author(s): saforres

243.3

artinian

A module M is artinian if it satisfies the following equivalent conditions: • the descending chain condition holds for submodules of M; • every nonempty family of submodules of M has a minimal element. A ring R is left artinian if it is artinian as a left module over itself (i.e. if R R is an artinian module), and right artinian if it is artinian as a right module over itself (i.e. if RR is an artinian module), and simply artinian if both conditions hold. Version: 3 Owner: antizeus Author(s): antizeus

243.4

composition series

Let R be a ring and let M be a (right or left) R-module. A series of submodules M = M0 ⊃ M1 ⊃ M2 ⊃ · · · ⊃ Mn = 0 in which each quotient Mi /Mi+1 is simple is called a composition series for M. A module need not have a composition series. For example, the ring of integers, Z, condsidered as a module over itself, does not have a composition series. A necessary and sufficient condition for a module to have a composition series is that it is both noetherian and artinian. If a module does have a composition series, then all composition series are the same length. This length (the number n above) is called the composition length of the module. If R is a semisimple Artinian ring, then RR and R R always have composition series. Version: 1 Owner: mclase Author(s): mclase 1092

243.5

conjugate module

If M is a right module over a ring R, and α is an endomorphism of R, we define the conjugate module M α to be the right R-module whose underlying set is {mα | m ∈ M}, with abelian group structure identical to that of M (i.e. (m − n)α = mα − nα ), and scalar multiplication given by mα · r = (m · α(r))α for all m in M and r in R. In other words, if φ : R → EndZ (M) is the ring homomorphism that describes the right module action of R upon M, then φα describes the right module action of R upon M α . If N is a left R-module, we define α N similarly, with r · α n = α (α(r) · n). Version: 4 Owner: antizeus Author(s): antizeus

243.6

modular law

Let R M be a left R-module with submodules A, B, C, and suppose C ⊆ B. Then \ \ C + (B A) = B (C + A) Version: 1 Owner: saforres Author(s): saforres

243.7

module

Let R be a ring, and let M be an abelian group. We say that M is a left R-module if there exists a ring homomorphism φ : R → EndZ (M) from R to the ring of abelian group endomorphisms on M (in which multiplication of endomorphisms is composition, using left function notation). We typically denote this function using a multiplication notation: [φ(r)](m) = r · m = rm This ring homomorphism defines what is called a left module action of R upon M. If R is a unital ring (i.e. a ring with identity), then we typically demand that the ring homomorphism map the unit 1 ∈ R to the identity endomorphism on M, so that 1 · m = m for all m ∈ M. In this case we may say that the module is unital. Typically the abelian group structure on M is expressed in additive terms, i.e. with operator +, identity element 0M (or just 0), and inverses written in the form −m for m ∈ M. 1093

Right module actions are defined similarly, only with the elements of R being written on the right sides of elements of M. In this case we either need to use an anti-homomorphism R → EndZ (M), or switch to right notation for writing functions. Version: 7 Owner: antizeus Author(s): antizeus

243.8

proof of modular law

T T First we show C + (B T A) ⊆ B (C + A): T Note that C ⊆ B, B AT⊆ B, and therefore C + (BT A) ⊆ B. Further, C ⊆ C + A, B A ⊆ C + A, thus C + (B A) ⊆ C + A.

T T Next we show B (C + A) ⊆ C + (B A): T Let b ∈ B (C + A). Then b = c + a for some c ∈ C and a ∈ A. Hence a = b − c, and so a ∈ B since bT∈ B and c ∈ C ⊆ B. T Hence a ∈ B A, so b = c + a ∈ C + (B A). Version: 5 Owner: saforres Author(s): saforres

243.9

zero module

Let R be a ring. The abelian group which contains only an identity element (zero) gains a trivial R-module structure, which we call the zero module. Every R-module M has an zero element and thus a submodule consisting of that element. This is called the zero submodule of M. Version: 2 Owner: antizeus Author(s): antizeus

1094

Chapter 244 16D20 – Bimodules 244.1

bimodule

Suppose that R and S are rings. An (R, S)-bimodule is an abelian group M which has a left R-module action as well as a right S-module action, which satisfy the relation r(ms) = (rm)s for every choice of elements r of R, s of S, and m of M. A (R, S)-sub-bi-module of M is a subgroup which is also a left R-submodule and a right S-submodule. Version: 3 Owner: antizeus Author(s): antizeus

1095

Chapter 245 16D25 – Ideals 245.1

associated prime

Let R be a ring, and let M be an R-module. A prime ideal P of R is an annihilator prime for M if P is equal to the annihilator of some nonzero submodule X of M. Note that if this is the case, then the module annA (P ) contains X, has P as its annihilator, and is a faithful (R/P )-module. If, in addition, P is equal to the annihilator of a submodule of M that is a fully faithful (R/P )-module, then we call P an associated prime of M. Version: 2 Owner: antizeus Author(s): antizeus

245.2

nilpotent ideal

A left (right) ideal I of a ring R is a nilpotent ideal if I n = 0 for some positive integer n. Here I n denotes a product of ideals – I · I · · · I. Version: 2 Owner: antizeus Author(s): antizeus

245.3

primitive ideal

Let R be a ring, and let I be an ideal of R. We say that I is a left (right) primitive ideal if there exists a simple left (right) R-module X such that I is the annihilator of X in R. We say that R is a left (right) primitive ring if the zero ideal is a left (right) primitive ideal 1096

of R. Note that I is a left (right) primitive ideal if and only if R/I is a left (right) primitive ring. Version: 2 Owner: antizeus Author(s): antizeus

245.4

product of ideals

Let R be a ring, and let A and B be left (right) ideals of R. Then the product of the ideals A and B, which we denote AB, is the left (right) ideal generated by the products {ab | a ∈ A, b ∈ B}. Version: 2 Owner: antizeus Author(s): antizeus

245.5

proper ideal

Suppose R is a ring and I is an ideal of R. We say that I is a proper ideal if I is not equal to R. Version: 2 Owner: antizeus Author(s): antizeus

245.6

semiprime ideal

Let R be a ring. An ideal I of R is a semiprime ideal if it satisfies the following equivalent conditions: (a) I can be expressed as an intersection of prime ideals of R; (b) if x ∈ R, and xRx ⊂ I, then x ∈ I; (c) if J is a two-sided ideal of R and J 2 ⊂ I, then J ⊂ I as well; (d) if J is a left ideal of R and J 2 ⊂ I, then J ⊂ I as well; (e) if J is a right ideal of R and J 2 ⊂ I, then J ⊂ I as well. Here J 2 is the product of ideals J · J. The ring R itself satisfies all of these conditions (including being expressed as an intersection of an empty family of prime ideals) and is thus semiprime. A ring R is said to be a semiprime ring if its zero ideal is a semiprime ideal. 1097

Note that an ideal I of R is semiprime if and only if the quotient ring R/I is a semiprime ring. Version: 7 Owner: antizeus Author(s): antizeus

245.7

zero ideal

In any ring, the set consisting only of the zero element (i.e. the additive identity) is an ideal of the left, right, and two-sided varieties. It is the smallest ideal in any ring. Version: 2 Owner: antizeus Author(s): antizeus

1098

Chapter 246 16D40 – Free, projective, and flat modules and ideals 246.1

finitely generated projective module

Let R be a unital ring. A finitely generated projective right R-module is of the form eRn , n ∈ N, where e is an idempotent in EndR (Rn ). Let A be a unital C ∗ -algebra and p be a projection in EndA (An ), n ∈ N. Then, E = pAn is a finitely generated projective right A-module. Further, E is a pre-Hilbert A-module with (A-valued) inner product n X hu, vi = u∗i vi , u, v ∈ E. i=1

Version: 3 Owner: mhale Author(s): mhale

246.2

flat module

A right module M over a ring R is flat if the tensor product functor M ⊗R (−) is an exact functor. Similarly, a left module N over R is flat if the tensor product functor (−) ⊗R N is an exact functor. Version: 2 Owner: antizeus Author(s): antizeus

1099

246.3

free module

Let R be a commutative ring. A free module over R is a direct sum of copies of R. In particular, as every abelian group is a Z-module, a free abelian group is a direct sum of copies of Z. This is equivalent to saying that the module has a free basis, i.e. a set of elements with the property that every element of the module can be uniquely expressed as an linear combination over R of elements of the free basis. In the case that a free module over R is a sum of finitely many copies of R, then the number of copies is called the rank of the free module. An alternative definition of a free module is via its universal property: Given a set X, the free R-module F (X) on the set X is equipped with a function i : X → F (X) satisfying the property that for any other R-module A and any function f : X → A, there exists a unique R-module map h : F (X) → A such that (h ◦ i) = f . Version: 4 Owner: mathcam Author(s): mathcam, antizeus

246.4

free module

Let R be a ring. A free module over R is a direct sum of copies of R. Similarly, as an abelian group is simply a module over Z, a free abelian group is a direct sum of copies of Z. This is equivalent to saying that the module has a free basis, i.e. a set of elements with the property that every element of the module can be uniquely expressed as an linear combination over R of elements of the free basis. Version: 1 Owner: antizeus Author(s): antizeus

246.5

projective cover

Let X and P be modules. We say that P is a projective cover of X if P is a projective module and there exists an epimorphism p : P → X such that ker p is a superfluous submodule of P . Equivalently, P is an projective cover of X if P is projective, and there is an epimorphism p : P → X, and if g : P 0 → X is an epimorphism from a projective module P 0 to X, then

1100

there exists an epimorphism h : P 0 → P such that ph = g. P0 h

P

g p

X

0

0 Version: 2 Owner: antizeus Author(s): antizeus

246.6

projective module

A module P is projective if it satisfies the following equivalent conditions: (a) Every short exact sequence of the form 0 → A → B → P → 0 is split; (b) The functor Hom(P, −) is exact; (c) If f : X → Y is an epimorphism and there exists a homomorphism g : P → Y , then there exists a homomorphism h : P → X such that f h = g. P h

X

g f

Y

0

(d) The module P is a direct summand of a free module. Version: 3 Owner: antizeus Author(s): antizeus

1101

Chapter 247 16D50 – Injective modules, self-injective rings 247.1

injective hull

Let X and Q be modules. We say that Q is an injective hull or injective envelope of X if Q is both an injective module and an essential extension of X. Equivalently, Q is an injective hull of X if Q is injective, and X is a submodule of Q, and if g : X → Q0 is a monomorphism from X to an injective module Q0 , then there exists a monomorphism h : Q → Q0 such that h(x) = g(x) for all x ∈ X. 0 0

X g

i

Q h

0

Q

Version: 2 Owner: antizeus Author(s): antizeus

247.2

injective module

A module Q is injective if it satisfies the following equivalent conditions: (a) Every short exact sequence of the form 0 → Q → B → C → 0 is split; (b) The functor Hom(−, Q) is exact; 1102

(c) If f : X → Y is a monomorphism and there exists a homomorphism g : X → Q, then there exists a homomorphism h : Y → Q such that hf = g. 0

X g

Q Version: 3 Owner: antizeus Author(s): antizeus

1103

f

Y h

Chapter 248 16D60 – Simple and semisimple modules, primitive rings and ideals 248.1

central simple algebra

Let K be a field. A central simple algebra A (over K) is an algebra A over K, which is finite dimensional as a vector space over K, such that • A has an identity element, as a ring • A is central: the center of A equals K (for all z ∈ A, we have z · a = a · z for all a ∈ A if and only if z ∈ K) • A is simple: for any two sided ideal I of A, either I = {0} or I = A By a theorem of Brauer, for every central simple algebra A over K, there exists a unique (up to isomorphism) division ring D containing K and a unique natural number n such that A is isomorphic to the ring of n × n matrices with coefficients in D. Version: 2 Owner: djao Author(s): djao

248.2

completely reducible

A module M is called completely reducible (or semisimple) if it is a direct sum of irreducible (or simple) modules. Version: 1 Owner: bwebste Author(s): bwebste

1104

248.3

simple ring

A nonzero ring R is said to be a simple ring if it has no (two-sided) ideal other then the zero ideal and R itself. This is equivalent to saying that the zero ideal is a maximal ideal. If R is a commutative ring with unit, then this is equivalent to being a field. Version: 4 Owner: antizeus Author(s): antizeus

1105

Chapter 249 16D80 – Other classes of modules and ideals 249.1

essential submodule

Let X be a submodule of a module Y . We say that X is an essential submodule of Y , and that T Y is an essential extension of X, if whenever A is a nonzero submodule of Y , then A X is also nonzero. A monomorphism f : X → Y is an essential monomorphism if the image imf is an essential submodule of Y . Version: 2 Owner: antizeus Author(s): antizeus

249.2

faithful module

Let R be a ring, and let M be an R-module. We say that M is a faithful R-module if its annihilator annR (M) is the zero ideal. We say that M is a fully faithful R-module if every nonzero R-submodule of M is faithful. Version: 3 Owner: antizeus Author(s): antizeus

1106

249.3

minimal prime ideal

A prime ideal P of a ring R is called a minimal prime ideal if it does not properly contain any other prime ideal of R. If R is a prime ring, then the zero ideal is a prime ideal, and is thus the unique minimal prime ideal of R. Version: 2 Owner: antizeus Author(s): antizeus

249.4

module of finite rank

Let M be a module, and let E(M) be the injective hull of M. Then we say that M has finite rank if E(M) is a finite direct sum of indecomposible submodules. This turns out to be equivalent to the property that M has no infinite direct sums of nonzero submodules. Version: 3 Owner: antizeus Author(s): antizeus

249.5

simple module

Let R be a ring, and let M be an R-module. We say that M is a simple or irreducible module if it contains no submodules other than itself and the zero module. Version: 2 Owner: antizeus Author(s): antizeus

249.6

superfluous submodule

Let X be a submodule of a module Y . We say that X is a superfluous submodule of Y if whenever A is a submodule of Y such that A + X = Y , then A = Y . Version: 2 Owner: antizeus Author(s): antizeus

1107

249.7

uniform module

A module M is said to be uniform if any two nonzero submodules of M must have a nonzero intersection. This is equivalent to saying that any nonzero submodule is an essential submodule. Version: 3 Owner: antizeus Author(s): antizeus

1108

Chapter 250 16E05 – Syzygies, resolutions, complexes 250.1

n-chain

An n-chain on a topological space X is a finite formal sum of n-simplices on X. The group of such chains is denoted Cn (X). For a CW-complex Y, Cn (Y ) = Hn (Y n , Y n−1 ), where Hn denotes the nth homology group. The boundary of an n-chain is the (n − 1)-chain given by the formal sum of the boundaries of its constitutent simplices. An n-chain is closed if its boundary is 0 and exact if it is the boundary of some (n + 1)-chain. Version: 3 Owner: mathcam Author(s): mathcam

250.2

chain complex

A sequence of modules and homomorphisms dn+1

d

n An−1 → · · · · · · → An+1 −→ An −→

is said to be a chain complex or complex if each pair of adjacent homomorphisms (dn+1 , dn ) satisfies the relation dn dn+1 = 0. This is equivalent to saying that im dn+1 ⊂ ker dn . We often denote such a complex as (A, d) or simply A. Compare this to the notion of an exact sequence. Version: 4 Owner: antizeus Author(s): antizeus

1109

250.3

flat resolution

Let M be a module. A flat resolution of M is an exact sequence of the form · · · → Fn → Fn−1 → · · · → F1 → F0 → M → 0 where each Fn is a flat module. Version: 2 Owner: antizeus Author(s): antizeus

250.4

free resolution

Let M be a module. A free resolution of M is an exact sequence of the form · · · → Fn → Fn−1 → · · · → F1 → F0 → M → 0 where each Fn is a free module. Version: 2 Owner: antizeus Author(s): antizeus

250.5

injective resolution

Let M be a module. An injective resolution of M is an exact sequence of the form 0 → M → Q0 → Q1 → · · · → Qn−1 → Qn → · · · where each Qn is an injective module. Version: 2 Owner: antizeus Author(s): antizeus

250.6

projective resolution

Let M be a module. A projective resolution of M is an exact sequence of the form · · · → Pn → Pn−1 → · · · → P1 → P0 → M → 0 where each Pn is a projective module. Version: 2 Owner: antizeus Author(s): antizeus

1110

250.7

short exact sequence

A short exact sequence is an exact sequence of the form 0 → A → B → C → 0. Note that in this case, the homomorphism A → B must be a monomorphism, and the homomorphism B → C must be an epimorphism. Version: 2 Owner: antizeus Author(s): antizeus

250.8

split short exact sequence f

g

In an abelian category, a short exact sequence 0 → A → B → C → 0 is split if it satisfies the following equivalent conditions: (a) there exists a homomorphism h : C → B such that gh = 1C ; (b) there exists a homomorphism j : B → A such that jf = 1A ; (c) B is isomorphic to the direct sum A ⊕ C. In this case, we say that h and j are backmaps or splitting backmaps. Version: 4 Owner: antizeus Author(s): antizeus

250.9

von Neumann regular

An element a of a ring R is said to be von Neumann regular if there exists b ∈ R such that aba = a. A ring R is said to be a von Neumann regular ring (or simply a regular ring, if the meaning is clear from context) if every element of R is von Neumann regular. Note that regular ring in the sense of von Neumann should not be confused with regular ring in the sense of commutative algebra. Version: 1 Owner: igor Author(s): igor

1111

Chapter 251 16K20 – Finite-dimensional 251.1

quaternion algebra

A quaternion algebra over a field K is a central simple algebra over K which is four dimensional as a vector space over K. Examples: • For any field K, the ring M2×2 (K) of 2 × 2 matrices with entries in K is a quaternion algebra over K. If K is algebraically closed, then all quaternion algebras over K are isomorphic to M2×2 (K). • For K = R, the well known algebra H of Hamiltonian quaternions is a quaternion algebra over R. The two algebras H and M2×2 (R) are the only quaternion algebras over R, up to isomorphism. • When K is a number field, there are infinitely many non–isomorphic quaternion algebras over K. In fact, there is one such quaternion algebra for every even sized finite collection of finite primes or real primes of K. The proof of this deep fact leads to many of the major results of class field theory. Version: 1 Owner: djao Author(s): djao

1112

Chapter 252 16K50 – Brauer groups 252.1

Brauer group

Let K be a field. The Brauer group Br(K) of K is the set of all equivalence classes of central simple algebras over K, where two central simple algebras A and B are equivalent if there exists a division ring D over K and natural numbers n, m such that A (resp. B) is isomorphic to the ring of n × n (resp. m × m) matrices with coefficients in D. The group operation in Br(K) is given by tensor product: for any two central simple algebras A, B over K, their product in Br(K) is the central simple algebra A ⊗K B. The identity element in Br(K) is the class of K itself, and the inverse of a central simple algebra A is the opposite algebra Aopp defined by reversing the order of the multiplication operation of A. Version: 5 Owner: djao Author(s): djao

1113

Chapter 253 16K99 – Miscellaneous 253.1

division ring

A division ring is a ring D with identity such that • 1 6= 0 • For all nonzero a ∈ D, there exists b ∈ D with a · b = b · a = 1 A field is equivalent to a commutative division ring. Version: 3 Owner: djao Author(s): djao

1114

Chapter 254 16N20 – Jacobson radical, quasimultiplication 254.1

Jacobson radical

The Jacobson radical J(R) of a ring R is the intersection of the annihilators of irreducible left R-modules. The following are alternate characterizations of the Jacobson radical J(R): 1. The intersection of all left primitive ideals. 2. The intersection of all maximal left ideals. 3. The set of all t ∈ R such that for all r ∈ R, 1 − rt is left invertible (i.e. there exists u such that u(1 − rt) = 1). 4. The largest ideal I such that for all v ∈ I, 1 − v is a unit in R. 5. (1) - (3) with “left” replaced by “right” and rt replaced by tr. Note that if R is commutative and finitely generated, then J(R) = {x ∈ R | xn = 0for some n ∈ N} = Nil(R) Version: 13 Owner: saforres Author(s): saforres

1115

254.2

a ring modulo its Jacobson radical is semiprimitive

Let R be a ring. Then J(R/J(R)) = (0). L et [u] ∈ J(R/J(R)). Then by one of the alternate characterizations of the Jacobson radical, 1 − [r][u] is left invertible for all r ∈ R, so there exists v ∈ R such that [v](1 − [r][u]) = 1.

Then v(1 − ru) = 1 − a for some a ∈ J(R). So wv(1−ru) = 1 since w(1−a) = 1 for some w ∈ R. Since this holds for all r ∈ R, u ∈ J(R), then [u] = 0. Version: 3 Owner: saforres Author(s): saforres

254.3

examples of semiprimitive rings

Examples of semiprimitive rings: The integers Z: ince Z is commutative, any left ideal is two-sided. So the maximal left ideals T of Z are the maximal ideals of Z, which are the ideals pZ for p prime. Note that pZ qZ = (0) if gcd(p, q) > 1. T Hence J(Z) = p pZ = (0). S

A matrix ring Mn (D) over a division ring D: T

he ring Mn (D) is simple, so the only proper ideal is (0). Thus J(Mn (D)) = (0).

A polynomial ring R[x] over a domain R: T

ake a ∈ J(R[x]) with a 6= 0. Then ax ∈ J(R[x]), since J(R[x]) is an ideal, and deg(ax) >

1. By one of the alternate characterizations of the Jacobson radical, 1 − ax is a unit. But deg(1 − ax) = max{deg(1), deg(ax)} > 1. So 1 − ax is not a unit, and by this contradiction we see that J(R[x]) = (0). Version: 5 Owner: saforres Author(s): saforres

1116

254.4

proof of Characterizations of the Jacobson radical

First, note that by definition a left primitive ideal is the annihilator of an irreducible left Rmodule, so clearly characterization 1) is equivalent to the definition of the Jacobson radical. Next, we will prove cyclical containment. Observe that 5) follows after the equivalence of 1) - 4) is established, since 4) is independent of the choice of left or right ideals. 1) ⊂ 2) We know that every left primitive ideal is the largest ideal contained in a maximal left ideal. So the intersection of all left primitive ideals will be contained in the intersection of all maximal left ideals. T 2) ⊂ 3) Let ST= {M : M a maximal left ideal of R} and take r ∈ R. Let t ∈ M ∈S M. Then rt ∈ M ∈S M. Assume 1 − rt is not left invertible; therefore there exists a maximal left ideal M0 of R such that R(1 − rt) ⊆ M0 . Note then that 1 − rt ∈ M0 . Also, by definition of t, we have rt ∈ M0 . Therefore 1 ∈ M0 ; this contradiction implies 1 − rt is left invertible. 3) ⊂ 4) We claim that 3) satisfies the condition of 4). Let K = {t ∈ R : 1 − rt is left invertible for all r ∈ R}. We shall first show that K is an ideal. Clearly if t ∈ K, then rt ∈ K. If t1 , t2 ∈ K, then 1 − r(t1 + t2 ) = (1 − rt1 ) − rt2 Now there exists u1 such that u1(1 − rt1 ) = 1, hence u1 ((1 − rt1 ) − rt2 ) = 1 − u1rt2 Similarly, there exists u2 such that u2 (1 − u1 rt2 ) = 1, therefore u2 u1 (1 − r(t1 + t2 )) = 1 Hence t1 + t2 ∈ K. Now if t ∈ K, r ∈ R, to show that tr ∈ K it suffices to show that 1−tr is left invertible. Suppose u(1 − rt) = 1, hence u − urt = 1, then tur − turtr = tr. So (1 + tur)(1 − tr) = 1 + tur − tr − turtr = 1. Therefore K is an ideal. Now let v ∈ K. Then there exists u such that u(1 − v) = 1, hence 1 − u = −uv ∈ K, so u = 1 − (1 − u) is left invertible. So there exists w such that wu = 1, hence wu(1 − v) = w, then 1 − v = w. Thus 1117

(1 − v)u = 1 and therefore 1 − v is a unit. Let J be the largest ideal such that, for all v ∈ J, 1 − v is a unit. We claim that K ⊆ J. Suppose this were not true; in this case K + J strictly contains J. Consider rx + sy ∈ K + J with x ∈ K, y ∈ J and r, s ∈ R. Now 1 − (rx + sy) = (1 − rx) − sy, and since rx ∈ K, then 1 − rx = u for some unit u ∈ R. So 1 − (rx + sy) = u − sy = u(1 − u−1 sy), and clearly u−1sy ∈ J since y ∈ J. Hence 1 − u−1 sy is also a unit, and thus 1 − (rx + sy) is a unit. Thus 1 − v is a unit for all v ∈ K + J. But this contradicts the assumption that J is the largest such ideal. So we must have K ⊆ J. 4) ⊂ 1) We must show that if I is an ideal such that for all u ∈ I, 1 − u is a unit, then I ⊂ ann(R M) for every irreducible left R-module R M. Suppose this is not the case, so there exists R M such that I 6⊂ ann(R M). Now we know that ann(R M) is the largest ideal inside some maximal left ideal J of R. Thus we must also have I 6⊂ J, or else this would contradict the maximality of ann(R M) inside J. But since I 6⊂ J, then by maximality I + J = R, hence there exist u ∈ I and v ∈ J such that u + v = 1. Then v = 1 − u, so v is a unit and J = R. But since J is a proper left ideal, this is a contradiction. Version: 25 Owner: saforres Author(s): saforres

254.5

properties of the Jacobson radical

Theorem: Let R, T be rings and ϕ : R → T be a surjective homomorphism. Then ϕ(J(R)) ⊆ J(T ). e shall use the characterization of the Jacobson radical as the set of all a ∈ R such that for all r ∈ R, 1 − ra is left invertible. W

Let a ∈ J(R), t ∈ T . We claim that 1 − tϕ(a) is left invertible: Since ϕ is surjective, t = ϕ(r) for some r ∈ R. Since a ∈ J(R), we know 1 − ra is left invertible, so there exists u ∈ R such that u(1 − ra) = 1. Then we have ϕ(u) (ϕ(1) − ϕ(r)ϕ(a)) = ϕ(u)ϕ(1 − ra) = ϕ(1) = 1 So ϕ(a) ∈ J(T ) as required. Theorem: Let R, T be rings. Then J(R × T ) ⊆ J(R) × J(T ). 1118

et π1 : R ×T → R be a (surjective) projection. By the previous theorem, π1 (J(R ×T )) ⊆ J(R). L

Similarly let π2 : R × T → T be a (surjective) projection. We see that π2 (J(R × T )) ⊆ J(T ). Now take (a, b) ∈ J(R × T ). Note that a = π1 (a, b) ∈ J(R) and b = π2 (a, b) ∈ J(T ). Hence (a, b) ∈ J(R) × J(T ) as required. Version: 8 Owner: saforres Author(s): saforres

254.6

quasi-regularity

An element x of a ring is called right quasi-regular [resp. left quasi-regular] if there is an element y in the ring such that x + y + xy = 0 [resp. x + y + yx = 0]. For calculations with quasi-regularity, it is useful to introduce the operation ∗ defined: x ∗ y = x + y + xy. Thus x is right quasi-regular if there is an element y such that x ∗ y = 0. The operation ∗ is easily demonstrated to be associative, and x ∗ 0 = 0 ∗ x = 0 for all x. An element x is called quasi-regular if it is both left and right quasi-regular. In this case, there are elements x and y such that x+y+xy = 0 = x+z+zx (equivalently, x∗y = z∗x = 0). A calculation shows that y = 0 ∗ y = (z ∗ x) ∗ y = z ∗ (x ∗ y) = z. So y = z is a unique element, depending on x, called the quasi-inverse of x. An ideal (one- or two-sided) of a ring is called quasi-regular if each of its elements is quasiregular. Similarly, a ring is called quasi-regular if each of its elements is quasi-regular (such rings cannot have an identity element). Lemma 1. Let A be an ideal (one- or two-sided) in a ring R. If each element of A is right quasi-regular, then A is a quasi-regular ideal. This lemma means that there is no extra generality gained in defining terms such as right quasi-regular left ideal, etc. Quasi-regularity is important because it provides elementary characterizations of the Jacobson radical for rings without an identity element: • The Jacobson radical of a ring is the sum of all quasi-regular left (or right) ideals. 1119

• The Jacobson radical of a ring is the largest quasi-regular ideal of the ring. For rings with an identity element, note that x is [right, left] quasi-regular if and only if 1 + x is [right, left] invertible in the ring. Version: 1 Owner: mclase Author(s): mclase

254.7

semiprimitive ring

A ring R is said to be semiprimitive (sometimes semisimple) if its Jacobson radical is the zero ideal. Any simple ring is automatically semiprimitive. A finite direct product of matrix rings over division rings can be shown to be semiprimitive and both left and right artinian. The Artin-Wedderburn Theorem states that any semiprimitive ring which is left or right Artinian is isomorphic to a finite direct product of matrix rings over division rings. Version: 11 Owner: saforres Author(s): saforres

1120

Chapter 255 16N40 – Nil and nilpotent radicals, sets, ideals, rings 255.1

Koethe conjecture

The Koethe Conjecture is the statement that for any pair of nil right ideals A and B in any ring R, the sum A + B is also nil. If either of A or B is a two-sided ideal, it is easy to see that A + B is nil. Suppose A is a two-sided ideal, and let x ∈ A + B. The quotient (A + B)/A is nil since it is a homomorphic image of B. So there is an n > 0 with xn ∈ A. Then there is an m > 0 such that xnm = 0, because A is nil. In particular, this means that the Koethe conjecture is true for commutative rings. It has been shown to be true for many classes of rings, but the general statement is still unproven, and no counter example has been found. Version: 1 Owner: mclase Author(s): mclase

255.2

nil and nilpotent ideals

An element x of a ring is nilpotent if xn = 0 for some positive integer n. A ring R is nil if every element in R is nilpotent. Similarly, a one- or two-sided ideal is called nil if each of its elements is nilpotent. A ring R [resp. a one- or two sided ideal A] is nilpotent if Rn = 0 [resp. An = 0] for some positive integer n. 1121

A ring or an ideal is locally nilpotent if every finitely generated subring is nilpotent. The following implications hold for rings (or ideals): nilpotent

⇒ locally nilpotent

Version: 3 Owner: mclase Author(s): mclase

1122

⇒ nil

Chapter 256 16N60 – Prime and semiprime rings 256.1

prime ring

A ring R is said to be a prime ring if the zero ideal is a prime ideal. If R is commutative, this is equivalent to being an integral domain. Version: 2 Owner: antizeus Author(s): antizeus

1123

Chapter 257 16N80 – General radicals and rings 257.1

prime radical

The prime radical of a ring R is the intersection of all the prime ideals of R. Note that the prime radical is the smallest semiprime ideal of R, and that R is a semiprime ring if and only if its prime radical is the zero ideal. Version: 2 Owner: antizeus Author(s): antizeus

257.2

radical theory

Let x◦ be a property which defines a class of rings, which we will call the x◦ -rings. Then x◦ is a radical property if it satisfies: 1. The class of x◦ -rings is closed under homomorphic images. 2. Every ring R has a largest ideal in the class of x◦ -rings; this ideal is written x◦ (R). 3. x◦ (R/x◦ (R)) = 0. Note: it is extremely important when interpreting the above definition that your definition of a ring does not require an identity element. The ideal x◦ (R) is called the x◦ -radical of R. A ring is called x◦ -radical if x◦ (R) = R, and is called x◦ -semisimple if x◦ (R) = 0. If x◦ is a radical property, then the class of x◦ -rings is also called the class of x◦ -radical rings. 1124

The class of x◦ -radical rings is closed under ideal extensions. That is, if A is an ideal of R, and A and R/A are x◦ -radical, then so is R. Radical theory is the study of radical properties and their interrelations. There are several well-known radicals which are of independent interest in ring theory (See examples – to follow). The class of all radicals is however very large. Indeed, it is possible to show that any partition of the class of simple rings into two classes, R and S gives rise to a radical x◦ with the property that all rings in R are x◦ -radical and all rings in S are x◦ -semisimple. A radical x◦ is hereditary if every ideal of an x◦ -radical ring is also x◦ -radical. A radical x◦ is supernilpotent if the class of x◦ -rings contains all nilpotent rings. Version: 2 Owner: mclase Author(s): mclase

1125

Chapter 258 16P40 – Noetherian rings and modules 258.1

Noetherian ring

A ring R is right noetherian (or left noetherian ) if R is noetherian as a right module (or left module ), i.e., if the three equivalent conditions hold: 1. right ideals (or left ideals) are finitely generated 2. the ascending chain condition holds on right ideals (or left ideals) 3. every nonempty family of right ideals (or left ideals) has a maximal element. We say that R is noetherian if it is both left noetherian and right noetherian. Examples of Noetherian rings include any field (as the only ideals are 0 and the whole ring) and the ring Z of integers (each ideal is generated by a single integer, the greatest common divisor of the elements of the ideal). The Hilbert basis theorem says that a ring R is noetherian iff the polynomial ring R[x] is. Version: 10 Owner: KimJ Author(s): KimJ

258.2

noetherian

A module M is noetherian if it satisfies the following equivalent conditions:

1126

• the ascending chain condition holds for submodules of M ; • every nonempty family of submodules of M has a maximal element; • every submodule of M is finitely generated. A ring R is left noetherian if it is noetherian as a left module over itself (i.e. if R R is a noetherian module), and right noetherian if it is noetherian as a right module over itself (i.e. if RR is an noetherian module), and simply noetherian if both conditions hold. Version: 2 Owner: antizeus Author(s): antizeus

1127

Chapter 259 16P60 – Chain conditions on annihilators and summands: Goldie-type conditions , Krull dimension 259.1

Goldie ring

Let R be a ring. If the set of annihilators {r. ann(x) | x ∈ R} satisifies the ascending chain condition, then R is said to satisfy the ascending chain condition on right annihilators. A ring R is called a right Goldie ring if it satisfies the ascending chain condition on right annihilators and RR is a module of finite rank. Left Goldie ring is defined similarly. If the context makes it clear on which side the ring operates, then such a ring is simply called a Goldie ring. A right noetherian ring is right Goldie. Version: 3 Owner: mclase Author(s): mclase

259.2

uniform dimension

Let M be a module over a ring R, and suppose that M contains no infinite direct sums of non-zero submodules. (This is the same as saying that M is a module of finite rank.)

1128

Then there exits an integer n such that M contains an essential submodule N where N = U1 ⊕ U2 ⊕ · · · ⊕ Un is a direct sum of n uniform submodules. This number n does not depend on the choice of N or the decomposition into uniform submodules. We call n the uniform dimension of M. Sometimes this is written u-dim M = n. If R is a field K, and M is a finite-dimensional vector space over K, then u-dim M = dimK M. u-dim M = 0 if and only if M = 0. Version: 3 Owner: mclase Author(s): mclase

1129

Chapter 260 16S10 – Rings determined by universal properties (free algebras, coproducts, adjunction of inverses, etc.) 260.1

Ore domain

Let R be a domain. We say that R is a right Ore domain if any two nonzero elements of R have a nonzero common right multiple, i.e. for every pair of nonzero x and y, there exists a pair of elements r and s of R such that xr = ys 6= 0. This condition turns out to be equivalent to the following conditions on R when viewed as a right R-module: (a) RR is a uniform module. (b) RR is a module of finite rank. The definition of a left Ore domain is similar. If R is a commutative domain, then it is a right (and left) Ore domain. Version: 6 Owner: antizeus Author(s): antizeus

1130

Chapter 261 16S34 – Group rings , Laurent polynomial rings 261.1

support

Let R[G] be the group ring of a group G over a ring R. P Let x = g xg g be an element of R[G]. The support of x, often written supp(x), is the set of elements of G which occur with non-zero coefficient in the expansion of x. Thus: supp(x) = {g ∈ G | xg 6= 0}. Version: 2 Owner: mclase Author(s): mclase

1131

Chapter 262 16S36 – Ordinary and skew polynomial rings and semigroup rings 262.1

Gaussian polynomials

For an indeterminate u and integers n ≥ m ≥ 0 we define the following: (a) (m)u = um−1 + um−2 + · · · + 1 for m > 0, (b) (m!)u = (m)u (m − 1)u · · · (1)u for m > 0, and (0!)u = 1,   (n!)u n n = (c) m . If m > n then we define = 0. (m!) ((n−m)!) m u u u u The expressions



n m u

are called u-binomial coefficients or Gaussian polynomials.

Note: if we replace u with 1, then we obtain the familiar integers, factorials, and binomial coefficients. Specifically, (a) (m)1 = m, (b) (m!)1 = m!,   n n (c) m = m . 1

Version: 3 Owner: antizeus Author(s): antizeus

1132

262.2

q skew derivation

Let (σ, δ) be a skew derivation on a ring R. Let q be a central (σ, δ)-constant. Suppose further that δσ = q · σδ. Then we say that (σ, δ) is a q-skew derivation. Version: 5 Owner: antizeus Author(s): antizeus

262.3

q skew polynomial ring

If (σ, δ) is a q-skew derivation on R, then we say that the skew polynomial ring R[θ; σ, δ] is a q-skew polynomial ring. Version: 3 Owner: antizeus Author(s): antizeus

262.4

sigma derivation

If σ is a ring endomorphism on a ring R, then a (left) σ-derivation is an additive map δ on R such that δ(x · y) = σ(x) · δ(y) + δ(x) · y for all x, y in R. Version: 7 Owner: antizeus Author(s): antizeus

262.5

sigma, delta constant

If (σ, δ) is a skew derivation on a ring R, then a (σ, δ)-constant is an element q of R such that σ(q) = q and δ(q) = 0. Note: If q is a (σ, δ)-constant, then it follows that σ(q · x) = q · σ(x) and δ(q · x) = q · δ(x) for all x in R. Version: 3 Owner: antizeus Author(s): antizeus

262.6

skew derivation

A (left) skew derivation on a ring R is a pair (σ, δ), where σ is a ring endomorphism of R, and δ is a left σ-derivation on R. Version: 4 Owner: antizeus Author(s): antizeus 1133

262.7

skew polynomial ring

If (σ, δ) is a left skew derivation on R, then we can construct the (left) skew polynomial ring R[θ; σ, δ], which is made up of polynomials in an indeterminate θ and left-hand coefficients from R, with multiplication satisfying the relation θ · r = σ(r) · θ + δ(r) for all r in R. Version: 2 Owner: antizeus Author(s): antizeus

1134

Chapter 263 16S99 – Miscellaneous 263.1

algebra

Let A be a ring with identity. An algebra over A is a ring B with identity together with a ring homomorphism f : A −→ Z(B), where Z(B) denotes the center of B. Equivalently, an algebra over A is an A–module B which is a ring and satisfies the property a · (x ∗ y) = (a · x) ∗ y = x ∗ (a · y) for all a ∈ A and all x, y ∈ B. Here · denotes A–module multiplication and ∗ denotes ring multiplication in B. One passes between the two definitions as follows: given any ring homomorphism f : A −→ Z(B), the scalar multiplication rule a · b := f (a) ∗ b makes B into an A–module in the sense of the second definition. Version: 5 Owner: djao Author(s): djao

263.2

algebra (module)

Given a commutative ring R, an algebra over R is a module M over R, endowed with a law of composition f :M ×M →M which is R-bilinear. Most of the important algebras in mathematics belong to one or the other of two classes: the unital associative algebras, and the Lie algebras. 1135

263.2.1

Unital associative algebras

In these cases, the ”product” (as it is called) of two elements v and w of the module, is denoted simply by vw or v  w or the like. Any unital associative algebra is an algebra in the sense of djao (a sense which is also used by Lang in his book Algebra (Springer-Verlag)). Examples of unital associative algebras: – tensor algebras and – quotients of them – Cayley algebras, such as the – ring of – quaternions – polynomial rings – the ring of – endomorphisms of a – vector space, in which – the bilinear product of – two mappings is simply the – composite mapping.

263.2.2

Lie algebras

In these cases the bilinear product is denoted by [v, w], and satisfies [v, v] = 0 for all v ∈ M [v, [w, x]] + [w, [x, v]] + [x, [v, w]] = 0 for all v, w, x ∈ M The second of these formulas is called the Jacobi identity. One proves easily [v, w] + [w, v] = 0 for all v, w ∈ M for any Lie algebra M. Lie algebras arise naturally from Lie groups, q.v. Version: 1 Owner: karthik Author(s): Larry Hammick

1136

Chapter 264 16U10 – Integral domains 264.1

Pr¨ ufer domain

An integral domain R is a Pr¨ ufer domain if every finitely generated ideal I of R is invertible. Let RI denote the localization of R at I. Then the following statements are equivalent: • i) R is a Pr¨ ufer domain. • ii) For every prime ideal P in R, RP is a valuation domain. • iii) For every maximal ideal M in R, RM is a valuation domain. A Pr¨ ufer domain is a Dedekind domain if and only if it is noetherian. If R is a Pr¨ ufer domain with quotient field K, then any domain S such that R ⊂ S ⊂ K is Pr¨ ufer.

REFERENCES 1. Thomas W. Hungerford. Algebra. Springer-Verlag, 1974. New York, NY.

Version: 2 Owner: mathcam Author(s): mathcam

264.2

valuation domain

An integral domain R is a valuation domain if for all a, b ∈ R, either a|b or b|a. 1137

Version: 3 Owner: mathcam Author(s): mathcam

1138

Chapter 265 16U20 – Ore rings, multiplicative sets, Ore localization 265.1

Goldie’s Theorem

Let R be a ring with an identity. Then R has a right classical ring of quotients Q which is semisimple Artinian if and only if R is a semiprime right Goldie ring. If this is the case, then the composition length of Q is equal to the uniform dimension of R. An immediate corollary of this is that a semiprime right noetherian ring always has a right classical ring of quotients. This result was discovered by Alfred Goldie in the late 1950’s. Version: 3 Owner: mclase Author(s): mclase

265.2

Ore condition

A ring R satisfies the left Ore condition (resp. right Ore condition) if and only if for all elements x and y with x regular, there exist elements u and v with v regular such that ux = vy

(resp.xu = yv).

A ring which satisfies the (left, right) Ore condition is called a (left, right) Ore ring. Version: 3 Owner: mclase Author(s): mclase

1139

265.3

Ore’s theorem

A ring has a (left, right) classical ring of quotients if and only if it satisfies the (left, right) Ore condition. Version: 3 Owner: mclase Author(s): mclase

265.4

classical ring of quotients

Let R be a ring. An element of R is called regular if it is not a right zero divisor or a left zero divisor in R. A ring Q ⊃ R is a left classical ring of quotients for R (resp. right classical ring of quotients for R) if it satisifies: • every regular element of R is invertible in Q • every element of Q can be written in the form x−1 y (resp. yx−1 ) with x, y ∈ R and x regular. If a ring R has a left or right classical ring of quotients, then it is unique up to isomorphism. If R is a commutative integral domain, then the left and right classical rings of quotients always exist – they are the field of fractions of R. For non-commutative rings, necessary and sufficient conditions are given by Ore’s theorem. Note that the goal here is to construct a ring which is not too different from R, but in which more elements are invertible. The first condition says which elements we want to be invertible. The second condition says that Q should contain just enough extra elements to make the regular elements invertible. Such rings are called classical rings of quotients, because there are other rings of quotients. These all attempt to enlarge R somehow to make more elements invertible (or sometimes to make ideals invertible). Finally, note that a ring of quotients is not the same as a quotient ring. Version: 2 Owner: mclase Author(s): mclase

1140

265.5

saturated

Let S be multiplicative subset of A. We say that S is a saturated if ab ∈ S ⇒ a, b ∈ S. When A is an integral domain, then S is saturated if and only if its complement A\S is union of prime ideals. Version: 1 Owner: drini Author(s): drini

1141

Chapter 266 16U70 – Center, normalizer (invariant elements) 266.1

center (rings)

If A is a ring, the center of A, sometimes denoted Z(A), is the set of all elements in A that commute with all other elements of A. That is, Z(A) = {a ∈ A | ax = xa∀x ∈ A} Note that 0 ∈ Z(A) so the center is non-empty. If we assume that A is a ring with a multiplicative unity 1, then 1 is in the center as well. The center of A is also a subring of A. Version: 3 Owner: dublisk Author(s): dublisk

1142

Chapter 267 16U99 – Miscellaneous 267.1

anti-idempotent

An element x of a ring is called an anti-idempotent element, or simply an anti-idempotent if x2 = −x. The term is most often used in linear algebra. Every anti-idempotent matrix over a field is diagonalizable. Two anti-idempotent matrices are similar if and only if they have the same rank. Version: 1 Owner: mathcam Author(s): mathcam

1143

Chapter 268 16W20 – Automorphisms and endomorphisms 268.1

ring of endomorphisms

Let R be a ring and let M be a right R-module. An endomorphism of M is a R-module homomorphism from M to itself. We shall write endomorphisms on the left, so that f : M → M maps x 7→ f (x). If f, g : M → M are two endomorphisms, we can add them: f + g : x 7→ f (x) + g(x) and multiply them f g : x 7→ f (g(x))

With these operations, the set of endomorphisms of M becomes a ring, which we call the ring of endomorphisms of M, written EndR (M). Instead of writing endomorphisms as functions, it is often convenient to write them multiplicatively: we simply write the application of the endomorphism f as x 7→ f x. Then the fact that each f is an R-module homomorphism can be expressed as: f (xr) = (f x)r for all x ∈ M and r ∈ R and f ∈ EndR (M). With this notation, it is clear that M becomes an EndR (M)-R-bimodule. Now, let N be a left R-module. We can construct the ring EndR (N) in the same way. There is a complication, however, if we still think of endomorphism as functions written on the left. In order to make M into a bimodule, we need to define an action of EndR (N) on the right of N: say x · f = f (x) 1144

But then we have a problem with the multiplication: x · f g = f g(x) = f (g(x)) but (x · f ) · g = f (x) · g = g(f (x))! In order to make this work, we need to reverse the order of composition when we define multiplication in the ring EndR (N) when it acts on the right. There are essentially two different ways to go from here. One is to define the muliplication in EndR (N) the other way, which is most natural if we write the endomorphisms as functions on the right. This is the approach taken in many older books. The other is to leave the muliplication in EndR (N) the way it is, but to use the opposite ring to define the bimodule. This is the approach that is generally taken in more recent works. Using this approach, we conclude that N is a R-EndR (N)op -bimodule. We will adopt this convention for the lemma below. Considering R as a right and a left module over itself, we can construct the two endomorphism rings EndR (RR ) and EndR (R R). Lemma 2. Let R be a ring with an identity element. Then R ' EndR (RR ) and R ' EndR (R R)op . D

efine ρr ∈ EndR (R R) by x 7→ xr.

A calculation shows that ρrs = ρs ρr (functions written on the left) from which it is easily seen that the map θ : r 7→ ρr is a ring homomorphism from R to EndR (R R)op . We must show that this is an isomorphism. If ρr = 0, then r = 1r = ρr (1) = 0. So θ is injective. Let f be an arbitrary element of EndR (R R), and let r = f (1). Then for any x ∈ R, f (x) = f (x1) = xf (1) = xr = ρr (x), so f = ρr = θ(r). The proof of the other isomorphism is similar. Version: 4 Owner: mclase Author(s): mclase

1145

Chapter 269 16W30 – Coalgebras, bialgebras, Hopf algebras ; rings, modules, etc. on which these act 269.1

Hopf algebra

A Hopf algebra is a bialgebra A over a field K with a K-linear map S : A → A, called the Definition 1. antipode, such that m ◦ (S ⊗ id) ◦ ∆ = η ◦ ε = m ◦ (id ⊗ S) ◦ ∆,

(269.1.1)

where m : A ⊗ A → A is the multiplication map m(a ⊗ b) = ab and η : K → A is the unit map η(k) = k1I. In terms of a commutative diagram: A ∆

A⊗A



A⊗A

ε

C

S⊗id

A⊗A

id⊗∗S

A⊗A

η m

m

A

1146

Example 1 (Algebra of functions on a finite group). Let A = C(G) be the algebra of complexvalued functions on a finite group G and identify C(G × G) with A ⊗ A. Then, A is a Hopf algebra with comultiplication (∆(f ))(x, y) = f (xy), counit ε(f ) = f (e), and antipode (S(f ))(x) = f (x−1 ). Example 2 (Group algebra of a finite group). Let A = CG be the complex group algebra of a finite group G. Then, A is a Hopf algebra with comultiplication ∆(g) = g ⊗ g, counit ε(g) = 1, and antipode S(g) = g −1 . The above two examples are dual to one another. Define a bilinear form C(G) ⊗ CG → C by hf, xi = f (x). Then, hf g, xi h1, xi h∆(f ), x ⊗ yi ε(f ) hS(f ), xi

= = = = =

hf ⊗ g, ∆(x)i, ε(x), hf, xyi, hf, ei, hf, S(x)i.

Example 3 (Polynomial functions on a Lie group). Let A = Poly(G) be the algebra of complex-valued polynomial functions on a complex Lie group G and identify Poly(G × G) with A ⊗ A. Then, A is a Hopf algebra with comultiplication (∆(f ))(x, y) = f (xy), counit ε(f ) = f (e), and antipode (S(f ))(x) = f (x−1 ). Example 4 (Universal enveloping algebra of a Lie algebra). Let A = U(g) be the universal enveloping algebra of a complex Lie algebra g. Then, A is a Hopf algebra with comultiplication ∆(X) = X ⊗ 1 + 1 ⊗ X, counit ε(X) = 0, and antipode S(X) = −X. The above two examples are dual to one another (if g is the Lie algebra of G). Define a d bilinear form Poly(G) ⊗ U(g) → C by hf, Xi = dt f (exp(tX)). t=0

Version: 6 Owner: mhale Author(s): mhale

269.2

almost cocommutative bialgebra

A bialgebra A is called almost cocommutative if there is an unit R ∈ A ⊗ A such that R∆(a) = ∆op (a)R

where ∆op is the opposite comultiplication (the usual comultiplication, composed with the flip map of the tensor product A ⊗ A). The element R is often called the R-matrix of A. The significance of the almost cocommutative condition is that σV,W = σ ◦ R : V ⊗ W → W ⊗ V gives a natural isomorphism of bialgebra representations, where V and W are Amodules, making the category of A-modules into a quasi-tensor or braided monoidal category. Note that σW,V ◦ σV,W is not necessarily the identity (this is the braiding of the category). Version: 2 Owner: bwebste Author(s): bwebste 1147

269.3

bialgebra

A Definition 2. bialgebra is a vector space that is both a unital algebra and a coalgebra, such that the comultiplication and counit are unital algebra homomorphisms. Version: 2 Owner: mhale Author(s): mhale

269.4

coalgebra

A Definition 3. coalgebra is a vector space A over a field K with a K-linear map ∆ : A → A ⊗ A, called the Definition 4. comultiplication, and a (non-zero) K-linear map ε : A → K, called the

Definition 5. counit, such that (∆ ⊗ id) ◦ ∆ = (id ⊗ ∆) ◦ ∆ (coassociativity),

(269.4.1)

(ε ⊗ id) ◦ ∆ = id = (id ⊗ ε) ◦ ∆.

(269.4.2)

In terms of commutative diagrams: A ∆



A⊗A

A⊗A ∆⊗id

id⊗∆

A⊗A⊗A A ∆



A⊗A

id

ε⊗id

A⊗A id⊗ε

A Let σ : A ⊗ A → A ⊗ A be the flip map σ(a ⊗ b) = b ⊗ a. A coalgebra is said to be Definition 6. cocommutative if σ ◦ ∆ = ∆. Version: 4 Owner: mhale Author(s): mhale 1148

269.5

coinvariant

Let V be a comodule with a right coaction t : V → V ⊗ A of a coalgebra A. An element v ∈ V is Definition 7. right coinvariant if t(v) = v ⊗ 1IA .

(269.5.1)

Version: 1 Owner: mhale Author(s): mhale

269.6

comodule

Let (A, ∆, ε) be a coalgebra. A Definition 8. right A-comodule is a vector space V with a linear map t : V → V ⊗A, called the Definition 9. right coaction, satisfying (t ⊗ id) ◦ t = (id ⊗ ∆) ◦ t,

(id ⊗ ε) ◦ t = id.

(269.6.1)

An A-comodule is also referred to as a corepresentation of A. Let V and W be two right A-comodules. Then V ⊕ W is also a right A-comodule. If A is a bialgebra then V ⊗ W is a right A-comodule as well (make use of the multiplication map A ⊗ A → A). Version: 2 Owner: mhale Author(s): mhale

269.7

comodule algebra

Let H be a bialgebra. A right H-comodule algebra is a unital algebra A which is a right H-comodule satisfying X t(ab) = t(a)t(b) = a(1) b(1) ⊗ a(2) b(2) , t(1IA ) = 1IA ⊗ 1IH , (269.7.1) for all h ∈ H and a, b ∈ A.

There is a dual notion of a H-module coalgebra. Example 5. Let H be a bialgebra. Then H is itself a H-comodule algebra for the right regular coaction t(h) = ∆(h). Version: 5 Owner: mhale Author(s): mhale 1149

269.8

comodule coalgebra

Let H be a bialgebra. A right H-comodule coalgebra is a coalgebra A which is a right H-comodule satisfying X (∆ ⊗ id)t(a) = a(1)(1) ⊗ a(2)(1) ⊗ a(1)(2) a(2)(2) , (ε ⊗ id)t(a) = ε(a)1IH , (269.8.1) for all h ∈ H and a ∈ A.

There is a dual notion of a H-module algebra. Example 6. Let H be a Hopf algebra. Then H is itself a H-comodule coalgebra for the adjoint coaction t(h) = h(2) ⊗ S(h(1) )h(3) . Version: 4 Owner: mhale Author(s): mhale

269.9

module algebra

Let H be a bialgebra. A left H-module algebra is a unital algebra A which is a left H-module satisfying X h . (ab) = (h(1) . a)(h(2) . b), h . 1IA = ε(h)1IA , (269.9.1) for all h ∈ H and a, b ∈ A.

There is a dual notion of a H-comodule coalgebra. ExampleP7. Let H be a Hopf algebra. Then H is itself a H-module algebra for the adjoint action g . h = g(1) hS(g(2) ).

Version: 4 Owner: mhale Author(s): mhale

269.10

module coalgebra

Let H be a bialgebra. A left H-module coalgebra is a coalgebra A which is a left Hmodule satisfying X ∆(h . a) = (h(1) . a(1) ) ⊗ (h(2) . a(2) ), ε(h . a) = ε(h)ε(a), (269.10.1) for all h ∈ H and a ∈ A.

There is a dual notion of a H-comodule algebra. Example 8. Let H be a bialgebra. Then H is itself a H-module coalgebra for the left regular action g . h = gh. Version: 5 Owner: mhale Author(s): mhale 1150

Chapter 270 16W50 – Graded rings and modules 270.1

graded algebra

An algebra A is graded if it is a graded module and satisfies Ap · Aq ⊆ Ap+q Examples of graded algebras include the polynomial ring k[X] being an N-graded k-algebra, and the exterior algebra. Version: 1 Owner: dublisk Author(s): dublisk

270.2

graded module

If R = R0 ⊕ R1 ⊕ · · · is a graded ring, then a graded module over R is a module M of the form M = ⊕∞ i=−∞ Mi and satisfies Ri Mj ⊆ Mi+j for all i, j. Version: 4 Owner: KimJ Author(s): KimJ

270.3

supercommutative

Let R be a Z2 -graded ring. Then R is supercommutative if for any homogeneous elements a and b ∈ R: ab = (−1)deg a deg b ba. 1151

This is, even homogeneous elements are in the center of the ring, and odd homogeneous elements anti-commute. Common examples of supercommutative rings are the exterior algebra of a module over a commutative ring (in particular, a vector space) and the cohomology ring of a topological space (both with the standard grading by degree reduced mod 2). Version: 1 Owner: bwebste Author(s): bwebste

1152

Chapter 271 16W55 – “Super” (or “skew”) structure 271.1

super tensor product

If A and B are Z-graded algebras, we define the super tensor product A ⊗su B to be the ordinary tensor product as graded modules, but with multiplication - called the super product - defined by 0 (a ⊗ b)(a0 ⊗ b0 ) = (−1)(deg b)(deg a ) aa0 ⊗ bb0

where a, a0 , b, b0 are homogeneous. The super tensor product of A and B is itself a graded algebra, as we grade the super tensor product of A and B as follows: (A ⊗su B)n =

a

p,q : p+q=n

Ap ⊗ B q

Version: 4 Owner: dublisk Author(s): dublisk

271.2

superalgebra

A graded algebra A is said to be a super algebra if it has a Z/2Z grading. Version: 2 Owner: dublisk Author(s): dublisk

1153

271.3

supernumber

Let ΛN be the Grassmann algebra generated by θi , i = 1 . . . N, such that θi θj = −θj θi and (θi )2 = 0. Denote by Λ∞ , the case of an infinite number of generators θi . A Definition 10. supernumber is an element of ΛN or Λ∞ . Any supernumber z can be expressed uniquely in the form 1 1 z = z0 + zi θi + zij θi θj + . . . + zi1 ...in θi1 . . . θin + . . . , 2 n! where the coefficients zi1 ...in ∈ C are antisymmetric in their indices. The Definition 11. body of z is defined as zB = z0 , and its Definition 12. soul is defined as zS = z − zB . If zB 6= 0 then z has an inverse given by z

−1

 k zS 1 X . − = zB k=0 zB

A supernumber can be decomposed into the even and odd parts 1 1 zeven = z0 + zij θi θj + . . . + zi1 ...i2n θi1 . . . θi2n + . . . , 2 (2n)! 1 1 zodd = zi θi + zijk θi θj θk + . . . + zi ...i θi1 . . . θi2n+1 + . . . . 6 (2n + 1)! 1 2n+1 Purely even supernumbers are called Definition 13. c-numbers, and odd supernumbers are called Definition 14. a-numbers. The superalgebra ΛN thus has a decomposition ΛN = Cc ⊕ Ca , where Cc is the space of c-numbers, and Ca is the space of a-numbers. Supernumbers are the generalisation of complex numbers to a commutative superalgebra of commuting and anticommuting “numbers”. They are primarily used in the description of fermionic fields in quantum field theory. Version: 5 Owner: mhale Author(s): mhale

1154

Chapter 272 16W99 – Miscellaneous 272.1

Hamiltonian quaternions

Definition of Q We define a unital associative algebra Q over R, of dimension 4, by the basis {1, i, j, k} and the multiplication table 1 i i −1 j −k k j

j k −1 −i

k −j i −1

(where the element in row x and column y is xy, not yx). Thus an arbitrary element of Q is of the form a1 + bi + cj + dk, a, b, c, d ∈ R (sometimes denoted by ha, b, c, di or by a+hb, c, di) and the product of two elements ha, b, c, di and hα, β, γ, δi is hw, x, y, zi where w x y z

= = = =

aα − bβ − cγ − dδ aβ + bα + cδ − dγ aγ − bδ + cα + kβ aδ + bγ − cβ + kα

The elements of Q are known as Hamiltonian quaternions. Clearly the subspaces of Q generated by {1} and by {1, i} are subalgebras isomorphic to R and C respectively. R is customarily identified with the corresponding subalgebra of Q. (We

1155

shall see in a moment that there are other and less obvious embeddings of C in Q.) The real numbers commute with all the elements of Q, and we have λ · ha, b, c, di = hλa, λb, λc, λdi for λ ∈ R and ha, b, c, di ∈ Q. norm, conjugate, and inverse of a quaternion Like the complex numbers (C), the quaternions have a natural involution called the quaternion conjugate. If q = a1 + bi + cj + dk, then the quaternion conjugate of q, denoted q, is simply q = a1 − bi − cj − dk. One can readily verify that if q = a1 + bi + cj + dk, then qq = (a2 + b2 + c2 + d2 )1. (See Euler four-square identity.) √ This product is used to form a norm | · | on the algebra (or the ring) Q: We define kqk = s where qq = s1. If v, w ∈ Q and λ ∈ R, then 1. kvk > 0 with equality only if v = h0, 0, 0, 0i = 0 2. kλvk = |λ|kvk 3. kv + wk 6 kvk + kwk 4. kv · wk = kvk · kwk which means that Q qualifies as a normed algebra when we give it the norm | · |. Because the norm of any nonzero quaternion q is real and nonzero, we have qq qq = = h1, 0, 0, 0i 2 kqk kqk2 which shows that any nonzero quaternion has an inverse: q −1 =

q . kqk2

Other embeddings of C into Q One can use any non-zero q to define an embedding of C into Q. If n(z) is a natural embedding of z ∈ C into Q, then the embedding: z → qn(z)q −1 is also an embedding into Q. Because Q is an associative algebra, it is obvious that: (qn(a)q −1 )(qn(b)q −1 ) = q(n(a)n(b))q −1 1156

and with the distributive laws, it is easy to check that (qn(a)q −1 ) + (qn(b)q −1 ) = q(n(a) + n(b))q −1 Rotations in 3-space Let us write U = {q ∈ Q : ||q|| = 1} With multiplication, U is a group. Let us briefly sketch the relation between U and the group SO(3) of rotations (about the origin) in 3-space. An arbitrary element q of U can be expressed cos 2θ + sin θ2 (ai + bj + ck), for some real numbers θ, a, b, c such that a2 + b2 + c2 = 1. The permutation v 7→ qv of U thus gives rise to a permutation of the real sphere. It turns out that that permutation is a rotation. Its axis is the line through (0, 0, 0) and (a, b, c), and the angle through which it rotates the sphere is θ. If rotations F and G correspond to quaternions q and r respectively, then clearly the permutation v 7→ qrv corresponds to the composite rotation F ◦ G. Thus this mapping of U onto SO(3) is a group homomorphism. Its kernel is the subset {1, −1} of U, and thus it comprises a double cover of SO(3). The kernel has a geometric interpretation as well: two unit vectors in opposite directions determine the same axis of rotation. Version: 3 Owner: mathcam Author(s): Larry Hammick, patrickwonders

1157

Chapter 273 16Y30 – Near-rings 273.1

near-ring

A near-ring is a set N together with two binary operations, denoted + : N × N → N and · : N × N → N, such that 1. (a + b) + c = a + (b + c) and (a · b) · c = a · (b · c) for all a, b, c ∈ N (associativity of both operations) 2. There exists an element 0 ∈ N such that a + 0 = 0 + a = a for all a ∈ N (additive identity) 3. For all a ∈ N, there exists b ∈ N such that a + b = b + a = 0 (additive inverse) 4. (a + b) · c = (a · c) + (b · c) for all a, b, c ∈ N (right distributive law) Note that the axioms of a near-ring differ from those of a ring in that they do not require addition to be commutative, and only require distributivity on one side. Every element a in a near-ring has a unique additive inverse, denoted −a. We say N has an identity element if there exists an element 1 ∈ N such that a · 1 = 1 · a = a for all a ∈ N. We say N is distributive if a · (b + c) = (a · b) + (a · c) holds for all a, b, c ∈ N. We say N is commutative if a · b = b · a for all a, b ∈ N. A natural example of a near-ring is the following. Let (G, +) be a group (not necessarily abelian), and let M be the set of all functions from G to G. For two functions f and g in M define f + g ∈ M by (f + g)(x) = f (x) + g(x) for all x ∈ G. Then (M, +, ◦) is a near-ring with identity, where ◦ denotes composition of functions. Version: 13 Owner: yark Author(s): yark, juergen 1158

Chapter 274 17A01 – General theory 274.1

commutator bracket

Let A be an associative algebra over a field K. For a, b ∈ A, the element of A defined by [a, b] = ab − ba

is called the commutator of a and b. The corresponding bilinear operation is called the commutator bracket.

[−, −] : A × A → A

The commutator bracket is bilinear, skew-symmetric, and also satisfies the Jacobi identity. To wit, for a, b, c ∈ A we have [a, [b, c]] + [b, [c, a]] + [c, [a, b]] = 0.

The proof of this assertion is straightforward. Each of the brackets in the left-hand side expands to 4 terms, and then everything cancels. In categorical terms, what we have here is a functor from the category of associative algebras to the category of Lie algebras over a fixed field. The action of this functor is to turn an associative algebra A into a Lie algebra that has the same underlying vector space as A, but whose multiplication operation is given by the commutator bracket. It must be noted that this functor is right-adjoint to the universal enveloping algebra functor. Examples • Let V be a vector space. Composition endows the vector space of endomorphisms End V with the structure of an associative algebra. However, we could also regard End V as a Lie algebra relative to the commutator bracket: [X, Y ] = XY − Y X, 1159

X, Y ∈ End V.

• The algebra of differential operators has some interesting properties when viewed as a Lie algebra. The fact is that even though, even though the composition of differential operators is a non-commutative operation, it is commutative when restricted to the highest order terms of the involved operators. Thus, if X, Y are differential operators of order p and q, respectively, the compositions XY and Y X have order p + q. Their highest order term coincides, and hence the commutator [X, Y ] has order p + q − 1. • In light of the preceding comments, it is evident that the vector space of first-order differential operators is closed with respect to the commutator bracket. Specializing even further we remark that, a vector field is just a homogeneous first-order differential operator, and that the commutator bracket for vector fields, when viewed as first-order operators, coincides with the usual, geometrically motivated vector field bracket. Version: 4 Owner: rmilson Author(s): rmilson

1160

Chapter 275 17B05 – Structure theory 275.1

Killing form

Let g be a finite dimensional Lie algebra over a field k, and adX : g → g be the adjoint action, adX Y = [X, Y ]. Then the Killing form on g is a bilinear map Bg : g × g → k given by Bg(X, Y ) = tr(adX ◦ adY ). The Killing form is invariant and symmetric (since trace is symmetric). Version: 4 Owner: bwebste Author(s): bwebste

275.2

Levi’s theorem

Let g be a complex Lie algebra, r its radical. Then the extension 0 → r → g → g/r → 0 is split, i.e., there exists a subalgebra h of g mapping isomorphically to g/r under the natural projection. Version: 2 Owner: bwebste Author(s): bwebste

275.3

nilradical

Let g be a Lie algebra. Then the nilradical n of g is defined to be the intersection of T the kernels of all the irreducible representations of g. Equivalently, n = [g, g] rad g, the 1161

interesection of the derived ideal and radical of g. Version: 1 Owner: bwebste Author(s): bwebste

275.4

radical

Let g be a Lie algebra. Since the sum of any two solvable ideals of g is in turn solvable, there is a unique maximal solvable ideal of any Lie algebra. This ideal is called the radical of g. Note that g/rad g has no solvable ideals, and is thus semi-simple. Thus, every Lie algebra is an extension of a semi-simple algebra by a solvable one. Version: 2 Owner: bwebste Author(s): bwebste

1162

Chapter 276 17B10 – Representations, algebraic theory (weights) 276.1

Ado’s theorem

Every finite dimensional Lie algebra has a faithful finite dimensional representation. In other words, every finite dimensional Lie algebra is a matrix algebra. This result is not true for Lie groups. Version: 2 Owner: bwebste Author(s): bwebste

276.2

Lie algebra representation

A representation of a Lie algebra g is a Lie algebra homomorphism ρ : g → End V, where End V is the commutator Lie algebra of some vector space V . In other words, ρ is a linear mapping that satisfies ρ([a, b]) = ρ(a)ρ(b) − ρ(b)ρ(a),

a, b ∈ g

Alternatively, one calls V a g-module, and calls ρ(a), a ∈ g the action of a on V . We call the representation faithful if ρ is injective. A invariant subsspace or sub-module W ⊂ V is a subspace of V satisfying ρ(a)(W ) ⊂ W for all a ∈ g. A representation is called irreducible or simple if its only invariant subspaces are {0} and the whole representation. 1163

The dimension of V is called the dimension of the representation. If V is infinite-dimensional, then one speaks of an infinite-dimensional representation. Given a representation or pair of representation, there are a couple of operations which will produce other representations: First there is direct sum. If ρ : g → End(V ) and σ : g → End(W ) are representations, then V ⊕ W has the obvious Lie algebra action, by the embedding End(V ) × End(W ) ,→ End(V ⊕ W ). Version: 9 Owner: bwebste Author(s): bwebste, rmilson

276.3

adjoint representation

Let g be a Lie algebra. For every a ∈ g we define the adjoint endomorphism, a.k.a. the adjoint action, ad(a) : g → g to be the linear transformation with action ad(a) : b 7→ [a, b],

b ∈ g.

The linear mapping ad : g → End(g) with action a 7→ ad(a),

a∈g

is called the adjoint representation of g. The fact that ad defines a representation is a straight-forward consequence of the Jacobi identity axiom. Indeed, let a, b ∈ g be given. We wish to show that ad([a, b]) = [ad(a), ad(b)], where the bracket on the left is the g multiplication structure, and the bracket on the right is the commutator bracket. For all c ∈ g the left hand side maps c to [[a, b], c], while the right hand side maps c to [a, [b, c]] + [b, [a, c]]. Taking skew-symmetry of the bracket as a given, the equality of these two expressions is logically equivalent to the Jacobi identity: [a, [b, c]] + [b, [c, a]] + [c, [a, b]] = 0. Version: 2 Owner: rmilson Author(s): rmilson 1164

276.4

examples of non-matrix Lie groups

While most well-known Lie groups are matrix groups, there do in fact exist Lie groups which are not matrix groups. That is, they have no faithful finite dimensional representations. For example, let H be the real Heisenberg group

and Γ the discrete subgroup

    1 a b  H = 0 1 c  | a, b, c ∈ R ,   0 0 1      1 0 n Γ = 0 1 0  | n ∈ Z .   0 0 1

The subgroup Γ is central, and thus normal. The Lie group H/Γ has no faithful finite dimensional representations over R or C. Another example is the universal cover of SL2 R. SL2 R is homotopy equivalent to a circle, and thus π(SL2 R) ∼ = Z, and thus has an infinite-sheeted cover. Any real or complex representation of this group factors through the projection map to SL2 R. Version: 3 Owner: bwebste Author(s): bwebste

276.5

isotropy representation

Let g be a Lie algebra, and h ⊂ g a subalgebra. The isotropy representation of h relative to g is the naturally defined action of h on the quotient vector space g/h. Here is a synopsis of the technical details. As is customary, we will use b + h, b ∈ g to denote the coset elements of g/h. Let a ∈ h be given. Since h is invariant with respect to adg(a), the adjoint action factors through the quotient to give a well defined endomorphism of g/h. The action is given by b + h 7→ [a, b] + h, This is the action alluded to in the first paragraph. Version: 3 Owner: rmilson Author(s): rmilson 1165

b ∈ g.

Chapter 277 17B15 – Representations, analytic theory 277.1

invariant form (Lie algebras)

Let V be a representation of a Lie algebra g over a field k. Then a bilinear form B : V ×V → k is invariant if B(Xv, w) + B(v, Xw) = 0. for all X ∈ g, v, w ∈ V . This criterion seems a little odd, but in the context of Lie algebras, ˜ : V → V ∗ given by v 7→ B(·, v) is equivariant if and it makes sense. For example, the map B only if B is an invariant form. Version: 2 Owner: bwebste Author(s): bwebste

1166

Chapter 278 17B20 – Simple, semisimple, reductive (super)algebras (roots) 278.1

Borel subalgebra

Let g be a semi-simple Lie group, h a Cartan subalgebra, R the associated root system and R+ ⊂ R a set of positive roots. We have a root decomposition into the Cartan subalgebra and the root spaces gα ! M g=h⊕ gα . α∈R

Now let b be the direct sum of the Cartan subalgebra and the positive root spaces.   M b=h⊕ gβ  . β∈R+

This is called a Borel subalgebra.

Version: 2 Owner: bwebste Author(s): bwebste

278.2

Borel subgroup

Let G be a complex semi-simple Lie group. Then any maximal solvable subgroup B 6 G is called a Borel subgroup. All Borel subgroups of a given group are conjugate. Any Borel group is connected and equal to its own normalizer, and contains a unique Cartan subgroup. The intersection of B with a maximal compact subgroup K of G is the maximal torus of K. If G = SLn C, then the standard Borel subgroup is the set of upper triangular matrices. 1167

Version: 2 Owner: bwebste Author(s): bwebste

278.3

Cartan matrix

Let R ⊂ E be a reduced root system, with E a euclidean vector space, with inner product (·, ·), and let Π = {α1 , · · · , αn } be a base of this root system. Then the Cartan matrix of the root system is the matrix   2(αi , αj ) Ci,j = . (αi , αi )

The Cartan matrix uniquely determines the root system, and is unique up to simultaneous permutation of the rows and columns. It is also the basis change matrix from the basis of fundamental weights to the basis of simple roots in E. Version: 1 Owner: bwebste Author(s): bwebste

278.4

Cartan subalgebra

Let g be a Lie algebra. Then a Cartan subalgebra is a maximal subalgebra of g which is selfnormalizing, that is, if [g, h] ∈ h for all h ∈ h, then g ∈ h as well. Any Cartan subalgebra h is nilpotent, and if g is semi-simple, it is abelian. All Cartan subalgebras of a Lie algebra are conjugate by the adjoint action of any Lie group with algebra g. Version: 3 Owner: bwebste Author(s): bwebste

278.5

Cartan’s criterion

A Lie algebra g is semi-simple if and only if its Killing form Bg is nondegenerate. Version: 2 Owner: bwebste Author(s): bwebste

278.6

Casimir operator

Let g be a semisimple Lie algebra, and let (·, ·) denote the Killing form. If {gi} is a basis of g, then there is a dual {g i} with respect to the Killing form, i.e., (gi , g j ) = δij . Consider P basis the element Ω = gi g i of the universal enveloping algebra of g. This element, called the Casimir operator is central in the enveloping algebra, and thus commutes with the g action on any representation. 1168

Version: 2 Owner: bwebste Author(s): bwebste

278.7

Dynkin diagram

Dynkin diagrams are a combinatorial way of representing the imformation in a root system. Their primary advantage is that they are easier to write down, remember, and analyze than explicit representations of a root system. They are an important tool in the classification of simple Lie algebras. Given a reduced root system R ⊂ E, with E an inner-product space, choose a base or simple roots Π (or equivalently, a set of positive roots R+ ). The Dynkin diagram associated to R is a graph whose vertices are Π. If πi and πj are distinct elements of the root system, we −4(πi ,πj )2 add mij = (πi ,πi)(π lines between them. This number is obivously positive, and an integer j ,πj ) since it is the product of 2 quantities that the axioms of a root system require to be integers. By the Cauchy-Schwartz inequality, and the fact that simple roots are never anti-parallel (they are all strictly contained in some half space), mij ∈ {0, 1, 2, 3}. Thus Dynkin diagrams are finite graphs, with single, double or triple edges. Fact, the criteria are much stronger than this: if the multiple edges are counted as single edges, all Dynkin diagrams are trees, and have at most one multiple edge. In fact, all Dynkin diagrams fall into 4 infinite families, and 5 exceptional cases, in exact parallel to the classification of simple Lie algebras. (Does anyone have good Dynkin diagram pictures? I’d love to put some up, but am decidedly lacking.) Version: 1 Owner: bwebste Author(s): bwebste

278.8

Verma module

Let g be a semi-simple Lie algebra, h a Cartan subalgebra, and b a Borel subalgebra. Let Fλ for a weight λ ∈ h∗ be the 1-d dimensional b module on which h acts by multiplication by λ, and the positive root spaces act trivially. Now, the Verma module Mλ of the weight λ is the g module Mλ = Fλ ⊗U(b) U(g). This is an infinite dimensional representation, and it has a very important property: If V is any representation with highest weight λ, there is a surjective homomorphism Mλ → V . That is, all representations with highest weight λ are quotients of Mλ . Also, Mλ has a unique maximal submodule, so there is a unique irreducible representation with highest weight λ. Version: 1 Owner: bwebste Author(s): bwebste 1169

278.9

Weyl chamber

If R ⊂ E is a root system, with E a euclidean vector space, and R+ is a set of positive roots, then the positive Weyl chamber is the set C = {e ∈ E|(e, α) > 0 ∀α ∈ R+ }. The interior of C is a fundamental domain for the action of the Weyl group on E. The image w(C) of C under the any element of the Weyl group is called a Weyl chamber. The Weyl group W acts simply transitively on the set of Weyl chambers. A weight which lies inside the positive Weyl chamber is called dominant Version: 2 Owner: bwebste Author(s): bwebste

278.10

Weyl group

The Weyl group WR of a root system R ⊂ E, where E is a euclidean vector space, is the subgroup of GL(E) generated by reflection in the hyperplanes perpendicular to the roots. The map of reflection in a root α is given by rα (v) = v − 2

(v, α) (α, α)

. The Weyl group is generated by reflections in the simple roots for any choice of a set of positive roots. There is a well-defined length function ` : WR → Z, where `(w) is the minimal number of reflections in simple roots that w can be written as. This is also the number of positive roots that w takes to negative roots. Version: 1 Owner: bwebste Author(s): bwebste

278.11

Weyl’s theorem

Let g be a finite dimensional semi-simple Lie algebra. Then any finite dimensional representation of g is completely reducible. Version: 1 Owner: bwebste Author(s): bwebste

1170

278.12

classification of finite-dimensional representations of semi-simple Lie algebras

If g is a semi-simple Lie algebra, then we say that an irreducible representation V has highest weight λ, if there is a vector v ∈ Vλ , the weight space of λ, such that Xv = 0 for X in any positive root space, and v is called a highest vector, or vector of highest weight. There is a unique (up to isomorphism) irreducible finite dimensional representation of g with highest weight λ for any dominant weight λ ∈ ΛW , where ΛW is the weight lattice of g, and every irreducible representation of g is of this type. Version: 1 Owner: bwebste Author(s): bwebste

278.13

cohomology of semi-simple Lie algebras

There are some important facts that make the cohomology of semi-simple Lie algebras easier to deal with than general Lie algebra cohomology. In particular, there are a number of vanishing theorems. First of all, let g be a finite-dimensional, semi-simple Lie algebra over C. Theorem. Let M be an irreducible representation of g. Then H n (g, M) = 0 for all n. Whitehead’s lemmata. Let M be any representation of g, then H 1 (g, M) = H 2 (g, M) = 0. Whitehead’s lemmata lead to two very important results. From the vanishing of H 1 , we can derive Weyl’s theorem, the fact that representations of semi-simple Lie algebras are completely reducible, since extensions of M by N are classified by H 1 (g, HomMN). And from the vanishing of H 2 , we obtain Levi’s theorem, which states that every Lie algebra is a split extension of a semi-simple algebra by a solvable algebra since H 2 (g, M) classifies extensions of g by M with a specified action of g on M. Version: 2 Owner: bwebste Author(s): bwebste

278.14

nilpotent cone

Let g be a finite dimensional semisimple Lie algebra. Then the nilpotent cone N of g is set of elements which act nilpotently on all representations of g. This is a irreducible subvariety of g (considered as a k-vector space), which is invariant under the adjoint action of G on g (here G is the adjoint group associated to g).

1171

Version: 3 Owner: bwebste Author(s): bwebste

278.15

parabolic subgroup

Let G be a complex semi-simple Lie group. Then any subgroup P of G containg a Borel subgroup B is called parabolic. Parabolics are classified in the following manner. Let g be the Lie algebra of G, h the unique Cartan subalgebra contained in b, the algebra of B, R the set of roots corresponding to this choice of Cartan, and R+ the set of positive roots whose root spaces are contained in b and let p be the Lie algebra of P . Then there exists a unique subset ΠP of Π, the base of simple roots associated to this choice of positive roots, such that {b, g−α }α∈ΠP generates p. In other words, parabolics containing a single Borel subgroup are classified by subsets of the Dynkin diagram, with the empty set corresponding to the Borel, and the whole graph corresponding to the group G.



Version: 1 Owner: bwebste Author(s): bwebste

278.16

pictures of Dynkin diagrams

Here is a complete list of connected Dynkin diagrams. In general if the name of a diagram has n as a subscript then there are n dots in the diagram. There are four infinite series that correspond to classical complex (that is over C) simple Lie algebras. No pan intended. • An , for n > 1 represents the simple complex Lie algebra sln+1 : A1 A2 A3 An

• Bn , for n > 1 represents the simple complex Lie algebra so2n+1 : • Cn , for n > 1 represents the simple complex Lie algebra sp2n :

1172

B1 B2 B3 Bn C1 C2 C3 Cn

     

• Dn , for n > 3 represents the simple complex Lie algebra so2n : D3

D4

D5

Dn

And then there are the exceptional cases that come in finite families. The corresponding Lie algebras are usually called by the name of the diagram.

• There is the E series that has three members: E6 which represents a 78–dimensional Lie algebra, E7 which represents a 133–dimensional Lie algebra, and E8 which represents a 248–dimensional Lie algebra. 1173

E6

E7

E8

   

• There is the F4 diagram which represents a 52–dimensional complex simple Lie algebra:

F4

• And finally there is G2 that represents a 14–dimensional Lie algebra. G2

Notice the low dimensional coincidences: A1 = B1 = C1 which reflects the exceptional isomorphisms sl2 ∼ = so3 ∼ = sp2 . Also reflecting the isomorphism And, reflecting

B2 ∼ = C2 so5 ∼ = sp4 . A3 ∼ = D3 sl4 ∼ = so6 . 1174



Remark 1. Often in the literature the listing of Dynkin diagrams is arranged so that there are no “intersections” between different families. However by allowing intersections one gets a graphical representation of the low degree isomorphisms. In the same vein there is a graphical representation of the isomorphism so4 ∼ = sl2 × sl2 .

Namely, if not for the requirement that the families consist of connected diagrams, one could start the D family with

D2

which consists of two disjoint copies of A2 .

Version: 9 Owner: Dr Absentius Author(s): Dr Absentius

278.17

positive root

If R ⊂ E is a root system, with E a euclidean vector space, then a subset R+ ⊂ R is called a set of positive roots if there is a vector v ∈ E such that (α, v) > 0 if α ∈ R+ , and (α, v) < 0 if α ∈ R\R+ . roots which are not positive are called negative. Since −α is negative exactly when α is positive, exactly half the roots must be positive. Version: 2 Owner: bwebste Author(s): bwebste

278.18

rank

Let lg be a finite dimensional Lie algebra. One can show that all Cartan subalgebras h ⊂ lg have the same dimension. The rank of lg is defined to be this dimension. Version: 5 Owner: rmilson Author(s): rmilson

278.19

root lattice

If R ⊂ E is a root system, and E a euclidean vector space, then the root lattice ΛR of R is the subset of E generated by R as an abelian group. In fact, this group is free on the simple roots, and is thus a full sublattice of E. 1175

Version: 1 Owner: bwebste Author(s): bwebste

278.20

root system

Root systems are sets of vectors in a Euclidean space which are used classify simple Lie algebras, and to understand their representation theory, and also in the theory of reflection groups. Axiomatically, an (abstract) root system R is a set of vectors in a euclidean vector space E with inner product (·, ·), such that: 1. R spans the vector space E. 2. if α ∈ R, then reflection in the hyperplane orthogonal to α preserves R. (α,β) 3. if α, β ∈ R, then 2 (α,α) is an integer.

Axiom 3 is sometimes dropped when dealing with reflection groups, but it is necessary for the root systems which arise in connection with Lie algebras. Additionally, a root system is called reduced if for all α ∈ R, if kα ∈ R, then k = ±1. We call a root system indecomposable if there is no proper subset R0 ⊂ R such that every vector in R0 is orthogonal to R. Root systems arise in the classification of semi-simple Lie algebras in the following manner: If g is a semi-simple complex Lie algebra, then one can choose a maximal self-normalizing subalgebra of g (alternatively, this is the commutant of an element with commutant of minimal dimension), called a Cartan subalgebra, traditionally denote h. These act on g by the adjoint action by diagonalizable linear maps. Since these maps all commute, they are all simultaneously diagonalizable. The simultaneous eigenspaces of this action are called root spaces, and the decomposition of g into h and the root spaces is called a root decompositon of g. It turns out that all root spaces are all one dimensional. Now, for each eigenspace, we have a map λ : h → C, given by Hv = λ(H)v for v an element of that eigenspace. The set R ⊂ h∗ of these λ is called the root system of the algebra g. The Cartan subalgebra h has a natural inner product (the Killing form), which in turn induces an inner product on h∗ . With respect to this inner product, the root system R is an abstract root system, in the sense defined up above. Conversely, given any abstract root system R, there is a unique semi-simple complex Lie algebra g such that R is its root system. Thus to classify complex semi-simple Lie algebras, we need only classify roots systems, a somewhat easier task. Really, we only need to classify indecomposable root systems, since all other root systems are built out of these. The Lie algebra corresponding to a root system is simple if and only if the associated root system is indecomposable. 1176

By convention e1 , . . . , en are orthonormal vectors, and the subscript on the name of the root system is the dimension of the space it is contained in, also called the rank of the system, and the indices i and j will run from 1 to n. There are four infinite series of indecomposable root systems : P • An = {ei − ej , δ + ei }i6=j , where δ = nk=1 ek . This system corresponds to sl2 C. S • Bn = {±ei ± ej }i<j {ei }. This system corresponds to so2n+1 C. S • Cn = {±ei ± ej }i<j {2ei }. This system corresponds to sp2n C. • Dn = {±ei ± ej }i<j . This sytem corresponds to so2n C.

and there are five exceptional root systems G2 , F4 , E6 , E7 , E8 , with five corresponding exceptional algebras, generally denoted by the same letter in lower-case Fraktur (g2 , etc.). Version: 3 Owner: bwebste Author(s): bwebste

278.21

simple and semi-simple Lie algebras

A Lie algebra is called simple if it has no proper ideals and is not abelian. A Lie algebra is called semisimple if it has no proper solvable ideals and is not abelian. Let k = R or C. Examples of simple algebras are sln k, the Lie algebra of the special linear group (traceless matrices), son k, the Lie algebra of the special orthogonal group (skew-symmetric matrices), and sp2n k the Lie algebra of the symplectic group. Over R, there are other simple Lie algebas, such as sun , the Lie algebra of the special unitary group (skew-Hermitian matrices). Any semisimple Lie algebra is a direct product of simple Lie algebras. Simple and semi-simple Lie algebras are one of the most widely studied classes of algebras for a number of reasons. First of all, many of the most interesting Lie groups have semi-simple Lie algebras. Secondly, their representation theory is very well understood. Finally, there is a beautiful classification of simple Lie algebras. Over C, there are 3 infinite series of simple Lie algebras: sln , son and sp2n , and 5 exceptional simple Lie algebras g2 , f4 , e6 , e7 , and e8 . Over R the picture is more complicated, as several different Lie algebras can have the same complexification (for example, sun and sln R both have complexification sln C). Version: 3 Owner: bwebste Author(s): bwebste

1177

278.22

simple root

Let R ⊂ E be a root system, with E a euclidean vector space. If R+ is a set of positive roots, then a root is called simple if it is positive, and not the sum of any two positive roots. The simple roots form a basis of the vector space E, and any positive root is a positive integer linear combination of simple roots. A set of roots which is simple with respect to some choice of a set of positive roots is called a base. The Weyl group of the root system acts simply transitively on the set of bases. Version: 1 Owner: bwebste Author(s): bwebste

278.23

weight (Lie algebras)

Let g be a semi-simple Lie algebra. Choose a Cartan subalgebra h. Then a weight is simply an element of the dual h∗ . Weights arise in the representation theory of semi-simple Lie algebras in the following manner: The elements of h must act on V by diagonalizable (also called semi-simple) linear transformations. Since h is abelian, these must be simultaneously diagonalizable. Thus, V decomposes as the direct sum of simultaneous eigenspaces for h. Let V be such an eigenspace. Then the map λ defined by λ(H)v = Hv is a linear functional on h, and thus a weight, as defined above. The maximal eigenspace Vλ with weight λ is called the weight space of λ. The dimension of Vλ is called the multiplicity of λ. A representation of a semi-simple algebra is determine by the multiplicities of its weights. Version: 3 Owner: bwebste Author(s): bwebste

278.24

weight lattice

The weight lattice ΛW of a root system R ⊂ E is the dual lattice to ΛR , the root lattice of R. That is, ΛW = {e ∈ E|(e, r) ∈ Z}. Weights which lie in the weight lattice are called integral. Since the simple roots are free generators of the root lattice, one need only check that (e, π) ∈ Z for all simple roots π. If R ⊂ h is the root system of a semi-simple Lie algebra g with Cartan subalgebra h, then ΛW is exactly the set of weights appearing in finite dimensional representations of g. Version: 4 Owner: bwebste Author(s): bwebste

1178

Chapter 279 17B30 – Solvable, nilpotent (super)algebras 279.1

Engel’s theorem

Before proceeding, it will be useful to recall the definition of a nilpotent Lie algebra. Let g be a Lie algebra. The lower central series of g is defined to be the filtration of ideals D0 g ⊃ D1 g ⊃ D2 g ⊃ . . . , where D0 g = g,

Dk+1g = [g, Dk g],

k ∈ N.

To say that g is nilpotent is to say that the lower central series has a trivial termination, i.e. that there exists a k such that Dk g = 0, or equivalently, that k nested bracket operations always vanish. Theorem 1 (Engel). Let g ⊂ End V be a Lie algebra of endomorphisms of a finite-dimensional vector space V . Suppose that all elements of g are nilpotent transformations. Then, g is a nilpotent Lie algebra. Lemma 3. Let X : V → V be a nilpotent endomorphism of a vector space V . Then, the adjoint action ad(X) : End V → End V is also a nilpotent endomorphism.

Proof. Suppose that Xk = 0 1179

for some k ∈ N. We will show that ad(X)2k−1 = 0. Note that ad(X) = l(X) − r(X), where l(X), r(X) : End V → End V, are the endomorphisms corresponding, respectively, to left and right multiplication by X. These two endomorphisms commute, and hence we can use the binomial formula to write ad(X)

2k−1

=

2k−1 X

(−1)i l(X)2k−1−i r(X)i .

i=0

Each of terms in the above sum vanishes because l(X)k = r(X)k = 0. QED Lemma 4. Let g be as in the theorem, and suppose, in addition, that g is a nilpotent Lie algebra. Then the joint kernel, \ ker g = ker a, a∈g

is non-trivial.

Proof. We proceed by induction on the dimension of g. The claim is true for dimension 1, because then g is generated by a single nilpotent transformation, and all nilpotent transformations are singular. Suppose then that the claim is true for all Lie algebras of dimension less than n = dim g. We note that D1 g fits the hypotheses of the lemma, and has dimension less than n, because g is nilpotent. Hence, by the induction hypothesis V0 = ker D1 g is non-trivial. Now, if we restrict all actions to V0 , we obtain a representation of g by abelian transformations. This is because for all a, b ∈ g and v ∈ V0 we have abv − bav = [a, b]v = 0. Now a finite number of mutually commuting linear endomorphisms admits a mutual eigenspace decomposition. In particular, if all of the commuting endomorphisms are singular, their joint kernel will be non-trivial. We apply this result to a basis of g/D1 g acting on V0 , and the desired conclusion follows. QED 1180

Proof of the theorem. We proceed by induction on the dimension of g. The theorem is true in dimension 1, because in that circumstance D1 g is trivial. Next, suppose that the theorem holds for all Lie algebras of dimension less than n = dim g. Let h ⊂ g be a properly contained subalgebra of minimum codimension. We claim that there exists an a ∈ g but not in h such that [a, h] ⊂ h. By the induction hypothesis, h is nilpotent. To prove the claim consider the isotropy representation of h on g/h. By Lemma 1, the action of each a ∈ h on g/h is a nilpotent endomorphism. Hence, we can apply Lemma 2 to deduce that the joint kernel of all these actions is nontrivial, i.e. there exists a a ∈ g but not in h such that [b, a] ⇔ 0 (mod#1) , for all b ∈ h. Equivalently, [h, a] ⊂ h and the claim is proved. Evidently then, the span of a and h is a subalgebra of g. Since h has minimum codimension, we infer that h and a span all of g, and that D1 g ⊂ h.

(279.1.1)

Next, we claim that all the Dk h are ideals of g. It is enough to show that [a, Dk h] ⊂ Dk h. We argue by induction on k. Suppose the claim is true for some k. Let b ∈ h, c ∈ Dk h be given. By the Jacobi identity [a, [b, c]] = [[a, b], c] + [b, [a, c]]. The first term on the right hand-side in Dk+1h because [a, b] ∈ h. The second term is in Dk+1 h by the induction hypothesis. In this way the claim is established. Now a is nilpotent, and hence by Lemma 1, ad(a)n = 0 for some n ∈ N. We now claim that Dn+1 g ⊂ D1 h. By (278.1.1) it suffices to show that n times

Putting

z }| { [g, [. . . [g, h] . . .]] ⊂ D1 h. g1 = g/D1 h,

h1 = h/D1 h,

1181

(279.1.2)

this is equivalent to n times

z }| { [g1 , [. . . [g1 , h1 ] . . .]] = 0.

However, h1 is abelian, and hence, the above follows directly from (278.1.2). Adapting this argument in the obvious fashion we can show that Dkn+1g ⊂ Dk h. Since h is nilpotent, g must be nilpotent as well. QED

Historical remark. In the traditional formulation of Engel’s theorem, the hypotheses are the same, but the conclusion is that there exists a basis B of V , such that all elements of g are represented by nilpotent matrices relative to B. Let us put this another way. The vector space of nilpotent matrices Nil, is a nilpotent Lie algebra, and indeed all subalgebras of Nil are nilpotent Lie algebras. Engel’s theorem asserts that the converse holds, i.e. if all elements of a Lie algebra g are nilpotent transformations, then g is isomorphic to a subalgebra of Nil. The classical result follows straightforwardly from our version of the Theorem and from Lemma 2. Indeed, let V1 be the joint kernel g. We then let U2 be the joint kernel of g acting on V /V0 , and let V2 ⊂ V be the subspace obtained by pulling U2 x back to V . We do this a finite number of times and obtain a flag of subspaces 0 = V0 ⊂ V1 ⊂ V2 ⊂ . . . ⊂ Vn = V, such that gVk+1 = Vk for all k. The choose an adapted basis relative to this flag, and we’re done. Version: 2 Owner: rmilson Author(s): rmilson

279.2

Lie’s theorem

Let g be a finite dimensional complex solvable Lie algebra, and V a repesentation of g. Then there exists an element of V which is a simultaneous eigenvector for all elements of g. Applying this result inductively, we find that there is a basis of V with respect to which all elements of g are upper triangular. Version: 3 Owner: bwebste Author(s): bwebste

1182

279.3

solvable Lie algebra

Let g be a Lie algebra. The lower central series of g is the filtration of subalgebras D1 g ⊃ D2 g ⊃ D3 g ⊃ · · · ⊃ Dk g ⊃ · · · of g, inductively defined for every natural number k as follows: D1 g := [g, g] Dk g := [g, Dk−1 g] The upper central series of g is the filtration D1 g ⊃ D2 g ⊃ D3 g ⊃ · · · ⊃ Dk g ⊃ · · · defined inductively by D1 g := [g, g] Dk g := [Dk−1g, Dk−1 g] In fact both Dk g and Dk g are ideals of g, and Dk g ⊂ Dk g for all k. The Lie algebra g is defined to be nilpotent if Dk g = 0 for some k ∈ N, and solvable if Dk g = 0 for some k ∈ N. A subalgebra h of g is said to be nilpotent or solvable if h is nilpotent or solvable when considered as a Lie algebra in its own right. The terms may also be applied to ideals of g, since every ideal of g is also a subalgebra. Version: 1 Owner: djao Author(s): djao

1183

Chapter 280 17B35 – Universal enveloping (super)algebras 280.1

Poincar´ e-Birkhoff-Witt theorem

Let g be a Lie algebra over a field k, and let B be a k-basis of g equipped with a linear order 6. The Poincar´e-Birkhoff-Witt-theorem (often abbreviated to PBW-theorem) states that the monomials x1 x2 · · · xn with x1 6 x2 6 . . . 6 xn elements of B constitute a k-basis of the universal enveloping algebra U(g) of g. Such monomials are often called ordered monomials or PBW-monomials. It is easy to see that they span U(g): for all n ∈ N, let Mn denote the set

and denote by π :

S∞

n=0

Mn = {(x1 , . . . , xn ) | x1 6 . . . 6 xn } ⊂ B n , B n → U(g) the multiplication map. Clearly it suffices to prove that π(B n ) ⊆

n X

π(Mi )

i=0

for all n ∈ N; to this end, we proceed by induction. For n = 0 the statement is clear. Assume that it holds for n − 1 > 0, and consider a list (x1 , . . . , xn ) ∈ B n . If it is an element of Mn , then we are done. Otherwise, there exists an index i such that xi > xi+1 . Now we have π(x1 , . . . , xn ) = π(x1 , . . . , xi−1 , xi+1 , xi , xi+2 , . . . , xn ) + x1 · · · xi−1 [xi , xi+1 ]xi+1 · · · xn . As B is a basis of k, [xi , xi+1 ] is a P linear combination of B. Using this to expand the second n−1 term above, we find that it is in i=0 π(Mi ) by the induction hypothesis. The argument 1184

of π in the first term, on the other hand, is lexicographically smaller than (x1 , . . . , xn ), but contains the same entries. Clearly this rewriting proces must end, and this concludes the induction step. The proof of linear independence of the PBW-monomials is slightly more difficult. Version: 1 Owner: draisma Author(s): draisma

280.2

universal enveloping algebra

A universal enveloping algebra of a Lie algebra g over a field k is an associative algebra U (with unity) over k, together with a Lie algebra homomorphism ι : g → U (where the Lie algebra structure on U is given by the commutator), such that if A is a another associative algebra over k and φ : g → A is another Lie algebra homomorphism, then there exists a unique homomorphism ψ : U → A of associative algebras such that the diagram ι

g

U

φ

ψ

A commutes. Any g has a universal enveloping algebra: let T be the associative tensor algebra generated by the vector space g, and let I be the two-sided ideal of T generated by elements of the form xy − yx − [x, y] for x, y ∈ g;

then U = T /I is a universal enveloping algebra of g. Moreover, the universal property above ensures that all universal enveloping algebras of g are canonically isomorphic; this justifies the standard notation U(g). Some remarks:

1. By the Poincar´e-Birkhoff-Witt theorem, the map ι is injective; usually g is identified with ι(g). From the construction above it is clear that this space generates U(g) as an associative algebra with unity. 2. By definition, the (left) representation theory of U(g) is identical to that of g. In particular, any irreducible g-module corresponds to a maximal left ideal of U(g). Example: let g be the Lie algebra generated by the elements p, q, and e with Lie bracket determined by [p, q] = e and [p, e] = [q, e] = 0. Then U(g)/(e − 1) (where (e − 1) denotes the ∂ two-sided ideal generated by e − 1) is isomorphic to the skew polynomial algebra k[x, ∂x ], the isomorphism being determined by ∂ and p + (e − 1) 7→ ∂x q + (e − 1) 7→ x. 1185

Version: 1 Owner: draisma Author(s): draisma

1186

Chapter 281 17B56 – Cohomology of Lie (super)algebras 281.1

Lie algebra cohomology

Let g be a Lie algebra, and M a representation of g. Let M g = {m ∈ M : Xm = 0∀X ∈ g}. This is clearly a covariant functor. Call its derived functor Ri (−g) = H i(g, −) the Lie algebra cohomology of g with coefficients in M These cohomology groups have certain interpretations. For any Lie algebra, H 1 (g, k) ∼ = 2 g/[g, g], the abelianization of g, and H (g, M) is in natural bijection with Lie algebra extensions (thinking of M as an abelian Lie algebra) 0 → M → f → g → 0 such that the action of g on M induced by that of f coincides with that already specified. Version: 2 Owner: bwebste Author(s): bwebste

1187

Chapter 282 17B67 – Kac-Moody (super)algebras (structure and representation theory) 282.1

Kac-Moody algebra

Let A be an n × n generalized Cartan matrix. If n − r is the rank of A, then let h be a n + r dimensional complex vector space. Choose n linearly independent elements α1 , . . . , αn ∈ h∗ (called roots), and α ˇ1, . . . , α ˇ n ∈ h (called coroots) such that hαi , αˇj i = aij , where h·, ·i is the ∗ natural pairing of h and h. This choice is unique up to automorphisms of h. Then the Kac-Moody algebra associated to g(A) is the Lie algebra generated by elements X1 , . . . , Xn , Y1 , . . . , Yn and h, with the relations [Xi , Yi ] = αˇi [Xi , h] = αi (h)Xi [X , [Xi , · · · , [Xi , Xj ] · · · ]] = 0 | i {z }

[Xi , Yj ] = 0 [Yi , h] = −αi (h)Yi [Y , [Y , · · · , [Yi, Yj ] · · · ]] = 0 | i i{z }

1−aij times

1−aij times

If the matrix A is positive-definite, we obtain a finite dimensional semi-simple Lie algebra, and A is the Cartan matrix associated to a Dynkin diagram. Otherwise, the algebra we obtain is infinite dimensional and has an r-dimensional center. Version: 2 Owner: bwebste Author(s): bwebste

282.2

generalized Cartan matrix

A generalized Cartan matrix is a matrix A whose diagonal entries are all 2, and whose off-diagonal entries are nonpositive integers, such that aij = 0 if and only if aji = 0. Such a 1188

matrix is called symmetrizable if there is a diagonal matrix B such that AB is symmetric. Version: 2 Owner: bwebste Author(s): bwebste

1189

Chapter 283 17B99 – Miscellaneous 283.1

Jacobi identity interpretations

The Jacobi identity in a Lie algebra g has various interpretations that are more transparent, whence easier to remember, than the usual form [x, [y, z]] + [y, [z, x]] + [z, [x, y]] = 0. One is the fact that the adjoint representation ad : g → End(g) really is a representation. Yet another way to formulate the identity is ad(x)[y, z] = [ad(x)y, z] + [y, ad(x)z], i.e., ad(x) is a derivation on g for all x ∈ g. Version: 2 Owner: draisma Author(s): draisma

283.2

Lie algebra

A Lie algebra over a field k is a vector space g with a bilinear map [ , ] : g × g → g, called the Lie bracket and denoted (x, y) 7→ [x, y]. It is required to satisfy: 1. [x, x] = 0 for all x ∈ g. 2. The Jacobi identity: [x, [y, z]] + [y, [z, x]] + [z, [x, y]] = 0 for all x, y, z ∈ g.

1190

283.2.1

Subalgebras & Ideals

A vector subspace h of the Lie algebra g is a subalgebra if h is closed under the Lie bracket operation, or, equivalently, if h itself is a Lie algebra under the same bracket operation as g. An ideal of g is a subspace h for which [x, y] ∈ h whenever either x ∈ h or y ∈ h. Note that every ideal is also a subalgebra. Some general examples of subalgebras: • The center of g, defined by Z(g) := {x ∈ g | [x, y] = 0for all y ∈ g}. It is an ideal of g. • The normalizer of a subalgebra h is the set N(h) := {x ∈ g | [x, h] ⊂ h}. The Jacobi identity guarantees that N(h) is always a subalgebra of g. • The centralizer of a subset X ⊂ g is the set C(X) := {x ∈ g | [x, X] = 0}. Again, the Jacobi identity implies that C(X) is a subalgebra of g.

283.2.2

Homomorphisms

Given two Lie algebras g and g0 over the field k, a homomorphism from g to g0 is a linear transformation φ : g → g0 such that φ([x, y]) = [φ(x), φ(y)] for all x, y ∈ g. An injective homomorphism is called a monomorphism, and a surjective homomorphism is called an epimorphism. The kernel of a homomorphism φ : g → g0 (considered as a linear transformation) is denoted ker (φ). It is always an ideal in g.

283.2.3

Examples

• Any vector space can be made into a Lie algebra simply by setting [x, x] = 0 for all x. The resulting Lie algebra is called an abelian Lie algebra. • If G is a Lie group, then the tangent space at the identity forms a Lie algebra over the real numbers. • R3 with the cross product operation is a nonabelian three dimensional Lie algebra over R.

283.2.4

Historical Note

Lie algebras are so-named in honour of Sophus Lie, a Norwegian mathematician who pioneered the study of these mathematical objects. Lie’s discovery was tied to his investigation 1191

of continuous transformation groups and symmetries. One joint project with Felix Klein called for the classification of all finite-dimensional groups acting on the plane. The task seemed hopeless owing to the generally non-linear nature of such group actions. However, Lie was able to solve the problem by remarking that a transformation group can be locally reconstructed from its corresponding “infinitesimal generators”, that is to say vector fields corresponding to various 1–parameter subgroups. In terms of this geometric correspondence, the group composition operation manifests itself as the bracket of vector fields, and this is very much a linear operation. Thus the task of classifying group actions in the plane became the task of classifying all finite-dimensional Lie algebras of planar vector field; a project that Lie brought to a successful conclusion. This “linearization trick” proved to be incredibly fruitful and led to great advances in geometry and differential equations. Such advances are based, however, on various results from the theory of Lie algebras. Lie was the first to make significant contributions to this purely algebraic theory, but he was surely not the last. Version: 10 Owner: djao Author(s): djao, rmilson, nerdy2

283.3

real form

Let G be a complex Lie group. A real Lie group K called a real form of G if g ∼ = C ⊗R k, where g and k are the Lie algebras of G and K, respectively. Version: 2 Owner: bwebste Author(s): bwebste

1192

Chapter 284 18-00 – General reference works (handbooks, dictionaries, bibliographies, etc.) 284.1

Grothendieck spectral sequence

If F : C → D and G : D → E are two covariant left exact functors between abelian categories, and if F takes injective objects of C to G-acyclic objects of D then there is a spectral sequence for each object A of C: E2pq = (Rp G ◦ Rq F )(A) → Rp+q (G ◦ F )(A) If X and Y are topological spaces and C = Ab(X) is the category of sheaves of abelian groups on X and D = Ab(Y ) and E = Ab is the category of abelian groups, then for a continuous map f : X → Y we have a functor f∗ : Ab(X) → Ab(Y ), the direct image functor. We also have the global section functors ΓX : Ab(X) → Ab, and ΓY : Ab(Y ) → Ab. Then since ΓY ◦ f∗ = ΓX and we can verify the hypothesis (injectives are flasque, direct images of flasque sheaves are flasque, and flasque sheaves are acyclic for the global section functor), the sequence in this case becomes: H p (Y, Rq f∗ F) → H p+q (X, F) for a sheaf F of abelian groups on X, exactly the Leray spectral sequence. I can recommend no better book than Weibel’s book on homological algebra. Sheaf theory can be found in Hartshorne or in Godement’s book. Version: 5 Owner: bwebste Author(s): Manoj, ceps, nerdy2

1193

284.2

category of sets

The category of sets has as its objects all sets and as its morphisms functions between sets. (This works if a category’s objects are only required to be part of a class, as the class of all sets exists.) Alternately one can specify a universe, containing all sets of interest in the situation, and take the category to contain only sets in that universe and functions between those sets. Version: 1 Owner: nerdy2 Author(s): nerdy2

284.3

functor

Given two categories C and D, a covariant functor T : C → D consists of an assignment for each object X of C an object T (X) of D (i.e. a “function” T : Ob(C) → Ob(D)) together with an assignment for every morphism f ∈ HomC(A, B), to a morphism T (f ) ∈ HomD(T (A), T (B)), such that: • T (1A ) = 1T (A) where 1X denotes the identity morphism on the object X (in the respective category). • T (g ◦ f ) = T (g) ◦ T (f ), whenever the composition g ◦ f is defined. A contravariant functor T : C → D is just a covariant functor T : Cop → D from the opposite category. In other words, the assignment reverses the direction of maps. If f ∈ HomC(A, B), then T (f ) ∈ HomD(T (B), T (A)) and T (g ◦ f ) = T (f ) ◦ T (g) whenever the composition is defined (the domain of g is the same as the codomain of f ). Given a category C and an object X we always have the functor T : C → Sets to the category of sets defined on objects by T (A) = Hom(X, A). If f : A → B is a morphism of C, then we define T (f ) : Hom(X, A) → Hom(X, B) by g 7→ f ◦ g. This is a covariant functor, denoted by Hom(X, −). Similarly, one can define a contravariant functor Hom(−, X) : C → Sets. Version: 3 Owner: nerdy2 Author(s): nerdy2

284.4

monic

A morphism f : A → B in a category is called monic if for any object C and any morphisms g1 , g2 : C → A, if f ◦ g1 = f ◦ g2 then g1 = g2 . 1194

A monic in the category of sets is simply a one-to-one function. Version: 1 Owner: nerdy2 Author(s): nerdy2

284.5

natural equivalence

A natural transformation between functors τ : F → G is called a natural equivalence (or a natural isomorphism) if there is a natural transformation σ : G → F such that τ ◦ σ = idG and σ ◦ τ = idF where idF is the identity natural transformation on F (which for each object A gives the identity map F (A) → F (A)), and composition is defined in the obvious way (for each object compose the morphisms and it’s easy to see that this results in a natural transformation). Version: 2 Owner: mathcam Author(s): mathcam, nerdy2

284.6

representable functor

A contravariant functor T : C → Sets between a category and the category of sets is representable if there is an object X of C such that T is isomorphic to the functor X • = Hom(−, X). Similarly, a covariant functor is T called representable if it is isomorphic to X• = Hom(X, −). We say that the object X represents T . X is unique up to canonical isomorphism. A vast number of important objects in mathematics are defined as representing functors. For example, if F : C → D is any functor, then the adjoint G : D → C (if it exists) can be defined as follows. For Y in D, G(Y ) is the object of C representing the functor X 7→ Hom(F (X), Y ) if G is right adjoint to F or X 7→ Hom(Y, F (X)) if G is left adjoint. Thus, for example, if R is a ring, then N⊗M represents the functor L 7→ HomR (N, HomR (M, L)). Version: 3 Owner: bwebste Author(s): bwebste, nerdy2

284.7

supplemental axioms for an Abelian category

These are axioms introduced by Alexandre Grothendieck for an abelian category. The first two are satisfied by definition in an Abelian category, and others may or may not be. (Ab1) Every morphism has a kernel and a cokernel. 1195

(Ab2) Every monic is the kernel of its cokernel. (Ab3) coproducts exist. (Coproducts are also called direct sums.) If this axiom is satisfied the category is often just called cocomplete. (Ab3*) Products exist. If this axiom is satisfied the category is often just called complete. (Ab4) Coproducts exist and the coproduct of monics is a monic. (Ab4*) Products exist and the product of epics is an epic. (Ab5) Coproducts exist and filtered colimits of exact sequences are exact. (Ab5*) Products exist and filtered inverse limits of exact sequences are exact. Grothendieck introduced these in his homological algebra paper in the Tokohu Journal of Math. They can also be found in Weibel’s excellent homological algebra book. Version: 5 Owner: nerdy2 Author(s): nerdy2

1196

Chapter 285 18A05 – Definitions, generalizations 285.1

autofunctor

Let F : C → C be an endofunctor on a category C. If F is a bijection on objects, Ob(C), and morphisms, Mor(C), then it is an autofunctor. In short, an autofunctor is a full and faithful endofunctor F : C → C such that the mapping Ob(C) → Ob(C) which is induced by F is a bijection. An autofunctor F : C → C is naturally isomorphic to the identity functor idC. Version: 10 Owner: mathcam Author(s): mathcam, mhale, yark, gorun manolescu

285.2

automorphism

Roughly, an automorphism is a map from a mathematical object onto itself such that: 1. There exists an ”inverse” map such that the composition of the two is the identity map of the object, and 2. any relevent structure related to the object in question is preserved. In category theory, an automorphism of an object A in a category C is a morphishm ψ ∈ Mor(A, A) such that there exists another morphism φ ∈ Mor(A, A) and ψ ◦ φ = φ ◦ ψ = idA . For example in the category of groups an automorphism is just a bijective (inverse exists and composition gives the identity) group homomorphism (group structure is preserved). Concretely, the map: x 7→ −x is an automorphism of the additive group of real numbers. In the category of topological spaces an automorphism would be a bijective, continuous map such that it’s inverse map is also continuous (not guaranteed as in the group case). Concretely, the map ψ : S 1 → S 1 where ψ(α) = α + θ for some fixed angle θ is an automorphism of the topological space that is the circle. 1197

Version: 4 Owner: benjaminfjones Author(s): benjaminfjones

285.3

category

A category C consists of the following data: 1. a collection ob(C) of objects (of C) 2. for each ordered pair (A, B) of objects of C, a collection (we will assume it is a set) Hom(A, B) of morphisms from the domain A to the codomain B 3. a function ◦ : Hom(A, B) × Hom(B, C) → Hom(A, C) called composition. We normally denote ◦(f, g) by g ◦ f for morphisms f, g. The above data must satisfy the following axioms: for objects A, B, C, D, T A1: Hom(A, B) Hom(C, D) = ∅ whenever A 6= B or C 6= D A2: (associativity) if f ∈ Hom(A, B), g ∈ Hom(B, C) and h ∈ Hom(C, D), h ◦ (g ◦ f ) = (h ◦ g) ◦ f A3: (Existence of an identity morphism) for each object A there exists an identity morphism idA ∈ Hom(A, A) such that for every f ∈ Hom(A, B), f ◦ idA = f and idA ◦ g = g for every g ∈ Hom(B, A). Some examples of categories: • 0 is the empty category with no objects or morphisms, 1 is the category with one object and one (identity) morphism. • If we assume we have a universe U which contains all sets encountered in “everyday” mathematics, Set is the category of all such small sets with morphisms being set functions • Top is the category of all small topological spaces with morphisms continuous functions • Grp is the category of all small groups whose morphisms are group homomorphisms Version: 9 Owner: mathcam Author(s): mathcam, RevBobo

1198

285.4

category example (arrow category)

Let C be a category, and let D be the category whose objects are the arrows of C. A morphism between two morphisms f : A → B and g : A0 → B 0 is defined to be a couple of morphisms (h, k), where h ∈ Hom(A, A0 ) and k ∈ Hom(B, B 0 ) such that the following diagram A

h

A0 g

f

B

B0

k

commutes. The resulting category D is called the arrow category of C. Version: 6 Owner: n3o Author(s): n3o

285.5

commutative diagram

Definition 15. Let C be a category. A diagram in C is a directed graph Γ with vertex set V and edge set E, (“loops” and “parallel edges” are allowed) together with two maps o : V → Obj(C), m : E → Morph(C) such that if e ∈ E has source s(e) ∈ V and target t(e) ∈ V then m(e) ∈ HomC (s(e), t(e)). Usually diagrams are denoted by drawing the corresponding graph and labeling its vertices (respectively edges) with their images under o (respectively m), for example if f : A → B is a morphism A

f

B

is a diagram. Often (as in the previous example) the vertices themselves are not drawn since their position can b deduced by the position of their labels. Definition 16. Let D = (Γ, o, m) be a diagram in the category C and γ = (e1 , . . . , en ) be a path in Γ. Then the composition along γ is the following morphism of C ◦(γ) := m(en ) ◦ · · · ◦ m(e1 ) . We say that D is commutative or that it commutes if for any two objects in the image of o, say A = o(v1 ) and B = o(v2 ), and any two paths γ1 and γ2 that connect v1 to v2 we have ◦(γ1 ) = ◦(γ2 ) . For example the commutativity of the triangle f

A

B g

h

C 1199

translates to h = g ◦ f , while the commutativity of the square f

A

B g

k

C

D

h

translates to g ◦ f = h ◦ k. Version: 3 Owner: Dr Absentius Author(s): Dr Absentius

285.6

double dual embedding

Let V be a vector space over a field K. Recall that V ∗ , the dual space, is defined to be the vector space of all linear forms on V . There is a natural embedding of V into V ∗∗ , the dual of its dual space. In the language of categories, this embedding is a natural transformation between the identity functor and the double dual functor, both endofunctors operating on VK , the category of vector spaces over K. Turning to the details, let I, D : VK → VK

denote the identity and the dual functors, respectively. Recall that for a linear mapping L : U → V (a morphism in VK ), the dual homomorphism D[L] : V ∗ → U ∗ is defined by u ∈ U, α ∈ V ∗ .

D[L](α) : u 7→ α(Lu),

The double dual embedding is a natural transformation δ : I → D2 , that associates to every V ∈ VK a linear homomorphism δV ∈ Hom(V, V ∗∗ ) described by v ∈ V, α ∈ V ∗

δV (v) : α 7→ α(v),

To show that this transformation is natural, let L : U → V be a linear mapping. We must show that the following diagram commutes:

U

δU

U ∗∗ D 2 [L]

L

V

δV

V ∗∗

Let u ∈ U and α ∈ V ∗ be given. Following the arrows down and right we have that (δV ◦ L)(u) : α 7→ α(Lu). 1200

Following the arrows right, then down we have that (D[D[L]] ◦ δU )(u) : α 7→ (δU u)(D[L]α) = (D[L]α)(u) = α(Lu), as desired. Let us also note that for every non-zero v ∈ V , there exists an α ∈ V ∗ such that α(v) 6= 0. Hence δV (v) 6= 0, and hence δV is an embedding, i.e. it is one-to-one. If V is finite dimensional, then V ∗ has the same dimension as V . Consequently, for finite-dimensional V , the natural embedding δV is, in fact, an isomorphism. Version: 1 Owner: rmilson Author(s): rmilson

285.7

dual category

Let C be a category. The dual category C∗ of C is the category which has the same objects as C, but in which all morphisms are ”reversed”. That is to say if A, B are objects of C and we have a morphism f : A → B, then f ∗ : B → A is a morphism in C∗ . The dual category is sometimes called the opposite category and is denoted Cop . Version: 3 Owner: RevBobo Author(s): RevBobo

285.8

duality principle

Let Σ be any statement of the elementary theory of an abstract category. We form the dual of Σ as follows: 1. Replace each occurrence of ”domain” in Σ with ”codomain” and conversely. 2. Replace each occurrence of g ◦ f = h with f ◦ g = h Informally, these conditions state that the dual of a statement is formed by reversing arrows and compositions. For example, consider the following statements about a category C: • f :A→B • f is monic, i.e. for all morphisms g, h for which composition makes sense, f ◦ g = f ◦ h implies g = h.

1201

The respective dual statements are • f :B→A • f is epi, i.e. for all morphisms g, h for which composition makes sense, g ◦ f = h ◦ f implies g = h. The duality principle asserts that if a statement is a theorem, then the dual statment is also a theorem. We take ”theorem” here to mean provable from the axioms of the elementary theory of an abstract category. In practice, for a valid statement about a particular category C, the dual statement is valid in the dual category C∗ (Cop ). Version: 3 Owner: RevBobo Author(s): RevBobo

285.9

endofunctor

Given a category C, an endofunctor is a functor T : C → C. Version: 2 Owner: rmilson Author(s): NeuRet, Logan

285.10

examples of initial objects, terminal objects and zero objects

Examples of initial objects, terminal objects and zero objects of categories include: • The empty set is the unique initial object in the category of sets; every one-element set is a terminal object in this category; there are no zero objects. Similarly, the empty space is the unique initial object in the category of topological spaces; every one-point space is a terminal object in this category. • In the category of non-empty sets, there are no initial objects. The singletons are not initial: while every non-empty set admits a function from a singleton, this function is in general not unique. • In the category of pointed sets (whose objects are non-empty sets together with a distinguished point; a morphism from (A, a) to (B, b) is a function f : A → B with f (a) = b) every singleton serves as a zero object. Similarly, in the category of pointed topological spaces, every singleton is a zero object.

1202

• In the category of groups, any trivial group (consisting only of its identity element) is a zero object. The same is true for the category of abelian groups as well as for the category of modules over a fixed ring. This is the origin of the term ”zero object”. • In the category of rings with identity, the ring of integers (and any ring isomorphic to it) serves as an initial object. The trivial ring consisting only of a single element 0 = 1 is a terminal object. • In the category of schemes, the prime spectrum of the integers spec(Z) is a terminal object. The emtpy scheme (which is the prime spectrum of the trivial ring) is an initial object. • In the category of fields, there are no initial or terminal objects. • Any partially ordered set (P, ≤) can be interpreted as a category: the objects are the elements of P , and there is a single morphism from x to y if and only if x ≤ y. This category has an initial object if and only if P has a smallest element; it has a terminal object if and only if P has a largest element. This explains the terminology. • In the category of graphs, the null graph is an initial object. There are no terminal objects, unless we allow our graphs to have loops (edges starting and ending at the same vertex), in which case the one-point-one-loop graph is terminal. • Similarly, the category of all small categories with functors as morphisms has the empty category as initial object and the one-object-one-morphism category as terminal object. ˆ by taking the open sets as • Any topological space X can be viewed as a category X objects, and a single morphism between two open sets U and V if and only if U ⊂ V . The empty set is the initial object of this category, and X is the terminal object. • If X is a topological space and C is some small category, we can form the category of ˆ to C, using natural transformations as morphisms. all contravariant functors from X This category is called the category of presheaves on X with values in C. If C has an initial object c, then the constant functor which sends every open set to c is an initial object in the category of presheaves. Similarly, if C has a terminal object, then the corresponding constant functor serves as a terminal presheave. • If we fix a homomorphism f : A → B of abelian groups, we can consider the category C consisting of all pairs (X, φ) where X is an abelian group and φ : X → A is a 1203

group homomorphism with f φ = 0. A morphism from the pair (X, φ) to the pair (Y, ψ) is defined to be a group homomorphism r : X → Y with the property ψr = φ: X

φ

A

r

Y

f

B

ψ

The kernel of f is a terminal object in this category; this expresses the universal property of kernels. With an analogous construction, cokernels can be retrieved as initial objects of a suitable category. • The previous example can be generalized to arbitrary limits of functors: if F : I → C is a functor, we define a new category Fˆ as follows: its objects are pairs (X, (φi )) where X is an object of C and for every object i of I, φi : X → F (i) is a morphism in C such that for every morphism ρ : i → j in I, we have F (ρ)φi = φj . A morphism between pairs (X, (φi )) and (Y, (ψi )) is defined to be a morphism r : X → Y such that ψi r = φi for all objects i of I. The universal property of the limit can then be expressed as saying: any terminal object of Fˆ is a limit of F and vice versa (note that Fˆ need not contain a terminal object, just like F need not have a limit). Version: 11 Owner: AxelBoldt Author(s): AxelBoldt

285.11

forgetful functor

Let C and D be categories such that each object c of C can be regarded an object of D by suitably ignoring structures c may have as a C-object but not a D-object. A functor U : C → D which operates on objects of C by “forgetting” any imposed mathematical structure is called a forgetful functor. The following are examples of forgetful functors: 1. U : Grp → Set takes groups into their underlying sets and group homomorphisms to set maps. 2. U : Top → Set takes topological spaces into their underlying sets and continuous maps to set maps. 3. U : Ab → Grp takes abelian groups to groups and acts as identity on arrows. Forgetful functors are often instrumental in studying adjoint functors. Version: 1 Owner: RevBobo Author(s): RevBobo

1204

285.12

isomorphism

A morphism f : A −→ B in a category is an isomorphism if there exists a morphism f −1 : B −→ A which is its inverse. The objects A and B are isomorphic if there is an isomorphism between them. Examples: • In the category of sets and functions, a function f : A −→ B is an isomorphism if and only if it is bijective. • In the category of groups and group homomorphisms (or rings and ring homomorphisms), a homomorphism φ : G −→ H is an isomorphism if it has an inverse map φ−1 : H −→ G which is also a homomorphism. • In the category of vector spaces and linear transformations, a linear transformation is an isomorphism if and only if it is an invertible linear transformation. • In the category of topological spaces and continuous maps, a continuous map is an isomorphism if and only if it is a homeomorphism. Version: 2 Owner: djao Author(s): djao

285.13

natural transformation

Let A, B be categories and T, S : A → B functors. A natural transformation τ : S → T is a family of morphisms τ = {τA : T (A) → S(A)} such that for each object A of A, 0 τA : S(A) → T (A) is an object of B and for each morphism f : A → A in A the following diagram commutes: τA T (A) S(A) Sf 0

S(A )

Tf τA0

0

T (A )

Version: 6 Owner: RevBobo Author(s): RevBobo

285.14

types of homomorphisms

Often in a category of algebraic structures, those structures are generated by certain elements, and subject to certain relations. One often refers to functions between structures 1205

which are said to preserve those relations. These functions are typically called homomorphisms. An example is the category of groups. Suppose that f : A → B is a function between two groups. We say that f is a group homomorphism if: (a) the binary operator is preserved: f (a1 · a2 ) = f (a1 ) · f (a2 ) for all a1 , a2 ∈ A; (b) the identity element is preserved: f (eA ) = eB ; (c) inverses of elements are preserved: f (a−1 ) = [f (a)]−1 for all a ∈ A. One can define similar natural concepts of homomorphisms for other algebraic structures, giving us ring homomorphisms, module homomorphisms, and a host of others. We give special names to homomorphisms when their functions have interesting properties. If a homomorphism is an injective function (i.e. one-to-one), then we say that it is a monomorphism. These are typically monic in their category. If a homomorphism is an surjective function (i.e. onto), then we say that it is an epimorphism. These are typically epic in their category. If a homomorphism is an bijective function (i.e. both one-to-one and onto), then we say that it is an isomorphism. If the domain of a homomorphism is the same as its codomain (e.g. a homomorphism f : A → A), then we say that it is an endomorphism. We often denote the collection of endomorphisms on A as End(A). If a homomorphism is both an endomorphism and an isomorphism, then we say that it is an automorphism. We often denote the collection of automorphisms on A as Aut(A). Version: 4 Owner: antizeus Author(s): antizeus

285.15

zero object

An initial object in a category C is an object A in C such that, for every object X in C, there is exactly one morphism A −→ X. A terminal object in a category C is an object B in C such that, for every object X in C, there is exactly one morphism X −→ B. A zero object in a category C is an object 0 that is both an initial object and a terminal object.

1206

All initial objects (respectively, terminal objects, and zero objects), if they exist, are isomorphic in C. Version: 2 Owner: djao Author(s): djao

1207

Chapter 286 18A22 – Special properties of functors (faithful, full, etc.) 286.1

exact functor

A covariant functor F is said to be left exact if whenever β

α

0 → A −→ B −→ C is an exact sequence, then Fβ



0 → F A −→ F B −→ F C is also an exact sequence. A covariant functor F is said to be right exact if whenever β

α

A −→ B −→ C → 0 is an exact sequence, then Fβ



F A −→ F B −→ F C → 0 is also an exact sequence. A contravariant functor F is said to be left exact if whenever β

α

A −→ B −→ C → 0 is an exact sequence, then Fβ



0 → F C −→ F B −→ F A is also an exact sequence.

1208

A contravariant functor F is said to be right exact if whenever β

α

0 → A −→ B −→ C is an exact sequence, then Fβ



F C −→ F B −→ F A → 0 is also an exact sequence. A (covariant or contravariant) functor is said to be exact if it is both left exact and right exact. Version: 3 Owner: antizeus Author(s): antizeus

1209

Chapter 287 18A25 – Functor categories, comma categories 287.1

Yoneda embedding

ˆ for the category of contravariant functors from C to Sets, the If C is a category, write C ˆ are natural transformations of functors. category of sets. The morphisms in C (To avoid set theoretical concerns, one can take a universe U and take all categories to be U-small.) For any object X of C, there is the functor hX = Hom(−, X). Then X 7→ hX is a covariant ˆ which embeds C faithfully as a full subcategory of C. ˆ functor C → C, Version: 4 Owner: nerdy2 Author(s): nerdy2

1210

Chapter 288 18A30 – Limits and colimits (products, sums, directed limits, pushouts, fiber products, equalizers, kernels, ends and coends, etc.) 288.1

categorical direct product

Let {Ci}i∈IQbe a set of objects in a category Q C. A direct product of the collection {Ci }i∈I is an object i∈I Ci of C, with morphisms πi : j∈I Cj −→ Ci for each i ∈ I, such that:

For every object A in C, and any collection of morphisms fi : A −→ Ci for every i ∈ I, there Q exists a unique morphism f : A −→ i∈I Ci making the following diagram commute for all i ∈ I. fi Ci A f

Q

j∈I

Version: 4 Owner: djao Author(s): djao

288.2

πi

Cj

categorical direct sum

Let {C` A direct sum of the collection {Ci }i∈I is an i }i∈I be a set of objects in a category C. ` object i∈I Ci of C, with morphisms ιi : Ci −→ j∈I Cj for each i ∈ I, such that: 1211

For every object A in C, and any ` collection of morphisms fi : Ci −→ A for every i ∈ I, there exists a unique morphism f : i∈I Ci −→ A making the following diagram commute for all i ∈ I. fi Ci A ιi

`

f

j∈I Cj

Version: 4 Owner: djao Author(s): djao

288.3

kernel

Let f : X → Y be a function and let Y be have some sort of zero, neutral or null element that we’ll denote as e. (Examples are groups, vector spaces, modules, etc) The kernel of f is the set: ker f = {x ∈ X : f (x) = e} that is, the set of elements in X such that their image is e. This set can also denoted as f −1 (e) (that doesn’t mean f has an inverse function, it’s just notation) and that is read as ”the kernel is the preimage of the neutral element”. Let’s see an examples. If X = Z and Y = Z6 , the function f that sends each integer n to its residue class modulo 6. So f (4) = 4, f (20) = 2, f (−5) = 1. The kernel of f consist precisely of the multiples of 6 (since they have residue 0, we have f (6k) = 0). This is also an example of kernel of a group homomorphism, and since the sets are also rings, the function f is also a homomorphism between rings and the kernel is also the kernel of a ring homomorphism. Usually we are interested on sets with certain algebraic structure. In particular, the following theorem holds for maps between pairs of vector spaces, groups, rings and fields (and some other algebraic structures): A map f : X → Y is injective if and only if ker f = {0} (the zero of Y ). Version: 4 Owner: drini Author(s): drini

1212

Chapter 289 18A40 – Adjoint functors (universal constructions, reflective subcategories, Kan extensions, etc.) 289.1

adjoint functor

Let C, D be categories and T : C → D, S : D → C be covariant functors. T is said to be a left adjoint functor to S (equivalently, S is a right adjoint functor to T ) if there exists ν = νC,D such that ν : Hom(T (C), D) ∼ = Hom(C, S(D)) D

C

is a natural bijection of hom-sets for all objects C of C and D of D. An adjoint to any functor are unique up to natural transformation. Examples: 1. Let U : Top → Set be the forgetful functor (i.e. U takes topological spaces to their underlying sets, and continuous maps to set functions). Then U is right adjoint to the functor F : Set → Top which gives each set the discrete topology. 2. If U : Grp → Set is again the forgetful functor, this time on the category of groups, the functor F : Set → Grp which takes a set A to the free group generated by A is left adjoint to U. 3. If UN : R − mod → R − mod is the functor M 7→ N ⊗ M for an R-module N, then UN is the left adjoint to the functor FN : R−mod → R−mod given by L 7→ HomR (N, L). 1213

Version: 8 Owner: bwebste Author(s): bwebste, RevBobo

289.2

equivalence of categories

Let C and D be two categories with functors F : C → D and G : D → C. The functors F and G are an Definition 17. equivalence of categories if there are natural isomorphisms F G ∼ = idD and ∼ GF = idC . Note, F is left adjoint to G, and G is right adjoint to F as G

Hom(F (c), d) −→ Hom(GF (c), G(d)) ←→ Hom(c, G(d)). D

C

C

And, F is right adjoint to G, and G is left adjoint to F as F

Hom(G(d), c) −→ Hom(F G(d), F (c)) ←→ Hom(d, F (c)). C

D

D

In practical terms, two categories are equivalent if there is a fully faithful functor F : C → D, such that every object d ∈ D is isomorphic to an object F (c), for some c ∈ C. Version: 2 Owner: mhale Author(s): mhale

1214

Chapter 290 18B40 – Groupoids, semigroupoids, semigroups, groups (viewed as categories) 290.1

groupoid (category theoretic)

A groupoid, also known as a virtual group, is a small category where every morphism is invertible. There is also a group-theoretic concept with the same name. Version: 6 Owner: akrowne Author(s): akrowne

1215

Chapter 291 18E10 – Exact categories, abelian categories 291.1

abelian category

An abelian category is a category A satisfying the following axioms. Because the later axioms rely on terms whose definitions involve the earlier axioms, we will intersperse the statements of the axioms with such auxiliary definitions as needed. Axiom 1. For any two objects A, B in A, the set of morphisms Hom(A, B) is an abelian group. The identity element in the group Hom(·, ·) will be denoted by 0, and the group operation by +. Axiom 2. Composition of morphisms distributes over addition in Hom(·, ·). That is, given any diagram of morphisms A

f

g1

B

g2

C

h

D

we have (g1 + g2 )f = g1 f + g2 f and h(g1 + g2 ) = hg1 + hg2 . Axiom 3. A has a zero object. Axiom 4. For any two objects A, B in A, the categorical direct product A × B exists in A. Given a morphism f : A −→ B in A, a kernel of f is a morphism i : X −→ A such that: • f i = 0. • For any other morphism j : X 0 −→ A such that f j = 0, there exists a unique morphism 1216

j 0 : X 0 −→ X such that the diagram X0 j0

X

j i

A

f

B

commutes. Likewise, a cokernel of f is a morphism p : B −→ Y such that: • pf = 0. • For any other morphism j : B −→ Y 0 such that jf = 0, there exists a unique morphism j 0 : Y −→ Y 0 such that the diagram A

f

B j

Y

p

Y j0

0

commutes. Axiom 5. Every morphism in A has a kernel and a cokernel. The kernel and cokernel of a morphism f in A will be denoted ker (f ) and cok(f ), respectively. A morphism f : A −→ B in A is called a monomorphism if, for every morphism g : X −→ A such that f g = 0, we have g = 0. Similarly, the morphism f is called an epimorphism if, for every morphism h : B −→ Y such that hf = 0, we have h = 0. Axiom 6. ker (cok(f )) = f for every monomorphism f in A. Axiom 7. cok(ker (f )) = f for every epimorphism f in A. Version: 6 Owner: djao Author(s): djao

291.2

exact sequence

Let A be an abelian category. We begin with a preliminary definition. Definition 1. For any morphism f : A −→ B in A, let m : X −→ B be the morphism equal to ker (cok(f )). Then the object X is called the image of f , and denoted Im(f ). The morphism m is called the image morphism of f , and denoted i(f ). 1217

Note that Im(f ) is not the same as i(f ): the former is an object of A, while the latter is a morphism of A. We note that f factors through i(f ): A

e

i(f )

Im(f )

B

f

The proof is as follows: by definition of cokernel, cok(f )f = 0; therefore by definition of kernel, the morphism f factors through ker (cok(f )) = i(f ), and this factor is the morphism e above. Furthermore m is a monomorphism and e is an epimorphism, although we do not prove these facts. Definition 2. A sequence f

···A B of morphisms in A is exact at B if ker (g) = i(f ).

g

C ···

Version: 3 Owner: djao Author(s): djao

291.3

derived category

Let A be an abelian category, and let K(A) be the category of chain complexes in A, with morphisms chain homotopy classes of maps. Call a morphism of chain complexes a quasiisomorphism if it induces an isomorphism on homology groups of the complexes. For example, any chain homotopy is an quasi-isomorphism, but not conversely. Now let the derived category D(A) be the category obtained from K(A) by adding a formal inverse to every quasi-isomorphism (technically this called a localization of the category). Derived categories seem somewhat obscure, but in fact, many mathematicians believe they are the appropriate place to do homological algebra. One of their great advantages is that the important functors of homological algebra which are left or right exact (Hom,N⊗k , where N is a fixed k-module, the global section functor Γ, etc.) become exact on the level of derived functors (with an appropriately modified definition of exact). See Methods of Homological Algebra, by Gelfand and Manin for more details. Version: 2 Owner: bwebste Author(s): bwebste

291.4

enough injectives

An abelian category is said to have enough injectives if for every object X, there is a monomorphism 0 → X → I where I is an injective object. Version: 2 Owner: bwebste Author(s): bwebste 1218

Chapter 292 18F20 – Presheaves and sheaves 292.1

locally ringed space

292.1.1

Definitions

A locally ringed space is a topological space X together with a sheaf of rings OX with the property that, for every point p ∈ X, the stalk (OX )p is a local ring 1 . A morphism of locally ringed spaces from (X, OX ) to (Y, OY ) is a continuous map f : X −→ Y together with a morphism of sheaves φ : OY −→ OX with respect to f such that, for every point p ∈ X, the induced ring homomorphism on stalks φp : (OY )f (p) −→ (OX )p is a local homomorphism. That is, φp (y) ∈ mp for every y ∈ mf (p) , where mp (respectively, mf (p) ) is the maximal ideal of the ring (OX )p (respectively, (OY )f (p) ).

292.1.2

Applications

Locally ringed spaces are encountered in many natural contexts. Basically, every sheaf on the topological space X consisting of continuous functions with values in some field is a locally ringed space. Indeed, any such function which is not zero at a point p ∈ X is nonzero and thus invertible in some neighborhood of p, which implies that the only maximal ideal of the stalk at p is the set of germs of functions which vanish at p. The utility of this definition lies in the fact that one can then form constructions in familiar instances of locally ringed spaces which readily generalize in ways that would not necessarily be obvious without this framework. For example, given a manifold X and its locally ringed space DX of real–valued differentiable functions, one can show that the space of all tangent vectors to 1

All rings mentioned in this article are required to be commutative.

1219

X at p is naturally isomorphic to the real vector space (mp /m2p )∗ , where the ∗ indicates the dual vector space. We then see that, in general, for any locally ringed space X, the space of tangent vectors at p should be defined as the k–vector space (mp /m2p )∗ , where k is the residue field (OX )p /mp and ∗ denotes dual with respect to k as before. It turns out that this definition is the correct definition even in esoteric contexts like algebraic geometry over finite fields which at first sight lack the differential structure needed for constructions such as tangent vector. Another useful application of locally ringed spaces is in the construction of schemes. The forgetful functor assigning to each locally ringed space (X, OX ) the ring OX (X) is adjoint to the ”prime spectrum” functor taking each ring R to its prime spectrum Spec(R), and this correspondence is essentially why the category of locally ringed spaces is the proper building block to use in the formulation of the notion of scheme. Version: 9 Owner: djao Author(s): djao

292.2

presheaf

For a topological space X a presheaf F with values in a category C associates to each open set U ⊂ X, an object F (U) of C and to each inclusion U ⊂ V a morphism of C, ρU V : F (V ) → F (U), the restriction morphism. It is required that ρU U = 1F (U ) and ρU W = ρU V ◦ ρV W for any U ⊂ V ⊂ W . A presheaf with values in the category of sets (or abelian groups) is called a presheaf of sets (or abelian groups). If no target category is specified, either the category of sets or abelian groups is most likely understood. A more categorical way to state it is as follows. For X form the category Top(X) whose objects are open sets of X and whose morphisms are the inclusions. Then a presheaf is merely a contravariant functor Top(X) → C. Version: 2 Owner: nerdy2 Author(s): nerdy2

292.3

sheaf

292.3.1

Presheaves

Let X be a topological space and let A be a category. A presheaf on X with values in A is a contravariant functor F from the category of open sets in X and inclusion morphisms to the category A. As this definition may be less than helpful to many readers, we offer the following equivalent 1220

(but longer) definition. A presheaf F on X consists of the following data: 1. An object F (U) in A, for each open set U ⊂ X 2. A morphism resV,U : F (V ) −→ F (U) for each pair of open sets U ⊂ V in X (called the restriction morphism), such that: (a) For every open set U ⊂ X, the morphism resU,U is the identity morphism.

(b) For any open sets U ⊂ V ⊂ W in X, the diagram resW,U

F (W ) resW,V F (V ) resV,U F (U) commutes. If the object F (U) of A is a set, its elements are called sections of U.

292.3.2

Morphisms of Presheaves

Let f : X −→ Y be a continuous map of topological spaces. Suppose FX is a presheaf on X, and GY is a presheaf on Y (with FX and GY both having values in A). We define a morphism of presheaves φ from GY to FX , relative to f , to be a collection of morphisms φU : GY (U) −→ FX (f −1 (U)) in A, one for every open set U ⊂ Y , such that the diagram GY (V )

φV

FX (f −1 (V )) resf −1 (V ),f −1 (U )

resV,U

GY (U)

φU

FX (f −1 (U))

commutes, for each pair of open sets U ⊂ V in Y . In the special case that f is the identity map id : X −→ X, we omit mention of the map f , and speak of φ as simply a morphism of presheaves on X. Form the category whose objects are presheaves on X and whose morphisms are morphisms of presheaves on X. Then an isomorphism of presheaves φ on X is a morphism of presheaves on X which is an isomorphism in this category; that is, there exists a morphism φ−1 whose composition with φ both ways is the identity morphism. More generally, if f : X −→ Y is any homeomorphism of topological spaces, a morphism of presheaves φ relative to f is an isomorphism if it admits a two–sided inverse morphism of presheaves φ−1 relative to f −1 .

1221

292.3.3

Sheaves

We now assume that the category A is a concrete category. A sheaf is a presheaf F on X, with values in A, such that for every open set U ⊂ X, and every open cover {Ui } of U, the following two conditions hold: 1. Any two elements f1 , f2 ∈ F (U) which have identical restrictions to each Ui are equal. That is, if resU,Ui f1 = resU,Ui f2 for every i, then f1 = f2 . 2. Any collection of elements fi ∈ F (Ui ) that have common restrictions can be realized as the collective restrictions of a single element of F (U). That is, if resUi ,Ui T Uj fi = resUj ,Ui T Uj fj for every i and j, then there exists an element f ∈ F (U) such that resU,Ui f = fi for all i.

292.3.4

Sheaves in abelian categories

If A is a concrete abelian category, then a presheaf F is a sheaf if and only if for every open subset U of X, the sequence 0

F (U)

incl

Q

diff

i F (Ui )

Q

i,j F (Ui

T

Uj )

(292.3.1)

is an exact sequence of morphisms in A for every open cover {Ui } of U in X. This diagram requires some explanation, because we owe the reader a definition of the morphisms incl and diff. We start with incl (short for “inclusion”). The restriction morphisms F (U) −→ F (Ui ) induce a morphism Y F (U) −→ F (Ui ) i

Q

to the categorical direct product i F (Ui ), which we define to be incl. The map diff (called “difference”) is defined as follows. For each Ui , form the morphism Y \ αi : F (Ui ) −→ F (Ui Uj ). j

By the universal properties of categorical direct product, there exists a unique morphism Y YY \ α: F (Ui ) −→ F (Ui Uj ) i

i

j

such that πi α = αi πi for all i, where πi is projection onto the ith factor. In a similar manner, form the morphism Y YY \ β: F (Uj ) −→ F (Ui Uj ). j

j

1222

i

Then α and β are both elements of the set Hom

Y i

F (Ui ),

Y i,j

F (Ui

\

!

Uj ) ,

which is an abelian group since A is an abelian category. Take the difference α − β in this group, and define this morphism to be diff. Note that exactness of the sequence (291.3.1) is an element free condition, and therefore makes sense for any abelian category A, even if A is not concrete. Accordingly, for any abelian category A, we define a sheaf to be a presheaf F for which the sequence (291.3.1) is always exact.

292.3.5

Examples

It’s high time that we give some examples of sheaves and presheaves. We begin with some of the standard ones. Example 9. If F is a presheaf on X, and U ⊂ X is an open subset, then one can define a presheaf F |U on U by restricting the functor F to the subcategory of open sets of X in U and inclusion morphisms. In other words, for open subsets of U, define F |U to be exactly what F was, and ignore open subsets of X that are not open subsets of U. The resulting presheaf is called, for obvious reasons, the restriction presheaf of F to U, or the restriction sheaf if F was a sheaf to begin with. Example 10. For any topological space X, let cX be the presheaf on X, with values in the category of rings, given by • cX (U) := the ring of continuous real–valued functions U −→ R, • resV,U f := the restriction of f to U, for every element f : V −→ R of cX (V ) and every subset U of V . Then cX is actually a sheaf of rings, because continuous functions are uniquely specified by their values on an open cover. The sheaf cX is called the sheaf of continuous real–valued functions on X. Example 11. Let X be a smooth differentiable manifold. Let DX be the presheaf on X, with values in the category of real vector spaces, defined by setting DX (U) to be the space of smooth real–valued functions on U, for each open set U, and with the restriction morphism given by restriction of functions as before. Then DX is a sheaf as well, called the sheaf of smooth real–valued functions on X. Much more surprising is that the construct DX can actually be used to define the concept of smooth manifold! That is, one can define a smooth manifold to be a locally Euclidean n–dimensional second countable topological space X, together with a sheaf F , such that there exists an open cover {Ui } of X where: 1223

For every i, there exists a homeomorphism fi : Ui −→ Rn and an isomorphism of sheaves φi : DRn −→ F |Ui relative to fi . The idea here is that not only does every smooth manifold X have a sheaf DX of smooth functions, but specifying this sheaf of smooth functions is sufficient to fully describe the smooth manifold structure on X. While this phenomenon may seem little more than a toy curiousity for differential geometry, it arises in full force in the field of algebraic geometry where the coordinate functions are often unwieldy and algebraic structures in many cases can only be satisfactorily described by way of sheaves and schemes. Example 12. Similarly, for a complex analytic manifold X, one can form the sheaf HX of holomorphic functions by setting HX (U) equal to the complex vector space of C–valued holomorphic functions on U, with the restriction morphism being restriction of functions as before. Example 13. The algebraic geometry analogue of the sheaf DX of differential geometry is the prime spectrum Spec(R) of a commutative ring R. However, the construction of the sheaf Spec(R) is beyond the scope of this discussion and merits a separate article. Example 14. For an example of a presheaf that is not a sheaf, consider the presheaf F on X, with values in the category of real vector spaces, whose sections on U are locally constant real–valued functions on U modulo constant functions on U. Then every section f ∈ F (U) is locally zero in some fine enough open cover {Ui } (it is enough to take a cover where each Ui is connected), whereas f may be nonzero if U is not connected. We conclude with some interesting examples of morphisms of sheaves, chosen to illustrate the unifying power of the language of schemes across various diverse branches of mathematics. 1. For any continuous function f : X −→ Y , the map φU : cY (U) −→ cX (f −1 (U)) given by φU (g) := gf defines a morphisms of sheaves from cY to cX with respect to f . 2. For any continuous function f : X −→ Y of smooth differentiable manifolds, the map given by φU (g) := gf has the property g ∈ DY (U) → φU (g) ∈ DX (f −1 (U))

if and only if f is a smooth function.

3. For any continuous function f : X −→ Y of complex analytic manifolds, the map given by φU (g) := gf has the property g ∈ HY (U) → φU (g) ∈ HX (f −1 (U))

if and only if f is a holomorphic function.

4. For any Zariski continuous function f : X −→ Y of algebraic varieties over a field k, the map given by φU (g) := gf has the property g ∈ OY (U) → φU (g) ∈ OX (f −1 (U))

if and only if f is a regular function. Here OX denotes the sheaf of k–valued regular functions on the algebraic variety X. 1224

REFERENCES 1. David Mumford, The Red Book of Varieties and Schemes, Second Expanded Edition, Springer– Verlag, 1999 (LNM 1358). 2. Charles Weibel, An Introduction to Homological Algebra, Cambridge University Press, 1994.

Version: 9 Owner: djao Author(s): djao

292.4

sheafification

Let F be a presheaf over a topological space X with values in a category A for which sheaves are defined. The sheafification of F , if it exists, is a sheaf F 0 over X together with a morphism θ : F −→ F 0 satisfying the following universal property: For any sheaf G over X and any morphism of presheaves φ : F −→ G over X, there exists a unique morphism of sheaves ψ : F 0 −→ G such that the diagram F

θ

F0

ψ

G

φ

commutes. In light of the universal property, the sheafification of F is uniquely defined up to canonical isomorphism whenever it exists. In the case where A is a concrete category (one consisting of sets and set functions), the sheafification ofSany presheaf F can be constructed by taking F 0 (U) to be the set of all functions s : U −→ p∈U Fp such that 1. s(p) ∈ Fp for all p ∈ U

2. For all p ∈ U, there is a neighborhood V ⊂ U of p and a section t ∈ F (V ) such that, for all q ∈ V , the induced element tq ∈ Fq equals s(q) for all open sets U ⊂ X. Here Fp denotes the stalk of the presheaf F at the point p. The following quote, taken from [1], is perhaps the best explanation of sheafification to be found anywhere: F 0 is ”the best possible sheaf you can get from F ”. It is easy to imagine how to get it: first identify things which have the same restrictions, and then add in all the things which can be patched together. 1225

REFERENCES 1. David Mumford, The Red Book of Varieties and Schemes, Second Expanded Edition, Springer– Verlag, 1999 (LNM 1358)

Version: 4 Owner: djao Author(s): djao

292.5

stalk

Let F be a presheaf over a topological space X with values in an abelian category A, and suppose direct limits exist in A. For any point p ∈ X, the stalk Fp of F at p is defined to be the object in A which is the direct limit of the objects F (U) over the directed set of all open sets U ⊂ X containing p, with respect to the restriction morphisms of F . In other words, Fp := lim F (U) −→ U 3p

If A is a category consisting of sets, the stalk Fp can be viewed as the set of all germs of sections of F at the point p. That is, the set Fp consists of all the equivalence classes of ordered pairs (U, s) where p ∈ U and s T ∈ F (U), under the equivalence relation (U, s) ∼ (V, t) if there exists a neighborhood W ⊂ U V of p such that resU,W s = resV,W t.

By universal properties of direct limit, a morphism φ : F −→ G of presheaves over X induces a morphism φp : Fp −→ Gp on each stalk Fp of F . Stalks are most useful in the context of sheaves, since they encapsulate all of the local data of the sheaf at the point p (recall that sheaves are basically defined as presheaves which have the property of being completely characterized by their local behavior). Indeed, in many of the standard examples of sheaves that take values in rings (such as the sheaf DX of smooth functions, or the sheaf OX of regular functions), the ring Fp is a local ring, and much of geometry is devoted to the study of sheaves whose stalks are local rings (so-called ”locally ringed spaces”). We mention here a few illustrations of how stalks accurately reflect the local behavior of a sheaf; all of these are drawn from [1]. • A morphism of sheaves φ : F −→ G over X is an isomorphism if and only if the induced morphism φp is an isomorphism on each stalk. • A sequence F −→ G −→ H of morphisms of sheaves over X is an exact sequence at G if and only if the induced morphism Fp −→ Gp −→ Hp is exact at each stalk Gp . • The sheafification F 0 of a presheaf F has stalk equal to Fp at every point p.

1226

REFERENCES 1. Robin Hartshorne, Algebraic Geometry, Springer–Verlag New York Inc., 1977 (GTM 52).

Version: 4 Owner: djao Author(s): djao

1227

Chapter 293 18F30 – Grothendieck groups 293.1

Grothendieck group

Let S be an Abelian semigroup. The Grothendieck group of S is K(S) = S × S/∼, where ∼ is the equivalence relation: (s, t) ∼ (u, v) if there exists r ∈ S such that s + v + r = t + u + r. This is indeed an abelian group with zero element (s, s) (any s ∈ S) and inverse −(s, t) = (t, s). The Grothendieck group construction is a functor from the category of abelian semigroups to the category of abelian groups. A morphism f : S → T induces a morphism K(f ) : K(S) → K(T ). Example 15. K(N) = Z.

Example 16. Let G be an abelian group, then K(G) ∼ = G via (g, h) ↔ g − h. Let C be a symmetric monoidal category. Its Grothendieck group is K([C]), i.e. the Grothendieck group of the isomorphism classes of objects of C. Version: 2 Owner: mhale Author(s): mhale

1228

Chapter 294 18G10 – Resolutions; derived functors 294.1

derived functor

There are two objects called derived functors. First, there are classical derived functors. Let A, B be abelian categories, and F : A → B be a covariant left-exact functor. Note that a completely analogous construction can be done for right-exact and contravariant functors, but it is traditional to only describe one case, as doing the other mostly consists of reversing arrows. Given an object A ∈ A, we can construct an injective resolution: A→A:0

I1

A

I2

···

which is unique up to chain homotopy equivalence. Then we apply the functor F to the injectives in the resolution to to get a complex F (A) : 0

F (I 1 )

F (I 2 )

···

(notice that the term involving A has been left out. This is not an accident, in fact, it is crucial). This complex also is independent of the choice of I’s (up to chain homotopy equivalence). Now, we define the classical right derived functors Ri F (A) to be the cohomology groups H i (F (A)). These only depend on A. Important properties of the classical derived functors are these: If the sequence 0 → A → A0 → A00 → 0 is exact, then there is a long exact sequence 0

F (A)

F (A0 )

F (A00 )

R1 F (A)

···

which is natural (a morphism of short exact sequences induces a morphism of long exact sequences). This, along with a couple of other properties determine the derived functors completely, giving an axiomatic definition, though the construction used above is usually necessary to show existence. From the definition, one can see immediately that the following are equivalent: 1229

1. F is exact 2. Rn F (A) = 0 for n > 1 and all A ∈ A. 3. R1 F (A) = 0 for all A ∈ A. However, R1 F (A) = 0 for a particular A does not imply that Rn F (A) = 0 for all n > 1. Important examples are Extn , the derived functor of Hom, Torn , the derived functor of tensor product, and sheaf cohomology, the derived functor of the global section functor on sheaves. (Coming soon: the derived categoies definition) Version: 4 Owner: bwebste Author(s): bwebste

1230

Chapter 295 18G15 – Ext and Tor, generalizations, K¨ unneth formula 295.1

Ext

For a ring R, and R-module A, we have a covariant functor HomA − R. ExtnR (A, −) are defined to be the right derived functors of HomA − R (ExtnR (A, −) = Rn HomA − R). Ext gets its name from the following fact: There is a natural bijection between elements of Ext1R (A, B) and extensions of B by A up to isomorphism of short exact sequences, where an extension of B by A is an exact sequence 0→B→C→A→0 . For example,

Ext1Z (Z/nZ, Z) ∼ = Z/nZ

, with 0 corresponding to the trivial extension 0 → Z → Z ⊕ Z/nZ → 0, and m 6= 0 corresponding to n m Z/nZ 0. 0 Z Z Version: 3 Owner: bwebste Author(s): bwebste

1231

Chapter 296 18G30 – Simplicial sets, simplicial objects (in a category) 296.1

nerve

The Definition 18. nerve of a category C is the simplicial set Hom(i(−), C), where i : ∆ → Cat is the fully faithful functor that takes each ordered set [n] in the simplicial category, ∆, to op the pre-order n + 1. The nerve is a functor Cat → Set∆ . Version: 1 Owner: mhale Author(s): mhale

296.2

simplicial category

The simplicial category ∆ is defined as the small category whose objects are the totally ordered finite sets [n] = {0 < 1 < 2 < . . . < n}, n > 0, (296.2.1) and whose morphisms are monotonic non-decreasing (order-preserving) maps. It is generated by two families of morphisms: δin : [n − 1] → [n] is the injection missing i ∈ [n], σin : [n + 1] → [n] is the surjection such that σin (i) = σin (i + 1) = i ∈ [n]. The δin morphisms are called Definition 19. face maps, and the σin morphisms are called

1232

Definition 20. degeneracy maps. They satisfy the following relations, n δjn+1 δin = δin+1 δj−1

for i < j,

n σjn−1 σin = σin−1 σj+1 for i 6 j,  n n−1 if i < j,  δi σj−1 n n+1 idn if i = j or i = j + 1, σj δi =  n n−1 δi−1 σj if i > j + 1.

(296.2.2) (296.2.3) (296.2.4)

All morphisms [n] → [0] factor through σ00 , so [0] is terminal. There is a bifunctor + : ∆ × ∆ → ∆ defined by

[m] + [n] = [m + n + 1],  f (i) if 0 6 i 6 m, (f + g)(i) = g(i − m − 1) + m0 + 1 if m < i 6 (m + n + 1),

(296.2.5) (296.2.6)

where f : [m] → [m0 ] and g : [n] → [n0 ]. Sometimes, the simplicial category is defined to include the empty set [−1] = ∅, which provides an initial object for the category. This makes ∆ a strict monoidal category as ∅ is a unit for the bifunctor: ∅ + [n] = [n] = [n] + ∅ and id∅ + f = f = f + id∅ . Further, ∆ is then the free monoidal category on a monoid object (the monoid object being [0], with product σ00 : [0] + [0] → [0]). There is a fully faithful functor from ∆ to Top, which sends each object [n] to an oriented nsimplex. The face maps then embed an (n − 1)-simplex in an n-simplex, and the degeneracy maps collapse an (n + 1)-simplex to an n-simplex. The bifunctor forms a simplex from the disjoint union of two simplicies by joining their vertices together in a way compatible with their orientations. There is also a fully faithful functor from ∆ to Cat, which sends each object [n] to a pre-order n + 1. The pre-order n is the category consisting of n partially-ordered objects, with one morphism a → b iff a 6 b. Version: 4 Owner: mhale Author(s): mhale

296.3

simplicial object

A Definition 21. simplicial object in a category C is a contravariant functor from the simplicial category ∆ to C. Such a functor X is uniquely specified by the morphisms X(δin ) : [n] → [n − 1] and

1233

X(σin ) : [n] → [n + 1], which satisfy n−1 X(δin−1 ) X(δjn ) = X(δj−1 ) X(δin ) for i < j, n+1 X(σin+1 ) X(σjn ) = X(σj+1 ) X(σin ) for  n−1  X(σj−1 ) X(δin ) idn X(δin+1 ) X(σjn ) =  n X(σjn−1 ) X(δi−1 )

(296.3.1)

i 6 j,

(296.3.2)

if i < j, if i = j or i = j + 1, if i > j + 1.

(296.3.3)

In particular, a

Definition 22. simplicial set is a simplicial object in Set. Equivalently, one could say that a simplicial set is a presheaf on ∆. The object X([n]) of a simplicial set is a set of n-simplices, and is called the n-skeleton. Version: 2 Owner: mhale Author(s): mhale

1234

Chapter 297 18G35 – Chain complexes 297.1

5-lemma

If Ai , Bi for i = 1, . . . , 5 are objects in an abelian category (for example, modules over a ring R) such that there is a commutative diagram A1 γ1

B1

A2 γ2

B2

A3 γ3

B3

A4 γ4

B4

A5 γ5

B5

with the rows exact, and γ1 is surjective, γ5 is injective, and γ2 and γ4 are isomorphisms, then γ3 is an isomorphism as well. Version: 2 Owner: bwebste Author(s): bwebste

1235

297.2

9-lemma

If Ai , Bi , Ci , for i = 1, 2, 3 are objects of an abelian category such that there is a commutative diagram 0

0

0

0

A1

B1

C1

0

0

A2

B2

C2

0

0

A3

B3

C3

0

0

0

0

with the columns and bottom two rows are exact, then the top row is exact as well. Version: 2 Owner: bwebste Author(s): bwebste

297.3

Snake lemma

There are two versions of the snake lemma: (1) Given a commutative (1) diagram as below, with exact (1) rows 0 −−−→ A1 −−−→ B1 −−−→ C1 −−−→ 0       γ αy βy y 0 −−−→ A2 −−−→ B2 −−−→ C2 −−−→ 0 there is an exact sequence

0 → ker α → ker β → ker γ → coker α → coker β → coker γ → 0 where ker denotes the kernel of a map and coker its cokernel. (2) Applying this result inductively to a short exact (2) sequence of (2) chain complexes, we obtain the following: Let A, B, C be chain complexes, and let 0→A→B→C→0 be a short exact sequence. Then there is a long exact sequence of homology groups · · · → Hn (A) → Hn (B) → Hn (C) → Hn−1 (A) → · · · Version: 5 Owner: bwebste Author(s): bwebste 1236

297.4

chain homotopy 0

0

0

0

Let (A, d) and (A , d ) be chain complexes and f : A → A , g : A → A be chain maps. A 0 chain homotopy D between f and g is a sequence of homomorphisms {Dn : An → An+1 } 0 so that dn+1 ◦ Dn + Dn−1 ◦ dn = fn − gn for each n. Thus, we have a commutative diagram: An+1

dn+1

dn

An

An−1

Dn−1

fn+1 −gn+1

fn−1 −gn−1

Dn

0

An+1

0

0

An

0 dn+1

An−1

0 dn

Version: 4 Owner: RevBobo Author(s): RevBobo

297.5

chain map 0

0

0

Let (A, d) and (A , d ) be chain complexes. A chain map f : A → A is a sequence of 0 homomorphisms {fn } such that dn ◦ fn = fn−1 ◦ dn for each n. Diagramatically, this says that the following diagram commutes: An

dn

An−1 fn−1

fn 0

0

An

dn

0

An−1

Version: 3 Owner: RevBobo Author(s): RevBobo

297.6

homology (chain complex)

If (A, d) is a chain complex dn−1

dn+1

d

dn+2

· · · ←−−− An−1 ←−n−− An ←−−− An+1 ←−−− · · · then the n-th homology group Hn (A, d) (or module) of the chain complex A is the quotient Hn (A, d) = Version: 2 Owner: bwebste Author(s): bwebste 1237

ker dn . i dn+1

Chapter 298 18G40 – Spectral sequences, hypercohomology 298.1

spectral sequence

A spectral sequence is a collection of R-modules (or more generally, objects of an abelian category) r r r → Ep−r,q+r−1 such that is a {Ep,q } for all r ∈ N, p, q ∈ Z, equipped with maps drpq : Ep,q r+1 chain complex, and the E ’s are its homology, that is, r+1 ∼ Ep,q = ker(drp,q )/im(drp+r,q−r+1).

(Note: what I have defined above is a homology spectral sequence. Cohomology spectral sequences are identical, except that all the arrows go in the other direction.) r Most interesting spectral sequences are upper right quadrant, meaning that Ep,q = 0 if p or q < 0. If this is the case then for any p, q, both drpq and drp+r,q−r+1 are 0 for sufficiently large r since the target or source is out of the upper right quadrant, so that for all r > r0 r r+1 ∞ Ep,q = Ep,q · · · . This group is called Ep,q . r A upper right quadrant spectral sequence {Ep,q } is said to converge to a sequence Fn of R-modules if there is an exhaustive filtration Fn,0 = 0 ⊂ Fn,1 ⊂ · · · ⊂ of each Fn such that ∞ Fp+q,q+1 /Fp+q,q ∼ = Ep,q r . This is typically written Ep,q ⇒ Fp+q .

Typically spectral sequences are used in the following manner: we find an interpretation of E r for a small value of r, typically 1, and of E ∞ , and then in cases where enough groups and differentials are 0, we can obtain information about one from the other. Version: 2 Owner: bwebste Author(s): bwebste 1238

Chapter 299 19-00 – General reference works (handbooks, dictionaries, bibliographies, etc.) 299.1

Algebraic K-theory

Algebraic K-theory is a series of functors on the category of rings. It classifies ring invariants, i.e. ring properties that are Morita invariant. The functor K0 Let R be a ring and denote by M∞ (R) the algebraic direct limit  of matrix algebras Mn (R)  a 0 . The zeroth K-group of under the embeddings Mn (R) → Mn+1 (R) : a → 0 0 R, K0 (R), is the Grothendieck group (abelian group of formal differences) of the unitary equivalence classes of projections in M∞ (R). The addition of two equivalence classes [p] and [q] is given by the direct summation of the projections p and q: [p] + [q] = [p ⊕ q]. The functor K1 [To Do: coauthor?] The functor K2 [To Do: coauthor?] Higher K-functors Higher K-groups are defined using the Quillen plus construction, Knalg (R) = πn (BGL∞ (R)+ ), 1239

(299.1.1)

where GL∞ (R) is the infinite general linear group over R (defined in a similar way to M∞ (R)), and BGL∞ (R) is its classifying space. Algebraic K-theory has a product structure, Ki (R) ⊗ Kj (S) → Ki+j (R ⊗ S).

(299.1.2)

Version: 2 Owner: mhale Author(s): mhale

299.2

K-theory

Topological K-theory is a generalised cohomology theory on the category of compact Hausdorff spaces. It classifies the vector bundles over a space X up to stable equivalences. Equivalently, via the Serre-Swan theorem, it classifies the finitely generated projective modules over the C ∗ -algebra C(X). Let A be a unital C ∗ -algebra over C and denote by M∞ (A) the algebraicdirect  limit of a 0 matrix algebras Mn (A) under the embeddings Mn (A) → Mn+1 (A) : a → . The 0 0 K0 (A) group is the Grothendieck group (abelian group of formal differences) of the homotopy classes of the projections in M∞ (A). Two projections p and q are homotopic if p = uqu−1 for some unitary u ∈ M∞ (A). Addition of homotopy classes is given by the direct summation of projections: [p] + [q] = [p ⊕ q]. Denote by U∞ (A) limit of unitary groups Un (A) under the embeddings Un (A) →  the direct  u 0 Un+1 (A) : u → . Give U∞ (A) the direct limit topology, i.e. a subset U of U∞ (A) 0 1 T is open if and only if U Un (A) is an open subset of Un (A), for all n. The K1 (A) group is the Grothendieck group (abelian group of formal differences) of the homotopy classes of the unitaries in U∞ (A). Two unitaries u and v are homotopic if uv −1 lies in the identity component of U∞ (A). Addition of homotopy classes is given by the direct summation of unitaries: [u] + [v] = [u ⊕ v]. Equivalently, one can work with invertibles in GL∞ (A) (an invertible g is connected to the unitary u = g|g|−1 via the homotopy t → g|g|−t). Higher K-groups can be defined through repeated suspensions, Kn (A) = K0 (S n A).

(299.2.1)

But, the Bott periodicity theorem means that K1 (SA) ∼ = K0 (A).

1240

(299.2.2)

The main properties of Ki are: Ki (A ⊕ B) Ki (Mn (A)) Ki (A ⊗ K) Ki+2 (A)

= = = =

Ki (A) ⊕ Ki (B), Ki (A) (Morita invariance), Ki (A) (stability), Ki (A) (Bott periodicity).

(299.2.3) (299.2.4) (299.2.5) (299.2.6)

There are three flavours of topological K-theory to handle the cases of A being complex (over C), real (over R) or Real (with a given real structure). Ki (C(X, C)) = KU −i (X) (complex/unitary), Ki (C(X, R)) = KO −i (X) (real/orthogonal), KR i (C(X), J) = KR −i (X, J) (Real).

(299.2.7) (299.2.8) (299.2.9)

Real K-theory has a Bott period of 8, rather than 2.

REFERENCES 1. N. E. Wegge-Olsen, K-theory and C ∗ -algebras. Oxford science publications. Oxford University Press, 1993. 2. B. Blackadar, K-Theory for Operator Algebras. Cambridge University Press, 2nd ed., 1998.

Version: 12 Owner: mhale Author(s): mhale

299.3

examples of algebraic K-theory groups R Z R C

K0 (R) Z Z C

K1 (R) Z/2 R× C×

K2 (R) Z/2

K3 (R) Z/48

K4 (R) 0

Algebraic K-theory of some common rings. Version: 2 Owner: mhale Author(s): mhale

1241

Chapter 300 19K33 – EXT and K-homology 300.1

Fredholm module

Fredholm modules represent abstract elliptic pseudo-differential operators. Definition 3. An Definition 23. odd Fredholm module (H, F ) over a C ∗ -algebra A is given by an involutive representation π of A on a Hilbert space H, together with an operator F on H such that F = F ∗ , F 2 = 1I and [F, π(a)] ∈ K(H) for all a ∈ A. Definition 4. An Definition 24. even Fredholm module (H, F, Γ) is given by an odd Fredholm module (H, F ) together with a Z2 -grading Γ on H, Γ = Γ∗ , Γ2 = 1I, such that Γπ(a) = π(a)Γ and ΓF = −F Γ. Definition 5. A Fredholm module is called Definition 25. degenerate if [F, π(a)] = 0 for all a ∈ A. Degenerate Fredholm modules are homotopic to the 0-module. Example 17 (Fredholm modules over C). An even Fredholm module (H, F, Γ) over C is given by   a1Ik 0 k k H = C ⊕ C with π(a) = , 0 0   0 1Ik F = , 1Ik 0   1Ik 0 Γ = . 0 −1Ik Version: 3 Owner: mhale Author(s): mhale 1242

300.2

K-homology

K-homology is a homology theory on the category of compact Hausdorff spaces. It classifies the elliptic pseudo-differential operators acting on the vector bundles over a space. In terms of C ∗ -algebras, it classifies the Fredholm modules over an algebra. The K 0 (A) group is the abelian group of homotopy classes of even Fredholm modules over A. The K 1 (A) group is the abelian group of homotopy classes of odd Fredholm modules over A. Addition is given by direct summation of Fredholm modules, and the inverse of (H, F, Γ) is (H, −F, −Γ). Version: 1 Owner: mhale Author(s): mhale

1243

Chapter 301 19K99 – Miscellaneous 301.1

examples of K-theory groups A C Mn (C) H K B B/K C0 ((0, 1)) C0 (R2n ) C0 (R2n+1 ) C([0, 1]) C(Tn ) C(S2n ) C(S2n+1 ) C(CPn ) On Aθ C ∗ (H3 )

K0 (A) Z Z Z Z 0 0 0 Z 0 Z n−1 Z2 Z2 Z Zn+1 Z/(n − 1) Z2 Z3

K1 (A) 0 0 0 0 0 Z Z 0 Z 0 n−1 Z2 0 Z 0 0 Z2 Z3

Topological K-theory of some common C ∗ -algebras. Version: 5 Owner: mhale Author(s): mhale

1244

Chapter 302 20-00 – General reference works (handbooks, dictionaries, bibliographies, etc.) 302.1

alternating group is a normal subgroup of the symmetric group

Theorem 2. The alternating group An is a normal subgroup of the symmetric group Sn efine the epimorphism f : Sn → Z2 by : σ 7→ 0 if σ is an even permutation and : σ 7→ 1 if σ is an odd permutation. Hence, An is the kernel of f and so it is a normal subgroup of the domain Sn . Furthermore Sn /An ∼ = Z2 by the first isomorphism theorem. So by Lagrange’s theorem |Sn | = |An ||Sn /An |. D

Therefore, |An | = n!/2. That is, there are n!/2 many elements in An Version: 1 Owner: tensorking Author(s): tensorking

302.2

associative

Let (S, φ) be a set with binary operation φ. φ is said to be associative over S if φ(a, φ(b, c)) = φ(φ(a, b), c)

1245

for all a, b, c ∈ S. Examples of associative operations are addition and multiplication over the integers (or reals), or addition or multiplication over n × n matrices. We can construct an operation which is not associative. Let S be the integers. and define ν(a, b) = a2 + b. Then ν(ν(a, b), c) = ν(a2 + b, c) = a4 + 2ba2 + b2 + c. But ν(a, ν(b, c)) = ν(a, b2 + c) = a + b4 + 2cb2 + c2 , hence ν(ν(a, b), c) 6= ν(a, ν(b, c)). Note, however, that if we were to take S = {0}, ν would be associative over S!. This illustrates the fact that the set the operation is taken with respect to is very important. Example. We show that the division operation over nonzero reals is non-associative. All we need is a counter-example: so let us compare 1/(1/2) and (1/1)/2. The first expression is equal to 2, the second to 1/2, hence division over the nonzero reals is not associative. Version: 6 Owner: akrowne Author(s): akrowne

302.3

canonical projection

Given a group G and a normal subgroup N  G there is an epimorphism π : G → G/N defined by sending an element g ∈ G to its coset gN. The epimorphism π is referred to as the canonical projection. Version: 4 Owner: Dr Absentius Author(s): Dr Absentius

302.4

centralizer

For a given group G, the centralizer of an element a ∈ G is defined to be the set C(a) = {x ∈ G | xa = ax} We note that if x, y ∈ C(a) then xy −1 a = xay −1 = axy −1 so that xy −1 ∈ C(a). Thus C(a) is a non-trivial subgroup of G containing at least {e, a}. To illustrate an application of this concept we prove the following lemma. There exists a bijection between the right cosets of C(a) and the conjugates of a. If x, y ∈ G are in the same right coset, then y = cx for some c ∈ C(a). Thus y −1ay = x−1 c−1 acx = x−1 c−1 cax = x−1 ax. Conversely, if y −1 ay = x−1 ax then xy −1 a = axy −1 and 1246

xy −1 ∈ C(a) giving x, y are in the same right coset. Let [a] denote the conjugacy class of a. It follows that |[a]| = [G : C(a)] and |[a]| | |G|. We remark that a ∈ Z(G) ⇔ |C(a)| = |G| ⇔ |[a]| = 1, where Z(G) denotes the center of G. Now let G be a p-group, i.e. a finite group of order pn , where p is a prime and n P > 0. Let z =P |Z(G)|. Summing over elements in distinct conjugacy classes, we have pn = |[a]| = z + a∈Z(G) |[a]| since the center consists precisely of the conjugacy classes of cardinality 1. / n But |[a]| | p , so p | z. However, Z(G) is certainly non-empty, so we conclude that every p-group has a non-trivial center. The groups C(gag −1) and C(a), for any g, are isomorphic. Version: 5 Owner: mathcam Author(s): Larry Hammick, vitriol

302.5

commutative

Let (S, φ) be a set with binary operation φ. φ is said to be commutative if φ(a, b) = φ(b, a) for all a, b ∈ S. Some operations which are commutative are addition over the integers, multiplication over the integers, addition over n × n matrices, and multiplication over the reals. An example of a non-commutative operation is multiplication over n × n matrices. Version: 3 Owner: akrowne Author(s): akrowne

302.6

examples of groups

Groups are ubiquitous throughout mathematics. Many “naturally occurring” groups are either groups of numbers (typically abelian) or groups of symmetries (typically non-abelian). Groups of numbers • The most important group is the group of integers Z with addition as operation. • The integers modulo n, often denoted by Zn , form a group under addition. Like Z itself, this a cyclic group; any cyclic group is isomorphic to one of these. 1247

• The rational (or real, or complex) numbers form a group under addition. • The positive rationals form a group under multiplication, and so do the non-zero rationals. The same is true for the reals. • The non-zero complex numbers form a group under multiplication. So do the non-zero quaternions. The latter is our first example of a non-abelian group. • More generally, any (skew) field gives rise to two groups: the additive group of all field elements, and the multiplicative group of all non-zero field elements. • The complex numbers of absolute value 1 form a group under multiplication, best thought of as the unit circle. The quaternions of absolute value 1 form a group under multiplication, best thought of as the three-dimensional unit sphere S 3 . The twodimensional sphere S 2 however is not a group in any natural way. Most groups of numbers carry natural topologies turning them into topological groups.

Symmetry groups • The symmetric group of degree n, denoted by Sn , consists of all permutations of n items and has n! elements. Every finite group is isomorphic to a subgroup of some Sn . • An important subgroup of the symmetric group of degree n is the alternating group, denoted An . This consists of all even permutations on n items. A permutation is said to be even if it can be written as the product of an even number of transpositions. The alternating group is normal in Sn , of index 2, and it is an interesting fact that An is simple for n > 5. See the proof on the simplicity of the alternating groups. By the Jordan-H¨older theorem, this means that this is the only normal subgroup of Sn . • If any geometrical object is given, one can consider its symmetry group consisting of all rotations and reflections which leave the object unchanged. For example, the symmetry group of a cone is isomorphic to S 1 . • The set of all automorphisms of a given group (or field, or graph, or topological space, or object in any category) forms a group with operation given by the composition of homomorphisms. These are called automorphism groups; they capture the internal symmetries of the given objects. • In Galois theory, the symmetry groups of field extensions (or equivalently: the symmetry groups of solutions to polynomial equations) are the central object of study; they are called Galois groups. • Several matrix groups describe various aspects of the symmetry of n-space:

1248

– The general linear group GL(n, R) of all real invertible n × n matrices (with matrix multiplication as operation) contains rotations, reflections, dilations, shear transformations, and their combinations. – The orthogonal group O(n, R) of all real orthogonal n × n matrices contains the rotations and reflections of n-space. – The special orthogonal group SO(n, R) of all real orthogonal n × n matrices with determinant 1 contains the rotations of n-space. All these matrix groups are Lie groups: groups which are differentiable manifolds such that the group operations are smooth maps.

Other groups • The trivial group consists only of its identity element. • If X is a topological space and x is a point of X, we can define the fundamental group of X at x. It consists of (equivalence classes of) continuous paths starting and ending at x and describes the structure of the “holes” in X accessible from x. • The free groups are important in algebraic topology. In a sense, they are the most general groups, having only those relations among their elements that are absolutely required by the group axioms. • If A and B are two abelian groups (or modules over the same ring), then the set Hom(A, B) of all homomorphisms from A to B is an abelian group (since the sum and difference of two homomorphisms is again a homomorphism). Note that the commutativity of B is crucial here: without it, one couldn’t prove that the sum of two homorphisms is again a homomorphism. • The set of all invertible n × n matrices over some ring R forms a group denoted by GL(n, R). • The positive integers less than n which are coprime to n form a group if the operation is defined as multiplication modulo n. This is a cyclic group whose order is given by the Euler phi-function φ(n), • Generalizing the last two examples, every ring (and every monoid) contains a group, its group of units (invertible elements), where the group operation is ring (monoid) multiplication. • If K is a number field, then multiplication of (equivalence classes of) non-zero ideals in the ring of algebraic integers OK gives rise to the ideal class group of K. • The set of arithmetic functions that take a value other than 0 at 1 form an abelian group under Dirichlet convolution. They include as a subgroup the set of multiplicative functions. 1249

• Consider the curve C = {(x, y) ∈ R2 | y 2 = x3 − x}. Every straight line intersects this set in three points (counting a point twice if the line is tangent, and allowing for a point at infinity). If we require that those three points add up to zero for any straight line, then we have defined an abelian group structure on C. Groups like these are called abelian varieties; the most prominent examples are elliptic curves of which C is the simplest one. • In the classification of all finite simple groups, several “sporadic” groups occur which don’t follow any discernable pattern. The largest of these is the monster group with some 8 · 1053 elements. Version: 14 Owner: AxelBoldt Author(s): AxelBoldt, NeuRet

302.7

group

Group. A group is a pair (G, ∗) where G is a non-empty set and ∗ is binary operation on G that holds the following conditions. • For any a, b, c ∈ G, (a ∗ b) ∗ c = a ∗ (b ∗ c). (associativity of the operation). • For any a, b in G, a ∗ b belongs to G. (The operation ∗ is closed). • There is an element e ∈ G such that ge = eg = g for any g ∈ G. (Existence of identity element). • For any g ∈ G there exists an element h such that gh = hg = e. (Existence of inverses) Usually the symbol ∗ is omitted and we write ab for a ∗ b. Sometimes, the symbol + is used to represent the operation, especially when the group is abelian. It can be proved that there is only one identity element , and that for every element there is only one inverse. Because of this we usually denote the inverse of a as a−1 or −a when we are using additive notation. The identity element is also called neutral element due to its behavior with respect to the operation. Version: 10 Owner: drini Author(s): drini

302.8

quotient group

Let (G, ∗) be a group and H a normal subgroup. The relation ∼ given by a ∼ b when ab−1 ∈ H is an equivalence relation. The equivalence classes are called cosets. The equivalence class of a is denoted as aH (or a + H if additive notation is being used). 1250

We can induce a group structure on the cosets with the following operation: (aH) ? (bH) = (a ∗ b)H. The collection of cosets is denoted as G/H and together with the ? operation form the quotient group or factor group of G with H. Example. Consider the group Z and the subgroup 3Z = {n ∈ Z : n = 3k,

k ∈ Z}.

Since Z is abelian, 3Z is then also a normal subgroup. Using additive notation, the equivalence relation becomes n ∼ m when (n−m) ∈ 3Z, that is, 3 divides n − m. So the relation is actually congruence modulo 3. Therefore the equivalence classes (the cosets) are: 3Z = . . . , −9, −6, −3, 0, 3, 6, 9, . . . 1 + 3Z = . . . , −8, −5, −2, 1, 4, 7, 10, . . . 2 + 3Z = . . . , −7, −4, −1, 2, 5, 8, 11, . . . which we’ll represent as ¯0, ¯1 and ¯2. Then we can check that Z/3Z is actually the integers modulo 3 (that is, Z/3Z ∼ = Z3 ). Version: 6 Owner: drini Author(s): drini

1251

Chapter 303 20-02 – Research exposition (monographs, survey articles) 303.1

length function

Let G be a group. A length function on G is a function L : G → R+ satisfying: L(e) = 0, L(g) = L(g −1 ), ∀g ∈ G, L(g1 g2 ) 6 L(g1 ) + L(g2 ), ∀g1 , g2 ∈ G. Version: 2 Owner: mhale Author(s): mhale

1252

Chapter 304 20-XX – Group theory and generalizations 304.1

free product with amalgamated subgroup

Definition 26. Let Gk , k = 0, 1, 2 be groups and ik : G0 → Gi , k = 1, 2 be monomorphisms. The free product of G1 and G2 with amalgamated subgroup G0 , is defined to be a group G that has the following two properties 1. there are homomorphisms jk : Gk → G, k = 1, 2 that make the following diagram commute G1 j1

i1

G0

G i2

j2

G2 2. G is universal with respect to the previous property, that is for any other group G0 and homomorphisms jk0 : Gk → G0 , k = 1, 2 that fit in such a commutative diagram there is a unique homomorphism G → G0 so that the following diagram commutes G1

j10 j1

i1

G0

G i2

j2

G2

1253

j20

!

G0

It follows by “general nonsense” that the free product of G1 and G2 with amalgamated subgroup G0 , if it exists, is “unique up to unique isomorphism.” The free product of G1 and G2 with amalgamated subgroup G0 , is denoted by G1 FG0 G2 . The following theorem asserts its existence. Theorem 1. G1 FG0 G2 exists for any groups Gk , k = 0, 1, 2 and monomorphisms ik : G0 → Gi , k = 1, 2. [ Sketch of proof] Without loss of generality assume that G0 is a subgroup of Gk and that ik is the inclusion for k = 1, 2. Let

Gk = h(xk;s )s∈S | (rk;t)t∈T i be a presentation of Gk for k = 1, 2. Each g ∈ G0 can be expressed as a word in the generators of Gk ; denote that word by wk (g) and let N be the normal closure of {w1 (g)w2(g)−1 | g ∈ G0 } in the free product G1 FG2 . Define G1 FG0 G2 := G1 FG2 /N and for k = 0, 1 define jk to be the inclusion into the free product followed by the canonical projection. Clearly (1) is satisfied, while (2) follows from the universal properties of the free product and the quotient group. Notice that in the above proof it would be sufficient to divide by the relations w1 (g)w2 (g)−1 for g in a generating set of G0 . This is useful in practice when one is interested in obtaining a presentation of G1 FG0 G2 . In case that the ik ’s are not injective the above still goes through verbatim. The group thusly obtained is called a “pushout”. Examples of free products with amalgamated subgroups are provided by Van Kampen’s theorem. Version: 1 Owner: Dr Absentius Author(s): Dr Absentius

304.2

nonabelian group

Let (G, ∗) be a group. If a ∗ b 6= b ∗ a for some a, b ∈ G, we say that the group is nonabelian or noncommutative. proposition. There is a nonabelian group for which x 7→ x3 is a homomorphism Version: 2 Owner: drini Author(s): drini, apmxi

1254

Chapter 305 20A05 – Axiomatics and elementary properties 305.1

Feit-Thompson theorem

An important result in the classification of all finite simple groups, the Feit-Thompson theorem states that every non-Abelian simple group must have even order. The proof requires 255 pages. Version: 1 Owner: mathcam Author(s): mathcam

305.2

Proof: The orbit of any element of a group is a subgroup

Following is a proof that, if G is a group and g ∈ G, then hgi ≤ G. Here hgi is the orbit of g and is defined as hgi = {g n : n ∈ Z} Since g ∈ hgi, then hgi is nonempty. Let a, b ∈ hgi. Then there exist x, y ∈ Z such that a = g x and b = g y . Since ab−1 = g x (g y )−1 = g x g −y = g x−y ∈ hgi, it follows that hgi ≤ G. Version: 3 Owner: drini Author(s): drini, Wkbj79

1255

305.3

center

The center of a group G is the subgroup of elements which commute with every other element. Formally Z(G) = {x ∈ G | xg = gx, ∀ g ∈ G} It can be shown that the center has the following properties • It is non-empty since it contains at least the identity element • It consists of those conjugacy classes containing just one element • The center of an abelian group is the entire group • It is normal in G • Every p-group has a non-trivial center Version: 5 Owner: vitriol Author(s): vitriol

305.4

characteristic subgroup

If (G, ∗) is a group, then H is a characteristic subgroup of G (H char G) if every automorphism of G maps H to itself. That is: ∀f ∈ Aut(G)∀h ∈ Hf (h) ∈ H or, equivalently: ∀f ∈ Aut(G)f [H] = H A few properties of characteristic subgroups:

(a) If H char G then H is a normalsubgroup of G (b) If G has only one subgroup of a given size then that subgroup is characteristic (c) If K char H and H E G then K E G (contrast with normality of subgroups is not transitive) (d) If K char H and H char G then K char G Proofs of these properties: 1256

(a) Consider H char G under the inner automorphisms of G. Since every automorphism preserves H, in particular every inner automorphism preserves H, and therefore g ∗ h ∗ g −1 ∈ H for any g ∈ G and h ∈ H. This is precisely the definition of a normal subgroup. (b) Suppose H is the only subgroup of G of order n. In general, homomorphisms takes subgroups to subgroups, and of course isomorphisms take subgroups to subgroups of the same order. But since there is only one subgroup of G of order n, any automorphism must take H to H, and so H char G. (c) Take K char H and H E G, and consider the inner automorphisms of G (automorphisms of the form h 7→ g ∗ h ∗ g −1 for some g ∈ G). These all preserve H, and so are automorphisms of H. But any automorphism of H preserves K, so for any g ∈ G and k ∈ K, g ∗ k ∗ g −1 ∈ K. (d) Let K char H and H char G, and let φ be an automorphism of G. Since H char G, φ[H] = H, so φH , the restriction of φ to H is an automorphism of H. Since K char H, so φH [K] = K. But φH is just a restriction of φ, so φ[K] = K. Hence K char G. Version: 1 Owner: Henry Author(s): Henry

305.5

class function

Given a field K, a K–valued class function on a group G is a function f : G −→ K such that f (g) = f (h) whenever g and h are elements of the same conjugacy class of G. An important example of a class function is the character of a group representation. Over the complex numbers, the set of characters of the irreducible representations of G form a basis for the vector space of all C–valued class functions, when G is a compact Lie group. Relation to the convolution algebra Class functions are also known as central functions, because they correspond to functions f in the convolution algebra C ∗ (G) that have the property f ∗ g = g ∗ f for all g ∈ C ∗ (G) (i.e., they commute with everything under the convolution operation). More precisely, the set of measurable complex valued class functions f is equal to the set of central elements of the convolution algebra C ∗ (G), for G a locally compact group admitting a Haar measure. Version: 5 Owner: djao Author(s): djao

1257

305.6

conjugacy class

Two elements g and g 0 of a group G are said to be conjugate if there exists h ∈ G such that g 0 = hgh−1 . Conjugacy of elements is an equivalence relation, and the equivalence classes of G are called conjugacy classes. Two subsets S and T of G are said to be conjugate if there exists g ∈ G such that T = {gsg −1 | s ∈ S} ⊂ G. In this situation, it is common to write gSg −1 for T to denote the fact that everything in T has the form gsg −1 for some s ∈ S. We say that two subgroups of G are conjugate if they are conjugate as subsets. Version: 2 Owner: djao Author(s): djao

305.7

conjugacy class formula

The conjugacy classes of a group form a partition of its elements. In a finite group, this means that the order of the group is the sum of the number of elements of the distinct conjugacy classes. For an element g of group G, we denote the conjugacy class of g as Cg and the normalizer in G of g as NG (g). The number of elements in Cg equals [G : NG (g)], the index of the normalizer of g in G. For an element g of the center Z(G) of G, the conjugacy class of g consists of the singleton {g}. Putting this together gives us the conjugacy class formula m X |G| = |Z(G)| + [G : NG (xi )] i=1

where the xi are elements of the distinct conjugacy classes contained in G − Z(G). Version: 3 Owner: lieven Author(s): lieven

305.8

conjugate stabilizer subgroups

Let · be a right group action of G on a set M. Then Gα·g = g −1 Gα g for any α ∈ M and g ∈ G.

1

Proof: 1

Gα is the stabilizer subgroup of α ∈ M .

1258

x ∈ Gα·g ↔ α · (gx) = α · g ↔ α · (gxg −1 ) = α ↔ gxg −1 ∈ Gα ↔ x ∈ g −1 αg and therefore Gα·g = g −1 Gα g. Thus all stabilizer subgroups for elements of the orbit G(α) of α are conjugate to Gα . Version: 4 Owner: Thomas Heye Author(s): Thomas Heye

305.9

coset

Let H be a subgroup of a group G, and let a ∈ G. The left coset of a with respect to H in G is defined to be the set aH := {ah | h ∈ H}. The right coset of a with respect to H in G is defined to be the set Ha := {ha | h ∈ H}.

T Two left cosets aH and bH of H in G are either identical or disjoint. Indeed, if c ∈ aH bH, then c = ah1 and c = bh2 for some h1 , h2 ∈ H, whence b−1 a = h2 h−1 1 ∈ H. But then, given any ah ∈ aH, we have ah = (bb−1 )ah = b(b−1 a)h ∈ bH, so aH ⊂ bH, and similarly bH ⊂ aH. Therefore aH = bH. Similarly, any two right cosets Ha and Hb of H in G are either identical or disjoint. Accordingly, the collection of left cosets (or right cosets) partitions the group G; the corresponding equivalence relation for left cosets can be described succintly by the relation a ∼ b if a−1 b ∈ H, and for right cosets by a ∼ b if ab−1 ∈ H. The index of H in G, denoted [G : H], is the cardinality of the set G/H of left cosets of H in G. Version: 5 Owner: djao Author(s): rmilson, djao

305.10

cyclic group

A group G is said to be cyclic if it is generated entirely by some x ∈ G. That is, if G has infinite order then every g ∈ G can be expressed as xk with k ∈ Z. If G has finite order then every g ∈ G can be expressed as xk with k ∈ N0 , and G has exactly φ(|G|) generators, where φ is the Euler totient function. It is a corollary of Lagrange’s theorem that every group of prime order is cyclic. All cyclic groups of the same order are isomorphic to each other. Consequently cyclic groups of order n are often denoted by Cn . Every cyclic group is abelian. 1259

Examples of cyclic groups are (Zm , +m ), (Z?p , ×p ) and (Rm , ×m ) where p is prime and Rm = {n ∈ N : (n, m) = 1, n ≤ m} Version: 10 Owner: yark Author(s): yark, Larry Hammick, vitriol

305.11

derived subgroup

Let G be a group and a, b ∈ G. The group element aba−1 b−1 is called the commutator of a and b. An element of G is called a commutator if it is the commutator of some a, b ∈ G. The subgroup of G generated by all the commutators in G is called the derived subgroup of G, and also the commutator subgroup. It is commonly denoted by G0 and also by G(1) . Alternatively, one may define G0 as the smallest subgroup that contains all the commutators. Note that the commutator of a, b ∈ G is trivial, i.e. aba−1 b−1 = 1 if and only if a and b commute. Thus, in a fashion, the derived subgroup measures the degree to which a group fails to be abelian. Proposition 1. The derived subgroup G0 is normal in G, and the factor group G/G0 is abelian. Indeed, G is abelian if and only if G0 is the trivial subgroup. One can of course form the derived subgroup of the derived subgroup; this is called the second derived subgroup, and denoted by G00 or by G(2) . Proceeding inductively one defines the nth derived subgroup as the derived subgroup of G(n−1) . In this fashion one obtains a sequence of subgroups, called the derived series of G: G = G(0) ⊇ G(1) ⊇ G(2) ⊇ . . . Proposition 2. The group G is solvable if and only if the derived series terminates in the trivial group {1} after a finite number of steps. In this case, one can refine the derived series to obtain a composition series (a.k.a. a Jordan-Holder decomposition) of G. Version: 4 Owner: rmilson Author(s): rmilson

305.12

equivariant

Let G be a group, and X and Y left (resp. right) homogeneous spaces of G. Then a map f : X → Y is called equivariant if g(f (x)) = f (gx) (resp. (f (x))g = f (xg)) for all g ∈ G. Version: 1 Owner: bwebste Author(s): bwebste 1260

305.13

examples of finite simple groups

This entry under construction. If I take too long to finish it, nag me about it, or fill in the rest yourself. All groups considered here are finite. It is now widely believed that the classification of all finite simple groups up to isomorphism is finished. The proof runs for at least 10,000 printed pages, and as of the writing of this entry, has not yet been published in its entirety.

Abelian groups • The first trivial example of simple groups are the cyclic groups of prime order. It is not difficult to see (say, by Cauchy’s theorem) that these are the only abelian simple groups.

Alternating groups • The alternating group on n symbols is the set of all even permutations of Sn , the symmetric group on n symbols. It is usually denoted by An , or sometimes by Alt(n). This is a normal subgroup of Sn , namely the kernel of the homomorphism that sends every even permutation to 1 and the odd permutations to −1. Because every permutation is either even or odd, and there is a bijection between the two (multiply every even permutation by a transposition), the index of An in Sn is 2. A3 is simple because it only has three elements, and the simplicity of An for n > 5 can be proved by an elementary argument. The simplicity of the alternating groups is an important ´ fact that Evariste Galois required in order to prove the insolubility by radicals of the general polynomial of degree higher than four.

Groups of Lie type • Projective special linear groups • Other groups of Lie type. Sporadic groups There are twenty-six sporadic groups (no more, no less!) that do not fit into any of the infinite sequences of simple groups considered above. These often arise as the group of automorphisms of strongly regular graphs.

1261

• Mathieu groups. • Janko groups. • The baby monster. • The monster. Version: 8 Owner: drini Author(s): bbukh, yark, NeuRet

305.14

finitely generated group

A group G is finitely generated if there is a finite subset X ⊆ G such that X generates G. That is, every element of G is a product of elements of X and inverses of elements of X. Or, equivalently, no proper subgroup of G contains X. Every finite group is finitely generated, as we can take X = G. Every finitely generated group is countable. Version: 6 Owner: yark Author(s): yark, nerdy2

305.15

first isomorphism theorem

If f : G → H is a homorphism of groups (or rings, or modules), then it induces an isomorphism G/ker f ≈ imf . Version: 2 Owner: nerdy2 Author(s): nerdy2

305.16

fourth isomorphism theorem

fourth isomorphism theorem 1: X group 2: N E X 3: A set of subgroups of X that contain N 4: B set of subgroups of X/N 1262

5: ∃ϕ : A → B bijection : ∀Y, Z 6 X : *N 6 Y & N 6 Z+ ⇒ **Y 6 Z ⇔ Y /N T 6 Z/N + & *Z T 6 Y ⇒ [Y : Z] = [Y /N : Z/N] + & *hY, Zi/N = hY /N, Z/Ni + & *(Y Z)/N = Y /N Z/N + & *Y E G ⇔ Y /N E G/N+

Note: This is a “seed” entry written using a short-hand format described in this FAQ. Version: 2 Owner: bwebste Author(s): yark, apmxi

305.17

generator

If G is a cyclic group and g ∈ G, then g is a generator of G if hgi = G. All infinite cyclic groups have exactly 2 generators. Let G be an infinite cyclic group and g be a generator of G. Let z ∈ Z such that g z is a generator of G. Then hg z i = G. Since g ∈ G, then g ∈ hg z i. Thus, there exists, n ∈ Z with g = (g z )n = g nz . Thus, g nz−1 = eG . Since G is infinite and |g| = |hgi| = |G| must be infinity, then nz − 1 = 0. Since nz = 1 and n and z are integers, then n = z = 1 or n = z = −1. It follows that the only generators of G are g and g −1. A finite cyclic group of order n has exactly ϕ(n) generators, where ϕ is the Euler totient function. Let G be a finite cyclic group of order n and g be a generator of G. Then |g| = |hgi| = |G| = n. Let z ∈ Z such that g z is a generator of G. By the division algorithm, there exist q, r ∈ Z with 0 ≤ r < n such that z = qn + r. Thus, g z = g qn+r = g qn g r = (g n )q g r = (eG )q g r = eG g r = g r . |g| n Since g r is a generator of G, then hg r i = G. Thus, n = |G| = |hg r i| = |g r | = gcd(r,|g|) = gcd(r,n) . Therefore, gcd(r, n) = 1, and the result follows. Version: 3 Owner: Wkbj79 Author(s): Wkbj79

305.18

group actions and homomorphisms

Notes on group actions and homomorphisms Let G be a group, X a non-empty set and SX the symmetric group of X, i.e. the group of all bijective maps on X. · may denote a left group action of G on X. 1. For each g ∈ G we define fg : X −→ X, fg (x) = g · x ∀ x ∈ X.

fg− 1 (fg (x)) = g −1 · (g · x) = x ∀ x ∈ X, so fg− 1 is the inverse of fg . so fg is bijective and thus element of SX . We define F : G −→ SX , F (g) = fg for all g ∈ G. This mapping is a group homomorphism: Let g, h ∈ G, x ∈ X. Then

F (gh)(x) = fgh (x) = (gh) · x = g · (h · x) = (fg ◦ fh )(x) = 1263

implies F (gh) = F (g) ◦ F (h). – The same is obviously true for a right group action. 2. Now let F : G −→ Sx be a group homomorphism, and let f : G × X −→ X, (g, x) −→ F (g)(x) satisfies (a) f (1G , x) = F (1g )(x) = x∀x ∈ X and

(b) f (gh, x) = F (gh)(x) = (F (g) ◦ F (h)(x) = F (g)(F (h)(x)) = f (g, f (h, x)), so f is a group action induced by F .

Characterization of group actions Let G be a group acting on a set X. Using the same notation as above, we have for each g ∈ ker(F ) [ F (g) = id x = fg ↔ g · x = x ∀x ∈ X ↔ g ∈ Gx (305.18.1) x∈X

and it follows

ker(F ) =

[

Gx .

x∈X

Let G act transitively on X. Then for any x ∈ X X is the orbit G(x) of x. As shown in “conjugate stabilizer subgroups’, all stabilizer subgroups of elements y ∈ G(x) are conjugate subgroups to Gx in G. From the above it follows that [ ker(F ) = gGx g −1 . g∈G

For a faithful operation of G the condition g · x = x∀x ∈ X → g = 1G is equivalent to ker(F ) = {1G } and therefore F : G −→ SX is a monomorphism. For the trivial operation of G on X given by g · x = x∀g ∈ G the stabilizer subgroup Gx is G for all x ∈ X, and thus ker(F ) = G. The corresponding homomorphism is g −→ id x∀g ∈ G. If the operation of G on X is free, then Gx = {1G } ∀ x ∈ X, thus the kernel of F is {1G }–like for a faithful operation. But: Let X = {1, . . . , n} and G = Sn . Then the operation of G on X given by π · i := π(i)∀i ∈ X, π ∈ Sn is faithful but not free. Version: 5 Owner: Thomas Heye Author(s): Thomas Heye 1264

305.19

group homomorphism

Let (G, ∗g ) and (K, ∗k ) be two groups. A group homomorphism is a function φ : G → K such that φ(s ∗g t) = φ(s) ∗k φ(t) for all s, t ∈ G. The composition of group homomorphisms is again a homomorphism. Let φ : G → K a group homomorphism. Then • φ(eg ) = ek where eg and ek are the respective identity elements for G and K. • φ(g)−1 = φ(g −1 ) for all g ∈ G • φ(g)z = φ(g z ) for all g ∈ G and for all z ∈ Z The kernel of φ is a subgroup of G and its image is a subgroup of K. Some special homomorphisms have special names. If φ : G → K is injective, we say that φ is an monomorphism, and if φ is onto we call it an epimorphism. When φ is both injective and surjective (that is, bijective) we call it an isomorphism. In the latter case we also say that G and K are isomorphic, meaning they are basically the same group (have the same structure). An homomorphism from G on itself is called an endomorphism, and if it is bijective, then is called an automorphism. Version: 15 Owner: drini Author(s): saforres, drini

305.20

homogeneous space

Overview and definition. Let G be a group acting transitively on a set X. In other words, we consider a homomorphism φ : G → Perm(X), where the latter denotes the group of all bijections of X. If we consider G as being, in some sense, the automorphisms of X, the transitivity assumption means that it is impossible to distinguish a particular element of X from any another element. Since the elements of X are indistinguishable, we call X a homogeneous space. Indeed, the concept of a homogeneous space, is logically equivalent to the concept of a transitive group action. Action on cosets. Let G be a group, H < G a subgroup, and let G/H denote the set of left cosets, as above. For every g ∈ G we consider the mapping ψH (g) : G/H → G/H with action aH → gaH, a ∈ G. Proposition 3. The mapping ψH (g) is a bijection. The corresponding mapping ψH : G → Perm(G/H) is a group homomorphism, specifying a transitive group action of G on G/H. 1265

Thus, G/H has the natural structure of a homogeneous space. Indeed, we shall see that every homogeneous space X is isomorphic to G/H, for some subgroup H. N.B. In geometric applications, the want the homogeneous space X to have some extra structure, like a topology or a differential structure. Correspondingly, the group of automorphisms is either a continuous group or a Lie group. In order for the quotient space X to have a Hausdorff topology, we need to assume that the subgroup H is closed in G. The isotropy subgroup and the basepoint identification. Let X be a homogeneous space. For x ∈ X, the subgroup Hx = {h ∈ G : hx = x}, consisting of all G-actions that fix x, is called the isotropy subgroup at the basepoint x. We identify the space of cosets G/Hx with the homogeneous space by means of the mapping τx : G/Hx → X, defined by τx (aHx ) = ax, a ∈ G. Proposition 4. The above mapping is a well-defined bijection. To show that τx is well defined, let a, b ∈ G be members of the same left coset, i.e. there exists an h ∈ Hx such that b = ah. Consequently bx = a(hx) = ax, as desired. The mapping τx is onto because the action of G on X is assumed to be transitive. To show that τx is one-to-one, consider two cosets aHx , bHx , a, b ∈ G such that ax = bx. It follows that b−1 a fixes x, and hence is an element of Hx . Therefore aHx and bHx are the same coset. The homogeneous space as a quotient. Next, let us show that τx is equivariant relative to the action of G on X and the action of G on the quotient G/Hx . Proposition 5. We have that φ(g) ◦ τx = τx ◦ ψHx (g) for all g ∈ G. To prove this, let g, a ∈ G be given, and note that ψHx (g)(aHx ) = gaHx . The latter coset corresponds under τx to the point gax, as desired. Finally, let us note that τx identifies the point x ∈ X with the coset of the identity element eHx , that is to say, with the subgroup Hx itself. For this reason, the point x is often called the basepoint of the identification τx : G/Hx → X. 1266

The choice of basepoint. Next, we consider the effect of the choice of basepoint on the quotient structure of a homogeneous space. Let X be a homogeneous space. Proposition 6. The set of all isotropy subgroups {Hx : x ∈ X} forms a single conjugacy class of subgroups in G. To show this, let x0 , x1 ∈ X be given. By the transitivity of the action we may choose a gˆ ∈ G such that x1 = gˆx0 . Hence, for all h ∈ G satisfying hx0 = x0 , we have (ˆ g hˆ g −1 )x1 = gˆ(h(ˆ g −1 x1 )) = gˆx0 = x1 .

Similarly, for all h ∈ Hx1 we have that gˆ−1hˆ g fixes x0 . Therefore, gˆ(Hx0 )ˆ g −1 = Hx1 ;

or what is equivalent, for all x ∈ X and g ∈ G we have gHx g −1 = Hgx .

Equivariance. Since we can identify a homogeneous space X with G/Hx for every possible x ∈ X, it stands to reason that there exist equivariant bijections between the different G/Hx . To describe these, let H0 , H1 < G be conjugate subgroups with H1 = gˆH0 gˆ−1 for some fixed gˆ ∈ G. Let us set

X = G/H0 ,

and let x0 denote the identity coset H0 , and x1 the coset gˆH0 . What is the subgroup of G that fixes x1 ? In other words, what are all the h ∈ G such that hˆ g H0 = gˆH0 , or what is equivalent, all h ∈ G such that

gˆ−1 hˆ g ∈ H0 .

The collection of all such h is precisely the subgroup H1 . Hence, τx1 : G/H1 → G/H0 is the desired equivariant bijection. This is a well defined mapping from the set of H1 -cosets to the set of H0 -cosets, with action given by τx1 (aH1 ) = aˆ g H0 ,

a ∈ G.

Let ψ0 : G → Perm(G/H0 ) and ψ1 : G → Perm(G/H1 ) denote the corresponding coset G-actions. Proposition 7. For all g ∈ G we have that τx1 ◦ ψ1 (g) = ψ0 (g) ◦ τx1 . Version: 3 Owner: rmilson Author(s): rmilson 1267

305.21

identity element

Let G be a groupoid, that is a set with a binary operation G×G → G, written muliplicatively so that (x, y) 7→ xy. An identity element for G is an element e such that ge = eg = g for all g ∈ G. The symbol e is most commonly used for identity elements. Another common symbol for an identity element is 1, particularly in semigroup theory (and ring theory, considering the multiplicative structure as a semigroup). Groups, monoids, and loops are classes of groupoids that, by definition, always have an identity element. Version: 6 Owner: mclase Author(s): mclase, vypertd, imran

305.22

inner automorphism

Let G be a group. For every x ∈ G, we define a mapping φx : G → G,

y 7→ xyx−1 ,

y ∈ G,

called conjugation by x. It is easy to show the conjugation map is in fact, a group automorphism. An automorphism of G that corresponds to the conjugation by some x ∈ G is called inner. An automorphism that isn’t inner is called an outer automorphism. The composition operation gives the set of all automorphisms of G the structure of a group, Aut(G). The inner automorphisms also form a group, Inn(G), which is a normal subgroup of Aut(G). Indeed, if φx , x ∈ G is an inner automorphism and π : G → G an arbitrary automorphism, then π ◦ φx ◦ π −1 = φπ(x) . Let us also note that the mapping x 7→ φx ,

x∈G

is a surjective group homomorphism with kernel Z(G), the centre subgroup. Consequently, Inn(G) is naturally isomorphic to the quotient of G/ Z(G). Version: 7 Owner: rmilson Author(s): rmilson, tensorking

1268

305.23

kernel

Let ρ : G → K be a group homomorphism. The preimage of the codomain identity element eK ∈ K forms a subgroup of the domain G, called the kernel of the homomorphism; ker(ρ) = {s ∈ G | ρ(s) = eK } The kernel is a normal subgroup. It is the trivial subgroup if and only if ρ is a monomorphism. Version: 9 Owner: rmilson Author(s): rmilson, Daume

305.24

maximal

Let G be a group. A subgroup H of G is said to be maximal if H 6= G and whenever K is a subgroup of G with H ⊆ K ⊆ G then K = H or K = G. Version: 1 Owner: Evandar Author(s): Evandar

305.25

normal subgroup

A subgroup H of a group G is normal if aH = Ha for all a ∈ G. Equivalently, H ⊂ G is normal if and only if aHa−1 = H for all a ∈ G, i.e., if and only if each conjugacy class of G is either entirely inside H or entirely outside H. The notation H E G or H / G is often used to denote that H is a normal subgroup of G. The kernel ker (f ) of any group homomorphism f : G −→ G0 is a normal subgroup of G. More surprisingly, the converse is also true: any normal subgroup H ⊂ G is the kernel of some homomorphism (one of these being the projection map ρ : G −→ G/H, where G/H is the quotient group). Version: 6 Owner: djao Author(s): djao

305.26

normality of subgroups is not transitive

Let G be a group. Obviously, a subgroup K ≤ H of a subgroup H ≤ G of G is a subgroup K ≤ G of G. It seems plausible that a similar situation would also hold for normal subgroups. This is not true. Even when K E H and H E G, it is possible that K 5 G. Here are two examples: 1269

1. Let G be the subgroup of orientation-preserving isometries of the plane R2 (G is just all rotations and translations), let H be the subgroup of G of translations, and let K be the subgroup of H of integer translations τi,j (x, y) = (x + i, y + j), where i, j ∈ Z.

Any element g ∈ G may be represented as g = r1 ◦ t1 = t2 ◦ r2 , where r1,2 are rotations and t1,2 are translations. So for any translation t ∈ H we may write g −1 ◦ t ◦ g = r −1 ◦ t0 ◦ r, where t0 ∈ H is some other translation and r is some rotation. But this is an orientationpreserving isometry of the plane that does not rotate, so it too must be a translation. Thus G−1 HG = H, and H E G. H is an abelian group, so all its subgroups, K included, are normal. We claim that K 5 G. Indeed, if ρ ∈ G is rotation by 45◦ about the origin, then ρ−1 ◦ τ1,0 ◦ ρ is not an integer translation.

2. A related example uses finite subgroups. Let G = D4 be the dihedral group with four elements (the group of automorphisms of the graph of the square C4 ). Then

D4 = r, f | f 2 = 1, r 4 = 1, f r = r −1 f is generated by r, rotation, and f , flipping.

The subgroup

 H = hrf, f ri = 1, rf, r 2 , f r ∼ = C2 × C2

is isomorphic to the Klein 4-group – an identity and 3 elements of order 2. H E G since [G : H] = 2. Finally, take K = hrf i = {1, rf } E H. We claim that K 5 G. And indeed, f ◦ rf ◦ f = f r ∈ / K. Version: 4 Owner: ariels Author(s): ariels

305.27

normalizer

Let G be a group, and let H ⊆ G. The normalizer of H in G, written NG (H), is the set {g ∈ G | gHg −1 = H} 1270

If H is a subgroup of G, then NG (H) is a subgroup of G containing H. Note that H is a normal subgroup of NG (H); in fact, NG (H) is the largest subgroup of G of which H is a normal subgroup. In particular, if H is a normal subgroup of G, then NG (H) = G. Version: 6 Owner: saforres Author(s): saforres

305.28

order (of a group)

The order of a group G is the number of elements of G, denoted |G|; if |G| is finite, then G is said to be a finite group. The order of an element g ∈ G is the smallest positive integer n such that g n = e, where e is the identity element; if there is no such n, then g is said to be of infinite order. Version: 5 Owner: saforres Author(s): saforres

305.29

presentation of a group

A presentation of a group G is a description of G in terms of generators and relations. We say that the group is finitely presented, if it can be described in terms of a finite number of generators and a finite number of defining relations. A collection of group elements gi ∈ G, i ∈ I is said to generate G if every element of G can be specified as a product of the gi , and of their inverses. A relation is a word over the alphabet consisting of the generators gi and their inverses, with the property that it multiplies out to the identity in G. A set of relations rj , j ∈ J is said to be defining, if all relations in G can be given as a product of the rj , their inverses, and the G-conjugates of these. The standard notation for the presentation of a group is G = hgi | rj i, meaning that G is generated by generators gi , subject to relations rj . Equivalently, one has a short exact sequence of groups 1 → N → F [I] → G → 1, where F [I] denotes the free group generated by the gi , and where N is the smallest normal subgroup containing all the rj . By the Nielsen-Schreier theorem, the kernel N is itself a free group, and hence we assume without loss of generality that there are no relations among the relations. Example. The symmetric group on n elements 1, . . . , n admits the following finite presentation (Note: this presentation is not canonical. Other presentations are known.) As 1271

generators take i = 1, . . . , n − 1,

gi = (i, i + 1),

the transpositions of adjacent elements. As defining relations take (gi gj )ni,j = id,

i, j = 1, . . . n,

where ni,i = 1 ni,i+1 = 3 ni,j = 2,

i<j+1

This means that a finite symmetric group is a Coxeter group, Version: 11 Owner: rmilson Author(s): rmilson

305.30

proof of first isomorphism theorem

Let K denote ker f . K is a normal subgroup of G because, by the following calculation, gkg −1 ∈ K for all g ∈ G and k ∈ K (rules of homomorphism imply the first equality, definition of K for the second): f (gkg −1) = f (g)f (k)f (g)−1 = f (g)1H f (g)−1 = 1H Therefore, G/K is well defined. Define a group homomorphism θ : G/K → imf given by: θ(gK) = f (g) We argue that θ is an isomorphism. First, θ is well defined. Take two representative, g1 and g2 , of the same modulo class. By definition, g1 g2−1 is in K. Hence, f sends g1 g2−1 to 1 (all elements of K are sent by f to 1). Consequently, the next calculation is valid: f (g1)f (g2 )−1 = f (g1 g2−1) = 1 but this is the same as saying that f (g1 ) = f (g2 ). And we are done because the last equality indicate that θ(g1 K) is equal to θ(g2 K). Going backward the last argument, we get that θ is also an injection: If θ(g1 K) is equal to θ(g2 K) then f (g1 ) = f (g2 ) and hence g1 g2−1 ∈ K (exactly as in previous part) which implies an equality between g1 K and g2 K. Now, θ is a homomorphism. We need to show that θ(g1 K · g2 K) = θ(g1 K)θ(g2 K) and that θ((gK)−1 ) = (θ(gK))−1 . And indeed: θ(g1 K · g2 K) = θ(g1 g2 K) = f (g1 g2 ) = f (g1 )f (g2 ) = θ(g1 K)θ(g2 K) 1272

θ((gK)−1 ) = θ(g −1 K) = f (g −1 ) = (f (g))−1 = (θ(gK))−1 To conclude, θ is surjective. Take h to be an element of imf and g its pre-image. Since h = f (g) we have that h is also the image of of θ(gK). Version: 3 Owner: uriw Author(s): uriw

305.31

proof of second isomorphism theorem

First, we shall prove that HK is a subgroup of G: Since e ∈ H and e ∈ K, clearly e = e2 ∈ HK. Take h1 , h2 ∈ H, k1, k2 ∈ K. Clearly h1 k1 , h2 k2 ∈ HK. Further, −1 h1 k1 h2 k2 = h1 (h2 h−1 2 )k1 h2 k2 = h1 h2 (h2 k1 h2 )k2 −1 Since K is a normal subgroup of G and h2 ∈ G, then h−1 2 k1 h2 ∈ K. Therefore h1 h2 (h2 k1 h2 )k2 ∈ HK, so HK is closed under multiplication.

Also, (hk)−1 ∈ HK for h ∈ H, k ∈ K, since (hk)−1 = k −1 h−1 = h−1 hk −1 h−1 and hk −1 h−1 ∈ K since K is a normal subgroup of G. So HK is closed under inverses, and is thus a subgroup of G. Since HK is a subgroup of G, the normality of K in HK follows immediately from the normality of K in G. T Clearly H K is a subgroup of G, since it is the intersection of two subgroups of G.

Finally, define φ : H → HK/K by ϕ(h) = hK. We claim that φ is a surjective homomorphism from H to HK/K. Let h0 k0 K be some element of HK/K; since k0 ∈ K, then h0 k0 K = h0 K, and φ(h0 ) = h0 K. Now ker (φ) = {h ∈ H | φ(h) = K} = {h ∈ H | hK = K} and if hK = K, then we must have h ∈ K. So ker (φ) = {h ∈ H | h ∈ K} = H

\

K

T Thus, since φ(H) = HK/K and ker φ = H K, by the first isomorphism theorem we T T see that H K is normal in H and that there is a natural isomorphism between H/(H K) and HK/K. Version: 8 Owner: saforres Author(s): saforres 1273

305.32

proof that all cyclic groups are abelian

Following is a proof that all cyclic groups are abelian. Let G be a cyclic group and g be a generator of G. Let a, b ∈ G. Then there exist x, y ∈ Z such that a = g x and b = g y . Since ab = g x g y = g x+y = g y+x = g y g x = ba, it follows that G is abelian. Version: 2 Owner: Wkbj79 Author(s): Wkbj79

305.33

proof that all cyclic groups of the same order are isomorphic to each other

The following is a proof that all cyclic groups of the same order are isomorphic to each other. Let G be a cyclic group and g be a generator of G. Define ϕ : Z → G by ϕ(c) = g c . Since ϕ(a + b) = g a+b = g a g b = ϕ(a)ϕ(b), then ϕ is a group homomorphism. If h ∈ G, then there exists x ∈ Z such that h = g x . Since ϕ(x) = g x = h, then ϕ is surjective. ker ϕ = {c ∈ Z|ϕ(c) = eG } = {c ∈ Z|g c = eG } If G is infinite, then ker ϕ = {0}, and ϕ is injective. Hence, ϕ is a group isomorphism, and G∼ = Z. If G is finite, then let |G| = n. Thus, |g| = |hgi| = |G| = n. If g c = eG , then n divides c. Z ∼ Therefore, ker ϕ = nZ. By the first isomorphism theorem, G ∼ = nZ = Zn . Let H and K be cyclic groups of the same order. If H and K are infinite, then, by the above argument, H ∼ = Z and K ∼ = Z. If H and K are finite of order n, then, by the above ∼ argument, H = Zn and K ∼ = K. = Zn . In any case, it follows that H ∼ Version: 1 Owner: Wkbj79 Author(s): Wkbj79

305.34

proof that all subgroups of a cyclic group are cyclic

Following is a proof that all subgroups of a cyclic group are cyclic. Let G be a cyclic group and H ≤ G. If G is trivial, then H = G, and H is cyclic. If H is the trivial subgroup, then H = {eG } = heG i, and H is cyclic. Thus, for the remainder of the proof, it will be assumed that both G and H are nontrivial. 1274

Let g be a generator of G. Let n be the smallest positive integer such that g n ∈ H. Claim: H = hg n i Let a ∈ hg n i. Then there exists z ∈ Z with a = (g n )z . Since g n ∈ H, then (g n )z ∈ H. Thus, a ∈ H. Hence, hg n i ⊆ H. Let h ∈ H. Then h ∈ G. Let x ∈ Z with h = g x . By the division algorithm, there exist q, r ∈ Z with 0 ≤ r < n such that x = qn + r. Since h = g x = g qn+r = g qn g r = (g n )q g r , then g r = h(g n )−q . Since h, g n ∈ H, then g r ∈ H. By choice of n, r cannot be positive. Thus, r = 0. Therefore, h = (g n )q g 0 = (g n )q eG = (g n )q ∈ hg n i. Hence, H ⊆ hg n i. Since hg n i ⊆ H and H ⊆ hg n i, then H = hg n i. It follows that every subgroup of G is cyclic. Version: 3 Owner: Wkbj79 Author(s): Wkbj79

305.35

regular group action

Let G be a group action on a set X. The action is called regular if for any pair α, β ∈ X there exists exactly one g ∈ G such that g · α = β. (For a right group action it is defined correspondingly.) Version: 3 Owner: Thomas Heye Author(s): Thomas Heye

305.36

second isomorphism theorem

Let (G, ∗) be a group. Let H be a subgroup of G and let K be a normal subgroup of G. Then • HK := {h ∗ k | h ∈ H, k ∈ K} is a subgroup of G, • K is a normal subgroup of HK, T • H K is a normal subgroup of H,

• There is a natural group isomorphism H/(H

T

K) = HK/K.

The same statement also holds in the category of modules over a fixed ring (where normality is neither needed nor relevant), and indeed can be formulated so as to hold in any abelian category. Version: 4 Owner: djao Author(s): djao

1275

305.37

simple group

Let G be a group. G is said to be simple if the only normal subgroups of G are {1} and G itself. Version: 3 Owner: Evandar Author(s): Evandar

305.38

solvable group

A group G is solvable if it has a composition series G = G0 ⊃ G1 ⊃ · · · ⊃ Gn = {1} where all the quotient groups Gi /Gi+1 are abelian. Version: 4 Owner: djao Author(s): djao

305.39

subgroup

Definition: Let (G, ∗) be a group and let K be subset of G. Then K is a subgroup of G defined under the same operation if K is a group by itself (respect to ∗), that is: • K is closed under the ∗ operation. • There exists an identity element e ∈ K such that for all k ∈ K, k ∗ e = k = e ∗ k. • Let k ∈ K then there exists an inverse k −1 ∈ K such that k −1 ∗ k = e = k ∗ k −1 . The subgroup is denoted likewise (K, ∗). We denote K being a subgroup of G by writing K 6 G. properties: • The set {e} whose only element is the identity is a subgroup of any group. It is called the trivial subgroup. • Every group is a subgroup of itself. • The null set {} is never a subgroup (since the definition of group states that the set must be non-empty). 1276

There is a very useful theorem that allows proving a given subset is a subgroup. Theorem: If K is a nonempty subset of of the group G. Then K is a subgroup of G if and only if s, t ∈ K implies that st−1 ∈ K. Proof: First we need to show if K is a subgroup of G then st−1 ∈ K. Since s, t ∈ K then st−1 ∈ K, because K is a group by itself. Now, suppose that if for any s, t ∈ K ⊆ G we have st−1 ∈ K. We want to show that K is a subgroup, which we will acomplish by proving it holds the group axioms. Since tt−1 ∈ K by hypothesis, we conclude that the identity element is in K: e ∈ K. (Existence of identity) Now that we know e ∈ K, for all t in K we have that et−1 = t−1 ∈ K so the inverses of elements in K are also in K. (Existence of inverses). Let s, t ∈ K. Then we know that t−1 ∈ K by last step. Applying hypothesis shows that s(t−1 )−1 = st ∈ K

so K is closed under the operation. QED Example: • Consider the group (Z, +). Show that(2Z, +) is a subgroup.

The subgroup is closed under addition since the sum of even integers is even.

The identity 0 of Z is also on 2Z since 2 divides 0. For every k ∈ 2Z there is an −k ∈ 2Z which is the inverse under addition and satisfies −k + k = 0 = k(−k). Therefore (2Z, +) is a subgroup of (Z, +). Another way to show (2Z, +) is a subgroup is by using the proposition stated above. If s, t ∈ 2Z then s, t are even numbers and s −t ∈ 2Z since the difference of even numbers is always an even number. See also: • Wikipedia, subgroup Version: 7 Owner: Daume Author(s): Daume

305.40

third isomorphism theorem

If G is a group (or ring, or module) and H ⊂ K are normal subgroups (or ideals, or submodules), with H normal (or an ideal, or a submodule) in K then there is a natural isomorphism 1277

(G/H)/(K/H) ≈ G/K. I think it is not uncommon to see the third and second isomorphism theorems permuted. Version: 2 Owner: nerdy2 Author(s): nerdy2

1278

Chapter 306 20A99 – Miscellaneous 306.1

Cayley table

A Cayley table for a group is essentially the “multiplication table” of the group.1 The columns and rows of the table (or matrix) are labeled with the elements of the group, and the cells represent the result of applying the group operation to the row-th and column-th elements. Formally, Let G be our group, with operation ◦ the group operation. Let C be the Cayley table for the group, with C(i, j) denoting the element at row i and column j. Then C(i, j) = ei ◦ ej where ei is the ith element of the group, and ej is the jth element. Note that for an abelian group, we have ei ◦ ej = ej ◦ ei , hence the Cayley table is a symmetric matrix. All Cayley tables for isomorphic groups are isomorphic (that is, the same, invariant of the labeling and ordering of group elements).

306.1.1

Examples.

• The Cayley table for Z4 , the group of integers modulo 4 (under addition), would be 1

A caveat to novices in group theory: multiplication is usually used notationally to represent the group operation, but the operation needn’t resemble multiplication in the reals. Hence, you should take “multiplication table” with a grain or two of salt.

1279



[0] [0] [1] [2] [3]

 [0]   [1]   [2] [3]

[1] [1] [2] [3] [0]

[2] [2] [3] [0] [1]

[3] [3] [0] [1] [2]

     

• The Cayley table for S3 , the permutation group of order 3, is  (1) (123) (132) (12) (13) (23)  (1) (1) (123) (132) (12) (13) (23)   23) (123) (132) (1) (13) (23) (12)   (132) (132) (1) (123) (23) (12) (13)   2) (12) (23) (13) (1) (132) (123)   (13) (13) (12) (23) (123) (1) (132) (23) (23) (13) (12) (132) (123) (1)

         

Version: 6 Owner: akrowne Author(s): akrowne

306.2

proper subgroup

A group H is a proper subgroup of a group G if and only if H is a subgroup of G and H 6= G.

(306.2.1)

Similarly a group H is an improper subgroup of a group G if and only if H is a subgroup of G and H = G. (306.2.2) Version: 2 Owner: imran Author(s): imran

306.3

quaternion group

The quaternion group, or quaternionic group, is a noncommutative group with eight elements. It is traditionally denoted by Q (not to be confused with Q) or by Q8 . This group is defined by the presentation {i, j; i4 , i2 j 2 , iji−1 j −1 } or, equivalently, defined by the multiplication table

1280

· 1 i j k −i −j −k −1

1 1 i j k −i −j −k −1

i j k −i i j k −i −1 k −j 1 −k −1 i k j −i −1 −j 1 −k j −1 k 1 −i −k −j i 1 j −i −j −k i

−j −j −k 1 i k −1 −i j

−k −1 −k −1 j −i −i −j 1 −k −j i i j −1 k k 1

where we have put each product xy into row x and column y. The minus signs are justified by the fact that {1, −1} is subgroup contained in the center of Q. Every subgroup of Q is normal and, except for the trivial subgroup {1}, contains {1, −1}. The dihedral group D4 (the group of symmetries of a square) is the only other noncommutative group of order 8. Since i2 = j 2 = k 2 = −1, the elements i, j, and k are known as the imaginary units, by analogy with i ∈ C. Any pair of the imaginary units generate the group. Better, given x, y ∈ {i, j, k}, any element of Q is expressible in the form xm y n . Q is identified with the group of units (invertible elements) of the ring of quaternions over Z. That ring is not identical to the group ring Z[Q], which has dimension 8 (not 4) over Z. Likewise the usual quaternion algebra is not quite the same thing as the group algebra R[Q]. Quaternions were known to Gauss in 1819 or 1820, but he did not publicize this discovery, and quaternions weren’t rediscovered until 1843, with Hamilton. For an excellent account of this famous Story, see http://math.ucr.edu/home/baez/Octonions/node1.html. Version: 6 Owner: vernondalhart Author(s): vernondalhart, Larry Hammick, patrickwonders

1281

Chapter 307 20B05 – General theory for finite groups 307.1

cycle notation

The cycle notation is a useful convention for writing down a permutations in terms of its constituent cycles. Let S be a finite set, and a1 , . . . , ak ,

k>2

distinct elements of S. The expression (a1 , . . . , ak ) denotes the cycle whose action is a1 7→ a2 7→ a3 . . . ak 7→ a1 . Note there are k different expressions for the same cycle; the following all represent the same cycle: (a1 , a2 , a3 , . . . , ak ) = (a2 , a3 , . . . , ak , a1 ), = . . . = (ak , a1 , a2 , . . . , ak−1 ). Also note that a 1-element cycle is the same thing as the identity permutation, and thus there is not much point in writing down such things. Rather, it is customary to express the identity permutation simply as (). Let π be a permutation of S, and let S1 , . . . , Sk ⊂ S,

k∈N

be the orbits of π with more than 1 element. For each j = 1, . . . , k let nj denote the cardinality of Sj . Also, choose an a1,j ∈ Sj , and define ai+1,j = π(ai,j ),

i ∈ N.

We can now express π as a product of disjoint cycles, namely π = (a1,1 , . . . an1 ,1 )(a2,1 , . . . , an2 ,2 ) . . . (ak,1 , . . . , ank ,k ). 1282

By way of illustration, here are the 24 elements of the symmetric group on {1, 2, 3, 4} expressed using the cycle notation, and grouped according to their conjugacy classes: (), (12), (13), (14), (23), (24), (34) (123), (213), (124), (214), (134), (143), (234), (243) (12)(34), (13)(24), (14)(23) (1234), (1243), (1324), (1342), (1423), (1432) Version: 1 Owner: rmilson Author(s): rmilson

307.2

permutation group

A permutation group is a pair (G, X) where G is an abstract group, and X is a set on which G acts faithfully. Alternatively, this can be thought of as a group G equipped with a homomorphism in to Sym(X), the symmetric group on X. Version: 2 Owner: bwebste Author(s): bwebste

1283

Chapter 308 20B15 – Primitive groups 308.1

primitive transitive permutation group

1: A finite set 2: G transitive permutation group on A 3: ∀B ⊂ A block or B = 1

example 1: S4 is a primitive transitive permutation group on {1, 2, 3, 4}

counterexample 1: D8 is not a primitive transitive permutation group on the vertices of a square

stabilizer maximal necessary and sufficient for primitivity 1: A finite set 2: G transitive permutation group on A 3: G primitive ⇔ ∀a ∈ A : H 6 G & H ⊃ StabG (a) ⇒ H = G or H = StabG (a) 1284

Note: This was a “seed” entry written using a short-hand format described in this FAQ. Version: 4 Owner: Thomas Heye Author(s): yark, apmxi

1285

Chapter 309 20B20 – Multiply transitive finite groups 309.1

Jordan’s theorem (multiply transitive groups)

Let G be a sharply n-transitive permutation group, with n 6 4. Then 1. G is similar to Sn with the standard action or 2. n = 4 and G is similar to M11 , the Mathieu group of degree 10 or 3. n = 5 and G is similar to M12 , the Mathieu group of degree 11. Version: 1 Owner: bwebste Author(s): bwebste

309.2

multiply transitive

Let G be a group, X a set on which it acts. Let X (n) be the set of order n-tuples of distinct elements of X. This is a G-set by the diagonal action: g · (x1 , . . . , xn ) = (g · x1 , . . . , g · xn ) The action of G on X is said to be n-transitive if it acts transitively on X (n) . For example, the standard action of S n , the symmetric group, is n-transitive, and the standard action of An , the alternating group, is (n − 2)-transitive. Version: 2 Owner: bwebste Author(s): bwebste 1286

309.3

sharply multiply transitive

Let G be a group, and X a set that G acts on, and let X (n) be the set of order n-tuples of distinct elements of X. Then the action of G on X is sharply n-transitive if G acts regularly on X (n) . Version: 1 Owner: bwebste Author(s): bwebste

1287

Chapter 310 20B25 – Finite automorphism groups of algebraic, geometric, or combinatorial structures 310.1

diamond theory

Diamond theory is the theory of affine groups over GF (2) acting on small square and cubic arrays. In the simplest case, the symmetric group of order 4 acts on a two-colored Diamond figure like that in Plato’s Meno dialogue, yielding 24 distinct patterns, each of which has some ordinary or color-interchange symmetry. This can be generalized to (at least) a group of order approximately 1.3 trillion acting on a 4x4x4 array of cubes, with each of the resulting patterns still having nontrivial symmetry. The theory has applications to finite geometry and to the construction of the large Witt design underlying the Mathieu group of degree 24.

Further Reading • ”Diamond Theory,” http://m759.freeservers.com/ Version: 4 Owner: m759 Author(s): akrowne, m759

1288

Chapter 311 20B30 – Symmetric groups 311.1

symmetric group

Let X be a set. Let S(X) be the set of permutations of X (i.e. the set of bijective functions on X). Then the act of taking the composition of two permutations induces a group structure on S(X). We call this group the symmetric group and it is often denoted Sym(X). Version: 5 Owner: bwebste Author(s): bwebste, antizeus

311.2

symmetric group

Let X be a set. Let S(X) be the set of permutations of X (i.e. the set of bijective functions on X). Then the act of taking the composition of two permutations induces a group structure on S(X). We call this group the symmetric group and it is often denoted Sym(X). When X has a finite number n of elements, we often refer to the symmetric group as Sn , and describe the elements by using cycle notation. Version: 2 Owner: antizeus Author(s): antizeus

1289

Chapter 312 20B35 – Subgroups of symmetric groups 312.1

Cayley’s theorem

Let G be a group, then G is isomorphic to a subgroup of the permutation group SG If G is finite and of order n, then G is isomorphic to a subgroup of the permutation group Sn Furthermore, suppose H is a proper subgroup of G. Let X = {Hg|g ∈ G} be the set of right cosets in G. The map θ : G → SX given by θ(x)(Hg) = Hgx is a homomorphism. The kernel is the largest normal subgroup of H. We note that |SX | = [G : H]!. Consequently if |G| doesn’t divide [G : H]! then θ is not an isomorphism so H contains a non-trivial normal subgroup, namely the kernel of θ. Version: 4 Owner: vitriol Author(s): vitriol

1290

Chapter 313 20B99 – Miscellaneous 313.1

(p, q) shuffle

Definition. Let p and q be positive natural numbers. Further, let S(k) be the set of permutations of the numbers {1, . . . , k}. A permutation τ ∈ S(p + q) is a (p, q) shuffle if τ (1) < · · · < τ (p), τ (p + 1) < · · · < τ (p + q).

The set of all (p, q) shuffles is denoted by S(p, q).

It is clear that S(p, q) ⊂ S(p + q). Since a (p, q) shuffle is completely determined by how  p+q the p first elements are mapped, the cardinality of S(p, q) is p . The wedge product of a p-form and a q-form can be defined as a sum over (p, q) shuffles. Version: 3 Owner: matte Author(s): matte

313.2

Frobenius group

A permutation group G on a set X is Frobenius if no non-trivial element of G fixes more than one element of X. Generally, one also makes the restriction that at least one non-trivial element fix a point. In this case the Frobenius group is called non-regular. The stabilizer of any point in X is called a Frobenius complement, and has the remarkable property that it is distinct from any conjugate by an element not in the subgroup. Conversely, if any finite group G has such a subgroup, then the action on cosets of that subgroup makes G into a Frobenius group. Version: 2 Owner: bwebste Author(s): bwebste 1291

313.3

permutation

A permutation of a set {a1 , a2 , . . . , an } is an arrangement of its elements. For example, if S = {ABC} then ABC, CAB , CBA are three different permutations of S. The number of permutations of a set with n elements is n!. A permutation can also be seen as a bijective function of a set into itself. For example, the permutation CAB could be seen a function that assigns: f (A) = C,

f (B) = A,

f (C) = B.

In fact, every bijection of a set into itself gives a permutation, and any permutation gives rise to a bijective function. Therefore, we can say that there are n! bijective fucntion from a set with n elements into itself. Using the function approach, it can be proved that any permutation can be expressed as a composition of disjoint cycles and also as composition of (not necessarily disjoint) transpositions. Moreover, if σ = τ1 τ2 · · · τm = ρ1 ρ2 · · · ρn are two factorization of a permutation σ into transpositions, then m and n must be both even or both odd. So we can label permutations as even or odd depending on the number of transpositions for any decomposition. Permutations (as functions) form a non-abelian group with function composition as binary operation called symmetric group of order n. The subset of even permutations becomes a subgroup called the alternating group of order n. Version: 3 Owner: drini Author(s): drini

313.4

proof of Cayley’s theorem

Let G be a group, and let SG be the permutation group of the underlying set G. For each g ∈ G, define ρg : G → G by ρg (h) = gh. Then ρg is invertible with inverse ρg−1 , and so is a permutation of the set G. Define Φ : G → SG by Φ(g) = ρg . Then Φ is a homomorphism, since (Φ(gh))(x) = ρgh (x) = ghx = ρg (hx) = (ρg ◦ ρh )(x) = ((Φ(g))(Φ(h)))(x) And Φ is injective, since if Φ(g) = Φ(h) then ρg = ρh , so gx = hx for all x ∈ X, and so g = h as required. 1292

So Φ is an embedding of G into its own permutation group. If G is finite of order n, then simply numbering the elements of G gives an embedding from G to Sn . Version: 2 Owner: Evandar Author(s): Evandar

1293

Chapter 314 20C05 – Group rings of finite groups and their modules 314.1

group ring

For any group G, the group ring Z[G] is defined to be the ring whose additive group is the abelian group of formal integer linear combinations of elements of G, and whose multiplication operation is defined by multiplication in G, extended Z–linearly to Z[G]. More generally, for any ring R, the group ring of G over R is the ring R[G] whose additive group is the abelian group of formal R–linear combinations of elements of G, i.e.: ( n ) X R[G] := ri gi ri ∈ R, gi ∈ G , i=1

and whose multiplication operation is defined by R–linearly extending the group multiplication operation of G. In the case where K is a field, the group ring K[G] is usually called a group algebra. Version: 4 Owner: djao Author(s): djao

1294

Chapter 315 20C15 – Ordinary representations and characters 315.1

Maschke’s theorem

Let G be a finite group, and k a field of characteristic not dividing |G|. Then any representation V of G over k is completely reducible. e need only show that any subrepresentation has a compliment, and the result follows by induction. W

Let V be a representation of G and W a subrepresentation. Let π : V → W be an arbitrary projection, and let 1 X −1 π 0 (v) = g π(gv) |G| g∈G

This map is obviously G-equivariant, and is the identity on W , and its image is contained in W , since W is invariant under G. Thus it is an equivariant projection to W , and its kernel is a compliment to W . Version: 5 Owner: bwebste Author(s): bwebste

315.2

a representation which is not completely reducible

If G is a finite group, and k is a field whose characteristic does divide the order of the group, then Maschke’s theorem fails. For example let V be the regular representation of G, which can be thought of as functions from G to k, with the G action g · ϕ(g 0) = ϕ(g −1 g 0). Then this representation is not completely reducible. 1295

There is an obvious trivial subrepresentation W of V , consisting of the constant functions. I claim that there is no complementary invariant subspace to this one. If W 0 is such a subspace, then there is a homomorphism ϕ : V → V /W 0 ∼ = k. Now consider the characteristic function of the identity e ∈ G ( 1 g=e δe (g) = 0 g 6= e and ` = ϕ(δe ) in V /W 0. This is not zero since δ generates the representation V . By Gequivarience, ϕ(δg ) = ` for all g ∈ G. Since X η= η(g)δg g∈G

for all η ∈ V ,

X

0

W = ϕ(η) = `

g∈G

Thus, ker ϕ = {η ∈ V |

X

!

η(g) .

η(g) = 0}.

∈G

But since the characteristic of the field k divides the order of G, W 6 W 0 , and thus could not possibly be complimentary to it. For example, if G = C2 = {e, f } then the invariant subspace of V is spanned by e + f . For characteristics other than 2, e − f spans a complimentary subspace, but over characteristic 2, these elements are the same. Version: 1 Owner: bwebste Author(s): bwebste

315.3

orthogonality relations

First orthogonality relations: Let χ1 , χ2 be characters of representations V1 , V2 of a finite group G over a field k of characteristic 0. Then

(χ1 , χ2 ) =

1 X χ1 (g)χ2 (g) = dim(HomV1 V2 ). |G| g∈G

irst of all, consider the special case where V = k with the trivial action of the group. Then HomG (k, V2 ) ∼ = V2G , the fixed points. On the other hand, consider the map F

φ=

1 X g : V2 → V2 |G| g∈G 1296

(with the sum in End(V2 )). Clearly, the image of this map is contained in V2G , and it is the identity restricted to V2G . Thus, it is a projection with image V2G . Now, the rank of a projection (over a field of characteristic 0) is its trace. Thus, dimk HomG (k, V2 ) = dim V2G = tr(φ) = which is exactly the orthogonality formula for V1 = k.

1 X χ2 (g) |G|

Now, in general, Hom(V1 , V2 ) ∼ = V1∗ ⊗V2 is a representation, and HomG (V1 , v2 ) = (Hom(V1 , V2 ))G . Since χV1∗ ⊗V2 = χ1 χ2 , dimk HomG (V1 , V2 ) = dimk (Hom(V1 , V2 ))G =

X

χ1 χ2

g∈G

which is exactly the relation we desired. In particular, if V1 , V2 irreducible, by Schur’s lemma ( D V1 ∼ = V2 HomV1 V2 = 0 V1  V2

where D is a division algebra. In particular, non-isomorphic irreducible representations have orthogonal characters. Thus, for any representation V , the multiplicities ni in the unique decomposition of V into the direct sum of irreducibles V ∼ = V1⊕n1 ⊕ · · · ⊕ Vm⊕nm where Vi ranges over irreducible representations of G over k, can be determined in terms of the character inner product:

ni =

(ψ, χi ) (χi , χi )

where ψ is the character of V and χi the character of Vi . In particular, representations over a field of characteristic zero are determined by their character. Note: This is not true over fields of positive characteristic. If the field k is algebraically closed, the only finite division algebra over k is k itself, so the characters of irreducible representations form an orthonormal basis for the vector space of class functions with respect to this inner product. Since (χi , χi ) = 1 for all irreducibles, the multiplicity formula above reduces to ni = (ψ, χi ).

1297

Second orthogonality relations: We assume now that k is algebraically closed. Let g, g 0 be elements of a finite group G. Then ( X |CG (g1 )| g ∼ g 0 χ(g)χ(g 0) = 0 g  g0 χ where the sum is over the characters of irreducible representations, and CG (g) is the centralizer of g. L et χ1 , . . . , χn be the characters of the irreducible representations, and let g1 , . . . , gn be representatives of the conjugacy classes. p Let A be the matrix whose ijth entry is |G : CG (gj )|(χi (gj )). By first orthogonality, AA∗ = |G|I (here ∗ denotes conjugate transpose), where I is the identity matrix. Since left inverses are right inverses, A∗ A = |G|I. Thus,

p

|G : CG (gi )||G : CG (gk )|

n X j=1

χj (gi )χj (gk ) = |G|δij .

Replacing gi or gk with any conjuagate will not P change the expression above. thus, if our two elements are not conjugate, we obtain that χ χ(g)χ(g 0) = 0. On the other hand, if g ∼ g 0 , then i = k in the sum above, which reduced to the expression we desired. P A special case of this result, applied to 1 is that |G| = χ χ(1)2 , that is, the sum of the squares of the dimensions of the irreducible representations of any finite group is the order of the group. Version: 8 Owner: bwebste Author(s): bwebste

1298

Chapter 316 20C30 – Representations of finite symmetric groups 316.1

example of immanent

If χ = 1 we obtain the permanent. If χ = sgn we obtain the determinant. Version: 1 Owner: gholmes74 Author(s): gholmes74

316.2

immanent

Let χ : Sn → C be a complex character. For any n × n matrix A define Immχ (A) =

X

χ(σ)

σ∈Sn

n Y

A(j, σj)

j=1

functions obtained in this way are called immanents. Version: 4 Owner: gholmes74 Author(s): gholmes74

316.3

permanent

The permanent of an n × n matrix A over C is the number per(A) =

n XY

σ∈Sn j=1

1299

A(j, σj)

Version: 2 Owner: gholmes74 Author(s): gholmes74

1300

Chapter 317 20C99 – Miscellaneous 317.1

Frobenius reciprocity

Let V be a finite-dimensional representation of a finite group G, and let W be a representation of a subgroup H ⊂ G. Then the characters of V and W satisfy the inner product relation (χInd(W ) , χV ) = (χW , χRes(V ) ) where Ind and Res denote the induced representation IndG H and the restriction representation ResG . H The Frobenius reciprocity theorem is often given in the stronger form which states that Res and Ind are adjoint functors between the category of G–modules and the category of H–modules: HomH (W, Res(V )) = HomG (Ind(W ), V ), or, equivalently V ⊗ Ind(W ) = Ind(Res(V ) ⊗ W ). Version: 4 Owner: djao Author(s): rmilson, djao

317.2

Schur’s lemma

Schur’s lemma in representation theory is an almost trivial observation for irreducible modules, but deserves respect because of its profound applications and implications. Lemma 5 (Schur’s lemma). Let G be a finite group represented on irreducible G-modules V and W . Any G-module homomorphism f : V → W is either invertible or the zero map. 1301

he only insight here is that both ker f and im f are G-submodules of V and W respectively. This is routine. However, because V is irreducible, ker f is either trivial or all of V . In the former case, im f is all of W , also because W is irreducible, so f is invertible. In the latter case, f is the zero map. T

The following corollary is a very useful form of Schur’s lemma, in case that our representations are over an algebraically closed field. Corollary 1. If G is represented over an algebraically closed field F on irreducible G-modules V and W , then any G-module homomorphism f : V → W is a scalar. T he insight in this case is to consider the modules V and W as vector spaces over F . Notice then that the homomorphism f is a linear transformation and therefore has an eigenvalue λ in our algebraically closed F . Hence, f −λ1 is not invertible. By Schur’s lemma, f −λ1 = 0. In other words, f = λ, a scalar.

Version: 14 Owner: rmilson Author(s): rmilson, NeuRet

317.3

character

Let ρ : G −→ GL(V ) be a finite dimensional representation of a group G (i.e., V is a finite dimensional vector space over its scalar field K). The character of ρ is the function χV : G −→ K defined by χV (g) := Tr(ρ(g)) where Tr is the trace function. Properties: • χV (g) = χV (h) if g is conjugate to h in G. (Equivalently, a character is a class function on G.) • If G is finite, the characters of the irreducible representations of G over the complex numbers form a basis of the vector space of all class functions on G (with pointwise addition and scalar multiplication). • Over the complex numbers, the characters of the irreducible representations of G are orthonormal under the inner product (χ1 , χ2 ) :=

1 X χ1 (g)χ2 (g) |G| g∈G

Version: 4 Owner: djao Author(s): djao 1302

317.4

group representation

Let G be a group, and let V be a vector space. A representation of G in V is a group homomorphism ρ : G −→ GL(V ) from G to the general linear group GL(V ) of invertible linear transformations of V . Equivalently, a representation of G is a vector space V which is a (left) module over the group ring Z[G]. The equivalence is achieved by assigning to each homomorphism ρ : G −→ GL(V ) the module structure whose scalar multiplication is defined by g · v := (ρ(g))(v), and extending linearly.

Special kinds of representations (preserving all notation from above) A representation is faithful if either of the following equivalent conditions is satisfied: • ρ : G −→ GL(V ) is injective • V is a faithful left Z[G]–module A subrepresentation of V is a subspace W of V which is a left Z[G]–submodule of V ; or, equivalently, a subspace W of V with the property that (ρ(g))(w) ∈ W for all w ∈ W. A representation V is called irreducible if it has no subrepresentations other than itself and the zero module. Version: 2 Owner: djao Author(s): djao

317.5

induced representation

Let G be a group, H ⊂ G a subgroup, and V a representation of H, considered as a Z[H]– module. The induced representation of ρ on G, denoted IndG H (V ), is the Z[G]–module whose underlying vector space is the direct sum M σV σ∈G/H

of formal translates of V by left cosets σ in G/H, and whose multiplication operation is defined by choosing a set {gσ }σ∈G/H of coset representatives and setting g(σv) := τ (hv) 1303

where τ is the unique left coset of G/H containing g · gσ (i.e., such that g · gσ = gτ · h for some h ∈ H). One easily verifies that the representation IndG H (V ) is independent of the choice of coset representatives {gσ }. Version: 1 Owner: djao Author(s): djao

317.6

regular representation

Given a group G, the regular representation of G over a field K is the representation ρ : G −→ GL( K G ) whose underlying vector space K G is the K–vector space of formal linear combinations of elements of G, defined by ! n n X X kigi := ki (ggi) ρ(g) i=1

i=1

for ki ∈ K, g, gi ∈ G. Equivalently, the regular representation is the induced representation on G of the trivial representation on the subgroup {1} of G. Version: 2 Owner: djao Author(s): djao

317.7

restriction representation

Let ρ : G −→ GL( V ) be a representation on a group G. The restriction representation of ρ to a subgroup H of G, denoted ResG H (V ), is the representation ρ|H : H −→ GL( V ) obtained by restricting the function ρ to the subset H ⊂ G. Version: 1 Owner: djao Author(s): djao

1304

Chapter 318 20D05 – Classification of simple and nonsolvable groups 318.1

Burnside p − q theorem

If a finite group G is not solvable, the order of G is divisible by at least 3 distinct primes. Alternatively, any groups whose order is divisible by only two distinct primes is solvable (these two distinct primes are the p and q of the title). Version: 2 Owner: bwebste Author(s): bwebste

318.2

classification of semisimple groups

For every semisimple group G there is a normal subgroup H of G, (called the centerless competely reducible radical) which isomorphic to a direct product of nonabelian simple groups such that conjugation on H gives an injection into Aut(H). Thus G is isomorphic to a subgroup of Aut(H) containing the inner automorphisms, and for every group H isomorphic to a direct product of non-abelian simple groups, every such subgroup is semisimple. Version: 1 Owner: bwebste Author(s): bwebste

318.3

semisimple group

A group G is called semisimple if it has no proper normal solvable subgroups. Every group is an extension of a semisimple group by a solvable one.

1305

Version: 1 Owner: bwebste Author(s): bwebste

1306

Chapter 319 20D08 – Simple groups: sporadic groups 319.1

Janko groups

The Janko groups denoted by J1 , J2 , J3 , and J4 are four of the 26 sporadic groups. They were discovered by Z. Janko in 1966 and published in the article ”A new finite simple group with abelan Sylow subgroups and its characterization.” (Journal of algebra, 1966, 32: 147-186). Each of these groups have very intricate matrix representations as maps into large general linear groups. For example, the matrix K corresponding to J4 gives a representation of J4 in GL112 (2). Version: 7 Owner: mathcam Author(s): mathcam, Thomas Heye

1307

Chapter 320 20D10 – Solvable groups, theory of formations, Schunck classes, Fitting classes, π-length, ranks 320.1

ˇ Cuhinin’s Theorem

Let G be a finite, π-separable group, for some set π of primes. Then if H is a maximal π-subgroup of G, the index of H in G, |G : H|, is coprime to all elements of π and all such subgroups are conjugate. Such a subgroup is called a Hall π-subgroup. For π = {p}, this essentially reduces to the Sylow theorems (with unnecessary hypotheses). If G is solvable, it is π-separable for all π, so such subgroups exist for all π. This result is often called Hall’s theorem. Version: 4 Owner: bwebste Author(s): bwebste

320.2

separable

Let π be a set of primes. A finite group G is called π-separable if there exists a composition series {1} = G0  · · ·  Gn = G such that Gi+1 /Gi is a π-group, or a π 0 -group. π-separability can be thought of as a generalization of solvability; a group is π-separable for all sets of primes if and only it is solvable. Version: 3 Owner: bwebste Author(s): bwebste

1308

320.3

supersolvable group

A group G is supersolvable if it has a finite normal series G = G0  G1  · · ·  Gn = 1 with the property that each factor group Gi−1 /Gi is cyclic. A supersolvable group is solvable. Finitely generated nilpotent groups are supersolvable. Version: 1 Owner: mclase Author(s): mclase

1309

Chapter 321 20D15 – Nilpotent groups, p-groups 321.1

Burnside basis theorem

If G is a p-group, then Frat G = G0 Gp , where Frat G is the Frattini subgroup, G0 the commutator subgroup, and Gp is the subgroup generated by p-th powers. Version: 1 Owner: bwebste Author(s): bwebste

1310

Chapter 322 20D20 – Sylow subgroups, Sylow properties, π-groups, π-structure 322.1

π-groups and π 0 -groups

Let π be a set of primes. A finite group G is called a π-group if all the primes dividing |G| are elements of π, and a π 0 -group if none of them are. Typically, if π is a singleton π = {p}, we write p-group and p0 -group for these. Version: 2 Owner: bwebste Author(s): bwebste

322.2

p-subgroup

Let G be a finite group with order n, and let p be a prime integer. We can write n = pk m for some k, m integers, such that k and m are coprimes (that is, pk is the highest power of p that divides n). Any subgroup of G whose order is pk is called a Sylow p-subgroup or simply p-subgroup. While there is no reason for p-subgroups to exist for any finite group, the fact is that all groups have p-subgroups for every prime p that divides |G|. This statement is the First Sylow theorem When |G| = pk we simply say that G is a p-group. Version: 2 Owner: drini Author(s): drini, apmxi

1311

322.3

Burnside normal complement theorem

Let G be a finite group, and S a Sylow subgroup such that CG (S) = NG (S). Then T S has a normal complement. That is, there exists a normal subgroup N  G such that S N = {1} and SN = G. Version: 1 Owner: bwebste Author(s): bwebste

322.4

Frattini argument

If H is a normal subgroup of a finite group G, and S is a Sylow subgroup of H, then G = HNG (S), where NG (S) is the normalizer of S in G. Version: 1 Owner: bwebste Author(s): bwebste

322.5

Sylow p-subgroup

If (G, ∗) is a group then any subgroup of order pa for any integer a is called a p-subgroup. If|G| = pa m, where p - m then any subgroup S of G with |S| = pa is a Sylow p-subgroup. We use Sylp (G) for the set of Sylow p-groups of G. Version: 3 Owner: Henry Author(s): Henry

322.6

Sylow theorems

Let G be a finite group whose order is divisible by the prime p. Suppose pm is the highest power of p which is a factor of |G| and set k = p|G| m • The group G contains at least one subgroup of order pm • Any two subgroups of G of order pm are conjugate • The number of subgroups of G of order pm is congruent to 1 modulo p and is a factor of k Version: 1 Owner: vitriol Author(s): vitriol

1312

322.7

Sylow’s first theorem

existence of subgroups of prime-power order 1: G finite group 2: p prime 3: pk divides |G| 4: ∃H 6 G : |H| = pk Note: This is a “seed” entry written using a short-hand format described in this FAQ. Version: 2 Owner: bwebste Author(s): yark, apmxi

322.8

Sylow’s third theorem

Let G finite group, and let n be the number of Sylow p-subgroups of G. Then n ⇔ 1 (mod p), and any two Sylow p-subgroups of G are conjugate to one another. Version: 8 Owner: bwebste Author(s): yark, apmxi

322.9

application of Sylow’s theorems to groups of order pq

We can use Sylow’s theorems to examine a group G of order pq, where p and q are primes and p < q. Let nq denote the number of Sylow q-subgroups of G. Then Sylow’s theorems tell us that nq is of the form 1 + kq for some integer k and nq divides pq. But p and q are prime and p < q, so this implies that nq = 1. So there is exactly one Sylow q-subgroup, which is therefore normal (indeed, characteristic) in G. T Denoting the Sylow q-subgroup by Q, and letting P be a Sylow p-subgroup, then Q P = {1} and QP = G, so G is a semidirect product of Q and P . In particular, if there is only one Sylow p-subgroup, then G is a direct product of Q and P , and is therefore cyclic. Version: 9 Owner: yark Author(s): yark, Manoj, Henry

1313

322.10

p-primary component

Definition 27. Let G be a finite abelian group and let p ∈ N be a prime. The p-primary component of G, Πp , is the subgroup of all elements whose order is a power of p. Note: The p-primary component of an abelian group G coincides with the unique Sylow p-subgroup of G. Version: 2 Owner: alozano Author(s): alozano

322.11

proof of Frattini argument

Let g ∈ G be any element. Since H is normal, gSg −1 ⊂ H. Since S is a Sylow subgroup of H, gSg −1 = hSh−1 for some h ∈ H, by Sylow’s theorems. Thus n = h−1 g normalizes S, and so g = hn for h ∈ H and n ∈ NG (S). Version: 1 Owner: bwebste Author(s): bwebste

322.12

proof of Sylow theorems

We let G be a group of order pm k where p - k and prove Sylow’s theorems. First, a fact which will be used several times in the proof: Proposition 8. If p divides the size of every conjugacy class outside the center then p divides the order of the center. Proof: This follows from this Centralizer: |G| = Z(G) +

X

a∈Z(G) /

|[a]|

If p divides the left hand side, and divides all but one entry on the right hand side, it must divide every entry on the right side of the equation, so p|Z(G). Proposition 9. G has a Sylow p-subgroup Proof: By induction on |G|. If |G| = 1 then there is no p which divides its order, so the condition is trivial. Suppose |G| = pm k, p - k, and the proposition holds for all groups of smaller order. Then we can consider whether p divides the order of the center, Z(G). 1314

If it does then, by Cauchy’s theorem, there is an element of Z(G) of order p, and therefore a cyclic subgroup generated by p, hpi, also of order p. Since this is a subgroup of the center, it is normal, so G/hpi is well-defined and of order pm−1 k. By the inductive hypothesis, this group has a subgroup P/hpi of order pm−1 . Then there is a corresponding subgroup P of G which has |P | = |P/hpi| · |N| = pm . On the other hand, if p - |Z(G)| then consider the conjugacy classes not in the center. By the proposition above, since Z(G) is not divisible by p, at least one conjugacy class can’t be. If a is a representative of this class then we have p - |[a]| = [G : C(a)], and since |C(a)| · [G : C(a)] = |G|, pm | |C(a)|. But C(a) 6= G, since a ∈ / Z(G), so C(a) has a subgroup m of order p , and this is also a subgroup of G. Proposition 10. The intersection of a Sylow p-subgroupTwith the normalizer of a Sylow T p-subgroup is the intersection of the subgroups. That is, Q NG (P ) = Q P .

T T Proof: If P and Q are Sylow p-subgroups, consider R = Q NG (P ). Obviously Q P ⊆ R. In addition, since R ⊆ NG (P ), the second isomorphism theorem tells us that RP is a group, |R|·|P T | . P is a subgroup of RP , so pm | |RP |. But |R| is a subgroup of Q and P and |RP | = |R P| m is a Sylow p-subgroup, so |R| · |P | is a multiple of p. Then it must T be that |RP | = p , and therefore P = RP , and so R ⊆ P . Obviously R ⊆ Q, so R ⊆ Q P . The following construction will be used in the remainder of the proof:

Given any Sylow p-subgroup P , consider the set of its conjugates C. Then X ∈ C ↔ X = xP x−1 = {xpx−1 |∀p ∈ P } for some x ∈ G. Observe that every X ∈ C is a Sylow p-subgroup (and we will show that the converse holds as well). We define a group action of a subset G on C by: g · X = g · xP x−1 = gxP x−1 g −1 = (gx)P (gx)−1 This is clearly a group action, so we can consider the orbits of P under it. Of course, if all G is used then there is only one orbit, so we restrict the action to a Sylow p-subgroup Q. Name the orbits O1 , . . . , Os , and let P1 , . . . , Ps be representatives of the corresponding orbits. By the orbit-stabilizer theorem, the size of an orbit isTthe index of the T stabilizer, and under T this action the stabilizer of any Pi is just NQ (Pi ) = Q NG (Pi ) = Q P , so |Oi | = [Q : Q Pi ]. T There are two easy T results on this construction. If Q = Pi then |Oi| = [Pi : Pi Pi ] = 1. If Q 6= Pi then [Q : Q Pi ] > 1, and since the index of any subgroup of Q divides Q, p | |Oi|. Proposition 11. The number of conjugates of any Sylow p-subgroup of G is congruent to 1 modulo p

In the construction above, let Q = P1 . Then |O1 | = 1 and p | |Oi| for i 6= 1. Since the number of conjugates of P is the sum of the number in each orbit, the number of conjugates is of the form 1 + k2 p + k3 p + · · · + ks p, which is obviously congruent to 1 modulo p. Proposition 12. Any two Sylow p-subgroups are conjugate 1315

Proof: Given a Sylow p-subgroup P and any other Sylow p-subgroup Q, consider again the construction given above. If Q is not conjugate to P then Q 6= Pi for every i, and therefore p | |Oi| for every orbit. But then the number of conjugates of P is divisible by p, contradicting the previous result. Therefore Q must be conjugate to P . Proposition 13. The number of subgroups of G of order pm is congruent to 1 modulo p and is a factor of k Proof: Since conjugates of a Sylow p-subgroup are precisely the Sylow p-subgroups, and since a Sylow p-subgroup has 1 modulo p conjugates, there are 1 modulo p Sylow p-subgroups. Since the number of conjugates is the index of the normalizer, it must be |G : NG (P )|. Since P is a subgroup of its normalizer, pm | NG (P ), and therefore |G : NG (P )| | k. Version: 3 Owner: Henry Author(s): Henry

322.13

subgroups containing the normalizers of Sylow subgroups normalize themselves

Let G be a finite group, and S a Sylow subgroup. Let M be a subgroup such that NG (S) ⊂ M. Then M = NG (M). y order considerations, S is a Sylow subgroup of M. Since M is normal in NG (M), by the Frattini argument, NG (M) = NG (S)M = M. B

Version: 3 Owner: bwebste Author(s): bwebste

1316

Chapter 323 20D25 – Special subgroups (Frattini, Fitting, etc.) 323.1

Fitting’s theorem

If G is a finite group and M and N are normal nilpotent subgroups, then MN is also a normal nilpotent subgroup. Thus, any finite group has a maximal normal nilpotent subgroup, called its Fitting subgroup. Version: 1 Owner: bwebste Author(s): bwebste

323.2

characteristically simple group

A group G is called characterisitically simple if its only characteristic subgroups are {1} and G. Any finite characteristically simple group is the direct product of several copies of isomorphic simple groups. Version: 3 Owner: bwebste Author(s): bwebste

323.3

the Frattini subgroup is nilpotent

The Frattini subgroup Frat G of any finite group G is nilpotent. et S be a Sylow p-subgroup of G. Then by the Frattini argument, (Frat G)NG (S) = G. Since the Frattini subgroup is formed of non-generators, NG (S) = G. Thus S is normal in L

1317

G, and thus in Frat G. Any subgroup whose Sylow subgroups are all normal is nilpotent. Version: 4 Owner: bwebste Author(s): bwebste

1318

Chapter 324 20D30 – Series and lattices of subgroups 324.1

maximal condition

A group is said to satisfy the maximal condition if every strictly ascending chain of subgroups G1 ⊂ G2 ⊂ G3 ⊂ · · · is finite. This is also called the ascending chain condition. A group satifies the maximal condition if and only if the group and all its subgroups are finitely generated. Similar properties are useful in other classes of algebraic structures: see for example the noetherian condition for rings and modules. Version: 2 Owner: mclase Author(s): mclase

324.2

minimal condition

A group is said to satisfy the minimal condition if every strictly descending chain of subgroups G1 ⊃ G2 ⊃ G3 ⊃ · · · is finite. This is also called the descending chain condition. 1319

A group which satisfies the minimal condition is necessarily periodic. For if it contained an element x of infinite order, then n

hxi ⊃ hx2 i ⊃ hx4 i ⊃ · · · ⊃ hx2 i ⊃ · · · is an infinite descending chain of subgroups. Similar properties are useful in other classes of algebraic structures: see for example the artinian condition for rings and modules. Version: 1 Owner: mclase Author(s): mclase

324.3

subnormal series

Let G be a group with a subgroup H, and let G = G0  G1  · · ·  Gn = H

(324.3.1)

be a series of subgroups with each Gi a normal subgroup of Gi−1 . Such a series is called a subnormal series or a subinvariant series. If in addition, each Gi is a normal subgroup of G, then the series is called a normal series. A subnormal series in which each Gi is a maximal normal subgroup of Gi−1 is called a composition series. A normal series in which Gi is a maximal normal subgroup of G contained in Gi−1 is called a principal series or a chief series. Note that a composition series need not end in the trivial group 1. One speaks of a composition series (1) as a composition series from G to H. But the term composition series for G generally means a compostion series from G to 1. Similar remarks apply to principal series. Version: 1 Owner: mclase Author(s): mclase

1320

Chapter 325 20D35 – Subnormal subgroups 325.1

subnormal subgroup

Let G be a group, and H a subgroup of G. Then H is subnormal if there exists a finite series H = H0 hdH1 hd · · · hdtHn = G with Hi a normal subgroup of Hi+1 . Version: 1 Owner: bwebste Author(s): bwebste

1321

Chapter 326 20D99 – Miscellaneous 326.1

Cauchy’s theorem

Let G be a finite group and let p be a prime dividing |G|. Then there is an element of G of order p. Version: 1 Owner: Evandar Author(s): Evandar

326.2

Lagrange’s theorem

Let G be a finite group and let H be a subgroup of G. Then the order of H divides the order of G. Version: 2 Owner: Evandar Author(s): Evandar

326.3

exponent

If G is a finite group, then the exponent of G, denoted exp G, is the smallest positive integer n such that, for every g ∈ G, g n = eG . Thus, for every group G, exp G divides G, and, for every g ∈ G, |g| divides exp G. The concept of exponent for finite groups is similar to that of characterisic for rings. If G is a finite abelian group, then there exists g ∈ G with |g| = exp G. As a result of the fundamental theorem of finite abelian groups, there exist a1 , . . . , an with ai dividing ai+1 for every integer i between 1 and n such that G ∼ = Za1 ⊕ · · · ⊕ Zan . Since, for every c ∈ G, 1322

can = eG , then exp G ≤ an . Since |(0, . . . , 0, 1)| = an , then exp G = an , and the result follows. Following are some examples of exponents of nonabelian groups. Since |(12)| = 2, |(123)| = 3, and |S3 | = 6, then exp S3 = 6. In Q8 = {1, −1, i, −i, j, −j, k, −k}, the ring of quaternions of order eight, since |i| = | − i| = |j| = | − j| = |k| = | − k| = 4 and 14 = (−1)4 = 1, then exp Q = 4. Since the order of a product of two disjoint transpositions is 2, the order of a three cycle is 3, and the only nonidentity elements of A4 are products of two disjoint transpositions and three cycles, then exp A4 = 6. Since |(123)| = 3 and |(1234)| = 4, then exp S4 ≥ 12. Since S4 is not abelian, then it is not cyclic, and thus contains no element of order 24. It follows that exp S4 = 12. Version: 5 Owner: Wkbj79 Author(s): Wkbj79

326.4

fully invariant subgroup

A subgroup H of a group G is fully invariant if f (H) ⊆ H for all endomorphisms f : G → G This is a stronger condition than being a characteristic subgroup. The derived subgroup is fully invariant. Version: 1 Owner: mclase Author(s): mclase

326.5

proof of Cauchy’s theorem

Let G be a finite group and p be a prime divisor of |G|. Consider the set X of all ordered strings (x1 , x2 , . . . , xp ) for which x1 x2 . . . xp = e. Note |X| = |G|p−1, i.e. a multiple of p. There is a natural group action of Zp on X. m ∈ Zp sends the string (x1 , x2 , . . . , xp ) to (xm+1 , . . . , xp , x1 , . . . , xm ). By orbit-stabilizer theorem each orbit contains exactly 1 or p strings. Since (e, e, . . . , e) has an orbit of cardinality 1, and the orbits partition X, the cardinality of which is divisible by p, there must exist at least one other string (x1 , x2 , . . . , xp ) which is left fixed by every element of Zp . i.e. x1 = x2 = . . . = xp and so there exists an element of order p as required. Version: 1 Owner: vitriol Author(s): vitriol

1323

326.6

proof of Lagrange’s theorem

We know that the cosets Hg form a partition of G (see the coset entry for proof of this.) Since G is finite, we know it can be completely decomposed into a finite number of cosets. Call this number n. We denote the ith coset by Hai and write G as G = Ha1

[

Ha2

[

···

[

Han

since each coset has |H| elements, we have |G| = |H| · n and so |H| divides |G|, which proves Lagrange’s theorem.  Version: 2 Owner: akrowne Author(s): akrowne

326.7

proof of the converse of Lagrange’s theorem for finite cyclic groups

Following is a proof that, if G is a finite cyclic group and n ∈ Z+ is a divisor of |G|, then G has a subgroup of order n. Let g be a generator of G. Then |g| = |hgi| = |G|. Let z ∈ Z such that nz = |G| = |g|. |g| Consider hg z i. Since g ∈ G, then g z ∈ G. Thus, hg z i ≤ G. Since |hg z i| = |g z | = GCD(z,|g|) = nz nz z = z = n, it follows that hg i is a subgroup of G of order n. GCD(z,nz) Version: 3 Owner: Wkbj79 Author(s): Wkbj79

326.8

proof that expG divides |G|

Following is a proof that exp G divides |G| for every finite group G. By the division algorithm, there exist q, r ∈ Z with 0 ≤ r < exp G such that |G| = q(exp G) + r. Let g ∈ G. Then eG = g |G| = g q(exp G)+r = g q(exp G) g r = (g exp G )q g r = (eG )q g r = eG g r = g r . Thus, for every g ∈ G, g r = eG . By the definition of exponent, r cannot be positive. Thus, r = 0. It follows that exp G divides |G|. Version: 4 Owner: Wkbj79 Author(s): Wkbj79 1324

326.9

proof that |g| divides expG

Following is a proof that, for every finite group G and for every g ∈ G, |g| divides exp G. By the division algorithm, there exist q, r ∈ Z with 0 ≤ r < |g| such that exp G = q|g| + r. Since eG = g exp G = g q|g|+r = (g |g| )q g r = (eG )q g r = eG g r = g r , then, by definition of the order of an element, r cannot be positive. Thus, r = 0. It follows that |g| divides exp G. Version: 2 Owner: Wkbj79 Author(s): Wkbj79

326.10

proof that every group of prime order is cyclic

Following is a proof that every group of prime order is cyclic. Let p be a prime and G be a group such that |G| = p. Then G contains more than one element. Let g ∈ G such that g 6= eG . Then hgi contains more than one element. Since hgi ≤ G, then by Lagrange’s theorem, |hgi| divides p. Since |hgi| > 1 and |hgi| divides a prime, then |hgi| = p = |G|. Hence, hgi = G. It follows that G is cyclic. Version: 3 Owner: Wkbj79 Author(s): Wkbj79

1325

Chapter 327 20E05 – Free nonabelian groups 327.1

Nielsen-Schreier theorem

Let G be a free group and H a subgroup of G. Then H is free. Version: 1 Owner: Evandar Author(s): Evandar

327.2

Scheier index formula

Let G be a free group and H a subgroup of finite index |G : H| = n. By the Nielsen-Schreier theorem, H is free. The Scheier index formula states that rank(H) = n(rank(G) − 1) + 1. Thus implies more generally, if G0 is any group generated by m elements, then any subgroup of index n can be generated by nm − n + 1 elements. Version: 1 Owner: bwebste Author(s): bwebste

327.3

free group

Let A be a set with elements ai for some index set I. We refer to A as an alphabet and the elements of A as letters. A syllable is a symbol of the form ani for n ∈ Z. It is customary to write a for a1 . Define a word to be a finite ordered string, or sequence, of syllables made up of elements of A. For example, 2 −3 a32 a1 a−1 4 a3 a2 1326

is a five-syllable word. Notice that there exists a unique empty word, i.e. the word with no syllables, usually written simply as 1. Denote the set of all words formed from elements of A by W[A]. Define a binary operation, called the product, on W[A] by concatenation of words. To −1 4 3 4 illustrate, if a32 a1 and a−1 1 a3 are elements of W[A] then their product is simply a2 a1 a1 a3 . This gives W[A] the structure of a semigroup with identity. The empty word 1 acts as a right and left identity in W[A], and is the only element which has an inverse. In order to give W[A] the structure of a group, two more ideas are needed. If v = u1 a0i u2 is a word where u1 , u2 are also words and ai is some element of A, an elementary contraction of type I replaces the occurrence of a0 by 1. Thus, after this type of contraction we get another word w = u1 u2 . If v = u1 api aqi u2 is a word, an elementary contraction of type II replaces the occurrence of api aqi by aip+q which results in w = u1 ap+q u2 . i In either of these cases, we also say that w is obtained from v by an elementary contraction, or that v is obtained from w by an elementary expansion. Call two words u, v equivalent (denoted u ∼ v) if one can be obtained from the other by a finite sequence of elementary contractions or expansions. This is an equivalence relation on W[A]. Let F[A] be the set of equivalence classes of words in W[A]. Then F[A] is group under the operation [u][v] = [uv] where [u] ∈ F[A]. The inverse [u]−1 of an element [u] is obtained by reversing the order of the syllables of [u] and changing the sign of each syllable. For example, if [u] = [a1 a23 ], then −1 [u]−1 = [a−2 3 a1 ]. We call F[A] the free group on the alphabet A or the free group generated by A. A given group G is free if G is isomorphic to F[A] for some A. This seemingly ad hoc construction gives an important result: Every group is the homomorphic image of some free group. Version: 4 Owner: jihemme Author(s): jihemme, rmilson, djao

327.4

proof of Nielsen-Schreier theorem and Schreier index formula

While there are purely algebraic proofs of this fact, a much easier proof is available through geometric group theory. Let G be a group which is free on a set X. Any group acts freely on its Cayley graph, and the Cayley graph of G is a 2|X|-regular tree, which we will call T.

If H is any subgroup of G, then H also acts freely on T by restriction. Since groups that act freely on trees a H is free. 1327

Moreover, we can obtain the rank of H (the size of the set on which it is free). If G is a finite graph, then π1 (G) is free of rank −χ(G) − 1, where χ(G) denotes the Euler characteristic of G. Since H ∼ = π1 (H\T), the rank of H is χ(H\T). If H is of finite index n in G, then H\T is finite, and χ(H\T) = nχ(G\T). Of course −χ(G\T) − 1 is the rank of G. Substituting, we find that rank(H) = n(rank(G) − 1) + 1. Version: 2 Owner: bwebste Author(s): bwebste

327.5

Jordan-Holder decomposition

A Jordan–H¨older decomposition of a group G is a filtration G = G1 ⊃ G2 ⊃ · · · ⊃ Gn = {1} such that Gi+1 is a normal subgroup of Gi and the quotient Gi /Gi+1 is a simple group for each i. Version: 4 Owner: djao Author(s): djao

327.6

profinite group

A topological group G is profinite if it is isomorphic to the inverse limit of some projective system of finite groups. In other words, G is profinite if there exists a directed set I, a collection of finite groups {Hi }i∈I , and homomorphisms αij : Hj → Hi for each pair i, j ∈ I with i 6 j, satisfying 1. αii = 1 for all i ∈ I, 2. αij ◦ αjk = αik for all i, j, k ∈ I with i 6 j 6 k, with the property that: • G is isomorphic as a group to the projective limit ) ( Y lim Hi := (hi ) ∈ Hi αij (hj ) = hi for all i 6 j ←− i∈I

under componentwise multiplication.

1328

• The isomorphism from G to lim Hi (considered as a subspace of

Q

Hi ) is a homeomorphism Q of topological spaces, where each Hi is given the discrete topology and Hi is given the product topology. ←−

The topology on a profinite group is called the profinite topology. Version: 3 Owner: djao Author(s): djao

327.7

extension

A short exact sequence 0 → A → B → C → 0 is sometimes called an extension of C by A. This term is also applied to an object B which fits into such an exact sequence. Version: 1 Owner: bwebste Author(s): bwebste

327.8

holomorph

Let K be a group, and let θ : Aut(K) → Aut(K) be the identity map. The holomorph of K, denoted Hol(K), is the semidirect product K oθ Aut(K). Then K is a normal subgroup of Hol(K), and any automorphism of K is the restriction of an inner automorphism of Hol(K). For if φ ∈ Aut(K), then (1, φ) · (k, 1) · (1, φ−1) = (1 · k θ(φ) , φ) · (1, φ−1 ) = (k θ(φ) · 1θ(φ) , φφ−1) = (φ(k), 1).

Version: 2 Owner: dublisk Author(s): dublisk

327.9

proof of the Jordan Holder decomposition theorem

Let |G| = N. We first prove existence, using induction on N. If N = 1 (or, more generally, if G is simple) the result is clear. Now suppose G is not simple. Choose a maximal proper normal subgroup G1 of G. Then G1 has a Jordan–H¨older decomposition by induction, which produces a Jordan–H¨older decomposition for G.

1329

To prove uniqueness, we use induction on the length n of the decomposition series. If n = 1 then G is simple and we are done. For n > 1, suppose that G ⊃ G1 ⊃ G2 ⊃ · · · ⊃ Gn = {1} and G ⊃ G01 ⊃ G02 ⊃ · · · ⊃ G0m = {1}

are two decompositions of G. If G1 = G01Tthen we’re done (apply the induction hypothesis to G1 ), so assume G1 6= G01 . Set H := G1 G01 and choose a decomposition series H ⊃ H1 ⊃ · · · ⊃ Hk = {1}

for H. By the second isomorphism theorem, G1 /H = G1 G01 /G01 = G/G01 (the last equality is because G1 G01 is a normal subgroup of G properly containing G1 ). In particular, H is a normal subgroup of G1 with simple quotient. But then G1 ⊃ G2 ⊃ · · · ⊃ Gn and G1 ⊃ H ⊃ · · · ⊃ H k are two decomposition series for G1 , and hence have the same simple quotients by the induction hypothesis; likewise for the G01 series. Therefore n = m. Moreover, since G/G1 = G01 /H and G/G01 = G1 /H (by the second isomorphism theorem), we have now accounted for all of the simple quotients, and shown that they are the same. Version: 4 Owner: djao Author(s): djao

327.10

semidirect product of groups

The goal of this exposition is to carefully explain the correspondence between the notions of external and internal semi–direct products of groups, as well as the connection between semi–direct products and short exact sequences. Naturally, we start with the construction of semi–direct products. Definition 6. Let H and Q be groups and let θ : Q −→ Aut(H) be a group homomorphism. The semi–direct product Hoθ Q is defined to be the group with underlying set {(h, q)such thath ∈ H, q ∈ Q} and group operation (h, q)(h0 , q 0 ) := (hθ(q)h0 , qq 0 ). We leave it to the reader to check that H oθ Q is really a group. It helps to know that the inverse of (h, q) is (θ(q −1 )(h−1 ), q −1 ). For the remainder of this article, we omit θ from the notation whenever this map is clear from the context. 1330

Set G := H o Q. There exist canonical monomorphisms H −→ G and Q −→ G, given by h 7→ (h, 1Q ), q 7→ (1H , q),

h∈H q∈Q

where 1H (resp. 1Q ) is the identity element of H (resp. Q). These monomorphisms are so natural that we will treat H and Q as subgroups of G under these inclusions. Theorem 3. Let G := H o Q as above. Then: • H is a normal subgroup of G. • HQ = G. T • H Q = {1G }.

L et p : G −→ Q be the projection map defined by p(h, q) = q. Then p is a homomorphism with kernel H. Therefore H is a normal subgroup of G.

Every (h, q) ∈ G can be written as (h, 1Q )(1H , q). Therefore HQ = G. Finally, it is evident that (1H , 1Q ) is the only element of G that is of the form (h, 1Q ) for h ∈ H and (1H , q) for q ∈ Q. This result motivates the definition of internal semi–direct products. Definition 7. Let G be a group with subgroups H and Q. We say G is the internal semi– direct product of H and Q if: • H is a normal subgroup of G. • HQ = G. T • H Q = {1G }.

We know an external semi–direct product is an internal semi–direct product (Theorem 3). Now we prove a converse (Theorem 4), namely, that an internal semi–direct product is an external semi–direct product. T Lemma 6. Let G be a group with subgroups H and Q. Suppose G = HQ and H Q = {1G }. Then every element g of G can be written uniquely in the form hq, for h ∈ H and q ∈ Q. ince G = HQ, we know that g can be T written as hq. Suppose it can also be written as 0 −1 0 0 0 −1 h q . Then hq = h q so h h = q q ∈ H Q = {1G }. Therefore h = h0 and q = q 0 . S

0 0

1331

Theorem 4. Suppose G is a group with subgroups H and Q, and G is the internal semi– direct product of H and Q. Then G ∼ = H oθ Q where θ : Q −→ Aut(H) is given by θ(q)(h) := qhq −1 , q ∈ Q, h ∈ H. y lemma 6, every element g of G can be written uniquely in the form hq, with h ∈ H and q ∈ Q. Therefore, the map φ : H o Q −→ G given by φ(h, q) = hq is a bijection from G to H o Q. It only remains to show that this bijection is a homomorphism. B

Given elements (h, q) and (h0 , q 0 ) in H o Q, we have φ((h, q)(h0 , q 0 )) = φ((hθ(q)(h0 ), qq 0)) = φ(hqh0 q −1 , qq 0) = hqh0 q 0 = φ(h, q)φ(h0 , q 0 ). Therefore φ is an isomorphism. Consider the external semi–direct product G := H oθ Q with subgroups H and Q. We know from Theorem 4 that G is isomorphic to the external semi–direct product H oθ0 Q, where we are temporarily writing θ0 for the conjugation map θ0 (q)(h) := qhq −1 of Theorem 4. But in fact the two maps θ and θ0 are the same: θ0 (q)(h) = (1H , q)(h, 1Q )(1H , q −1 ) = (θ(q)(h), 1Q ) = θ(q)(h). In summary, one may use Theorems 3 and 4 to pass freely between the notions of internal semi–direct product and external semi–direct product. Finally, we discuss the correspondence between semi–direct products and split exact sequences of groups. Definition 8. An exact sequence of groups j

i

1 −→ H −→ G −→ Q −→ 1. is split if there exists a homomorphism k : Q −→ G such that j ◦ k is the identity map on Q. Theorem 5. Let G, H, and Q be groups. Then G is isomorphic to a semi–direct product H o Q if and only if there exists a split exact sequence j

i

1 −→ H −→ G −→ Q −→ 1. irst suppose G ∼ = H o Q. Let i : H −→ G be the inclusion map i(h) = (h, 1Q ) and let j : G −→ Q be the projection map j(h, q) = q. Let the splitting map k : Q −→ G be the inclusion map k(q) = (1H , q). Then the sequence above is clearly split exact. F

Now suppose we have the split exact sequence above. Let k : Q −→ G be the splitting map. Then: • i(H) = ker j, so i(H) is normal in G. 1332

• For any g ∈ G, set q := k(j(g)). Then j(gq −1 ) = j(g)j(k(j(g)))−1 = 1Q , so gq −1 ∈ Im i. Set h := gq −1 . Then g = hq. Therefore G = i(H)k(Q). • Suppose g ∈ G is in both i(H) and k(Q). Write g = k(q). Then T k(q) ∈ Im i = ker j, so q = j(k(q)) = 1Q . Therefore g = k(q) = k(1Q ) = 1G , so i(H) k(Q) = {1G }.

This proves that G is the internal semi–direct product of i(H) and k(Q). These are isomorphic to H and Q, respectively. Therefore G is isomorphic to a semi–direct product H o Q. Thus, not all normal subgroups H ⊂ G give rise to an (internal) semi–direct product G = H o G/H. More specifically, if H is a normal subgroup of G, we have the canonical exact sequence 1 −→ H −→ G −→ G/H −→ 1.

We see that G can be decomposed into H o G/H as an internal semi–direct product if and only if the canonical exact sequence splits. Version: 5 Owner: djao Author(s): djao

327.11

wreath product

Let A and B be groups, and let B act on the set Γ. Let AΓ be the set of all functions from Γ to A. Endow AΓ with a group operation by pointwise multiplication. In other words, for any f1 , f2 ∈ AΓ , (f1 f2 )(γ) = f1 (γ)f2 (γ) ∀γ ∈ Γ where the operation on the right hand side above takes place in A, of course. Define the action of B on AΓ by bf (γ) := f (bγ),

for any f : Γ → A and all γ ∈ Γ. The wreath product of A and B according to the action of B on Γ, sometimes denoted A oΓ B, is the following semidirect product of groups: AΓ o B. Before going into further constructions, let us pause for a moment to unwind this definition. Let W := A oΓ B. The elements of W are ordered pairs (f, b), for some function f : Γ → A and some b ∈ B. The group operation in the semidirect product, for any (f1 , b1 ), (f2 , b2 ) ∈ W is, (f1 (γ), b1 )(f2 (γ), b2 ) = (f1 (γ)f2 (b1 γ), b1 b2 ) ∀γ ∈ Γ

The set AΓ can be interpreted as the cartesian product of A with itself, of cardinality Γ. That is to say, Γ here plays the role of an index set for the Cartesian product. If Γ is finite, 1333

for instance, say Γ = {1, 2, . . . , n}, then any f ∈ AΓ is an n-tuple, and we can think of any (f, b) ∈ W as the following ordered pair: ((a1 , a2 , . . . , an ), b) where a1 , a2 , . . . , an ∈ A The action of B on Γ in the semidirect product has the effect of permuting the elements of the n-tuple f , and the group operation defined on AΓ gives pointwise multiplication. To be explicit, suppose (f, a), (g, b) ∈ W , and for j ∈ Γ, f (j) = rj ∈ A and g(j) = sj ∈ A. Then, (f, a)(g, b) = ((r1 , r2 , . . . , rn ), a)((s1 , s2 , . . . , sn ), b) = ((r1 , r2 , . . . , rn )(sa1 , sa2 , . . . , san ), ab) (Notice the permutation of the indices!) = ((r1 sa1 , r2 sa2 , . . . , rn san ), ab). A moment’s thought to understand this slightly messy notation will be illuminating (and might also shed some light on the choice of terminology, “wreath” product). Version: 11 Owner: bwebste Author(s): NeuRet

327.12

Jordan-Hlder decomposition theorem

Every finite group G has a filtration G ⊃ G0 ⊃ · · · ⊃ Gn = {1}, where each Gi+1 is normal in Gi and each quotient group Gi /Gi+1 is a simple group. Any two such decompositions of G have the same multiset of simple groups Gi /Gi+1 up to ordering. A filtration of G satisfying the properties above is called a Jordan–H¨older decomposition of G. Version: 4 Owner: djao Author(s): djao

327.13

simplicity of the alternating groups

This is an elementary proof that for n > 5 the alternating group on n symbols, An , is simple. Throughout this discussion, fix n > 5. We will extensively employ cycle notation, with composition on the left, as is usual. The following observation will also be useful. Let π be a permutation written as disjoint cycles π = (a1 , a2 , . . . , ak )(b1 , b2 , . . . , bl )(. . .) . . . 1334

It is easy to check that for any other permutation σ ∈ Sn σπσ −1 = (σ(a1 ), σ(a2 ), . . . , σ(ak ))(σ(b1 ), σ(b2 ), . . .)(. . .) . . .) In particular, two permutations of Sn are conjugate exactly when they have the same cycle type. Two preliminary results are necessary. Lemma 7. An is generated by all cycles of length 3. A product of 3-cycles is an even permutation, so the subgroup generated by all 3-cycles is therefore contained in An . For the reverse inclusion, by definition every even permutation is the product of even number of transpositions. Thus, it suffices to show that the product of two transpositions can be written as a product of 3-cycles. There are two possibilities. Either the two transpositions move an element in common, say (a, b) and (a, c), or the two transpositions are disjoint, say (a, b) and (c, d). In the former case,

(a, b)(a, c) = (a, c, b), and in the latter, (a, b)(c, d) = (a, b, d)(c, b, d). This establishes the first lemma. Lemma 8. If a normal subgroup N / An contains a 3-cycle, then N = An . W e will show that if (a, b, c) ∈ N, then the assumption of normality implies that any other (a0 , b0 , c0 ) ∈ N. This is easy to show, because there is some permutation in σ ∈ Sn that under conjugation takes (a, b, c) to (a0 , b0 , c0 ), that is

σ(a, b, c)σ −1 = (σ(a), σ(b), σ(c)) = (a0 , b0 , c0 ). In case σ is odd, then (because n > 5) we can choose some transposition (d, e) ∈ An disjoint from (a0 , b0 , c0 ) so that σ(a, b, c)σ −1 = (d, e)(a0 , b0 , c0 )(d, e), that is, σ 0 (a, b, c)σ 0−1 = (d, e)σ(a, b, c)σ −1 (d, e) = (a0 , b0 , c0 ) where σ 0 is even. This means that N contains all 3-cycles, as N / An . Hence, by previous lemma N = An as required. The rest of the proof proceeds by an exhuastive verification of all the possible cases. Suppose there is some nontrivial N / An . We will show that N = An . In each case we will suppose N contains a particular kind of element, and the normality will imply that N also contains a certain conjugate of the element in An , thereby reducing the situation to a previously solved case. 1335

Case 1 Suppose N contains a permutation π that when written as disjoint cycles has a cycle of length at least 4, say π = (a1 , a2 , a3 , a4 , . . .) . . . Upon conjugation by (a1 , a2 , a3 ) ∈ An , we obtain π 0 = (a1 , a2 , a3 )π(a3 , a2 , a1 ) = (a2 , a3 , a1 , a4 , . . .) . . . so that π 0 ∈ N, and also π 0 π −1 = (a1 , a2 , a4 ) ∈ N. Notice that the rest of the cycles cancel. By Lemma 8, N = An . Case 2 The cyclic decompositions of elements of N only involve cycles of length 2 and at least two cycles of length 3. Consider then π = (a, b, c)(d, e, f ) . . . Conjugation by (c, d, e) implies that N also contains π 0 = (c, d, e)π(e, d, c) = (a, b, d)(e, c, f ) . . . , and hence N also contains π 0 π = (a, d, c, b, f ) . . ., which reduces to Case 1. Case 3 There is an element of N whose cyclic decomposition only involves transpositions and exactly one 3-cycle. Upon squaring, this element becomes a 3-cycle and Lemma 8 applies.

Case 4 There is an element of N of the form π = (a, b)(c, d). Conjugating by (a, e, b) with e distinct from a, b, c, d (again, at least one such e, as n > 5) yields π 0 = (a, e, b)π(b, e, a) = (a, e)(c, d) ∈ N. Hence π 0 π = (a, b, e) ∈ N. Lemma 8, applies and N = An . Case 5 Every element of N is the product of at least four transpositions. Suppose N contains π = (a1 , b1 )(a2 , b2 )(a3 , b3 )(a4 , b4 ) . . ., the number of transpostions being even, of course. This time we conjugate by (a2 , b1 )(a3 , b2 ). π 0 = (a2 , b1 )(a3 , b2 )π(a3 , b2 )(a2 , b1 ) = (a1 , a2 )(a3 , b1 )(b2 , b3 )(a4 , b4 ), and π 0 π = (a1 , a3 , b2 )(a2 , b3 , b1 ) ∈ N which is Case 2. Since this covers all possible cases, N = An and the alternating group contains no proper nontrivial normal subgroups. QED. Version: 8 Owner: rmilson Author(s): NeuRet 1336

327.14

abelian groups of order 120

Here we present an application of the fundamental theorem of finitely generated abelian groups. Example (abelian groups of order 120): Let G be an abelian group of order n = 120. Since the group is finite it is obviously finitely generated, so we can apply the theorem. There exist n1 , n2 , . . . , ns with G∼ = Z/n1 Z ⊕ Z/n2 Z ⊕ . . . ⊕ Z/ns Z ∀i, ni > 2;

ni+1 | ni for 1 6 i 6 s − 1

Notice that in the case of a finite group, r, as in the statement of the theorem, must be equal to 0. We have s Y 3 ni = n1 · n2 · . . . · ns n = 120 = 2 · 3 · 5 = i=1

and by the divisibility properties of ni we must have that every prime divisor of n must divide n1 . Thus the possibilities for n1 are the following 2 · 3 · 5,

22 · 3 · 5,

23 · 3 · 5

If n1 = 23 · 3 · 5 = 120 then s = 1. In the case that n1 = 22 · 3 · 5 then n2 = 2 and s = 2. It remains to analyze the case n1 = 2 · 3 · 5. Now the possibilities for n2 are 2 (with n3 = 2, s = 3) or 4 (with s = 2). Hence if G is an abelian group of order 120 it must be (up to isomorphism) one of the following: Z/120Z,

Z/60Z ⊕ Z/2Z,

Z/30Z ⊕ Z/4Z,

Z/30Z ⊕ Z/2Z ⊕ Z/2Z

Also notice that they are all non-isomorphic. This is because Z/(n · m)Z ∼ = Z/nZ ⊕ Z/mZ ⇔ gcd(n, m) = 1 which is due to the Chinese remainder theorem. Version: 1 Owner: alozano Author(s): alozano

327.15

fundamental theorem of finitely generated abelian groups

Theorem 2 (Fundamental Theorem of finitely generated abelian groups). Let G be a finitely generated abelian group. Then there is a unique expression of the form: G∼ = Zr ⊕ Z/n1 Z ⊕ Z/n2 Z ⊕ . . . ⊕ Z/ns Z 1337

for some integers r, ni satisfying: r > 0;

∀i, ni > 2;

ni+1 | ni for 1 6 i 6 s − 1

Version: 1 Owner: bwebste Author(s): alozano

327.16

conjugacy class

Let G a group, and consider its operation (action) on itself give by conjugation, that is, the mapping (g, x) 7→ gxg −1 Since conjugation is an equivalence relation, we obtain a partition of G into equivalence classes, called conjugacy classes. So, the conjugacy class of X (represented Cx or C(x) is given by Cx = {y ∈ X : y = gxg −1 for some g ∈ G} Version: 2 Owner: drini Author(s): drini, apmxi

327.17

Frattini subgroup

Let G be a group. The Frattini subgroup Φ(G) of G is the intersection of all maximal subgroups of G. Equivalently, Φ(G) is the subgroup of non-generators of G. Version: 1 Owner: Evandar Author(s): Evandar

327.18

non-generator

Let G be a group. g ∈ G is said to be a non-generator if whenever X is a generating set for G then X r {g} is also a generating set for G. Version: 1 Owner: Evandar Author(s): Evandar

1338

Chapter 328 20Exx – Structure and classification of infinite or finite groups 328.1

faithful group action

Let A be a G-set. That is, a set over which acts (or operates) a group G. The map mg : A → A defined as

mg (x) = ψ(g, x)

where g ∈ G and ψ is the action, is a permutation of A (in other words, a bijective function of A) and so an element of SA . We can even get an homorphism from G to SA by the rule g 7→ mg . If for any pair g, h ∈ G g 6= h we have mg 6= mh , in other words, the homomorphism g → mg being injective, we say that the action is faithful. Version: 3 Owner: drini Author(s): drini, apmxi

1339

Chapter 329 20F18 – Nilpotent groups 329.1

classification of finite nilpotent groups

Let G be a finite group. The following are equivalent: 1. G is nilpotent. 2. Every subgroup of G is subnormal. 3. Every subgroup H 6 G is properly contained in its normalizer. 4. Every maximal subgroup is normal. 5. Every Sylow subgroup is normal. 6. G is a direct product of p-groups. Version: 1 Owner: bwebste Author(s): bwebste

329.2

nilpotent group

We define the lower central series of a group G to be the filtration of subgroups G = G1 ⊃ G2 ⊃ · · · defined inductively by: G1 := G, Gi := [Gi−1 , G], i > 1, 1340

where [Gi−1 , G] denotes the subgroup of G generated by all commutators of the form hkh−1 k −1 where h ∈ Gi−1 and k ∈ G. The group G is said to be nilpotent if Gi = 1 for some i. Nilpotent groups can also be equivalently defined by means of upper central series. For a group G, the upper central series of G is the filtration of subgroups C1 ⊂ C2 ⊂ · · · defined by setting C1 to be the center of G, and inductively taking Ci to be the unique subgroup of G such that Ci /Ci−1 is the center of G/Ci−1 , for each i > 1. The group G is nilpotent if and only if G = Ci for some i. Nilpotent groups are related to nilpotent Lie algebras in that a Lie group is nilpotent as a group if and only if its corresponding Lie algebra is nilpotent. The analogy extends to solvable groups as well: every nilpotent group is solvable, because the upper central series is a filtration with abelian quotients. Version: 3 Owner: djao Author(s): djao

1341

Chapter 330 20F22 – Other classes of groups defined by subgroup chains 330.1

inverse limit

Let {Gi }∞ i=0 be a sequence of groups which are related by a chain of surjective homomorphisms fi : Gi → Gi−1 such that G0

f1

G1

f2

G2

f3

G3

f4

...

Definition 28. The inverse limit of (Gi , fi ), denoted by is the subset of

lim(Gi , fi ), or lim Gi ←− ←− G formed by elements satisfying i=0 i

Q∞

( g0 , g1 , g2 , g3 , . . .), with gi ∈ Gi ,

fi (gi ) = gi−1

Note: The inverse limit of Gi can be checked to be a subgroup of the product below for a more general definition.

Q∞

i=0

Gi . See

Examples: 1. Let p ∈ N be a prime. Let G0 = {0} and Gi = Z/pi Z. Define the connecting homomorphisms fi , for i > 2, to be “reduction modulo pi−1 ” i.e. fi : Z/pi Z → Z/pi−1 Z

fi (x mod pi ) = x mod pi−1 which are obviously surjective homomorphisms. The inverse limit of (Z/pi Z, fi ) is called the p-adic integers and denoted by Zp = lim Z/pi Z ←− 1342

2. Let E be an elliptic curve defined over C. Let p be a prime and for any natural number n write E[n] for the n-torsion group, i.e. E[n] = {Q ∈ E | n · Q = O} In this case we define Gi = E[pi ], and fi : E[pi ] → E[pi−1 ],

fi (Q) = p · Q

The inverse limit of (E[pi ], fi ) is called the Tate module of E and denoted Tp (E) = lim E[pi ] ←− The concept of inverse limit can be defined in far more generality. Let (S, 6) be a directed set and let C be a category. Let {Gα }α∈S be a collection of objects in the category C and let {fα,β : Gβ → Gα | α, β ∈ S,

α 6 β}

be a collection of morphisms satisfying: 1. For all α ∈ S, fα,α = IdGα , the identity morphism. 2. For all α, β, γ ∈ S such that α 6 β 6 γ, we have fα,γ = fα,β ◦ fβ,γ (composition of morphisms). Definition 29. The inverse limit of ({Gα }α∈S , {fα,β }), denoted by lim(Gα , fα,β ), or lim Gα ←− ←− Q is defined to be the set of all (gα ) ∈ α∈S Gα such that for all α, β ∈ S α 6 β ⇒ fα,β (gβ ) = gα

For a good example of this more general construction, see infinite Galois theory. Version: 6 Owner: alozano Author(s): alozano

1343

Chapter 331 20F28 – Automorphism groups of groups 331.1

outer automorphism group

The outer automorphism group of a group is the quotient of its automorphism group by its inner automorphism group: Out(G) = Aut(G)/Inn(G). Version: 7 Owner: Thomas Heye Author(s): yark, apmxi

1344

Chapter 332 20F36 – Braid groups; Artin groups 332.1

braid group

Consider two sets of n points on the complex plane C2 , of the form (1, 0), . . . , (n, 0), and of the form (1, 1), . . . , (n, 1). We connect these two sets of points via a series of paths fi : I → C2 , such that fi (t) 6= fj (t) for i 6= j and any t ∈ [0, 1]. Also, each fi may only intersect the planes (0, z) and (1, z) for t = 0 and 1 respectively. Thus, the picture looks like a bunch of strings connecting the two sets of points, but possibly tangled. The path f = (f1 , . . . , fn ) determines a homotopy class f, where we require homotopies to satisfy the same conditions on the fi . Such a homotopy class f is called a braid on n strands. We can obtain a group structure on the set of braids on n strands as follows. Multiplication of two strands f, g is done by simply following f first, then g, but doing each twice as fast. That is, f · g is the homotopy class of the path ( f (2t) if 0 6 t 6 1/2 fg = g(2t − 1) if 1/2 6 t 6 1 where f and g are representatives for f and g respectively. Inverses are done by following the same strand backwards, and the identity element is the strand represented by straight lines down. The result is known as the braid group on n strands, it is denoted by Bn . The braid group determines a homomorphism φ : Bn → Sn , where Sn is the symmetric group on n letters. For f ∈ Bn , we get an element of Sn from map sending i 7→ p1 (fi (1)) where f is a representative of the homtopy class f, and p1 is the projection onto the first factor. This works because of our requirement on the points that the braids start and end, and since our homotopies fix basepoints. The kernel of φ consists of the braids that bring each strand to its original order. This kernel gives us the pure braid group on n strands, and is denoted by Pn . Hence, we have a short exact sequence 1 → Pn → Bn → Sn → 1. 1345

We can also describe braid groups as certain fundamental groups, and in more generality. Let M be a manifold, The configuration space of n ordered points on M is defined to be Fn (M) = {(a1 , . . . , an ) ∈ M n | ai 6= aj fori 6= j}. The group Sn acts on Fn (M) by permuting coordinates, and the corresponding quotient space Cn (M) = Fn (M)/Sn is called the configuration space of n unordered points on M. In the case that M = C, we obtain the regular and pure braid groups as π1 (Cn (M)) and π1 (Fn (M)) respectively. The group Bn can be given the following presentation. The presentation was given in Artin’s first paper [1] on the braid group. Label the braids 1 through n as before. Let σi be the braid that twists strands i and i + 1, with i passing beneath i + 1. Then the σi generate Bn , and the only relations needed are σi σj = σj σi for |i − j| > 2, 1 6 i, j 6 n − 1 σi σi+1 σi = σi+1 σi σi+1 for 1 6 i 6 n − 2 The pure braid group has a presentation with −1 −1 −1 generatorsaij = σj−1 σj−2 · · · σi+1 σi2 σi+1 · · · σj−2 σj−1 for 1 6 i < j 6 n

and defining relations  aij    a a a−1 rj ij rj a−1 rs aij ars = −1 −1  a rj asj aij asj arj    −1 −1 −1 arj asj a−1 rj asj aij asj arj asj arj

if if if if

i < r < s < j or r < s < i < j r
REFERENCES 1. E. Artin Theorie der Z¨ opfe. Abh. Math. Sem. Univ. Hamburg 4(1925), 42-72. 2. V.L. Hansen Braids and Coverings. London Mathematical Society Student Texts 18. Cambridge University Press. 1989.

Version: 7 Owner: dublisk Author(s): dublisk

1346

Chapter 333 20F55 – Reflection and Coxeter groups 333.1

cycle

Let S be a set. A cycle is a permutation (bijective function of a set onto itself) such that there exist distinct elements a1 , a2 , . . . , ak of S such that f (ai ) = ai+1

and

f (ak ) = a1

that is f (a1 ) = a2 f (a2 ) = a3 .. . f (ak ) = a1 and f (x) = x for any other element of S. This can also be pictured as a1 7→ a2 7→ a3 7→ · · · 7→ ak 7→ a1 and x 7→ x

for any other element x ∈ S, where 7→ represents the action of f . One of the basic results on symmetric groups says that any permutation can be expressed as product of disjoint cycles. Version: 6 Owner: drini Author(s): drini 1347

333.2

dihedral group

The nth dihedral group, Dn is the symmetry group of the regular n-sided polygon. The group consists of n reflections, n − 1 rotations, and the identity transformation. Letting ω = exp(2πi/n) denote a primitive nth root of unity, and assuming the polygon is centered at the origin, the rotations Rk , k = 0, . . . , n − 1 (Note: R0 denotes the identity) are given by Rk : z 7→ ω k z, z ∈ C, and the reflections Mk , k = 0, . . . , n − 1 by

Mk : z 7→ ω k z¯,

z∈C

The abstract group structure is given by Rk Rl = Rk+l , Mk Ml = Rk−l ,

Rk Ml = Mk+l Mk Rl = Mk−l ,

where the addition and subtraction is carried out modulo n. The group can also be described in terms of generators and relations as (M0 )2 = (M1 )2 = (M1 M0 )n = id. This means that Dn is a rank-1 Coxeter group. Since the group acts by linear transformations (x, y) → (ˆ x, yˆ),

(x, y) ∈ R2

there is a corresponding action on polynomials p → pˆ, defined by pˆ(ˆ x, yˆ) = p(x, y),

p ∈ R[x, y].

The polynomials left invariant by all the group transformations form an algebra. This algebra is freely generated by the following two basic invariants:   n n−2 2 2 2 n x +y , x − x y + ..., 2 the latter polynomial being the real part of (x + iy)n . It is easy to check that these two polynomials are invariant. The first polynomial describes the distance of a point from the origin, and this is unaltered by Euclidean reflections through the origin. The second polynomial is unaltered by a rotation through 2π/n radians, and is also invariant with respect to complex conjugation. These two transformations generate the nth dihedral group. Showing that these two invariants polynomially generate the full algebra of invariants is somewhat trickier, and is best done as an application of Chevalley’s theorem regarding the invariants of a finite reflection group. Version: 8 Owner: rmilson Author(s): rmilson 1348

Chapter 334 20F65 – Geometric group theory 334.1

groups that act freely on trees are free

Let X be a tree, and Γ a group acting freely and faithfully by group automorphisms on X. Then Γ is a free group. S ince Γ acts freely on X, the quotient graph X/Γ is well-defined, and X is the universal cover of X/Γ since X is contractible. Thus Γ ∼ = π1 (X/Γ). Since any graph is homotopy equivalent to a wedge of circles, and the fundamental group of such a space is free by Van Kampen’s theorem, Γ is free.

Version: 3 Owner: bwebste Author(s): bwebste

1349

Chapter 335 20F99 – Miscellaneous 335.1

perfect group

A group G is called perfect if G = [G, G], where [G, G] is the derived subgroup of G, or equivalently, if the abelianization of G is trivial. Version: 1 Owner: bwebste Author(s): bwebste

1350

Chapter 336 20G15 – Linear algebraic groups over arbitrary fields 336.1

Nagao’s theorem

For any integral domain k, the group of n×n invertible matrices with coefficients in k[t] is the amalgamated free product of invertible matrices over k and invertible upper triangular matrices over k[t], amalgamated over the upper triangular matrices of k. More compactly GLn (k[t]) ∼ = GLn (k) ∗B(k) B(k[t]). Version: 3 Owner: bwebste Author(s): bwebste

336.2

computation of the order of GL(n, Fq )

GL(n, Fq ) is the group of n × n matrices over a finite field Fq with non-zero determinant. Here is a proof that |GL(n, Fq )| = (q n − 1)(q n − q) · · · (q n − q n−1 ). Each element A ∈ GL(n, Fq ) is given by a collection of n Fq linearly independent vectors. If one chooses the first column vector of A from (Fq )n there are q n choices, but one can’t choose the zero vector since this would make the determinant of A zero. So there are really only (q n − 1) choices. To choose an i-th vector from (Fq )n which is linearly independent from (i-1) already choosen linearly independent vectors {V1 , · · · , Vi−1 } one must choose a vector not in the span of {V1 , · · · , Vi−1 }. There are q i−1 vectors in this span so the number of choices is clearly (q n − q i−1 ). Thus the number of linearly independent collections of n vectors in Fq is: (q n − 1)(q n − q) · · · (q n − q n−1 ). Version: 5 Owner: benjaminfjones Author(s): benjaminfjones 1351

336.3

general linear group

Given a vector space V , the general linear group GL( V ) is defined to be the group of invertible linear transformations from V to V . The group operation is defined by composition: given T : V −→ V and T 0 : V −→ V in GL( V ), the product T T 0 is just the composition of the maps T and T 0 . If V = Fn for some field F, then the group GL(V ) is often denoted GL(n, F) or GLn (F). In this case, if one identifies each linear transformation T : V −→ V with its matrix with respect to the standard basis, the group GL(n, F) becomes the group of invertible n × n matrices with entries in F, under the group operation of matrix multiplication. Version: 3 Owner: djao Author(s): djao

336.4

order of the general linear group over a finite field

GL(n, Fq ) is a finite group when Fq is a finite field with q elements. Furthermore, |GL(n, Fq )| = (q n − 1)(q n − q) · · · (q n − q n−1 ). Version: 16 Owner: benjaminfjones Author(s): benjaminfjones

336.5

special linear group

Given a vector space V , the special linear group SL(V ) is defined to be the subgroup of the general linear group GL(V ) consisting of all invertible linear transformations T : V −→ V in GL(V ) that have determinant 1. If V = Fn for some field F, then the group SL(V ) is often denoted SL(n, F) or SLn (F), and if one identifies each linear transformation with its matrix with respect to the standard basis, then SL(n, F) consists of all n × n matrices with entries in F that have determinant 1. Version: 2 Owner: djao Author(s): djao

1352

Chapter 337 20G20 – Linear algebraic groups over the reals, the complexes, the quaternions 337.1

orthogonal group

Let Q be a non-degenerate symmetric bilinear form over the real vector space Rn . A linear transformation T : V −→ V is said to preserve Q if Q(T x, T y) = Q(x, y) for all vectors x, y ∈ V . The subgroup of the general linear group GL(V ) consisting of all linear transformations that preserve Q is called the orthogonal group with respect to Q, and denoted O(n, Q). If Q is also positive definite (i.e., Q is an inner product), then O(n, Q) is equivalent to the group of invertible linear transformations that preserve the standard inner product on Rn , and in this case it is usually denoted O(n). One can show that a transformation T is in O(n) if and only if T −1 = T T (the inverse of T equals the transpose of T ). Version: 2 Owner: djao Author(s): djao

1353

Chapter 338 20G25 – Linear algebraic groups over local fields and their integers 338.1

Ihara’s theorem

Let Γ be a discrete, torsion-free subgroup of SL2 Qp (where Qp is the field of p-adic numbers). Then Γ is free. Proof, or a sketch thereof] There exists a p + 1 regular tree X on which SL2 Qp acts, with of p-adic integers). Since Zp is compact in its stabilizer SL2 Zp (here, Zp denotes the ring T profinite topology, so is SL2 Zp . Thus, SL2 Zp Γ must be compact, discrete and torsion-free. Since compact and discrete implies finite, the only such group is trivial. Thus, Γ acts freely on X. Since groups acting freely on trees are free, Γ is free. [

Version: 6 Owner: bwebste Author(s): bwebste

1354

Chapter 339 20G40 – Linear algebraic groups over finite fields 339.1

SL2(F3)

The special linear group over the finite field F3 is represented by SL2 (F3 ) and consists of the 2 × 2 invertible matrices with determinant equal to 1 and whose entries belong to F3 . Version: 6 Owner: drini Author(s): drini, apmxi

1355

Chapter 340 20J06 – Cohomology of groups 340.1

group cohomology

Let G be a group and let M be a (left) G-module. The 0th cohomology group of the G-module M is H 0 (G, M) = {m ∈ M : ∀σ ∈ G, σm = m} which is the set of elements of M which are G-invariant, also denoted by M G .

A map φ : G → M is said to be a crossed homomorphism (or 1-cocycle) if φ(αβ) = φ(α) + αφ(β) for all α, β ∈ G. If we fix m ∈ M, the map ρ : G → M defined by ρ(α) = αm − m is clearly a crossed homomorphism, said to be principal (or 1-coboundary). We define the following groups: Z 1 (G, M) = {φ : G → M : φ is a 1-cocycle} B 1 (G, M) = {ρ : G → M : ρ is a 1-coboundary}

Finally, the 1st cohomology group of the G-module M is defined to be the quotient group: H 1 (G, M) = Z 1 (G, M)/B 1 (G, M) The following proposition is very useful when trying to compute cohomology groups: Proposition 1. Let G be a group and let A, B, C be G-modules related by an exact sequence: 0→A→B→C→0 Then there is a long exact sequence in cohomology: 0 → H 0 (G, A) → H 0 (G, B) → H 0 (G, C) → H 1 (G, A) → H 1 (G, B) → H 1(G, C) 1356

In general, the cohomology groups H n (G, M) can be defined as follows: Definition 30. Define C 0 (G, M) = M and for n > 1 define the additive group: C n (G, M) = {φ : Gn → M} The elements of C n (G, M) are called n-cochains. Also, for n > 0 define the nth coboundary homomorphism dn : C n (G, M) → C n+1 (G, M): dn (f )(g1 , ..., gn+1 ) = g1 · f (g2, ..., gn+1 ) n X + (−1)i f (g1 , ..., gi−1, gi gi+1 , gi+2 , ..., gn+1) i=1

+ (−1)n+1 f (g1 , ..., gn )

Let Z n (G, M) = ker dn for n > 0, the set of n-cocyles. Also, let B 0 (G, M) = 1 and for n > 1 let B n (G, A) = image dn−1 , the set of n-coboundaries. Finally we define the nth -cohomology group of G with coefficients in M to be H n (G, M) = Z n (G, M)/B n (G, M)

REFERENCES 1. J.P. Serre, Galois Cohomology, Springer-Verlag, New York. 2. James Milne, Elliptic Curves, online course notes. 3. Joseph H. Silverman, The Arithmetic of Elliptic Curves. Springer-Verlag, New York, 1986.

Version: 4 Owner: alozano Author(s): alozano

340.2

stronger Hilbert theorem 90

¯ be an algebraic closure of K. By K ¯ + we denote the abelian group Let K be a field and let K ∗ ¯ +) and similarly K ¯ = (K, ¯ ∗) (here the operation is multiplication). Also we let (K, ¯ = Gal(K/K) GK/K ¯ be the absolute Galois group of K. Theorem 3 (Hilbert 90). Let K be a field. 1.

¯ +) = 0 H 1 (GK/K ,K ¯ 1357

2.

¯ ∗) = 0 H 1 (GK/K ,K ¯

3. If char(K), the characteristic of K, does not divide m (or char(K) = 0) then H 1 (GK/K , µm ) ∼ ¯ = K ∗ /K ∗m where µm denotes the set of all mth -roots of unity.

REFERENCES 1. J.P. Serre, Galois Cohomology, Springer-Verlag, New York. 2. J.P. Serre, Local Fields, Springer-Verlag, New York.

Version: 2 Owner: alozano Author(s): alozano

1358

Chapter 341 20J15 – Category of groups 341.1

variety of groups

A variety of groups is the set of groups G such that all elements x1 , . . . , xn ∈ G satisfy a set of equationally defined relations ri (x1 , . . . , xn ) = 1 ∀i ∈ I, where I is an index set. For example, abelian groups are a variety defined by the equations {[x1 , x2 ] = 1}, where [x, y] = xyx−1 y −1. Nilpotent groups of class < c are a variety defined by {[[· · · [[x1 , x2 ], x3 ] · · · ], xc ]}. Analogously, solvable groups of length < c are a variety. Abelian groups are a special case of both of these. Groups of exponent n are a variety, defined by {xn1 = 1}. A variety of groups is a full subcategory of the category of groups, and there is a free group on any set of elements in the variety, which is the usual free group modulo the relations of the variety applied to all elements. This satisfies the usual universal property of the free group on groups in the variety, and is thus adjoint to the forgetful functor in the category of sets. In the variety of abelian groups, we get back the usual free abelian groups. In the variety of groups of exponent n, we get the Burnside groups. Version: 1 Owner: bwebste Author(s): bwebste 1359

Chapter 342 20K01 – Finite abelian groups 342.1

Schinzel’s theorem

Let a ∈ Q, not zero or 1 or −1. For any prime p which does not divide the numerator or denominator of a in reduced form, a can be viewed as an element of the multiplicative group Z/pZ. Let np be the order of this element in the multiplicative group. Then the set of np over all such primes has finite complement in the set of positive integers. One can generalize this as follows: Similarly, if K is a number field, choose a not zero or a root of unity in K. Then for any finite place (discrete valuation) p with vp (a) = 0, we can view a as an element of the residue field at p, and take the order np of this element in the multiplicative group. Then the set of np over all such primes has finite complement in the set of positive integers. Silverman also generalized this to elliptic curves over number fields. References to come soon. Version: 4 Owner: mathcam Author(s): Manoj, nerdy2

1360

Chapter 343 20K10 – Torsion groups, primary groups and generalized primary groups 343.1

torsion

The Definition 31. torsion of a group G is the set Tor(G) = {g ∈ G : g n = e for some n ∈ N}. A group is said to be Definition 32. torsion-free if Tor(G) = {e}, i.e. the torsion consists only of the identity element. If G is abelian then Tor(G) is a subgroup (the Definition 33. torsion group) of G. Example 18 (Torsion of a cyclic group). For a cyclic group Zp , Tor(Zp ) = Zp . In general, if G is a finite group then Tor(G) = G. Version: 2 Owner: mhale Author(s): mhale

1361

Chapter 344 20K25 – Direct sums, direct products, etc. 344.1

direct product of groups

The external direct product G × H of two groups G and H is defined to be the set of ordered pairs (g, h), with g ∈ G and h ∈ H. The group operation is defined by (g, h)(g 0, h0 ) = (gg 0, hh0 ) It can be shown that G × H obeys the group axioms. More generally, we can define the external direct product of n groups, in the obvious way. Let G = G1 × . . . × Gn be the set of all ordered n-tuples {(g1 , g2 . . . , gn ) | gi ∈ Gi } and define the group operation by componentwise multiplication as before. Version: 4 Owner: vitriol Author(s): vitriol

1362

Chapter 345 20K99 – Miscellaneous 345.1

Klein 4-group

The Klein 4-group is the subgroup V (Vierergruppe) of S4 (see symmetric group) consisting of the following 4 permutations: (), (12), (34), (12)(34). (see cycle notation). This is an abelian group, isomorphic to the product Z/2Z × Z/2Z. The group is named after Felix Klein, a pioneering figure in the field of geometric group theory. The Klein 4 group enjoys a number of interesting properties, some of which are listed below. 1. It is the automorphism group of the graph consisting of two disjoint edges. 2. It is the unique 4 element group with the property that all elements are idempotent. 3. It is the symmetry group of a planar ellipse. 4. Consider the action of S4 , the permutation group of 4 elements, on the set of partitions into two groups of two elements. There are 3 such partitions, which we denote by (12, 34) (13, 24) (14, 23). Thus, the action of S4 on these partition induces a homomorphism from S4 to S3 ; the kernel is the Klein 4-group. This homomorphism is quite exceptional, and corresponds to the fact that A4 (the alternating group) is not a simple group (notice that V is actually a subgroup of A4 ). All other alternating groups are simple. 5. A more geometric way to see the above is the following: S4 is the group of symmetries of a tetrahedron. There is an iduced action of S4 on the six edges of the tetrahedron. Observing that this action preserves incidence relations one gets an action of S4 on the three pairs of opposite edges (See figure). 1363

6. It is the symmetry group of the Riemannian curvature tensor. 3

4 1

2

Version: 7 Owner: rmilson Author(s): Dr Absentius, rmilson, imran

345.2

divisible group

An abelian group D is said to be divisible if for any x ∈ D, n ∈ Z+ , there exists an element x0 ∈ D such that nx0 = x. Some noteworthy facts: • An abelian group is injective (as a Z-module) if and only if it is divisible. • Every group is isomorphic to a subgroup of a divisible group. • Any divisible abelian group is isomorphic to the direct sum of its torsion subgroup and n copies of the group of rationals (for some cardinal number n). Version: 4 Owner: mathcam Author(s): mathcam

345.3

example of divisible group

Let G denote the group of rational numbers taking the operation to be addition. Then for p p any pq ∈ G and n ∈ Z+ , we have nq ∈ G satisfying n nq = pq , so the group is divisible. Version: 1 Owner: mathcam Author(s): mathcam

345.4

locally cyclic group

A locally cyclic (or generalized cyclic) group is a group in which any pair of elements generates a cyclic subgroup. 1364

Every locally cyclic group is abelian. If G is a locally cyclic group, then every finite subset of G generates a cyclic subgroup. Therefore, the only finitely-generated locally cyclic groups are the cyclic groups themselves. The group (Q, +) is an example of a locally cyclic group that is not cyclic. Subgroups and quotients of locally cyclic groups are also locally cyclic. A group is locally cyclic if and only if its lattice of subgroups is distributive. Version: 10 Owner: yark Author(s): yark

1365

Chapter 346 20Kxx – Abelian groups 346.1

abelian group

Let (G, ∗) be a group. If for any a, b ∈ G we have a ∗ b = b ∗ a, we say that the group is abelian. Sometimes the expression commutative group is used, but this is less frequent. Abelian groups hold several interesting properties. Theorem 4. If ϕ : G → G defined by ϕ(x) = x2 is a homomorphism, then G is abelian. Proof. If such function were a homomorphism, we would have (xy)2 = ϕ(xy) = ϕ(x)ϕ(y) = x2 y 2 that is, xyxy = xxyy. Left-mutiplying by x−1 and right-multiplying by y −1 we are led to yx = xy and thus the group is abelian. QED Theorem 5. Any subgroup of an abelian group is normal. Proof. Let H be a subgroup of the abelian group G. Since ah = ha for any a ∈ G and any h ∈ H we get aH = Ha. That is, H is normal in G. QED Theorem 6. Quotient groups of abelian groups are also abelian. Proof Let H a subgroup of G. Since G is abelian, H is normal and we can get the quotient group G/H whose elements are the equivalence classes for a ∼ b if ab−1 ∈ H. The operation on the quotient group is given by aH · bH = (ab)H. But bh · aH = (ba)H = (ab)H, therefore the quotient group is also commutative. QED Version: 12 Owner: drini Author(s): drini, yark, akrowne, apmxi 1366

Chapter 347 20M10 – General structure theory 347.1

existence of maximal semilattice decomposition

Let S be a semigroup. A maximal semilattice decomposition for S is a surjective homomorphism φ : S → Γ onto a semilattice Γ with the property that any other semilattice decomposition factors through φ. So if φ0 : S → Γ0 is any other semilattice decomposition of S, then there is a homomorphism Γ → Γ0 such that the following diagram commutes: φ

S

Γ

φ0

Γ0 Proposition 14. Every semigroup has a maximal semilattice decomposition. ecall that each semilattice decompostion determines a semilattice congruence. If {ρi | T i ∈ I} is the family of all semilattice congruences on S, then define ρ = i∈I ρi . (Here, we consider the congruences as subsets of S × S, and take their intersection as sets.) R

It is easy to see that ρ is also a semilattice congruence, which is contained in all other semilattice congruences. Therefore each of the homomorphisms S → S/ρi factors through S → S/ρ. Version: 2 Owner: mclase Author(s): mclase

1367

347.2

semilattice decomposition of a semigroup

S A semigroup S has a semilattice decomposition if we can write S = γ∈Γ Sγ as a disjoint union of subsemigroups, indexed by elements of a semilattice Γ, with the additional condition that x ∈ Sα and y ∈ Sβ implies xy ∈ Sαβ . Semilattice decompositions arise from homomorphims of semigroups onto semilattices. If φ : S → Γ is a surjective homomorphism, then it is easy to see that we get a semilattice decomposition by putting Sγ = φ−1 (γ) for each γ ∈ Γ. Conversely, every semilattice decomposition defines a map from S to the indexing set Γ which is easily seen to be a homomorphism. A third way to look at semilattice decompositions is to consider the congruence ρ defined by the homomorphism φ : S → Γ. Because Γ is a semilattice, φ(x2 ) = φ(x) for all x, and so ρ satisfies the constraint that x ρ x2 for all x ∈ S. Also, φ(xy) = φ(yx) so that xy ρ yx for all x, y ∈ S. A congruence ρ which satisfies these two conditions is called a semilattice congruence. Conversely, a semilattice congruence ρ on S gives rise to a homomorphism from S to a semilattice S/ρ. The ρ-classes are the components of the decomposition. Version: 3 Owner: mclase Author(s): mclase

347.3

simple semigroup

Let S be a semigroup. If S has no ideals other than itself, then S is said to be simple. If S has no left ideals [resp. Right ideals] other than itself, then S is said to be left simple [resp. right simple]. Right simple and left simple are stronger conditions than simple. A semigroup S is left simple if and only if Sa = S for all a ∈ S. A semigroup is both left and right simple if and only if it is a group. If S has a zero element θ, then 0 = {θ} is always an ideal of S, so S is not simple (unless it has only one element). So in studying semigroups with a zero, a slightly weaker definition is required. Let S be a semigroup with a zero. Then S is zero simple, or 0-simple, if the following conditions hold: • S 2 6= 0 • S has no ideals except 0 and S itself 1368

The condition S 2 = 0 really only eliminates one semigroup: the 2-element null semigroup. Excluding this semigroup makes parts of the structure theory of semigroups cleaner. Version: 1 Owner: mclase Author(s): mclase

1369

Chapter 348 20M12 – Ideal theory 348.1

Rees factor

Let I be an ideal of a semigroup S. Define a congruence ∼ by x ∼ y iff x = y or x, y ∈ I. Then the Rees factor of S by I is the quotient S/ ∼. As a matter of notation, the congruence ∼ is normally suppressed, and the quotient is simply written S/I. Note that a Rees factor always has a zero element. Intuitively, the quotient identifies all element in I and the resulting element is a zero element. Version: 1 Owner: mclase Author(s): mclase

348.2

ideal

Let S be a semigroup. An ideal of S is a non-empty subset of S which is closed under multiplication on either side by elements of S. Formally, I is an ideal of S if I is non-empty, and for all x ∈ I and s ∈ S, we have sx ∈ I and xs ∈ I. One-sided ideals are defined similarly. A non-empty subset A of S is a left ideal (resp. right ideal) of S if for all a ∈ A and s ∈ S, we have sa ∈ A (resp. as ∈ A). A principal left ideal of S is a left ideal generated by S a single element. If a 1∈ S, then the 1 principal left ideal of S generated by a is S a = Sa {a}. (The notation S is explained here.) S Similarly, the principal right ideal generated by a is aS 1 = aS {a}. The notation L(a) and R(a) are also common for the principal left and right ideals generated 1370

by a respectively. A principal ideal of S is an ideal generated by a single element. The ideal generated by a is [ [ [ S 1 aS 1 = SaS Sa aS {a}. The notation J(a) = S 1 aS 1 is also common.

Version: 5 Owner: mclase Author(s): mclase

1371

Chapter 349 20M14 – Commutative semigroups 349.1

Archimedean semigroup

Let S be a commutative semigroup. We say an element x divides an element y, written x | y, if there is an element z such that xz = y. An Archimedean semigroup S is a commutative semigroup with the property that for all x, y ∈ S there is a natural number n such that x | y n . This is related to the Archimedean property of positive real numbers R+ : if x, y > 0 then there is a natural number n such that x < ny. Except that the notation is additive rather than multiplicative, this is the same as saying that (R+ , +) is an Archimedean semigroup. Version: 1 Owner: mclase Author(s): mclase

349.2

commutative semigroup

A semigroup S is commutative if the defining binary operation is commutative. That is, for all x, y ∈ S, the identity xy = yx holds. Although the term Abelian semigroup is sometimes used, it is more common simply to refer to such semigroups as commutative semigroups. A monoid which is also a commutative semigroup is called a commutative monoid. Version: 1 Owner: mclase Author(s): mclase

1372

Chapter 350 20M20 – Semigroups of transformations, etc. 350.1

semigroup of transformations

Let X be a set. A transformation of X is a function from X to X. If α and β are transformations on X, then their product αβ is defined (writing functions on the right) by (x)(αβ) = ((x)α)β. With this definition, the set of all transformations on X becomes a semigroup, the full semigroupf of transformations on X, denoted TX . More generally, a semigroup of transformations is any subsemigroup of a full set of transformations. When X is finite, say X = {x1 , x2 , . . . , xn }, then the transformation α which maps xi to yi (with yi ∈ X, of course) is often written:   x1 x2 . . . xn α= y1 y2 . . . yn With this notation it is quite easy to calculate then   1 2 3 4 1 2 3 3 2 1 2 2 3 3

products. For example, if X = {1, 2, 3, 4},    4 1 2 3 4 = 4 3 3 2 3

When X is infinite, say X = {1, 2, 3, . . . }, then this notation is still useful for illustration in cases where the transformation pattern is apparent. For example, if α ∈ TX is given by

1373

α : n 7→ n + 1, we can write

  1 2 3 4 ... α= 2 3 4 5 ...

Version: 3 Owner: mclase Author(s): mclase

1374

Chapter 351 20M30 – Representation of semigroups; actions of semigroups on sets 351.1

counting theorem

Given a group action of a finite group G on a set X, the following expression gives the number of distinct orbits 1 X stabg (X) |G| g∈G Where stabg (X) is the number of elements fixed by the action of g. Version: 8 Owner: mathcam Author(s): Larry Hammick, vitriol

351.2

example of group action

Let a, b, c be integers and let [a, b, c] denote the mapping [a, b, c] : Z × Z 7→ Z, (x, y) 7→ ax2 + bxy + cy 2 . Let G be the group of 2 × 2 matrices such that det A = ±1 ∀ A ∈ G, and A ∈ G. The substitution txy 7→ A · txy leads to 0

0

0

[a, b, c](a11 x + a12 y, a21x + a22 y) = a x2 + b xy + c y 2 , 1375

where 0

a = a · a211 + b · a11 · a21 + c · a221 0 b = 2a · a11 · a12 + 2c · a21 · a22 + b(a11 a22 + a12 a21 0 c = a · a212 + b · a12 a22 + c · a222

(351.2.1)

So we define 0

0

0

0

0

[a, b, c] ∗ A := [a , b , c ]

0

to be the binary quadratic form with coefficients a , b , c of x2 , xy, y 2, respectively as in  1 0 we have [a, b, c] ∗ A = [a, b, c] for any binary quadratic (495.2.1). Putting in A = 0 1 form [a, b, c]. Now let B be another matrix in G. We must show that [a, b, c] ∗ (AB) = ([a, b, c] ∗ A) ∗ B. 00

00

00

Set [a, b, c] ∗ (AB) := [a , b , c ]. So we have 00

a c

00

a · (a11 b11 + a12 b21 )2 + c · (a21 b11 + a22 b21 )2 + b · (a11 b11 + a12 b21 ) (a21 b11 + (351.2.2) a22 b21 ) 0 0 2 2 a · b11 + c · b21 + (2a · a11 a12 + 2c · a21 a22 + b (a11 a22 + a12 a21 )) b11 b21 a · (a11 b12 + a12 b22 )2 + c · (a21 b12 + a22 b22 )2 + b · (a11 b12 + a12 b22 ) (a21 b12 + (351.2.3) a22 b22 ) 0 0 2 2 a · b12 + c · b22 + (2a · a11 a12 + 2c · a21 a22 + b (a11 a22 + a12 a21 )) b12 b22

= = = =

00

as desired. For the coefficient b we get 00

= 2a · (a11 b11 + a12 b21 ) (a11 b12 + a12 b22 ) + 2c · (a21 b11 + a22 b21 ) (a21 b12 + a22 b22 ) + b · ((a11 b11 + a12 b21 ) (a21 b12 + a22 b22 ) + (a11 b12 + a12 b22 ) (a21 b11 + a22 b21 ))

b

and by evaluating the factors of b11 b12 , b21 b22 , and b11 b22 + b21 b12 , it can be checked that 00

0

0

b = 2a b11 b12 + 2c b21 b22 + (b11 b22 + b21 b12 ) (2a · a11 a12 + 2c · a21 a22 + b (a11 a22 + a12 a21 )) . This shows that 00

00

00

0

0

0

[a , b , c ] = [a , b , c ] ∗ B

(351.2.4)

and therefore [a, b, c] ∗ (AB) = ([a, b, c] ∗ A) ∗ B. Thus, (495.2.1) defines an action of G on the set of (integer) binary quadratic forms. Furthermore, the discriminant of each quadratic form in the orbit of [a, b, c] under G is b2 − 4ac. Version: 5 Owner: Thomas Heye Author(s): Thomas Heye

351.3

group action

Let G be a group and let X be a set. A left group action is a function · : G × X −→ X such that: 1376

1. 1G · x = x for all x ∈ X 2. (g1 g2 ) · x = g1 · (g2 · x) for all g1 , g2 ∈ G and x ∈ X A right group action is a function · : X × G −→ X such that: 1. x · 1G = x for all x ∈ X 2. x · (g1 g2 ) = (x · g1 ) · g2 for all g1 , g2 ∈ G and x ∈ X There is a correspondence between left actions and right actions, given by associating the right action x · g with the left action g · x := x · g −1. In many (but not all) contexts, it is useful to identify right actions with their corresponding left actions, and speak only of left actions. Special types of group actions A left action is said to be effective, or faithful, if the function x 7→ g ·x is the identity function on X only when g = 1G . A left action is said to be transitive if, for every x1 , x2 ∈ X, there exists a group element g ∈ G such that g · x1 = x2 . A left action is free if, for every x ∈ X, the only element of G that stabilizes x is the identity; that is, g · x = x implies g = 1G . Faithful, transitive, and free right actions are defined similarly. Version: 3 Owner: djao Author(s): djao

351.4

orbit

Let G be a group, X a set, and · : G × X −→ X a group action. For any x ∈ X, the orbit of x under the group action is the set {g · x | g ∈ G} ⊂ X. Version: 2 Owner: djao Author(s): djao

351.5

proof of counting theorem

Let N be the cardinality of the set of all the couples (g, x) such that g · x = x. For each g ∈ G, there exist stabg (X) couples with g as the first element, while for each x, there are 1377

|Gx | couples with x as the second element. Hence the following equality holds: X X N= stabg (X) = |Gx |. g∈G

x∈X

From the orbit-stabilizer theorem it follows that: X 1 . N = |G| |G(x)| x∈X Since all the x belonging to the same orbit G(x) contribute with |G(x)| in the sum, then therefore

P

x∈X

1 =1 |G(x)|

1/|G(x)| precisely equals the number of distinct orbits s. We have X stabg (X) = |G|s, g∈G

which proves the theorem.

Version: 2 Owner: n3o Author(s): n3o

351.6

stabilizer

Let G be a group, X a set, and · : G × X −→ X a group action. For any subset S of X, the stabilizer of S, denoted Stab(S), is the subgroup Stab(S) := {g ∈ G | g · s ∈ Sfor all s ∈ S}. The stabilizer of a single point x in X is often denoted Gx . Version: 3 Owner: djao Author(s): djao

1378

Chapter 352 20M99 – Miscellaneous 352.1

a semilattice is a commutative band

This note explains how a semilattice is the same as a commutative band. Let S be a semilattice, with partial order < and each pair of elements x and y having a greatest lower bound x ∧ y. Then it is easy to see that the operation ∧ defines a binary operation on S which makes it a commutative semigroup, and that every element is idempotent since x ∧ x = x. Conversely, if S is such a semigroup, define x 6 y iff x = xy. Again, it is easy to see that this defines a partial order on S, and that greatest lower bounds exist with respect to this partial order, and that in fact x ∧ y = xy. Version: 3 Owner: mclase Author(s): mclase

352.2

adjoining an identity to a semigroup

It is possible to formally adjoin an identity element to any semigroup to make it into a monoid. S Suppose S is a semigroup without an identity, and consider the set S {1} where 1 is S a symbol not in S. Extend the semigroup operation from S to S {1} by additionally defining: [ s · 1 = s = 1 · s, for alls ∈ S 1 It is easy to verify that this defines a semigroup (associativity is the only thing that needs to be checked).

1379

As a matter of notation, it is customary to write S 1 for the semigroup S with an identity adjoined in this manner, if S does not already have one, and to agree that S 1 = S, if S does already have an identity. Despite the simplicity of this construction, however, it rarely allows one to simplify a problem by considering monoids instead of semigroups. As soon as one starts to look at the structure of the semigroup, it is almost invariably the case that one needs to consider subsemigroups and ideals of the semigroup which do not contain the identity. Version: 2 Owner: mclase Author(s): mclase

352.3

band

A band is a semigroup in which every element is idempotent. A commutative band is called a semilattice. Version: 1 Owner: mclase Author(s): mclase

352.4

bicyclic semigroup

The bicyclic semigroup C(p, q) is the monoid generated by {p, q} with the single relation pq = 1. The elements of C(p, q) are all words of the form q n pm for m, n > 0 (with the understanding p0 = q 0 = 1). These words are multiplied as follows: ( q n+k−m pl if m 6 k, q n pm q k pl = q n pl+m−k if m > k. It is apparent that C(p, q) is simple, for if q n pm is an element of C(p, q), then 1 = pn (q n pm )q m and so S 1 q n pm S 1 = S. It is useful to picture some further properties of C(p, q) by arranging the elements in a table: 1 p p2 p3 p4 2 3 q qp qp qp qp4 q 2 q 2 p q 2 p2 q 2 p3 q 2 p4 q 3 q 3 p q 3 p2 q 3 p3 q 3 p4 q 4 q 4 p q 4 p2 q 4 p3 q 4 p4 .. .. .. .. .. . . . . . 1380

... ... ... ... ... .. .

Then the elements below any horizontal line drawn through this table form a right ideal and the elements to the right of any vertical line form a left ideal. Further, the elements on the diagonal are all idempotents and their standard ordering is 1 > qp > q 2 p2 > q 3 p3 > · · · . Version: 3 Owner: mclase Author(s): mclase

352.5

congruence

Let S be a semigroup. An equivalence relation ∼ defined on S is called a congruence if it is preserved under the semigroup operation. That is, for all x, y, z ∈ S, if x ∼ y then xz ∼ yz and zx ∼ zy. If ∼ satisfies only x ∼ y implies xz ∼ yz (resp. zx ∼ zy) then ∼ is called a right congruence (resp. left congruence). Example 19. Suppose f : S → T is a semigroup homomorphism. Define ∼ by x ∼ y iff f (x) = f (y). Then it is easy to see that ∼ is a congruence. If ∼ is a congruence, defined on a semigroup S, write [x] for the equivalence class of x under ∼. Then it is easy to see that [x] · [y] = [xy] is a well-defined operation on the set of equivalence classes, and that in fact this set becomes a semigroup with this operation. This semigroup is called the quotient of S by ∼ and is written S/ ∼. Thus semigroup congruences are related to homomorphic images of semigroups in the same way that normal subgroups are related to homomorphic images of groups. More precisely, in the group case, the congruence is the coset relation, rather than the normal subgroup itself. Version: 3 Owner: mclase Author(s): mclase

352.6

cyclic semigroup

A semigroup which is generated by a single element is called a cyclic semigroup. Let S = hxi be a cyclic semigroup. Then as a set, S = {xn | n > 0}. If all powers of x are distinct, then S = {x, x2 , x3 , . . . } is (countably) infinite. Otherwise, there is a least integer n > 0 such that xn = xm for some m < n. It is clear then that the elements x, x2 , . . . , xn−1 are distinct, but that for any j ≥ n, we must have xj = xi for some i, m ≤ i ≤ n − 1. So S has n − 1 elements. 1381

Unlike in the group case, however, there are in general multiple non-isomorphic cyclic semigroups with the same number of elements. In fact, there are t non-isomorphic cyclic semigroups with t elements: these correspond to the different choices of m in the above (with n = t + 1). The integer m is called the index of S, and n − m is called the period of S. The elements K = {xm , xm+1 , . . . , xn−1 } are a subsemigroup of S. In fact, K is a cyclic group. A concrete representation of the semigroup with index m and period r as a semigroup of transformations can be obtained as follows. Let X = {1, 2, 3, . . . , m + r}. Let   1 2 3 ... m + r −1 m + r . φ= 2 3 4 ... m+r r+1 Then φ generates a subsemigroup S of the full semigroup of transformations TX , and S is cyclic with index m and period r. Version: 3 Owner: mclase Author(s): mclase

352.7

idempotent

An element x of a ring is called an idempotent element, or simply an idempotent if x2 = x. The set of idempotents of a ring can be partially ordered by putting e ≤ f iff e = ef = f e. The element 0 is a minimum element in this partial order. If the ring has an identity element, 1, then 1 is a maximum element in this partial order. Since these definitions refer only to the multiplicative structure of the ring, they also hold for semigroups (with the proviso, of course, that a semigroup may not have a zero element). In the special case of a semilattice, this partial order is the same as the one described in the entry for semilattice. If a ring has an identity, then 1 − e is always an idempotent whenever e is an idempotent, and e(1 − e) = (1 − e)e = 0. In a ring with an identity, two idempotents e and f are called a pair of orthogonal idempotents if e + f = 1, and ef = f e = 0. Obviously, this is just a fancy way of saying that f = 1 − e. More generally, a set {e1 , e2 , . . . , en } of idempotents is called a complete set of orthogonal idempotents if ei ej = ej ei = 0 whenever i 6= j and if 1 = e1 + e2 + · · · + en . 1382

Version: 3 Owner: mclase Author(s): mclase

352.8

null semigroup

A left zero semigroup is a semigroup in which every element is a left zero element. In other words, it is a set S with a product defined as xy = x for all x, y ∈ S. A right zero semigroup is defined similarly. Let S be a semigroup. Then S is a null semigroup if it has a zero element and if the product of any two elements is zero. In other words, there is an element θ ∈ S such that xy = θ for all x, y ∈ S. Version: 1 Owner: mclase Author(s): mclase

352.9

semigroup

A semigroup G is a set together with a binary operation · : G × G −→ G which satisfies the associative property: (a · b) · c = a · (b · c) for all a, b, c ∈ G. Version: 2 Owner: djao Author(s): djao

352.10

semilattice

A lower semilattice is a partially ordered set S in which each pair of elements has a greatest lower bound. A upper semilattice is a partially ordered set S in which each pair of elements has a least upper bound. Note that it is not normally necessary to distinguish lower from upper semilattices, because one may be converted to the other by reversing the partial order. It is normal practise to refer to either structure as a semilattice and it should be clear from the context whether greatest lower bounds or least upper bounds exist. Alternatively, a semilattice can be considered to be a commutative band, that is a semigroup which is commutative, and in which every element is idempotent. In this context, semilattices are important elements of semigroup theory and play a key role in the structure theory of commutative semigroups. A partially ordered set which is both a lower semilattice and an upper semilattice is a lattice. 1383

Version: 3 Owner: mclase Author(s): mclase

352.11

subsemigroup,, submonoid,, and subgroup

Let S be a semigroup, and let T be a subset of S. T is a subsemigroup of S if T is closed under the operation of S; that it if xy ∈ T for all x, y ∈ T . T is a submonoid of S if T is a subsemigroup, and T has an identity element. T is a subgroup of S if T is a submonoid which is a group. Note that submonoids and subgroups do not have to have the same identity element as S itself (indeed, S may not have an identity element). The identity element may be any idempotent element of S. Let e ∈ S be an idempotent element. Then there is a maximal subsemigroup of S for which e is the identity: eSe = {exe | x ∈ S}.

In addition, there is a maximal subgroup for which e is the identity:

U(eSe) = {x ∈ eSe | ∃y ∈ eSe st xy = yx = e}. Subgroups with different identity elements are disjoint. To see this, suppose that G and H are subgroups of a semigroup S with identity elements e and f respectively, and suppose T x ∈ G H. Then x has an inverse y ∈ G, and an inverse z ∈ H. We have: e = xy = f xy = f e = zxe = zx = f.

Thus intersecting subgroups have the same identity element. Version: 2 Owner: mclase Author(s): mclase

352.12

zero elements

Let S be a semigroup. An element z is called a right zero [resp. left zero] if xz = z [resp. zx = z] for all x ∈ S. An element which is both a left and a right zero is called a zero element. A semigroup may have many left zeros or right zeros, but if it has at least one of each, then they are necessarily equal, giving a unique (two-sided) zero element. 1384

It is customary to use the symbol θ for the zero element of a semigroup. Version: 1 Owner: mclase Author(s): mclase

1385

Chapter 353 20N02 – Sets with a single binary operation (groupoids) 353.1

groupoid

A groupoid G is a set together with a binary operation · : G × G −→ G. The groupoid (or “magma”) is closed under the operation. There is also a separate, category-theoretic definition of “groupoid.” Version: 7 Owner: akrowne Author(s): akrowne

353.2

idempotency

If (S, ∗) is a magma, then an element x ∈ S is said to be idempotent if x ∗ x = x. If every element of S is idempotent, then the binary operation ∗ (or the magma itself) is said to be idempotent. For example, the ∧ and ∨ operations in a lattice are idempotent, because x ∧ x = x and x ∨ x = x for all x in the lattice. A function f : D → D is said to be idempotent if f ◦ f = f . (This is just a special case of the above definition, the magma in question being (D D , ◦), the monoid of all functions from D to D, with the operation of function composition.) In other words, f is idempotent iff repeated application of f has the same effect as a single application: f (f (x)) = f (x) for all x ∈ D. An idempotent linear transformation from a vector space to itself is called a projection. Version: 12 Owner: yark Author(s): yark, Logan

1386

353.3

left identity and right identity

Let G be a groupoid. An element e ∈ G is called a left identity element if ex = x for all x ∈ G. Similarly, e is a right identity element if xe = x for all x ∈ G. An element which is both a left and a right identity is an identity element. A groupoid may have more than one left identify element: in fact the operation defined by xy = y for all x, y ∈ G defines a groupoid (in fact, a semigroup) on any set G, and every element is a left identity. But as soon as a groupoid has both a left and a right identity, they are necessarily unique and equal. For if e is a left identity and f is a right identity, then f = ef = e. Version: 2 Owner: mclase Author(s): mclase

1387

Chapter 354 20N05 – Loops, quasigroups 354.1

Moufang loop

Proposition: Let Q be a nonempty quasigroup. I) The following conditions are equivalent. (x(yz))x ((xy)z)x (yx)(zy) y(x(yz))

= = = =

(xy)(zx) x(y(zx)) (y(xz))y ((yx)y)z

∀x, y, z ∈ Q ∀x, y, z ∈ Q ∀x, y, z ∈ Q ∀x, y, z ∈ Q

(354.1.1) (354.1.2) (354.1.3) (354.1.4)

II) If Q satisfies those conditions, then Q has an identity element (i.e. Q is a loop). For a proof, we refer the reader to the two references. Kunen in [1] shows that that any of the four conditions implies the existence of an identity element. And Bol and Bruck [2] show that the four conditions are equivalent for loops. Definition:A nonempty quasigroup satisfying the conditions (1)-(4) is called a Moufang quasigroup or, equivalently, a Moufang loop (after Ruth Moufang, 1905-1977). The 16-element set of unit octonians over Z is an example of a nonassociative Moufang loop. Other examples appear in projective geometry, coding theory, and elsewhere. References [1] K. Kunen Moufang Quasigroups (PostScript format) (=Moufang Quasigroups, J. Algebra 83 (1996) 231-234) [2] R. H. Bruck, A Survey of Binary Systems, Springer-Verlag, 1958 1388

Version: 3 Owner: yark Author(s): Larry Hammick

354.2

loop and quasigroup

A quasigroup is a groupoid G with the property that for every x, y ∈ G, there are unique elements w, z ∈ G such that xw = y and zx = y. A loop is a quasigroup which has an identity element. What distinguishes a loop from a group is that the former need not satisfy the associative law. Version: 1 Owner: mclase Author(s): mclase

1389

Chapter 355 22-00 – General reference works (handbooks, dictionaries, bibliographies, etc.) 355.1

fixed-point subspace

Let Σ ⊂ Γ be a subgroup where Γ is a compact Lie group acting on a vector space V . The fixed-point subspace of Σ is Fix(Σ) = {x ∈ V | σx = x, ∀σ ∈ Σ} Fix(Σ) is a linear subspace of V since Fix(Σ) =

\

σ∈Σ

ker(σ − I)

where I is the identity. If it is important to specify the space V we use the following notation FixV (Σ).

REFERENCES [GSS] Golubsitsky, Matin. Stewart, Ian. Schaeffer, G. David: Singularities and Groups in Bifurcation Theory (Volume II). Springer-Verlag, New York, 1988.

Version: 1 Owner: Daume Author(s): Daume

1390

Chapter 356 22-XX – Topological groups, Lie groups 356.1

Cantor space

Cantor space denoted C is the set of all infinite binary sequences with the product topology. It is a perfect Polish space. It is a compact subspace of Baire space, which is the set of all infinite sequences of integers with the natural product topology.

REFERENCES 1. Moschovakis, Yiannis N. Descriptive set theory theory, 1980, Amsterdam ; New York : NorthHolland Pub. Co.

Version: 8 Owner: xiaoyanggu Author(s): xiaoyanggu

1391

Chapter 357 22A05 – Structure of general topological groups 357.1

topological group

A topological group is a triple (G, ·, T) where (G, ·) is a group and T is a topology on G such that under T, the group operation (x, y) 7→ x · y is continuous with respect to the product topology on G × G and the inverse map x 7→ x−1 is continuous on G. Version: 3 Owner: Evandar Author(s): Evandar

1392

Chapter 358 22C05 – Compact groups 358.1

n-torus

The n-Torus, denoted T n , is a smooth orientable n dimensional manifold which is the product 1 of n 1-spheres, i.e. T n = S · · × S}1 . | × ·{z n

Equivalently, the n-Torus can be considered to be Rn modulo the action (vector addition) of the integer lattice Zn . The n-Torus is in addition a topological group. If we think of S 1 as the unit circle in C 1 and T n = S · · × S}1 , then S 1 is a topological group and so is T n by coordinate-wise | × ·{z n

multiplication. That is,

(z1 , z2 , . . . , zn ) · (w1 , w2 , . . . , wn ) = (z1 w1 , z2 w2 , . . . , zn wn ) Version: 2 Owner: ack Author(s): ack, apmxi

358.2

reductive

Let G be a Lie group or algebraic group. G is called reductive over a field k if every representation of G over k is completely reducible For example, a finite group is reductive over a field k if and only if its order is not divisible by the characteristic of k (by Maschke’s theorem). A complex Lie group is reductive if and only if it is a direct product of a semisimple group and an algebraic torus. Version: 3 Owner: bwebste Author(s): bwebste 1393

Chapter 359 22D05 – General properties and structure of locally compact groups 359.1

Γ-simple

A representation V of Γ is Γ-simple if either • V ∼ = W1 ⊕ W2 where W1 , W2 are absolutely irreducible for Γ and are Γ-isomorphic, or • V is non-absolutely irreducible for Γ. [GSS]

REFERENCES [GSS] Golubsitsky, Matin. Stewart, Ian. Schaeffer, G. David.: Singularities and Groups in Bifurcation Theory (Volume II). Springer-Verlag, New York, 1988.

Version: 1 Owner: Daume Author(s): Daume

1394

Chapter 360 22D15 – Group algebras of locally compact groups 360.1

group C ∗-algebra

Let C[G] be the group ring of a discrete group G. It has two completions to a C ∗ -algebra: Reduced group C ∗ -algebra. The reduced group C ∗ -algebra, Cr∗ (G), is obtained by completing C[G] in the operator norm for its regular representation on l2 (G). ∗ Maximal group C ∗ -algebra. The maximal group C ∗ -algebra, Cmax (G) or just C ∗ (G), is defined by the following universal property: any *-homomorphism from C[G] to some B(H) (the C ∗ -algebra of bounded operators on some Hilbert space H) factors through ∗ the inclusion C[G] ,→ Cmax (G). ∗ (G). If G is amenable then Cr∗ (G) ∼ = Cmax

Version: 3 Owner: mhale Author(s): mhale

1395

Chapter 361 22E10 – General properties and structure of complex Lie groups 361.1

existence and uniqueness of compact real form

Let G be a semisimple complex Lie group. Then there exists a unique (up to isomorphism) real Lie group K such that K is compact and a real form of G. Conversely, if K is compact, semisimple and real, it is the real form of a unique semisimple complex Lie group G. The group K can be realized as the set of fixed points of a special involution of G, called the Cartan involution. For example, the compact real form of SLn C, the complex special linear group, is SU(n), the special unitary group. Note that SLn R is also a real form of SLn C, but is not compact. The compact real form of SOn C, the complex special orthogonal group, is SOn R, the real orthogonal group. SOn C also has other, non-compact real forms, called the pseudo-orthogonal groups. The compact real form of Sp2n C, the complex symplectic group, is less well-known. It is (unfortunately) also usually denoted Sp(2n), and consists of n × n “unitary” quaternion matrices, that is, Sp(2n) = {M ∈ GLn H|MM ∗ = I}

where M ∗ denotes M conjugate transpose. This different from the real symplectic group Sp2n R. Version: 2 Owner: bwebste Author(s): bwebste

1396

361.2

maximal torus

Let K be a compact group, and let t ∈ K be an element whose centralizer has minimal dimension (such elements are dense in K). Let T be the centralizer of t. This subgroup is closed since T = ϕ−1 (t) where ϕ : K → K is the map k 7→ ktk −1 , and abelian since it is the intersection of K with the Cartan subgroup of its complexification, and hence a torus, since K (and thus T ) is compact. We call T a maximal torus of K. This term is also applied to the corresponding maximal abelian subgroup of a complex semisimple group, which is an algebraic torus. Version: 2 Owner: bwebste Author(s): bwebste

361.3

Lie group

A Lie group is a group endowed with a compatible analytic structure. To be more precise, Lie group structure consists of two kinds of data • a finite-dimensional, real-analytic manifold G • and two analytic maps, one for multiplication G×G → G and one for inversion G → G, which obey the appropriate group axioms. Thus, a homomorphism in the category of Lie groups is a group homomorphism that is simultaneously an analytic mapping between two real-analytic manifolds. Next, we describe a natural construction that associates a certain Lie algebra g to every Lie group G. Let e ∈ G denote the identity element of G. For g ∈ G let λg : G → G denote the diffeomorphisms corresponding to left multiplication by g. Definition 9. A vector-field V on G is called left-invariant if V is invariant with respect to all left multiplications. To be more precise, V is left-invariant if and only if (λg )∗ (V ) = V (see push-forward of a vector-field) for all g ∈ G.

Proposition 15. The vector-field bracket of two left-invariant vector fields is again, a leftinvariant vector field. Proof. Let V1 , V2 be left-invariant vector fields, and let g ∈ G. The bracket operation is covariant with respect to diffeomorphism, and in particular (λg )∗ [V1 , V2 ] = [(λg )∗ V1 , (λg )∗ V2 ] = [V1 , V2 ]. 1397

Q.E.D. Definition 10. The Lie algebra of G, denoted hereafter by g, is the vector space of all left-invariant vector fields equipped with the vector-field bracket. Now a right multiplication is invariant with respect to all left multiplications, and it turns out that we can characterize a left-invariant vector field as being an infinitesimal right multiplication. Proposition 16. Let a ∈ Te G and let V be a left-invariant vector-field such that Ve = a. Then for all g ∈ G we have Vg = (λg )∗ (a). The intuition here is that a gives an infinitesimal displacement from the identity element and that Vg is gives a corresponding infinitesimal right displacement away from g. Indeed consider a curve γ : (−, ) → G passing through the identity element with velocity a; i.e. γ(0) = e,

γ 0 (0) = a.

The above proposition is then saying that the curve t 7→ gγ(t),

t ∈ (−, )

passes through g at t = 0 with velocity Vg . Thus we see that a left-invariant vector-field is completely determined by the value it takes at e, and that therefore g is isomorphic, as a vector space to Te G. Of course, we can also consider the Lie algebra of right-invariant vector fields. The resulting Lie-algebra is anti-isomorphic (the order in the bracket is reversed) to the Lie algebra of left-invariant vector fields. Now it is a general principle that the group inverse operation gives an anti-isomorphism between left and right group actions. So, as one may well expect, the anti-isomorphism between the Lie algebras of left and right-invariant vector fields can be realized by considering the linear action of the inverse operation on Te G. Finally, let us remark that one can induce the Lie algebra structure directly on Te G by considering adjoint action of G on Te G. Examples. [Coming soon.]

1398

Notes. 1. No generality is lost in assuming that a Lie group has analytic, rather than C ∞ or even C k , k = 1, 2, . . . structure. Indeed, given a C 1 differential manifold with a C 1 multiplication rule, one can show that the exponential mapping endows this manifold with a compatible real-analytic structure. Indeed, one can go even further and show that even C 0 suffices. In other words, a topological group that is also a finite-dimensional topological manifold possesses a compatible analytic structure. This result was formulated by Hilbert as his fifth problem, and proved in the 50’s by Montgomery and Zippin. 2. One can also speak of a complex Lie group, in which case G and the multiplication mapping are both complex-analytic. The theory of complex Lie groups requires the notion of a holomorphic vector-field. Not withstanding this complication, most of the essential features of the real theory carry over to the complex case. 3. The name “Lie group” honours the Norwegian mathematician Sophus Lie who pioneered and developed the theory of continuous transformation groups and the corresponding theory of Lie algebras of vector fields (the group’s infinitesimal generators, as Lie termed them). Lie’s original impetus was the study of continuous symmetry of geometric objects and differential equations. The scope of the theory has grown enormously in the 100+ years of its existence. The contributions of Elie Cartan and Claude Chevalley figure prominently in this evolution. Cartan is responsible for the celebrated ADE classification of simple Lie algebras, as well as for charting the essential role played by Lie groups in differential geometry and mathematical physics. Chevalley made key foundational contributions to the analytic theory, and did much to pioneer the related theory of algebraic groups. Armand Borel’s book “Essays in the History of Lie groups and algebraic groups” is the definitive source on the evolution of the Lie group concept. Sophus Lie’s contributions are the subject of a number of excellent articles by T. Hawkins. Version: 6 Owner: rmilson Author(s): rmilson

361.4

complexification

Let G be a real Lie group. Then the complexification GC of G is the unique complex Lie group equipped with a map ϕ : G → GC such that any map G → H where H is a complex Lie group, extends to a holomorphic map GC → H. If g and gC are the respective Lie algebras, gC ∼ = g ⊗R C. For simply connected groups, the construction is obvious: we simply take the simply connected complex group with Lie algebra gC , and ϕ to be the map induced by the inclusion g → gC . 1399

If γ ∈ G is central, then its image is in central in GC since g 7→ γgγ −1 is a map extending ϕ, and thus must be the identity by uniqueness half of the universal property. Thus, if Γ ⊂ G is a discrete central subgroup, then we get a map G/Γ → GC /ϕ(Γ), which gives a complexification for G/Γ. Since every Lie group is of this form, this shows existence. Some easy examples: the complexification both of SLn R and SU(n) is SLn C. The complexification of R is C and of S 1 is C∗ . The map ϕ : G → GC is not always injective. For example, if G is the universal cover of SLn R (which has fundamental group Z), then GC ∼ = SLn C, and ϕ factors through the covering G → SLn R. Version: 3 Owner: bwebste Author(s): bwebste

361.5

Hilbert-Weyl theorem

theorem: Let Γ be a compact Lie group acting on V . Then there exists a finite Hilbert basis for the ring P(Γ) (the set of invariant polynomials). [GSS] proof: In [GSS] on page 54. theorem:(as stated by Hermann Weyl) The (absolute) invariants J(x, y, . . .) corresponding to a given set of representations of a finite or a compact Lie group have a finite integrity basis. [PV] proof: In [PV] on page 274.

REFERENCES [GSS] Golubsitsky, Matin. Stewart, Ian. Schaeffer, G. David.: Singularities and Groups in Bifurcation Theory (Volume II). Springer-Verlag, New York, 1988. [HW] Hermann, Weyl: The Classical Groups: Their Invariants and Representations. Princeton University Press, New Jersey, 1946.

Version: 3 Owner: Daume Author(s): Daume 1400

361.6

the connection between Lie groups and Lie algebras

Given a finite dimensional Lie group G, it has an associated Lie algebra g = Lie(G). The Lie algebra encodes a great deal of information about the Lie group. I’ve collected a few results on this topic: Theorem 7. (Existence) Let g be a finite dimensional Lie algebra over R or C. Then there exists a finite dimensional real or complex Lie group G with Lie(G) = g. Theorem 8. (Uniqueness) There is a unique connected simply-connected Lie group G with any given finite-dimensional Lie algebra. Every connected Lie group with this Lie algebra is a quotient G/Γ by a discrete central subgroup Γ. Even more important, is the fact that the correspondence G 7→ g is functorial: given a homomorphism ϕ : G → H of Lie groups, there is natural homomorphism defined on Lie algebras ϕ∗ : g → h, which just the derivative of the map ϕ at the identity (since the Lie algebra is canonically identified with the tangent space at the identity). There are analogous existence and uniqueness theorems for maps: Theorem 9. (Existence) Let ψ : g → h be a homomorphism of Lie algebras. Then if G is the unique connected, simply-connected group with Lie algebra g, and H is any Lie group with Lie algebra h, there exists a homorphism of Lie groups ϕ : G → H with ϕ∗ = ψ. Theorem 10. (Uniqueness) Let G be connected Lie group and H an arbitrary Lie group. Then if two maps ϕ, ϕ0 : G → H induce the same maps on Lie algebras, then they are equal. Essentially, what these theorems tell us is the correspondence g 7→ G from Lie algebras to simply-connected Lie groups is functorial, and right adjoint to the functor H 7→ Lie(H) from Lie groups to Lie algebras. Version: 6 Owner: bwebste Author(s): bwebste

1401

Chapter 362 26-00 – General reference works (handbooks, dictionaries, bibliographies, etc.) 362.1

derivative notation

This is the list of known standard representations and their nuances. du df dy , , − dv dx dx

The most common notation, this is read as the derivative of u with respect d2 y to v. Exponents relate which derivative, for example, dx 2 is the second derivative of y with resspect to x. f 0 (x) , f~0 (x) , y 00− This is read as f prime of x. The number of primes tells the derivative, ie. f 000 (x) is the third derivative of f (x) with respect to x. Note that in higher dimensions, this may be a tensor of a rank equal to the derivative. Dx f (x), Fy (x), fxy (x)− These notations are rather arcane, and should not be used generally, as they have other meanings. For example Fy can easily by the y component of a vector-valued function. The subscript in this case means ”with respect to”, so Fyy would be the second derivative of F with respect to y. D1 f (x), F2 (x), f12 (x)− The subscripts in these cases refer to the derivative with respect to the nth variable. For example, F2 (x, y, z) would be the derivative of F with respect to y. They can easily represent higher derivatives, ie. D21 f (x) is the derivative with respect to the first variable of the derivative with respect to the second variable.

1402

, ∂f − The partial derivative of u with respect to v. This symbol can be manipulated as ∂x in du for higher partials. dv ∂u ∂v

d dv

∂ , ∂v − This is the operator version of the derivative. Usually you will see it acting on d (v 2 + 3u) = 2v. something such as dv

[Jf(x)] , [Df (x)]− The first of these represents the Jacobian of f, which is a matrix of partial derivatives such that   D1 f1 (x) . . . Dn f1 (x)   .. .. .. [Jf (x)] =   . . . D1 fm (x) . . . Dn fm (x)

where fn represents the nth function of a vector valued function. the second of these notations represents the derivative matrix, which in most cases is the Jacobian, but in some cases, does not exist, even though the Jacobian exists. Note that the directional derivative in the direction ~v is simply [Jf(x)]~v . Version: 7 Owner: slider142 Author(s): slider142

362.2

fundamental theorems of calculus

The Fundamental Theorems of Calculus serve to demonstrate that integration and differentiation are inverse processes. First Fundamental Theorem: Suppose that F is a differentiable function on the interval [a, b]. Then intba F 0 (x) dx = F (b) − F (a). Second Fundamental Theorem: Let f be a continuous function on the interval [a, b], let c be an arbitrary point in this interval and assume f is integrable on the intervals of the form [0, x] for all x ∈ [a, b]. Let F be defined as F (x) = intxc f (t) dt for every x in (a, b). Then, F is differentiable and F 0 (x) = f (x). This result is about Riemann integrals. When dealing with Lebesgue integrals we get a generalization with Lebesgue’s differentiation theorem. Version: 9 Owner: mathcam Author(s): drini, greg

1403

362.3

logarithm

Definition. Three real numbers x, y, p, with x, y > 0, are said to obey the logarithmic relation logx (y) = p if they obey the corresponding exponential relation: xp = y. Note that by the monotonicity and continuity property of the exponential operation, for given x and y there exists a unique p satisfying the above relation. We are therefore able to says that p is the logarithm of y relative to the base x.

Properties. There are a number of basic algebraic identities involving logarithms. logx (yz) = logx (y) + logx (z) logx (y/z) = logx (y) − logx (z) logx (y z ) = z logx (y) logx (1) = 0 logx (x) = 1 logx (y) logy (x) = 1 logx (z) logy (z) = logx (y) Notes. In essence, logarithms convert multiplication to addition, and exponentiation to multiplication. Historically, these properties of the logarithm made it a useful tool for doing numerical calculations. Before the advent of electronic calculators and computers, tables of logarithms and the logarithmic slide rule were essential computational aids. Scientific applications predominantly make use of logarithms whose base is the Eulerian number e = 2.71828 . . .. Such logarithms are called natural logarithms and are commonly denoted by the symbol ln, e.g. ln(e) = 1. Natural logarithms naturally give rise to the natural logarithm function. A frequent convention, seen in elementary mathematics texts and on calculators, is that logarithms that do not give a base explicitly are assumed to be base 10, e.g. log(100) = 2. This is far from universal. In Rudin’s “Real and Complex analysis”, for example, we see a baseless log used to refer to the natural logarithm. By contrast, computer science and 1404

information theory texts often assume 2 as the default logarithm base. This is motivated by the fact that log2 (N) is the approximate number of bits required to encode N different messages. The invention of logarithms is commonly credited to John Napier [ Biography] Version: 13 Owner: rmilson Author(s): rmilson

362.4

proof of the first fundamental theorem of calculus

Let us make a subdivison of the intervalP[a, b], ∆ : {a = x0 < x1 < x2 < · · · < xn−1 < xn = b} From this, we can say F (b) − F (a) = ni=1 [F (xi ) − F (xi−1 )].

¯) 3 From the mean-value theorem, we have that for any two points, x¯ and x¯, ∃ ξ ∈ (¯ x, x 0 F (x¯) − F (¯ x) = F (ξ)(x¯ − x¯) If we use xi as x¯ and xi−1 as x¯, calling our intermediate point ξi , we get F (xi ) − F (xi−1 ) = F 0 (ξi )(xi − xi−1 ). Combining these, and using the abbreviation ∆i x = xi − xi−1 , we have F (xi ) − F (xi−1 ) = Pn 0 i=1 F (ξi )∆i xi . P From the definition of an integral ∀  > 0 ∃ δ > 0 3 | ni=1 F 0 (ξi )∆i x − intba F 0 (x) dx| <  when k∆k < δ. Thus, ∀ > 0, |F (b) − F (a) − intba F 0 (x) dx| < . lim→0 |F (b) − F (a) − intba F 0 (x) dx| = 0, but F (b) − F (a) − intba F 0 (x) dx is constant with respect to , which can only mean that |F (b) − F (a) − int ba F 0 (x) dx| = 0, and so we have the first fundamental theorem of calculus F (b) − F (a) = intba F 0 (x) dx. Version: 4 Owner: greg Author(s): greg

362.5

proof of the second fundamental theorem of calculus

Recall that a continuous function is Riemann integrable, so the integral F (x) = intxc f (t) dt is well defined. Consider the increment of F : F (x + h) − F (x) = intcx+h f (t) dt − intxc f (t) dt = intx+h x f (t) dt 1405

(we have used the linearity of the integral with respect to the function and the additivity with respect to the domain). Now let M be the maximum of f on [x, x + h] and m be the minimum. Clearly we have mh ≤ intxx+h f (t) dt ≤ Mh (this is due to the monotonicity of the integral with respect to the integrand) which can be written as F (x + h) − F (x) intxx+h f (t) dt = ∈ [m, M] h h Being f continuous, by the mean-value theorem, there exists ξh ∈ [x, x+h] such that f (ξh ) = F (x+h)−F (x) so that h F (x + h) − F (x) = lim f (ξh ) = f (x) h→0 h→0 h

F 0 (x) = lim since ξh → x as h → 0.

Version: 1 Owner: paolini Author(s): paolini

362.6

root-mean-square

If x1 , x2 , . . . , xn are real numbers, we define their root-mean-square or quadratic mean as r x21 + x22 + · · · + x2n R(x1 , x2 , . . . , xn ) = . n The root-mean-square of a random variable X is defined as the square root of the expectation of X 2 : p R(X) = E(X 2 )

If X1 , X2 , . . . , Xn are random variables with standard deviations σ1 , σ2 , . . . , σn , then the standard deviation of their arithmetic mean, X1 +X2n+···+Xn , is the root-mean-square of σ1 , σ2 , . . . , σn . Version: 1 Owner: pbruin Author(s): pbruin

362.7

square

The square of a number x is the number obtained multiplying x by itself. It’s denoted as x2 .

1406

Some examples: 52  2 1 3 02 .52

= 25 1 = 9 = 0 = .25

Version: 2 Owner: drini Author(s): drini

1407

Chapter 363 26-XX – Real functions 363.1

abelian function

An abelian or hyperelliptic function is a generalisation of an elliptic function. It is a function of two variables with four periods. In a similar way to an elliptic function it can also be regarded as the inverse function to certain integrals (called abelian or hyperelliptic integrals) of the form dz int R(z) where R is a polynomial of degree greater than 4. Version: 2 Owner: vladm Author(s): vladm

363.2

full-width at half maximum

The full-width at half maximum (FWHM) is a parameter used to describe the width of a bump on a function (or curve). The FWHM is given by the distance beteen the points where the function reaches half of its maximum value. For example: the function f (x) =

10 . +1

x2

f reaches its maximum for x = 0,(f (0) = 10), so f reaches half of its maximum value for x = 1 and x = −1 (f (1) = f (−1) = 5). So the FWHM for f , in this case, is 2. Beacouse the distance between A(1, 5) and B(−1, 5) si 2.

1408

The function

10 . +1 is called ’The Agnesi curve’, from Maria Gaetana Agnesi (1718 - 1799). f (x) =

x2

Version: 2 Owner: vladm Author(s): vladm

1409

Chapter 364 26A03 – Foundations: limits and generalizations, elementary topology of the line 364.1

Cauchy sequence

A sequence x0 , x1 , x2 , . . . in a metric space (X, d) is a Cauchy sequence if, for every real number  > 0, there exists a natural number N such that d(xn , xm ) <  whenever n, m > N. Version: 4 Owner: djao Author(s): djao, rmilson

364.2

Dedekind cuts

The purpose of Dedekind cuts is to provide a sound logical foundation for the real number system. Dedekind’s motivation behind this project is to notice that a real number α, intuitively, is completely determined by the rationals strictly smaller than α and those strictly larger than α. Concerning the completeness or continuity of the real line, Dedekind notes in [2] that If all points of the straight line fall into two classes such that every point of the first class lies to the left of every point of the second class, then there exists one and only one point which produces this division of all points into two classes, this severing of the straight line into two portions. Dedekind defines a point to produce the division of the real line if this point is either the least or greatest element of either one of the classes mentioned above. He further notes that 1410

the completeness property, as he just phrased it, is deficient in the rationals, which motivates the definition of reals as cuts of rationals. Because all rationals greater than α are really just excess baggage, we prefer to sway somewhat from Dedekind’s original definition. Instead, we adopt the following definition. Definition 34. A Dedekind cut is a subset α of the rational numbers Q that satisfies these properties: 1. α is not empty. 2. Q \ α is not empty. 3. α contains no greatest element 4. For x, y ∈ Q, if x ∈ α and y < x, then y ∈ α as well. Dedekind cuts are particularly appealing for two reasons. First, they make it very easy to prove the completeness, or continuity of the real line. Also, they make it quite plain to distinguish the rationals from the irrationals on the real line, and put the latter on a firm logical foundation. In the construction of the real numbers from Dedekind cuts, we make the following definition: Definition 35. A real number is a Dedekind cut. We denote the set of all real numbers by R and we order them by set-theoretic inclusion, that is to say, for any α, β ∈ R, α < β if and only if α ⊂ β where the inclusion is strict. We further define α = β as real numbers if α and β are equal as sets. As usual, we write α 6 β if α < β or α = β. Moreover, a real number α is said to be irrational if Q \ α contains no least element. The Dedekind completeness property of real numbers, expressed as the supremum property, now becomes straightforward to prove. In what follows, we will reserve Greek variables for real numbers, and Roman variables for rationals. Theorem 11. Every nonempty subset of real numbers that is bounded above has a least upper bound. L et A be a nonempty set of real numbers, such that for every α ∈ A we have that α 6 γ for some real number γ. Now define the set

sup A =

[

α.

α∈A

We must show that this set is a real number. This amounts to checking the four conditions of a Dedekind cut. 1411

1. sup A is clearly not empty, for it is the nonempty union of nonempty sets. 2. Because γ is a real number, there is some rational x that is not in γ. Since every α ∈ A is a subset of γ, x is not in any α, so x 6∈ sup A either. Thus, Q \ sup A is nonempty. 3. If sup A had a greatest element g, then g ∈ α for some α ∈ A. Then g would be a greatest element of α, but α is a real number, so by contrapositive, sup A has no greatest element. 4. Lastly, if x ∈ sup A, then x ∈ α for some α, so given any y < x because α is a real number y ∈ α, whence y ∈ sup A. Thus, sup A is a real number. Trivially, sup A is an upper bound of A, for every α ⊆ sup A. It now suffices to prove that sup A 6 γ, because γ was an arbitrary upper bound. But this is easy, because every x ∈ sup A is an element of α for some α ∈ A, so because α ⊆ γ, x ∈ γ. Thus, sup A is the least upper bound of A. We call this real number the supremum of A. To finish the construction of the real numbers, we must endow them with algebraic operations, define the additive and multiplicative identity elements, prove that these definitions give a field, and prove further results about the order of the reals (such as the totality of this order) – in short, build a complete ordered field. This task is somewhat laborious, but we include here the appropriate definitions. Verifying their correctness can be an instructive, albeit tiresome, exercise. We use the same symbols for the operations on the reals as for the rational numbers; this should cause no confusion in context. Definition 36. Given two real numbers α and β, we define • The additive identity, denoted 0, is 0 := {x ∈ Q : x < 0} • The multiplicative identity, denoted 1, is 1 := {x ∈ Q : x < 1} • Addition of α and β denoted α + β is α + β := {x + y : x ∈ α, y ∈ β} • The opposite of α, denoted −α, is −α := {x ∈ Q : −x 6∈ α, but − x is not the least element of Q \ α} • The absolute value of α, denoted |α|, is ( α, if α > 0 |α| := −α, if α 6 0 1412

• If α, β > 0, then multiplication of α and β, denoted α · β, is

α · β := {z ∈ Q : z 6 0 or z = xy for some x ∈ α, y ∈ β with x, y > 0}

In general,

  if α = 0 or β = 0 0, α · β := |α| · |β| if α > 0, β > 0 or α < 0, β < 0   −(|α| · |β|) if α > 0, β < 0 or α > 0, β < 0

• The inverse of α > 0, denoted α−1 , is

α−1 := {x ∈ Q : x 6 0 or x > 0 and (1/x) 6∈ α, but 1/x is not the least element of Q\α} If α < 0, α−1 := −(|α|)−1

All that remains (!) is to check that the above definitions do indeed define a complete ordered field, and that all the sets implied to be real numbers are indeed so. The properties of R as an ordered field follow from these definitions and the properties of Q as an ordered field. It is important to point out that in two steps, in showing that inverses and opposites are properly defined, we require an extra property of Q, not merely in its capacity as an ordered field. This requirement is the Archimedean property. Moreover, because R is a field of characteristic 0, it contains an isomorphic copy of Q. The rationals correspond to the Dedekind cuts α for which Q \ α contains a least member.

REFERENCES 1. Courant, Richard and Robbins, Herbert. What is Mathematics? pp. 68-72 Oxford University Press, Oxford, 1969 2. Dedekind, Richard. Essays on the Theory of Numbers Dover Publications Inc, New York 1963 3. Rudin, Walter Principles of Mathematical Analysis pp. 17-21 McGraw-Hill Inc, New York, 1976 4. Spivak, Michael. Calculus pp. 569-596 Publish or Perish, Inc. Houston, 1994

Version: 20 Owner: rmilson Author(s): rmilson, NeuRet

364.3

binomial proof of positive integer power rule

We will use the difference quotient in this proof of the power rule for positive integers. Let f (x) = xn for some integer n > 0. Then we have (x + h)n − xn . h→0 h

f 0 (x) = lim

1413

We can use the binomial theorem to expand the numerator n C0n x0 hn + C1n x1 hn−1 + · · · + Cn−1 xn−1 h1 + Cnn xn h0 − xn h→0 h

f 0 (x) = lim where Ckn =

n! . k!(n−k)!

We can now simplify the above hn + nxhn−1 + · · · + nxn−1 h + xn − xn f (x) = lim h→0 h = lim (hn−1 + nxhn−2 + · · · + nxn−1 ) 0

h→0 n−1

= nx = nxn−1 . Version: 4 Owner: mathcam Author(s): mathcam, slider142

364.4

exponential

Preamble. We use R+ ⊂ R to denote the set of non-negative real numbers. Our aim is to define the exponential, or the generalized power operation, xp ,

x ∈ R+ , p ∈ R.

The power p in the above expression is called the exponent. We take it as proven that R is a complete, ordered field. No other properties of the real numbers are invoked. Definition. For x ∈ R+ and n ∈ Z we define xn in terms of repeated multiplication. To be more precise, we inductively characterize natural number powers as follows: x0 = 1,

xn+1 = x · xn ,

n ∈ N.

The existence of the reciprocal is guaranteed by the assumption that R is a field. Thus, for negative exponents, we can define x−n = (x−1 )n ,

n ∈ N,

where x−1 is the reciprocal of x. The case of arbitrary exponents is somewhat more complicated. A possible strategy is to define roots, then rational powers, and then extend by continuity. Our approach is different. For x ∈ R+ and p ∈ R, we define the set of all reals that one would want to be smaller than xp , and then define the latter as the least upper bound of this set. To be more precise, let x > 1 and define L(x, p) = {z ∈ R+ : z n < xm for all m ∈ Z, n ∈ N such that m < pn}. 1414

We then define xp to be the least upper bound of L(x, p). For x < 1 we define xp = (x−1 )p . The exponential operation possesses a number of important properties, some of which characterize it up to uniqueness. Note. It is also possible to define the exponential operation in terms of the exponential function and the natural logarithm. Since these concepts require the context of differential theory, it seems preferable to give a basic definition that relies only on the foundational property of the reals. Version: 11 Owner: rmilson Author(s): rmilson

364.5

interleave sequence

Let S be a set, and let {xi }, i = 0, 1, 2, . . . and {yi }, i = 0, 1, 2, . . . be two sequences in S. The interleave sequence is defined to be the sequence x0 , y0 , x1 , y1 , . . . . Formally, it is the sequence {zi }, i = 0, 1, 2, . . . given by ( xk if i = 2k is even, zi := yk if i = 2k + 1 is odd. Version: 2 Owner: djao Author(s): djao

364.6

limit inferior

Let S ⊂ R be a set of real numbers. Recall that a limit point of S is a real number x ∈ R such that for all  > 0 there exist infinitely many y ∈ S such that |x − y| < . We define lim inf S, pronounced the limit inferior of S, to be the infimum of all the limit points of S. If there are no limit points, we define the limit inferior to be +∞. The two most common notations for the limit inferior are lim inf S and lim S . 1415

An alternative, but equivalent, definition is available in the case of an infinite sequence of real numbers x0 , x1 , x2 , , . . .. For each k ∈ N, let yk be the infimum of the k th tail, yk = inf j>k xj . This construction produces a non-decreasing sequence y0 6 y1 6 y2 6 . . . , which either converges to its supremum, or diverges to +∞. We define the limit inferior of the original sequence to be this limit; lim inf xk = lim yk . k

k

Version: 7 Owner: rmilson Author(s): rmilson

364.7

limit superior

Let S ⊂ R be a set of real numbers. Recall that a limit point of S is a real number x ∈ R such that for all  > 0 there exist infinitely many y ∈ S such that |x − y| < . We define lim sup S, pronounced the limit superior of S, to be the supremum of all the limit points of S. If there are no limit points, we define the limit superior to be −∞. The two most common notations for the limit superior are lim sup S and lim S . An alternative, but equivalent, definition is available in the case of an infinite sequence of real numbers x0 , x1 , x2 , , . . .. For each k ∈ N, let yk be the supremum of the k th tail, yk = sup xj . j>k

This construction produces a non-increasing sequence y0 > y1 > y2 > . . . , which either converges to its infimum, or diverges to −∞. We define the limit superior of the original sequence to be this limit; lim sup xk = lim yk . k

k

Version: 7 Owner: rmilson Author(s): rmilson 1416

364.8

power rule

The power rule states that D p x = pxp−1 , Dx

p∈R

This rule, when combined with the chain rule, product rule, and sum rule, makes calculating many derivatives far more tractable. This rule can be derived by repeated application of the product rule. See the proof of the power rule. Repeated use of the above formula gives

di k x = dxi

( 0

k! xk−i (k−i)!

i>k i 6 k,

for i, k ∈ Z.

Examples D 0 x Dx D 1 x Dx D 2 x Dx D 3 x Dx D√ x Dx D e 2x Dx

0 D =0= 1 x Dx D = 1x0 = 1 = x Dx =

= 2x = 3x2 =

D 1/2 1 −1/2 1 x = x = √ Dx 2 2 x

= 2exe−1

Version: 4 Owner: mathcam Author(s): mathcam, Logan

364.9

properties of the exponential

The exponential operation possesses the following properties. 1417

• Homogeneity. For x, y ∈ R+ , p ∈ R we have (xy)p = xp y p • Exponent additivity. For x ∈ R+ we have x0 = 1,

x1 = x.

Furthermore xp+q = xp xq ,

p, q ∈ R.

• Monotonicity. For x, y ∈ R+ with x < y and p ∈ R+ we have xp < y p ,

x−p > y −p .

• Continuity. The exponential operation is continuous with respect to its arguments. To be more precise, the following function is continuous: P : R+ × R → R,

P (x, y) = xy .

Let us also note that the exponential operation is characterized (in the sense of existence and uniqueness) by the additivity and continuity properties. [Author’s note: One can probably get away with substantially less, but I haven’t given this enough thought.] Version: 10 Owner: rmilson Author(s): rmilson

364.10

squeeze rule

Squeeze rule for sequences Let f, g, h : N → R be three sequences of real numbers such that f (n) ≤ g(n) ≤ h(n) for all n. If limn→∞ f (n) and limn→∞ h(n) exist and are equal, say to a, then limn→∞ g(n) also exists and equals a. The proof is fairly straightforward. Let e be any real number > 0. By hypothesis there exist M, N ∈ N such that |a − f (n)| < e for all n ≥ M |a − h(n)| < e for all n ≥ N Write L = max(M, N). For n ≥ L we have

1418

• if g(n) ≥ a:

|g(n) − a| = g(n) − a ≤ h(n) − a < e

• else g(n) < a and:

|g(n) − a| = a − g(n) ≤ a − f (n) < e

So, for all n ≥ L, we have |g(n) − a| < e, which is the desired conclusion. Squeeze rule for functions Let f, g, h : S → R be three real-valued functions on a neighbourhood S of a real number b, such that f (x) ≤ g(x) ≤ h(x) for all x ∈ S − {b}. If limx→b f (x) and limx→b h(x) exist and are equal, say to a, then limx→b g(x) also exists and equals a. Again let e be an arbitrary positive real number. Find positive reals α and β such that |a − f (x)| < e whenever 0 < |b − x| < α |a − h(x)| < e whenever 0 < |b − x| < β Write δ = min(α, β). Now, for any x such that |b − x| < δ, we have • if g(x) ≥ a: • else g(x) < a and:

|g(x) − a| = g(x) − a ≤ h(x) − a < e |g(x) − a| = a − g(x) ≤ a − f (x) < e

and we are done. Version: 1 Owner: Daume Author(s): Larry Hammick

1419

Chapter 365 26A06 – One-variable calculus 365.1

Darboux’s theorem (analysis)

Let f : [a, b] → R be a real-valued continuous function on [a, b], which is differentiable on (a, b), differentiable from the right at a, and differentiable from the left at b. Then f 0 satisfies the intermediate value theorem: for every t between f+0 (a) and f−0 (b), there is some x ∈ [a, b] such that f 0 (x) = t. Note that when f is continuously differentiable (f ∈ C 1 ([a, b])), this is trivially true by the intermediate value theorem. But even when f 0 is not continuous, Darboux’s theorem places a severe restriction on what it can be. Version: 3 Owner: mathwizard Author(s): mathwizard, ariels

365.2

Fermat’s Theorem (stationary points)

Let f : (a, b) → R be a continuous function and suppose that x0 ∈ (a, b) is a local extremum of f . If f is differentiable in x0 then f 0 (x0 ) = 0. Version: 2 Owner: paolini Author(s): paolini

1420

365.3

Heaviside step function

The Heaviside step function is the function   0 1/2 H(x) =  1

H : R → R defined as when x < 0, when x = 0, when x > 0.

Here, there are many conventions for the value at x = 0. The motivation for setting H(0) = 1/2 is that we can then write H as a function of the signum function (see this page). In applications, such as the Laplace transform, where the Heaviside function is used extensively, the value of H(0) is irrelevant. The function is named after Oliver Heaviside (1850-1925) [1]. However, the function was already used by Cauchy[2], who defined the function as √  1 u(t) = t + t/ t2 2 and called it a coefficient limitateur [1].

REFERENCES 1. The MacTutor History of Mathematics archive, Oliver Heaviside. 2. The MacTutor History of Mathematics archive, Augustin Louis Cauchy. 3. R.F. Hoskins, Generalised functions, Ellis Horwood Series: Mathematics and its applications, John Wiley & Sons, 1979.

Version: 1 Owner: Koro Author(s): matte

365.4

Leibniz’ rule

Theorem [Leibniz’ rule] ([1] page 592) Let f and g be real (or complex) valued functions that are defined on an open interval of R. If f and g are k times differentiable, then k   X k (k−r) (r) (k) (f g) = f g . r r=0 For multi-indices, Leibniz’ rule have the following generalization: Theorem [2] If f, g : Rn → C are smooth functions, and j is a multi-index, then X j  j ∂ (f g) = ∂ i (f ) ∂ j−i (g), i i≤j where i is a multi-index.

1421

REFERENCES 1. R. Adams, Calculus, a complete course, Addison-Wesley Publishers Ltd, 3rd ed. 2. http://www.math.umn.edu/ jodeit/course/TmprDist1.pdf

Version: 3 Owner: matte Author(s): matte

365.5

Rolle’s theorem

Rolle’s theorem. If f is a continuous function on [a, b], such that f (a) = f (b) = 0 and differentiable on (a, b) then there exists a point c ∈ (a, b) such that f 0 (c) = 0. Version: 8 Owner: drini Author(s): drini

365.6

binomial formula

The binomial formula gives the power series expansion of the pth power function for every real power p. To wit, ∞ n X p n x (1 + x) = p , x ∈ R, |x| < 1, n! n=0 where

pn = p(p − 1) . . . (p − n + 1)

denotes the nth falling factorial of p.

Note that for p ∈ N the power series reduces to a polynomial. The above formula is therefore a generalization of the binomial theorem. Version: 4 Owner: rmilson Author(s): rmilson

365.7

chain rule

Let f (x), g(x) be differentiable, real-valued functions. The derivative of the composition (f ◦ g)(x) can be found using the chain rule, which asserts that: (f ◦ g)0(x) = f 0 (g(x)) g 0(x) The chain rule has a particularly suggestive appearance in terms of the Leibniz formalism. Suppose that z depends differentiably on y, and that y in turn depends differentiably on x. 1422

Then,

dz dz dy = dx dy dx

The apparent cancellation of the dy term is at best a formal mnemonic, and does not constitute a rigorous proof of this result. Rather, the Leibniz format is well suited to the interpretation of the chain rule in terms of related rates. To wit: The instantaneous rate of change of z relative to x is equal to the rate of change of z relative to y times the rate of change of y relative to x. Version: 5 Owner: rmilson Author(s): rmilson

365.8

complex Rolle’s theorem

Theorem [1] Suppose Ω is an open convex set in C, suppose f is a holomorphic function f : Ω → C, and suppose f (a) = f (b) = 0 for distinct points a, b in Ω. Then there exist points u, v on Lab (the straight line connecting a and b not containing the endpoints), such that Re{f 0 (u)} = 0 and Im{f 0(v)} = 0.

REFERENCES 1. J.-Cl. Evard, F. Jafari, A Complex Rolle’s Theorem, American Mathematical Monthly, Vol. 99, Issue 9, (Nov. 1992), pp. 858-861.

Version: 4 Owner: matte Author(s): matte

365.9

complex mean-value theorem

Theorem [1] Suppose Ω is an open convex set in C, suppose f is a holomorphic function f : Ω → C, and suppose a, b are distinct points in Ω. Then there exist points u, v on Lab (the straight line connecting a and b not containing the endpoints), such that f (b) − f (a) } = Re{f 0 (u)}, b−a f (b) − f (a) Im{ } = Im{f 0 (v)}, b−a

Re{

1423

REFERENCES 1. J.-Cl. Evard, F. Jafari, A Complex Rolle’s Theorem, American Mathematical Monthly, Vol. 99, Issue 9, (Nov. 1992), pp. 858-861.

Version: 2 Owner: matte Author(s): matte

365.10

definite integral

The definite integral with respect to x of some function f (x) over the closed interval [a, b] is defined to be the “area under the graph of f (x) with respect to x” (if f(x) is negative, then you have a negative area). It is written as: intba f (x) dx one way to find the value of the integral is to take a limit of an approximation technique as the precision increases to infinity. For example, use a Riemann sum which approximates the area by dividing it into n intervals of equal widths, and then calculating the area of rectangles with the width of the interval and height dependent on the function’s value in the interval. Let Rn be this approximation, which can be written as n X Rn = f (x∗i )∆x i=1

where x∗i is some x inside the ith interval.

Then, the integral would be intba f (x)

dx = lim Rn = lim n→∞

n→∞

n X

f (x∗i )∆x

i=1

We can use this definition to arrive at some important properties of definite integrals (a, b, c are constant with respect to x): intba f (x) + g(x) intba f (x) − g(x) intba f (x) intba f (x) intba cf (x)

dx dx dx dx dx

= = = = =

intba f (x) dx + intba g(x) dx intba f (x) dx − intba g(x) dx −intab f (x) dx intca f (x) dx + intbc f (x) dx cintba f (x) dx

There are other generalisations about integrals, but many require the fundamental theorem of calculus. Version: 4 Owner: xriso Author(s): xriso 1424

365.11

derivative of even/odd function (proof )

Suppose f (x) = ±f (−x). We need to show that f 0 (x) = ∓f 0 (−x). To do this, let us define the auxiliary function m : R → R, m(x) = −x. The condition on f is then f (x) = ±(f ◦ m)(x). Using the chain rule, we have that f 0 (x) = ±(f ◦ m)0 (x)  = ±f 0 m(x) m0 (x) = ∓f 0 (−x), and the claim follows. 2 Version: 2 Owner: mathcam Author(s): matte

365.12

direct sum of even/odd functions (example)

Example. direct sum of even and odd functions Let us define the sets F = {f |f is a function fromR toR}, F+ = {f ∈ F |f (x) = f (−x)for allx ∈ R}, F− = {f ∈ F |f (x) = −f (−x)for allx ∈ R}. In other words, F contain all functions from R to R, F+ ⊂ F contain all even functions, and F− ⊂ F contain all odd functions. All of these spaces have a natural vector space structure: for functions f and g we define f + g as the function x 7→ f (x) + g(x). Similarly, if c is a real constant, then cf is the function x 7→ cf (x). With these operations, the zero vector is the mapping x 7→ 0. We claim that F is the direct sum of F+ and F− , i.e., that F = F+ ⊕ F− .

(365.12.1)

To prove this claim, let us first note that F± are vector subspaces of F . Second, given an arbitrary function f in F , we can define  1 f (x) + f (−x) , 2  1 f− (x) = f (x) − f (−x) . 2 f+ (x) =

Now f+ and f− are even and odd functions and f = f+ + f− . Thus any function in F can be split into two components f+ and f− , such that f+ ∈ F+ and f− ∈ F− . To show that the sum 1425

T is direct, suppose f is an element in F+ F− . Then we have that f (x) = −f (−x) = −f (x), so f (x) = 0 for all x, i.e., f is the zero vector in F . We have established equation 364.12.1. Version: 2 Owner: mathcam Author(s): matte

365.13

even/odd function

Definition. Let f be a function from R to R. If f (x) = f (−x) for all x ∈ R, then f is an even function. Similarly, if f (x) = −f (−x) for all x ∈ R, then f is an odd function. Example. 1. The trigonometric functions sin and cos are odd and even, respectively. properties. 1. The vector space of real functions can be written as the direct sum of even and odd functions. (See this page.) 2. Let f : R → R be a differentiable function. (a) If f is an even function, then the derivative f 0 is an odd function. (b) If f is an odd function, then the derivative f 0 is an even function. (proof) 3. Let f : R → R be a smooth function. Then there exists smooth functions g, h : R → R such that f (x) = g(x2 ) + xh(x2 ) for all x ∈ R. Thus, if f is even, we have f (x) = g(x2 ), and if f is odd, we have f (x) = xh(x2 ) ([4], Exercise 1.2)

REFERENCES 1. L. H¨ormander, The Analysis of Linear Partial Differential Operators I, (Distribution theory and Fourier Analysis), 2nd ed, Springer-Verlag, 1990.

Version: 4 Owner: mathcam Author(s): matte

1426

365.14

example of chain rule

Suppose we wanted to differentiate h(x) = Here, h(x) is given by the composition

p

sin(x).

h(x) = f (g(x)), where f (x) =



x and g(x) = sin(x).

Then chain rule says that h0 (x) = f 0 (g(x))g 0(x). Since

1 f 0 (x) = √ , 2 x

we have by chain rule 0

h (x) =



and g 0 (x) = cos(x),

1 √ 2 sin x



cos x cos x = √ 2 sin x

Using the Leibniz formalism, the above calculation would have the following appearance. First we describe the functional relation as p z = sin(x).

Next, we introduce an auxiliary variable y, and write z= We then have



y,

y = sin(x).

dz 1 = √ , dy 2 y

dy = cos(x), dx

and hence the chain rule gives 1 dz = √ cos(x) dx 2 y 1 cos(x) = p 2 sin(x) Version: 1 Owner: rmilson Author(s): rmilson

1427

365.15

example of increasing/decreasing/monotone function

The function f (x) = ex is strictly increasing and hence strictly monotone. Similarly g(x) = e−x is strictly decreasing Consider the function h : [1, 10] 7→ p and√hence strictlypmonotone. √ [1, 5] where h(x) = x − 4 x − 1 + 3 + x − 6 x − 1 + 8. It is not strictly monotone since it is constant on an interval, however it is decreasing and hence monotone. Version: 1 Owner: Johan Author(s): Johan

365.16

extended mean-value theorem

Let f : [a, b] → R and g : [a, b] → R be continuous on [a, b] and differentiable on (a, b). Then there exists some number ξ ∈ (a, b) satisfying: (f (b) − f (a))g 0 (ξ) = (g(b) − g(a))f 0(ξ). If g is linear this becomes the usual mean-value theorem. Version: 6 Owner: mathwizard Author(s): mathwizard

365.17

increasing/decreasing/monotone function

Definition Let A a subset of R, and let f be a function from f : A → R. Then 1. f is increasing, if x ≤ y implies that f (x) ≤ f (y) (for all x and y in A). 2. f is strictly increasing, if x < y implies that f (x) < f (y). 3. f is decreasing, if x ≥ y implies that f (x) ≥ f (y). 4. f is strictly decreasing, if x > y implies that f (x) > f (y). 5. f is monotone, if f is either increasing or decreasing. 6. f is strictly monotone, if f is either strictly increasing or strictly decreasing. Theorem Let X be a bounded of unbounded open S interval of R. In other words, let X be an interval of the form X = (a, b), where a, b ∈ R {−∞, ∞}. Futher, let f : X → R be a monotone function. 1428

1. The set of points where f is discontinuous is at most countable [1, 1]. Lebesgue f is differentiable almost everywhere ([1], pp. 514).

REFERENCES 1. C.D. Aliprantis, O. Burkinshaw, Principles of Real Analysis, 2nd ed., Academic Press, 1990. 2. W. Rudin, Principles of Mathematical Analysis, McGraw-Hill Inc., 1976. 3. F. Jones, Lebesgue Integration on Euclidean Spaces, Jones and Barlett Publishers, 1993.

Version: 3 Owner: matte Author(s): matte

365.18

intermediate value theorem

Let f be a continuous function on the interval [a, b]. Let x1 and x2 be points with a ≤ x1 < x2 ≤ b such that f (x1 ) 6= f (x2 ). Then for each value y between f (x1 ) and (x2 ), there is a c ∈ (x1 , x2 ) such that f (c) = y. Bolzano’s theorem is a special case of this one. Version: 2 Owner: drini Author(s): drini

365.19

limit

Let f : X \ {a} −→ Y be a function between two metric spaces X and Y , defined everywhere except at some a ∈ X. For L ∈ Y , we say the limit of f (x) as x approaches a is equal to L, or lim f (x) = L x→a

if, for every real number ε > 0, there exists a real number δ > 0 such that, whenever x ∈ X with 0 < dX (x, a) < δ, then dY (f (x), L) < ε. The formal definition of limit as given above has a well–deserved reputation for being notoriously hard for inexperienced students to master. There is no easy fix for this problem, since the concept of a limit is inherently difficult to state precisely (and indeed wasn’t even accomplished historically until the 1800’s by Cauchy, well after the invention of calculus in the 1600’s by Newton and Leibniz). However, there are number of related definitions, which, taken together, may shed some light on the nature of the concept.

1429

• The notion of a limit can be generalized to mappings between arbitrary topological spaces. In this context we say that limx→a f (x) = L if and only if, for every neighborhood V of L (in Y ), there is a deleted neighborhood U of a (in X) which is mapped into V by f. • Let an , n ∈ N be a sequence of elements in a metric space X. We say that L ∈ X is the limit of the sequence, if for every ε > 0 there exists a natural number N such that d(an , L) < ε for all natural numbers n > N. • The definition of the limit of a mapping can be based on the limit of a sequence. To wit, limx→a f (x) = L if and only if, for every sequence of points xn in X converging to a (that is, xn → a, xn 6= a), the sequence of points f (xn ) in Y converges to L. In calculus, X and Y are frequently taken to be Euclidean spaces Rn and Rm , in which case the distance functions dX and dY cited above are just Euclidean distance. Version: 5 Owner: djao Author(s): rmilson, djao

365.20

mean value theorem

Mean value theorem Let f : [a, b] → R be a continuous function differentiable on (a, b). Then there is some real number x0 ∈ (a, b) such that f 0 (x0 ) =

f (b) − f (a) . b−a

Version: 3 Owner: drini Author(s): drini, apmxi

365.21

mean-value theorem

Let f : R → R be a function which is continuous on the interval [a, b] and differentiable on (a, b). Then there exists a number c : a < c < b such that f (b) − f (a) . b−a The geometrical meaning of this theorem is illustrated in the picture: f 0 (c) =

1430

(365.21.1)

This is often used in the integral context: ∃c ∈ [a, b] such that (b − a)f (c) = intba f (x)dx.

(365.21.2)

Version: 4 Owner: mathwizard Author(s): mathwizard, drummond

365.22

monotonicity criterion

Suppose that f : [a, b] → R is a function which is continuous on [a, b] and differentiable on (a, b). Then the following relations hold. 1. f 0 (x) ≥ 0 for all x ∈ (a, b) ⇔ f is an increasing function on [a, b]; 2. f 0 (x) ≤ 0 for all x ∈ (a, b) ⇔ f is a decreasing function on [a, b]; 3. f 0 (x) > 0 for all x ∈ (a, b) ⇒ f is a strictly increasing function on [a, b]; 4. f 0 (x) < 0 for all x ∈ (a, b) ⇒ f is a strictly decreasing function on [a, b]. Notice that the third and fourth statement cannot be inverted. As an example consider the function f : [−1, 1] → R, f (x) = x3 . This is a strictly increasing function, but f 0 (0) = 0. Version: 4 Owner: paolini Author(s): paolini

365.23

nabla

Let f : Rn → R a C 1 (Rn ) function. That is, a partially differentiable function on all its coordinates. The symbol ∇, named nabla represents the gradient operator whose action on f (x1 , x2 , . . . , xn ) is given by ∇f = (fx1 , fx2 , . . . , fxn )   ∂f ∂f ∂f , ,..., = ∂x1 ∂x2 ∂xn Version: 2 Owner: drini Author(s): drini, apmxi

1431

365.24

one-sided limit

Let f be a real-valued function defined on S ⊆ R. The left-hand one-sided limit at a is defined to be the real number L− such that for every  > 0 there exists a δ > 0 such that |f (x) − L− | <  whenever 0 < a − x < δ. Analogously, the right-hand one-sided limit at a is the real number L+ such that for every  > 0 there exists a δ > 0 such that |f (x) − L+ | <  whenever 0 < x − a < δ. Common notations for the one-sided limits are L+ = f (x+) = lim+ f (x) = lim f (x), x&a

x→a



L

= f (x−) = lim− f (x) = lim f (x). x%a

x→a

Sometimes, left-handed limits are referred to as limits from below while right-handed limits are from above. Theorem The ordinary limit of a function exists at a point if and only if both one-sided limits exist at this point and are equal (to the ordinary limit). e.g., The Heaviside unit step function, sometimes colloquially referred to as the diving board function, defined by ( 0 if x < 0 H(x) = 1 if x > 0 has the simplest kind of discontinuity at x = 0, a jump discontinuity. Its ordinary limit does not exist at this point, but the one-sided limits do exist, and are lim H(x) = 0 and lim+ H(x) = 1.

x→0−

x→0

Version: 5 Owner: matte Author(s): matte, NeuRet

365.25

product rule

The product rule states that if f : R → R and g : R → R are functions in one variable both differentiable at a point x0 , then the derivative of the product of the two fucntions, denoted f · g, at x0 is given by D (f · g) (x0 ) = f (x0 )g 0 (x0 ) + f 0 (x0 )g(x0 ). Dx

1432

Proof See the proof of the product rule.

365.25.1

Generalized Product Rule

More generally, for differentiable functions f1 , f2 , . . . , fn in one variable, all differentiable at x0 , we have D(f1 · · · fn )(x0 ) =

n X i=1

(fi (x0 ) · · · fi−1 (x0 ) · Dfi (x0 ) · fi+1 (x0 ) · · · fn (x0 )) .

Also see Leibniz’ rule.

Example The derivative of x ln |x| can be found by application of this rule. Let f (x) = x, g(x) = ln |x|, so that f (x)g(x) = x ln |x|. Then f 0 (x) = 1 and g 0 (x) = x1 . Therefore, by the product rule, D (x ln |x|) = f (x)g 0 (x) + f 0 (x)g(x) Dx x + 1 · ln |x| = x = ln |x| + 1 Version: 8 Owner: mathcam Author(s): mathcam, Logan

365.26

proof of Darboux’s theorem

WLOG, assume f+0 (a) > t > f−0 (b). Let g(x) = f (x) − tx. Then g 0 (x) = f 0 (x) − t, 0 0 g+ (a) > 0 > g− (b), and we wish to find a zero of g 0 . g is a continuous function on [a, b], so it attains a maximum on [a, b]. This maximum cannot 0 0 be at a, since g+ (a) > 0 so g is locally increasing at a. Similarly, g− (b) < 0, so g is locally decreasing at b and cannot have a maximum at b. So the maximum is attained at some c ∈ (a, b). But then g 0 (c) = 0 by Fermat’s theorem. Version: 2 Owner: paolini Author(s): paolini, ariels 1433

365.27

proof of Fermat’s Theorem (stationary points)

Suppose that x0 is a local maximum (a similar proof applies if x0 is a local minimum). Then there exists δ > 0 such that (x0 − δ, x0 + δ) ⊂ (a, b) and such that we have f (x0 ) ≥ f (x) for all x with |x − x0 | < δ. Hence for h ∈ (0, δ) we notice that it holds

f (x0 + h) − f (x0 ) ≤ 0. h Since the limit of this ratio as h → 0+ exists and is equal to f 0 (x0 ) we conclude that f 0 (x0 ) ≤ 0. On the other hand for h ∈ (−δ, 0) we notice that f (x0 + h) − f (x0 ) ≥0 h but again the limit as h → 0+ exists and is equal to f 0 (x0 ) so we also have f 0 (x0 ) ≥ 0. Hence we conclude that f 0 (x0 ) = 0. Version: 1 Owner: paolini Author(s): paolini

365.28

proof of Rolle’s theorem

Because f is continuous on a compact (closed and bounded) interval I = [a, b], it attains its maximum and minimum values. In case f (a) = f (b) is both the maximum and the minimum, then there is nothing more to say, for then f is a constant function and f 0 ⇔ 0 on the whole interval I. So suppose otherwise, and f attains an extremum in the open interval (a, b), and without loss of generality, let this extremum be a maximum, considering −f in lieu of f as necessary. We claim that at this extremum f (c) we have f 0 (c) = 0, with a < c < b. To show this, note that f (x) − f (c) 6 0 for all x ∈ I, because f (c) is the maximum. By definition of the derivative, we have that f (x) − f (c) . f 0 (c) = lim x→c x−c Looking at the one-sided limits, we note that f (x) − f (c) R = lim+ 60 x→c x−c because the numerator in the limit is nonpositive in the interval I, yet x − c > 0, as x approaches c from the right. Similarly, f (x) − f (c) L = lim− > 0. x→c x−c Since f is differentiable at c, the left and right limits must coincide, so 0 6 L = R 6 0, that is to say, f 0 (c) = 0. Version: 1 Owner: rmilson Author(s): NeuRet 1434

365.29

proof of Taylor’s Theorem

Let n be a natural number and I be the closed interval [a, b]. We have that f : I → R has n continuous derivatives and its (n + 1)-st derivative exists. Suppose that c ∈ I, and x ∈ I is arbitrary. Let J be the closed interval with endpoints c and x. Define F : J → R by F (t) := f (x) − so that 0

0

F (t) = f (t) − = −

n X (x − t)k k=0

n  X (x − t)k k=1 n

k!

k!

f

(k+1)

(x − t) (n+1) f (t) n!

f (k) (t)

(x − t)k−1 (k) (t) − f (t) (k − 1)!

(365.29.1)



since the sum telescopes. Now, define G on J by n+1  x−t F (c) G(t) := F (t) − x−c and notice that G(c) = G(x) = 0. Hence, Rolle’s theorem gives us a ζ strictly between x and c such that (x − ζ)n 0 = G0 (ζ) = F 0 (ζ) − (n + 1) F (c) (x − c)n+1 that yields

1 (x − c)n+1 0 F (ζ) n + 1 (x − c)n 1 (x − c)n+1 (x − ζ)n (n+1) = f (ζ) n + 1 (x − c)n n! f (n+1) (ζ) (x − c)n+1 = (n + 1)!

F (c) = −

from which we conclude, recalling (364.29.1), f (x) =

n X f (k) (c) k=0

k!

(x − c)k +

Version: 3 Owner: rmilson Author(s): NeuRet

1435

f (n+1) (ζ) (x − c)n+1 (n + 1)!

365.30

proof of binomial formula

Let p ∈ R and x ∈ R, |x| < 1 be given. We wish to show that ∞ X xn p pn , (1 + x) = n! n=0 where pn denotes the nth falling factorial of p.

The convergence of the series in the right-hand side of the above equation is a straightforward consequence of the ratio test. Set f (x) = (1 + x)p . and note that f (n) (x) = pn (1 + x)p−n . The desired equality now follows from Taylor’s Theorem. Q.E.D. Version: 2 Owner: rmilson Author(s): rmilson

365.31

proof of chain rule

Let’s say that g is differentiable in x0 and f is differentiable in y0 = g(x0 ). We define:  f (y)−f (y0 ) if y 6= y0 y−y0 ϕ(y) = 0 f (y0 ) if y = y0 Since f is differentiable in y0 , ϕ is continuous. We observe that, for x 6= x0 ,

g(x) − g(x0 ) f (g(x)) − f (g(x0 )) = ϕ(g(x)) , x − x0 x − x0 in fact, if g(x) 6= g(x0 ), it follows at once from the definition of ϕ, while if g(x) = g(x0 ), both members of the equation are 0. Since g is continuous in x0 , and ϕ is continuous in y0 , lim ϕ(g(x)) = ϕ(g(x0 )) = f 0 (g(x0 )),

x→x0

hence f (g(x)) − f (g(x0 )) x→x0 x − x0 g(x) − g(x0 ) = lim ϕ(g(x)) x→x0 x − x0 0 0 = f (g(x0 ))g (x0 ).

(f ◦ g)0(x0 ) =

lim

Version: 3 Owner: n3o Author(s): n3o 1436

365.32

proof of extended mean-value theorem

Let f : [a, b] → R and g : [a, b] → R be continuous on [a, b] and differentiable on (a, b). Define the function h(x) = f (x) (g(b) − g(a)) − g(x) (f (b) − f (a)) − f (a)g(b) + f (b)g(a). Because f and g are continuous on [a, b] and differentiable on (a, b), so is h. Furthermore, h(a) = h(b) = 0, so by Rolle’s theorem there exists a ξ ∈ (a, b) such that h0 (ξ) = 0. This implies that f 0 (ξ) (g(b) − g(a)) − g 0(ξ) (f (b) − f (a)) = 0 and, if g(b) 6= g(a),

f 0 (ξ) f (b) − f (a) = . 0 g (ξ) g(b) − g(a)

Version: 3 Owner: pbruin Author(s): pbruin

365.33

proof of intermediate value theorem

We first prove the following lemma. If f : [a, b] → R is a continuous function with f (a) ≤ 0 ≤ f (b) then ∃c ∈ [a, b] such that f (c) = 0. Define the sequences (an ) and (bn ) inductively, as follows. a0 = a b0 = b an + bn cn = 2 ( (an−1 , cn−1 ) f (cn−1 ) ≥ 0 (an , bn ) = (cn−1 , bn−1 ) f (cn−1 ) < 0 We note that a0 ≤ a1 . . . ≤ an ≤ bn ≤ . . . b1 ≤ b0 (bn − an ) = 2−n (b0 − a0 )

(365.33.1)

f (an ) ≤ 0 ≤ f (bn )

(365.33.2)

1437

By the fundamental axiom of analysis (an ) → α and (bn ) → β. But (bn − an ) → 0 so α = β. By continuity of f (f (an )) → f (α) (f (bn )) → f (α) But we have f (α) ≤ 0 and f (α) ≥ 0 so that f (α) = 0. Furthermore we have a ≤ α ≤ b, proving the assertion.

Set g(x) = f (x) − k where f (a) ≤ k ≤ f (b). g satisfies the same conditions as before, so ∃c such that f (c) = k. Thus proving the more general result. Version: 2 Owner: vitriol Author(s): vitriol

365.34

proof of mean value theorem

Define h(x) on [a, b] by h(x) = f (x) − f (a) −



 f (b) − f (a) (x − a) b−a

clearly, h is continuous on [a, b], differentiable on (a, b), and h(a) = f (a) − f (a) = 0 h(b) = f (b) − f (a) −

f (b)−f (a) b−a



(b − a) = 0

Notice that h satisfies the conditions of Rolle’s theorem. Therefore, by Rolle’s Theorem there exists c ∈ (a, b) such that h0 (c) = 0. However, from the definition of h we obtain by differentiation that h0 (x) = f 0 (x) −

f (b) − f (a) b−a

Since h0 (c) = 0, we therefore have f 0 (c) =

f (b) − f (a) b−a

as required.

REFERENCES 1. Michael Spivak, Calculus, 3rd ed., Publish or Perish Inc., 1994.

Version: 2 Owner: saforres Author(s): saforres 1438

365.35

proof of monotonicity criterion

Let us start from the implications “⇒”. Suppose that f 0 (x) ≥ 0 for all x ∈ (a, b). We want to prove that therefore f is increasing. So take x1 , x2 ∈ [a, b] with x1 < x2 . Applying the mean-value theorem on the interval [x1 , x2 ] we know that there exists a point x ∈ (x1 , x2 ) such that f (x2 ) − f (x1 ) = f 0 (x)(x2 − x1 ) and being f 0 (x) ≥ 0 we conclude that f (x2 ) ≥ f (x1 ). This proves the first claim. The other three cases can be achieved with minor modifications: replace all “≥” respectively with ≤, > and <. Let us now prove the implication “⇐” for the first and second statement. Given x ∈ (a, b) consider the ratio f (x + h) − f (x) . h If f is increasing the numerator of this ratio is ≥ 0 when h > 0 and is ≤ 0 when h < 0. Anyway the ratio is ≥ 0 since the denominator has the same sign of the numerator. Since we know by hypothesys that the function f is differentiable in x we can pass to the limit to conclude that f (x + h) − f (x) f 0 (x) = lim ≥ 0. h→0 h If f is decreasing the ratio considered turns out to be ≤ 0 hence the conclusion f 0 (x) ≤ 0. Notice that if we suppose that f is strictly increasing we obtain the this ratio is > 0, but passing to the limit as h → 0 we cannot conclude that f 0 (x) > 0 but only (again) f 0 (x) ≥ 0. Version: 2 Owner: paolini Author(s): paolini

365.36

proof of quotient rule

Let F (x) = f (x)/g(x). Then

f (x+h) − F (x + h) − F (x) g(x+h) = lim F (x) = lim h→0 h→0 h h f (x + h)g(x) − f (x)g(x + h) = lim h→0 hg(x + h)g(x) 0

1439

f (x) g(x)

h

Like the product rule, the key to this proof is subtracting and adding the same quantity. We separate f and g in the above expression by subtracting and adding the term f (x)g(x) in the numerator.

f (x + h)g(x) − f (x)g(x) + f (x)g(x) − f (x)g(x + h) h→0 hg(x + h)g(x)

F 0 (x) = lim

(x) g(x) f (x+h)−f − f (x) g(x+h)−g(x) h h h→0 g(x + h)g(x)

= lim

(x) limh→0 g(x) · limh→0 f (x+h)−f − limh→0 f (x) · limh→0 g(x+h)−g(x) h h limh→0 g(x + h) · limh→0 g(x) g(x)f 0(x) − f (x)g 0 (x) = [g(x)]2

=

Version: 1 Owner: Luci Author(s): Luci

365.37

quotient rule

The quotient rule says that the derivative of the quotient f /g of two differentiable functions f and g exists at all values of x as long as g(x) 6= 0 and is given by the formula   g(x)f 0 (x) − f (x)g 0(x) d f (x) = dx g(x) [g(x)]2 The Quotient Rule and the other differentiation formulas allow us to compute the derivative of any rational function. Version: 10 Owner: Luci Author(s): Luci

365.38

signum function

The signum function is the function sign : R → R   −1 when x < 0, 0 when x = 0, sign(x) =  1 when x > 0.

The following properties hold:

1440

1. For all x ∈ R, sign(−x) = − sign(x). 2. For all x ∈ R, |x| = sign(x)x. 3. For all x 6= 0,

d |x| dx

= sign(x).

Here, we should point out that the signum function is often defined simply as 1 for x > 0 and −1 for x < 0. Thus, at x = 0, it is left undefined. See e.g. [2]. In applications, such as the Laplace transform, this definition is adequate since the value of a function at a single point does not change the analysis. One could then, in fact, set sign(0) to any value. However, setting sign(0) = 0 is motivated by the above relations. A related function is the Heaviside step function defined as  when x < 0,  0 1/2 when x = 0, H(x) =  1 when x > 0.

Again, this function is sometimes left undefined at x = 0. The motivation for setting H(0) = 1/2 is that for all x ∈ R, we then have the relations 1 (sign(x) + 1), 2 H(−x) = 1 − H(x). H(x) =

This first relation is clear. For the second, we have 1 1 − H(x) = 1 − (sign(x) + 1) 2 1 = (1 − sign(x)) 2 1 (1 + sign(−x)) = 2 = H(−x). Example Let a < b be real numbers, and let f : R → R be the piecewise defined function  4 when x ∈ (a, b), f (x) = 0 otherwise. Using the Heaviside step function, we can write f (x) = 4 H(x − a) − H(x − b)



(365.38.1)

almost everywhere. Indeed, if we calculate f using equation 364.38.1 we obtain f (x) = 4 for x ∈ (a, b), f (x) = 0 for x ∈ / [a, b], and f (a) = f (b) = 2. Therefore, equation 364.38.1 holds at all points except a and b. 2 1441

365.38.1

Signum function for complex arguments

For a complex number z, the signum function is defined as [1]  0 when z = 0, sign(z) = z/|z| when z = 6 0. In other words, if z is non-zero, then sign z is the projection of z onto the unit circle {z ∈ C | |z| = 1}. clearly, the complex signum function reduces to the real signum function for real arguments. For all z ∈ C, we have z sign z = |z|, where z is the complex conjugate of z.

REFERENCES 1. E. Kreyszig, Advanced Engineering Mathematics, John Wiley & Sons, 1993, 7th ed. 2. G. Bachman, L. Narici, Functional analysis, Academic Press, 1966.

Version: 4 Owner: mathcam Author(s): matte

1442

Chapter 366 26A09 – Elementary functions 366.1

definitions in trigonometry

Informal definitions Given a triangle ABC with a signed angle x at A and a right angle at B, the ratios BC AB BC AC AC AB are dependant only on the angle x, and therefore define functions, denoted by sin x

cos x

tan x

respectively, where the names are short for sine, cosine and tangent. Their inverses are rather less important, but also have names: 1 (cotangent) cot x = AB/BC = tan x 1 csc x = AC/BC = (cosecant) sin x 1 (secant) sec x = AC/AB = cos x From Pythagoras’s theorem we have cos2 x + sin2 x = 1 for all (real) x. Also it is “clear” from the diagram at left that functions cos and sin are periodic with period 2π. However: Formal definitions The above definitions are not fully rigorous, because we have not defined the word angle. We will sketch a more rigorous approach. The power series

∞ X xn n=0

n!

1443

converges uniformly on compact subsets of C and its sum, denoted by exp(x) or by ex , is therefore an entire function of x, called the exponential function. f (x) = exp(x) is the unique solution of the boundary value problem f 0 (x) = f (x)

f (0) = 1

on R. The sine and cosine functions, for real arguments, are defined in terms of exp, simply by exp(ix) = cos x + i(sin x) . Thus

x2 x4 x6 + − + ... 2! 4! 6! x x3 x5 sin x = − + − ... 1! 3! 5! Although it is not self-evident, cos and sin are periodic functions on the real line, and have the same period. That period is denoted by 2π. cos x = 1 −

Version: 3 Owner: Daume Author(s): Larry Hammick

366.2

hyperbolic functions

The hyperbolic functions sinh x and cosh x ared defined as follows: ex − e−x 2 x e + e−x cosh x := . 2 sinh x :=

One can then also define the functions tanh x and coth x in analogy to the definitions of tan x and cot x: ex − e−x sinh x = x cosh x e + e−x coth x ex + e−x coth x := = x . cosh x e − e−x

tanh x :=

The hyperbolic functions are named in that way because the hyperbola x2 y 2 − 2 =1 a2 b can be written in parametrical form with the equations: x = a cosh t,

y = b sinh t.

1444

This is because of the equation cosh2 x − sinh2 x = 1. There are also addition formulas which are like the ones for trigonometric functions: sinh(x ± y) = sinh x cosh y ± cosh x sinh y cosh(x ± y) = cosh x cosh y ± sinh x sinh y. The Taylor series for the hyperbolic functions are: ∞ X

sinh x =

n=0 ∞ X

cosh x =

n=0

x2n+1 (2n + 1)! x2n . (2n)!

Using complex numbers we can use the hyperbolic functions to express the trigonometric functions: sinh(ix) i cos x = cosh(ix). sin x =

Version: 2 Owner: mathwizard Author(s): mathwizard

1445

Chapter 367 26A12 – Rate of growth of functions, orders of infinity, slowly varying functions 367.1

Landau notation

Given two functions f and g from R+ to R+ , the notation f = O(g) means that the ratio we write

f (x) g(x)

stays bounded as x → ∞. If moreover that ratio approaches zero, f = o(g).

It is legitimate to write, say, 2x = O(x) = O(x2 ), with the understanding that we are using the equality sign in an unsymmetric (and informal) way, in that we do not have, for example, O(x2 ) = O(x). The notation f = Ω(g) means that the ratio

f (x) g(x)

is bounded away from zero as x → ∞, or equivalently g = O(f ).

If both f = O(g) and f = Ω(g), we write f = Θ(g). One more notational convention in this group is f (x) ∼ g(x), (x) = 1. meaning limx→∞ fg(x)

1446

In analysis, such notation is useful in describing error estimates. For example, the Riemann hypothesis is equivalent to the conjecture π(x) =

√ x + O( x log x) log x

Landau notation is also handy in applied mathematics, e.g. in describing the efficiency of an algorithm. It is common to say that an algorithm requires O(x3 ) steps, for example, without needing to specify exactly what is a step; for if f = O(x3 ), then f = O(Ax3 ) for any positive constant A. Version: 8 Owner: mathcam Author(s): Larry Hammick, Logan

1447

Chapter 368 26A15 – Continuity and related questions (modulus of continuity, semicontinuity, discontinuities, etc.) 368.1

Dirichlet’s function

Dirichlet’s function f : R → R is defined as  1 if x = pq is a rational number in lowest terms, q f (x) = 0 if x is an irrational number. This function has the property that it is continuous at every irrational number and discontinuous at every rational one. Version: 3 Owner: urz Author(s): urz

368.2

semi-continuous

A real function f : A → R, where A ⊆ R is said to be lower semi-continuous in x0 if ∀ε > 0 ∃δ > 0 ∀x ∈ A |x − x0 | < δ ⇒ f (x) > f (x0 ) − ε, and f is said to be upper semi-continuous if ∀ε > 0 ∃δ > 0 ∀x ∈ A |x − x0 | < δ ⇒ f (x) < f (x0 ) + ε.

1448

Remark A real function is continuous in x0 if and only if it is both upper and lower semicontinuous in x0 . We can generalize the definition to arbitrary topological spaces as follows. Let A be a topological space. f : A → R is lower semicontinuous at x0 if, for each ε > 0 there is a neighborhood U of x0 such that x ∈ U implies f (x) > f (x0 ) − ε. Theorem Let f : [a, b] → R be a lower (upper) semi-continuous function. Then f has a minimum (maximum) in [a, b]. Version: 3 Owner: drini Author(s): drini, n3o

368.3

semicontinuous

Defintion [1] Suppose X is a topological space, and f is a function from X into the extended real numbers R; f : X → R. Then: 1. If {x ∈ X | f (x) > α} is an open set in X for all α ∈ R, then f is said to be lower semicontinuous. 2. If {x ∈ X | f (x) < α} is an open set in X for all α ∈ R, then f is said to be upper semicontinuous.

Properties 1. If X is a topological space and f is a function f : X → R, then f is continuous if and only if f is upper and lower semicontinuous [1, 3]. 2. The characteristic function of an open set is lower semicontinuous [1, 3]. 3. The characteristic function of a closed set is upper semicontinuous [1, 3]. 4. If f and g are lower semicontinuous, then f + g is also lower semicontinuous [3].

REFERENCES 1. W. Rudin, Real and complex analysis, 3rd ed., McGraw-Hill Inc., 1987. 2. D.L. Cohn, Measure Theory, Birkh¨auser, 1980.

Version: 2 Owner: bwebste Author(s): matte, apmxi 1449

368.4

uniformly continuous

Let f : A → R be a real function defined on a subset A of the real line. We say that f is uniformly continuous if, given an arbitrary small positive ε, there exists a positive δ such that whenever two points in A differ by less than δ, they are mapped by f into points which differ by less than ε. In symbols: ∀ε > 0 ∃δ > 0 ∀x, y ∈ A |x − y| < δ ⇒ |f (x) − f (y)| < ε. Every uniformly continuous function is also continuous, while the converse does not always hold. For instance, the function f :]0, +∞[→ R defined by f (x) = 1/x is continuous in its domain, but not uniformly. A more general definition of uniform continuity applies to functions between metric spaces (there are even more general environments for uniformly continuous functions, i.e. Uniform spaces). Given a function f : X → Y , where X and Y are metric spaces with distances dX and dY , we say that f is uniformly continuous if ∀ε > 0 ∃δ > 0 ∀x, y ∈ X dX (x, y) < δ ⇒ dY (f (x), f (y)) < ε. Uniformly continuous functions have the property that they map Cauchy sequences to Cauchy sequences and that they preserve uniform convergence of sequences of functions. Any continuous function defined on a compact space is uniformly continuous (see Heine-Cantor theorem). Version: 10 Owner: n3o Author(s): n3o

1450

Chapter 369 26A16 – Lipschitz (H¨ older) classes 369.1

Lipschitz condition

A mapping f : X → Y between metric spaces is said to satisfy the Lipschitz condition if there exists a real constant α > 0 such that dY (f (p), f (q)) 6 αdX (p, q),

for all p, q ∈ X.

Proposition 17. A Lipschitz mapping f : X → Y is uniformly continuous. Proof. Let f be a Lipschitz mapping and α > 0 a corresponding Lipschitz constant. For every given  > 0, choose δ > 0 such that δα < . Let p, q ∈ X such that

dX (p, q) < δ

be given. By assumption, dY (f (p), f (q)) 6 αδ < , as desired. QED

Notes. More generally, one says that mapping satisfies a Lipschitz condition of order β > 0 if there exists a real constant α > 0 such that dY (f (p), f (q)) 6 αdX (p, q)β ,

for all p, q ∈ X.

Version: 17 Owner: rmilson Author(s): rmilson, slider142

1451

369.2

Lipschitz condition and differentiability

If X and Y are Banach spaces, e.g. Rn , one can inquire about the relation between differentiability and the Lipschitz condition. The latter is the weaker condition. If f is Lipschitz, the ratio kf (q) − f (p)k , p, q ∈ X kq − pk

is bounded but is not assumed to converge to a limit. Indeed, differentiability is the stronger condition.

Proposition 18. Let f : X → Y be a continuously differentiable mapping between Banach spaces. If K ⊂ X is a compact subset, then the restriction f : K → Y satisfies the Lipschitz condition. Proof. Let Lin(X, Y ) denote the Banach space of bounded linear maps from X to Y . Recall that the norm kT k of a linear mapping T ∈ Lin(X, Y ) is defined by kT k = sup{

kT uk : u 6= 0}. kuk

Let Df : X → Lin(X, Y ) denote the derivative of f . By definition Df is continuous, which really means that kDf k : X → R is a continuous function. Since K ⊂ X is compact, there exists a finite upper bound B1 > 0 for kDf k restricted to U. In particular, this means that kDf (p)uk 6 kDf (p)kkuk 6 B1 kuk, for all p ∈ K, u ∈ X. Next, consider the secant mapping s : X × X → R defined by   kf (q) − f (p) − Df (p)(q − p)k kq − pk s(p, q) =  0

q 6= p p=q

This mapping is continuous, because f is assumed to be continuously differentiable. Hence, there is a finite upper bound B2 > 0 for s restricted to the compact K × K. It follows that for all p, q ∈ K we have kf (q) − f (p)k 6 kf (q) − f (p) − Df (p)(q − p)k + kDf (p)(q − p)k 6 B2 kq − pk + B1 kq − pk = (B1 + B2 )kq − pk Therefore B1 , B2 is the desired Lipschitz constant. QED Version: 22 Owner: rmilson Author(s): rmilson, slider142 1452

369.3

Lipschitz condition and differentiability result

About Lipschitz continuity of differentiable functions the following holds. Theorem 6. Let X, Y be Banach spaces and let A be a convex (see convex set), open subset of X. Let f : A → Y be a function which is continuous in A and differentiable in A. Then f is lipschitz continuous on A if and only if the derivative Df is bounded on A i.e. sup kDf (x)k < +∞. x∈A

S

uppose that f is lipschitz continuous: kf (x) − f (y)k ≤ Lkx − yk.

Then given any x ∈ A and any v ∈ X, for all small h ∈ R we have k

f (x + hv) − f (x) k ≤ L. h

Hence, passing to the limit h → 0 it must hold kDf (x)k ≤ L. On the other hand suppose that Df is bounded on A: kDf (x)k ≤ L,

∀x ∈ A.

Given any two points x, y ∈ A and given any α ∈ Y ∗ consider the function G : [0, 1] → R G(t) = hα, f ((1 − t)x + ty)i. For t ∈ (0, 1) it holds

G0 (t) = hα, Df ((1 − t)x + ty)[y − x]i

and hence |G0 (t)| ≤ Lkαk ky − xk. Applying Lagrange mean-value theorem to G we know that there exists ξ ∈ (0, 1) such that |hα, f (y) − f (x)i| = |G(1) − G(0)| = |G0(ξ)| ≤ kαkLky − xk and since this is true for all α ∈ Y ∗ we get kf (y) − f (x)k ≤ Lky − xk which is the desired claim. Version: 1 Owner: paolini Author(s): paolini

1453

Chapter 370 26A18 – Iteration 370.1

iteration

Let f : X → X be a function, X being any set. The n-th iteration of a function is the function which is obtained if f is applied n times, and is denoted by f n . More formally we define: f 0 (x) = x and f n+1 (x) = f (f n (x)) for nonnegative integers n. If f is invertible, then by going backwards we can define the iterate also for negative n. Version: 6 Owner: mathwizard Author(s): mathwizard

370.2

periodic point

Let f : X → X be a function and f n its n-th iteration. A point x is called a periodic point of period n of f if it is a fixed point of f n . The least n for which x is a fixed point of f n is called prime period or least period. If f is a function mapping R to R or C to C then a periodic point x of prime period n is called hyperbolic if |(f n )0 (x)| = 6 1, attractive if |(f n )0 (x)| < 1 and repelling if |(f n )0 (x)| > 1. Version: 11 Owner: mathwizard Author(s): mathwizard

1454

Chapter 371 26A24 – Differentiation (functions of one variable): general theory, generalized derivatives, mean-value theorems 371.1

Leibniz notation

Leibniz notation centers around the concept of a differential element. The differential element of x is represented by dx. You might think of dx as being an infinitesimal change dy in x. It is important to note that d is an operator, not a variable. So, when you see dx , you y can’t automatically write as a replacement x . We use

df (x) dx

or

d f (x) dx

to represent the derivative of a function f (x) with respect to x.

f (x + Dx) − f (x) df (x) = lim Dx→0 dx Dx We are dividing two numbers infinitely close to 0, and arriving at a finite answer. D is another operator that can be thought of just a change in x. When we take the limit of Dx as Dx approaches 0, we get an infinitesimal change dx. Leibniz notation shows a wonderful use in the following example: dy dy du dy du = = dx dx du du dx The two dus can be cancelled out to arrive at the original derivative. This is the Leibniz notation for the chain rule. Leibniz notation shows up in the most common way of representing an integral, F (x) = intf (x)dx 1455

The dx is in fact a differential element. Let’s start with a derivative that we know (since F (x) is an antiderivative of f (x)). dF (x) dx dF (x) intdF (x) F (x)

= f (x) = f (x)dx = intf (x)dx = intf (x)dx

We can think of dF (x) as the differential element of area. Since dF (x) = f (x)dx, the element of area is a rectangle, with f (x) × dx as its dimensions. Integration is the sum of all these infinitely thin elements of area along a certain interval. The result: a finite number. (a diagram is deserved here) One clear advantage of this notation is seen when finding the length s of a curve. The formula is often seen as the following: s = intds The length is the sum of all the elements,qds, of length. If we have a function f (x), the (x) 2 ] dx. If we modify this a bit, we get length element is usually written as ds = 1 + [ dfdx p 2 2 ds = [dx] + [df (x)] . Graphically, we could say that the length element is the hypotenuse of a right triangle with one leg being the x element, and the other leg being the f (x) element. (another diagram would be nice!) There are a few caveats, such as if you want to take the value of a derivative. Compare to the prime notation. df (x) 0 f (a) = dx x=a A second derivative is represented as follows:

d dy d2 y = 2 dx dx dx 3

d y The other derivatives follow as can be expected: dx 3 , etc. You might think this is a little sneaky, but it is the notation. Properly using these terms can be interesting. For example, 2 d2 y dy dy what is int ddxy ? We could turn it into int dx 2 dx or intd dx . Either way, we get dx .

Version: 2 Owner: xriso Author(s): xriso

371.2

derivative

Qualitatively the derivative is a measure of the change of a function in a small region around a specified point. 1456

Motivation The idea behind the derivative comes from the straight line. What characterizes a straight line is the fact that it has constant “slope”. Figure 371.1: The straight line y = mx + b

In other words for a line given by the equation y = mx + b, as in Fig. 370.1, the ratio of ∆y ∆y over ∆x is always constant and has the value ∆x = m. Figure 371.2: The parabola y = x2 and its tangent at (x0 , y0 )

For other curves we cannot define a “slope”, like for the straight line, since such a quantity would not be constant. However, for sufficiently smooth curves, each point on a curve has a tangent line. For example consider the curve y = x2 , as in Fig. 370.2. At the point (x0 , y0 ) on the curve, we can draw a tangent of slope m given by the equation y − y0 = m(x − x0 ). Suppose we have a curve of the form y = f (x), and at the point (x0 , f (x0 )) we have a tangent given by y − y0 = m(x − x0 ). Note that for values of x sufficiently close to x0 we can make the approximation f (x) ≈ m(x − x0 ) + y0. So the slope m of the tangent describes how much f (x) changes in the vicinity of x0 . It is the slope of the tangent that will be associated with the derivative of the function f (x).

Formal definition More formally for any real function f : R → R, we define the derivative of f at the point x as the following limit (if it exists) f (x + h) − f (x) . h→0 h

f 0 (x) := lim

This definition turns out to be consistent with the motivation introduced above. The derivatives for some elementary functions are (cf. Derivative notation) 1.

d c dx

2.

d n x dx

3.

d dx

= 0,

where c is constant;

= nxn−1 ;

sin x = cos x; 1457

4.

d dx

5.

d x e dx

6.

d dx

cos x = − sin x; = ex ;

ln x = x1 .

While derivatives of more complicated expressions can be calculated algorithmically using the following rules Linearity

d dx

(af (x) + bg(x)) = af 0 (x) + bg 0 (x);

Product rule Chain rule

d dx

(f (x)g(x)) = f 0 (x)g(x) + f (x)g 0 (x);

d g(f (x)) dx

Quotient Rule

d f (x) dx g(x)

= g 0(f (x))f 0 (x); =

f 0 (x)g(x)−f (x)g 0 (x) . g(x)2

Note that the quotient rule, although given as much importance as the other rules in elementary calculus, can be derived by succesively applying the product rule and the chain rule (x) 1 to fg(x) = f (x) g(x) . Also the quotient rule does not generalize as well as the other ones. Since the derivative f 0 (x) of f (x) is also a function x, higher derivatives can be obtained by applying the same procedure to f 0 (x) and so on.

Generalization Banach Spaces Unfortunately the notion of the “slope of the tangent” does not directly generalize to more abstract situations. What we can do is keep in mind the facts that the tangent is a linear function and that it approximates the function near the point of tangency, as well as the formal definition above. Very general conditions under which we can define a derivative in a manner much similar to the above areas follows. Let f : V → W, where V and W are Banach spaces. Suppose that h ∈ V and h 6= 0, the we define the directional derivative (Dh f )(x) at x as the following limit f (x + h) − f (x) (Dh f )(x) := lim , →0  where  is a scalar. Note that f (x + h) ≈ f (x) + (Dh f )(x), which is consistent with our original motivation. This directional derivative is also called the Gˆ ateaux derivative.

1458

Finally we define the derivative at x as the bounded linear map (Df )(x) : V → W such that for any non-zero h ∈ V (f (x + h) − f (x)) − (Df )(x) · h = 0. khk→0 khk lim

Once again we have f (x + h) ≈ f (x) + (Df )(x) · h. In fact, if the derivative (Df )(x) exists, the directional derivatives can be obtained as (Dh f )(x) = (Df )(x) · h.1 each nonzero h ∈ V does not guarantee the existence of (Df )(x). This derivative is also called the Fr´ echet derivative. In the more familiar case f : Rn → Rm , the derivative Df is simply the Jacobian of f . Under these general conditions the following properties of the derivative remain 1. Dh = 0,

where h is a constant;

2. D(A · x) = A,

where A is linear.

Linearity D(af (x) + bg(x)) · h = a(Df )(x) · h + b(Dg)(x) · h; “Product” rule D(B(f (x), g(x)))·h = B((Df )(x)·h, g(x))+B(f (x), (Dg)(x)·h), B is bilinear;

where

Chain rule D(g(f (x)) · h = (Dg)(f (x)) · ((Df )(x) · h). Note that the derivative of f can be seen as a function Df : V → L(V, W) given by Df : x 7→ (Df )(x), where L(V, W) is the space of bounded linear maps from V to W. Since L(V, W) can be considered a Banach space itself with the norm taken as the operator norm, higher derivatives can be obtained by applying the same procedure to Df and so on.

Manifolds A manifold is a topological space that is locally homeomorphic to a Banach space V (for finite dimensional manifolds V = Rn ) and is endowed with enough structure to define derivatives. Since the notion of a manifold was constructed specifically to generalize the notion of a derivative, this seems like the end of the road for this entry. The following discussion is rather technical, a more intuitive explanation of the same concept can be found in the entry on related rates. Consider manifolds V and W modeled on Banach spaces V and W, respectively. Say we have y = f (x) for some x ∈ V and y ∈ W , then, by definition of a manifold, we can find 1 The notation A · h is used when h is a vector and A a linear operator. This notation can be considered advantageous to the usual notation A(h), since the latter is rather bulky and the former incorporates the intuitive distributive properties of linear operators also associated with usual multiplication.

1459

charts (X, x) and (Y, y), where X and Y are neighborhoods of x and y, respectively. These charts provide us with canonical isomorphisms between the Banach spaces V and W, and the respective tangent spaces Tx V and Ty W : dxx : Tx V → V,

dyy : Ty W → W.

Now consider a map f : V → W between the manifolds. By composing it with the chart maps we construct the map (Y,y)

g(X,x) = y ◦ f ◦ x−1 : V → W, defined on an appropriately restricted domain. Since we now have a map between Banach (Y,y) spaces, we can define its derivative at x(x) in the sense defined above, namely Dg(X,x) (x(x)). If this derivative exists for every choice of admissible charts (X, x) and (Y, y), we can say that the derivative of Df (x) of f at x is defined and given by (Y,y)

Df (x) = dyy−1 ◦ Dg(X,x) (x(x)) ◦ dxx (it can be shown that this is well defined and independent of the choice of charts). Note that the derivative is now a map between the tangent spaces of the two manifolds Df (x) : Tx V → Ty W . Because of this a common notation for the derivative of f at x is Tx f . Another alternative notation for the derivative is f∗,x because of its connection to the category-theoretical pushforward. Version: 15 Owner: igor Author(s): igor

371.3

l’Hpital’s rule

L’Hˆopital’s rule states that given an unresolvable limit of the form 00 or ∞ , the ratio of ∞ f 0 (x) f (x) functions g(x) will have the same limit at c as the ratio g0 (x) . In short, if the limit of a ratio of functions approaches an indeterminate form, then f (x) f 0 (x) = lim 0 x→c g(x) x→c g (x) lim

provided this last limit exists. L’Hˆopital’s rule may be applied indefinitely as long0 as the (x) conditions still exist. However it is important to note, that the nonexistance of lim fg0 (x) does (x) not prove the nonexistance of lim fg(x) .

Example: We try to determine the value of x2 . x→∞ ex lim

1460

As x approaches ∞ the expression becomes an indeterminate form rule we get 2x 2 x2 lim x = lim x = lim x = 0. x→∞ e x→∞ e x→∞ e

∞ . ∞

By applying L’Hˆopital’s

Version: 8 Owner: mathwizard Author(s): mathwizard, slider142

371.4

proof of De l’Hpital’s rule

Let x0 ∈ R, I be an interval containing x0 and let f and g be two differentiable functions defined on I \ {x0 } with g 0(x) 6= 0 for all x ∈ I. Suppose that lim f (x) = 0,

x→x0

and that

lim g(x) = 0

x→x0

f 0 (x) lim = m. x→x0 g 0 (x)

We want to prove that hence g(x) 6= 0 for all x ∈ I \ {x0 } and f (x) = m. x→x0 g(x) lim

First of all (with little abuse of notation) we suppose that f and g are defined also in the point x0 by f (x0 ) = 0 and g(x0 ) = 0. The resulting functions are continuous in x0 and hence in the whole interval I. Let us first prove that g(x) 6= 0 for all x ∈ I \ {x0 }. If by contradiction g(¯ x) = 0 since we 0 also have g(x0 ) = 0, by Rolle’s theorem we get that g (ξ) = 0 for some ξ ∈ (x0 , x¯) which is against our hypotheses. Consider now any sequence xn → x0 with xn ∈ I \ {x0 }. By Cauchy’s mean value theorem there exists a sequence x0n such that f (xn ) f (xn ) − f (x0 ) f 0 (x0 ) = = 0 0n . g(xn ) g(xn ) − g(x0 ) g (xn )

But as xn → x0 and since x0n ∈ (x0 , xn ) we get that x0n → x0 and hence f (xn ) f 0 (xn ) f 0 (x) = lim 0 = lim 0 = m. n→∞ g(xn ) n→∞ g (xn ) x→x0 g (x) lim

Since this is true for any given sequence xn → x0 we conclude that f (x) = m. x→x0 g(x) lim

Version: 5 Owner: paolini Author(s): paolini 1461

371.5

related rates

The notion of a derivative has numerous interpretations and applications. A well-known geometric interpretation is that of a slope, or more generally that of a linear approximation to a mapping between linear spaces (see here). Another useful interpretation comes from physics and is based on the idea of related rates. This second point of view is quite general, and sheds light on the definition of the derivative of a manifold mapping (the latter is described in the pushforward entry). Consider two physical quantities x and y that are somehow coupled. For example: • the quantities x and y could be the coordinates of a point as it moves along the unit circle; • the quantity x could be the radius of a sphere and y the sphere’s surface area; • the quantity x could be the horizontal position of a point on a given curve and y the distance traversed by that point as it moves from some fixed starting position; • the quantity x could be depth of water in a conical tank and y the rate at which the water flows out the bottom. Regardless of the application, the situation is such that a change in the value of one quantity is accompanied by a change in the value of the other quantity. So let’s imagine that we take control of one of the quantities, say x, and change it in any way we like. As we do so, quantity y follows suit and changes along with x. Now the analytical relation between the values of x and y could be quite complicated and non-linear, but the relation between the instantaneous rates of change of x and y is linear. It does not matter how we vary the two quantities, the ratio of the rates of change depends only on the values of x and y. This ratio is, of course, the derivative of the function that maps the values of x to the values of y. Letting x, ˙ y˙ denote the rates of change of the two quantities, we describe this conception of the derivative as y˙ dy = , dx x˙ or equivalently as y˙ =

dy x. ˙ dx

(371.5.1)

Next, let us generalize the discussion and suppose that the two quantities x and y represent physical states with multiple degrees of freedom. For example, x could be a point on the earth’s surface, and y the position of a point 1 kilometer to the north of x. Again, the dependence of y and x is, in general, non-linear, but the rate of change of y does have a linear dependence on the rate of change of x. We would like to say that the derivative is 1462

precisely this linear relation, but we must first contend with the following complication. The rates of change are no longer scalars, but rather velocity vectors, and therefore the derivative must be regarded as a linear transformation that changes one vector into another. In order to formalize this generalized notion of the derivative we must consider x and y to be points on manifolds X and Y , and the relation between them a manifold mapping φ : X → Y . A varying x is formally described by a trajectory γ : I → X,

I ⊂ R.

The corresponding velocities take their value in the tangent spaces of X: γ 0 (t) ∈ Tγ(t) X. The “coupling” of the two quantities is described by the composition φ ◦ γ : I → Y. The derivative of φ at any given x ∈ X is a linear mapping φ∗ (x) : Tx X → Tφ(x) Y, called the pushforward of φ at x, with the property that for every trajectory γ passing through x at time t, we have (φ ◦ γ)0 (t) = φ∗ (x)γ 0 (t). The above is the multi-dimensional and coordinate-free generalization of the related rates relation (370.5.1). All of the above has a perfectly rigorous presentation in terms of manifold theory. The approach of the present entry is more informal; our ambition was merely to motivate the notion of a derivative by describing it as a linear transformation between velocity vectors. Version: 2 Owner: rmilson Author(s): rmilson

1463

Chapter 372 26A27 – Nondifferentiability (nondifferentiable functions, points of nondifferentiability), discontinuous derivatives 372.1

Weierstrass function

The Weierstrass function is a continuous function that is nowhere differentiable, and hence is not an analytic function. The formula for the Weierstrass function is

f (x) =

∞ X

bn cos(an πx)

n=1

with a odd, 0 < b < 1, and ab > 1 + 23 π. Another example of an everywhere continuous but nowhere differentiable curve is the fractal Koch curve. [insert plot of Weierstrass function] Version: 5 Owner: akrowne Author(s): akrowne

1464

Chapter 373 26A36 – Antidifferentiation 373.1

antiderivative

The function F (x) is called an antiderivative of a function f (x) if (and only if) the derivative of F is equal to f . F 0 (x) = f (x) Note that there are an infinite number of antiderivatives for any function f (x), since any constant can be added or subtracted from any valid antiderivative to yield another equally valid antiderivative. To account for this, we express the general antiderivative, or indefinite integral, as follows: intf (x) dx = F (x) + C where C is an arbitrary constant called the constant of integration. The dx portion means ”with respect to x”, because after all, our functions F and f are functions of x. Version: 4 Owner: xriso Author(s): xriso

373.2

integration by parts

When one has an integral of a product of two functions, it is sometimes preferable to simplify the integrand by integrating one of the functions and differentiating the other. This process is called integrating by parts, and is defined in the following way, where u and v are functions of x. intu · v 0 dx = u · v − intv · u0 dx This process may be repeated indefinitely, and in some cases it may be used to solve for the original integral algebraically. For definite integrals, the rule appears as intba u(x) · v 0 (x) dx = (u(b) · v(b) − u(a) · v(a)) − intba v(x) · u0 (x) dx 1465

Proof: Integration by parts is simply the antiderivative of a product rule. Let G(x) = u(x) · v(x). Then, G0 (x) = u0 (x)v(x) + u(x)v 0 (x) Therefore, G0 (x) − v(x)u0 (x) = u(x)v 0 (x)

We can now integrate both sides with respect to x to get

G(x) − intv(x)u0 (x) dx = intu(x)v 0 (x) dx which is just integration by parts rearranged. Example: We integrate the function f (x) = x sin x: Therefore we define u(x) := x and v 0 (x) = sin x. So integration by parts yields us: intx sin xdx = −x cos x + int cos xdx = −x cos x + sin x. Version: 5 Owner: mathwizard Author(s): mathwizard, slider142

373.3

integrations by parts for the Lebesgue integral

Theorem [1, 2] Suppose f, g are complex valued functions on a bounded interval [a, b]. If f and g are absolutely continuous, then int[a,b] f 0 g = −int[a,b] f g 0 + f (b)g(b) − f (a)g(a). where both integrals are Lebesgue integrals. Remark Any absolutely continuous function can be differentiated almost everywhere. Thus, in the above, the functions f 0 and g 0 make sense. Proof. Since f, g and f g are almost everywhere differentiable with Lebesgue integrable derivatives (see this page), we have (f g)0 = f 0 g + f g 0 almost everywhere, and  int[a,b] (f g)0 = int[a,b] f 0 g + f g 0 = int[a,b] f 0 g + int[a,b] f g 0 . The last equality is justified since f 0 g and f g 0 are integrable. For instance, we have int[a,b] |f 0 g| ≤ max |g(x)|int[a,b] |f 0|, x∈[a,b]

which is finite since g is continuous and f 0 is Lebesgue integrable. Now the claim follows from the Fundamental theorem of calculus for the Lebesgue integral. 2 1466

REFERENCES 1. Jones, F., Lebesgue Integration on Euclidean Spaces, Jones and Barlett Publishers, 1993. 2. Ng, Tze Beng, Integration by Parts, online.

Version: 4 Owner: matte Author(s): matte

1467

Chapter 374 26A42 – Integrals of Riemann, Stieltjes and Lebesgue type 374.1

Riemann sum

Suppose there is a function f : I → R where I = [a, b] is a closed interval, and f is bounded on I. If we have a finite set of points {x0 , x1 , x2 , . . . xn } such that a = x0 < x1 < x2 · · · < xn = b, then this set creates a partition P = {[x0 , x1 ), [x1 , x2 ), . . . [xn − 1, xn ]} of I. If P is a

partition with n ∈ N elements of I, then the Riemann sum of f over I with the partition P is defined as

S=

n X i=1

f (yi)(xi − xi−1 )

where xi−1 6 yi 6 xi . The choice of yi is arbitrary. If yi = xi−1 for all i, then S is called a left Riemann sum. If yi = xi , then S is called a right Riemann sum. Suppose we have

S=

n X i=1

b(xi − xi−1 )

where b is the supremum of f over [xi−1 , xi ]; then S is defined to be an upper Riemann sum. Similarly, if b is the infimum of f over [xi−1 , xi ], then S is a lower Riemann sum. Version: 3 Owner: mathcam Author(s): mathcam, vampyr

1468

374.2

Riemann-Stieltjes integral

Let f and α be bounded, real-valued functions defined upon a closed finite interval I = [a, b] of R(a 6= b), P = {x0 , ..., xn } a partition of I, and ti a point of the subinterval [xi−1 , xi ]. A sum of the form

S(P, f, α) =

n X i=1

f (ti )(α(xi ) − α(xi−1 ))

is called a Riemann-Stieltjes sum of f with respect to α. f is said to be Riemann integrable with respect to α on I if there exists A ∈ R such that given any  > 0 there exists a partition P of I for which, for all P finer than P and for every choice of points ti , we have |S(P, f, α) − A| <  If such an A exists, then it is unique and is known as the Riemann-Stieltjes integral of f with respect to α. f is known as the integrand and α the integrator. The integral is denoted by intba f dα or intba f (x)dα(x) Version: 3 Owner: vypertd Author(s): vypertd

374.3

continuous functions are Riemann integrable

Let f : [a, b] → R be a continuous function. Then f is Riemann integrable. Version: 2 Owner: paolini Author(s): paolini

374.4

generalized Riemann integral

A function f : [a, b] → R is said to be generalized Riemann integrable on [a, b] if there exists a number L ∈ R such that for every  > 0 there exists a gauge δ on [a, b] such that if P˙ is any δ -fine partition of [a, b], then ˙ − L| <  |S(f ; P) 1469

˙ is the Riemann sum for f using the partition P. ˙ The collection of all generWhere S(f ; P) ∗ alized Riemann integrable functions is usually denoted by R [a, b]. If f ∈ R∗ [a, b] then the number L is uniquely determined, and is called the generalized Riemann integral of f over [a, b]. Version: 3 Owner: vypertd Author(s): vypertd

374.5

proof of Continuous functions are Riemann integrable

Recall the definition of Riemann integral. To prove that f is integrable we have to prove that limδ→0+ S ∗ (δ) − S∗ (δ) = 0. Since S ∗ (δ) is decreasing and S∗ (δ) is increasing it is enough to show that given  > 0 there exists δ > 0 such that S ∗ (δ) − S∗ (δ) < . So let  > 0 be fixed. By Heine-Cantor theorem f is uniformly continuous i.e. ∃δ > 0 |x − y| < δ ⇒ |f (x) − f (y)| <

 . b−a

Let now P be any partition of [a, b] in C(δ) i.e. a partition {x0 = a, x1 , . . . , xN = b} such that xi+1 − xi < δ. In any small interval [xi , xi+1 ] the function f (being continuous) has a maximum Mi and minimum mi . Being f uniformly continuous and being xi+1 − xi < δ we hence have Mi − mi < /(b − a). So the difference between upper and lower Riemann sums is X X  X (xi+1 − xi ) = . Mi (xi+1 − xi ) − mi (xi+1 − xi ) ≤ b − a i i i Being this true for every partition P in C(δ) we conclude that S ∗ (δ) − S∗ (δ) < . Version: 1 Owner: paolini Author(s): paolini

1470

Chapter 375 26A51 – Convexity, generalizations 375.1

concave function

Let f (x) a continuous function defined on an interval [a, b]. Then we say that f is a concave function on [a, b] if, for any x1 , x2 in [a, b] and any λ ∈ [0, 1] we have   f λx1 + (1 − λ)x2 > λf (x1 ) + (1 − λ)f (x2 ). The definition is equivalent to the statements: • For all x1 , x2 in [a, b], f



x1 + x2 2





f (x1 ) + f (x2 ) 2

• The second derivative of f is negative on [a, b]. • If f has a derivative which is monotone decreasing. obviously, the last two items apply provided f has the required derivatives. And example of concave function is f (x) = −x2 on the interval [−5, 5]. Version: 5 Owner: drini Author(s): drini

1471

Chapter 376 26Axx – Functions of one variable 376.1

function centroid

Let f : D ⊂ R → R be an arbitrary function. By analogy with the geometric centroid, the centroid of an function f is defined as: hxi =

intxf (x)dx , intf (x)dx

where the integrals are taken over the domain D. Version: 1 Owner: vladm Author(s): vladm

1472

Chapter 377 26B05 – Continuity and differentiation questions 377.1

C0∞ (U ) is not empty

Theorem If U is a non-empty open set in Rn , then the set of smooth functions with compact support C0∞ (U) is not empty. The proof is divided into three sub-claims: Claim 1 Let a < b be real numbers. Then there exists a smooth non-negative function f : R → R, whose support is the compact set [a, b]. To prove Claim 1, we need the following lemma: Lemma ([4], pp. 14) If φ(x) =



0 for x ≤ 0, e−1/x for x > 0,

then φ : R → R is a non-negative smooth function. (A proof of the Lemma can be found in [4].) Proof of Claim 1. Using the lemma, let us define f (x) = φ(x − a)φ(b − x). Since φ is smooth, it follows that f is smooth. Also, from the definition of φ, we see that φ(x − a) = 0 precisely when x ≤ a, and φ(b − x) = 0 precisely when x ≥ b. Thus the support of f is indeed [a, b]. 2 Claim 2 Let ai , bi be real numbers with ai < bi for all i = 1, . . . , n. Then there exists a 1473

smooth non-negative function f : Rn → R whose support is the compact set [a1 , b1 ] × · · · × [an , bn ]. Proof of Claim 2. Using Claim 1, we can for each i = 1, . . . , n construct a function fi with support in [ai , bi ]. Then f (x1 , . . . , xn ) = f1 (x1 )f2 (x2 ) · · · fn (xn ) gives a smooth function with the sought properties. 2 Claim 3 If U is a non-empty open set in Rn , then there are real numbers ai < bi for i = 1, . . . , n such that [a1 , b1 ] × · · · × [an , bn ] is a subset of U. Proof of Claim 3. Here, of course, we assume that Rn is equipped with the usual topology induced by the open balls of the Euclidean metric. Since U is non-empty, there exists some point x in U. Further, since U is a topological space, x is contained in some open set. Since the topology has a basis consisting of open balls, there exists a y ∈ U and ε > 0 such that x is contained in the open ball B(y, ε). Let us now set ai = yi − 2√ε n and bi = yi + 2√ε n for all i = 1, . . . , n. Then D = [a1 , b1 ] × · · · × [an , bn ] can be parametrized as ε D = {y + (λ1 , . . . , λn ) √ | λi ∈ [−1, 1] for all i = 1, . . . , n}. 2 n For an arbitrary point in D, we have ε ε |y + (λ1 , . . . , λn ) √ − y| = |(λ1 , . . . , λn )| √ | 2 n 2 n q ε λ21 + · · · + λ2n = √ 2 n ε ≤ < ε, 2 so D ⊂ B(y, ) ⊂ U, and Claim 3 follows. 2

REFERENCES 1. L. H¨ormander, The Analysis of Linear Partial Differential Operators I, (Distribution theory and Fourier Analysis), 2nd ed, Springer-Verlag, 1990.

Version: 3 Owner: matte Author(s): matte

377.2

Rademacher’s Theorem

Let f : Rn → R be any Lipschitz continuous function. Then f is differentiable in almost every x ∈ Rn . 1474

Version: 1 Owner: paolini Author(s): paolini

377.3

smooth functions with compact support

Definition [3] Let U be an open set in Rn . Then the set of smooth functions with compact support (in U) is the set of functions f : Rn → C which are smooth (i.e., ∂ α f : Rn → C is a continuous function for all multi-indices α) and supp f is compact and contained in U. This functionspace is denoted by C0∞ (U). Remarks 1. A proof that C0∞ (U) is not empty can be found here. 2. With the usual point-wise addition and point-wise multiplication by a scalar, C0∞ (U) is a vector space over the field C. 3. Suppose U and V are open subsets in Rn and U ⊂ V . Then C0∞ (U) is a vector subspace of C0∞ (V ). In particular, C0∞ (U) ⊂ C0∞ (V ).

It is possible to equip C0∞ (U) with a topology, which makes C0∞ (U) into a locally convex topological vector s The definition, however, of this topology is rather involved (see e.g. [3]). However, the next theorem shows when a sequence converges in this topology. Theorem 1 Suppose that U is an open set in Rn , and that {φi }∞ i=1 is a sequence of functions ∞ in C0 (U). Then {φi } converges (in the aforementioned topology) to a function φ ∈ C0∞ (U) if and only if the following conditions hold: 1. There is a compact set K ⊂ U such that supp φi ⊂ K for all i = 1, 2, . . .. 2. For every multi-index α, in the sup-norm.

∂ α φi → ∂ α φ

Theorem 2 Suppose that U is an open set in Rn , that Γ is a locally convex topological vector space, and that L : C0∞ (U) → Γ is a linear map. Then L is a continuous map, if and only if the following condition holds: ∞ If K is a compact subset of U, and {φi }∞ i=1 is a sequence of functions in C0 (U) such that supp φi ⊂ K for all i, and φi → φ (in C0∞ (U)) for some φ ∈ D(U), then Lφi → Lφ (in C).

The above theorems are stated without proof in [1]. 1475

REFERENCES 1. W. Rudin, Functional Analysis, McGraw-Hill Book Company, 1973. 2. G.B. Folland, Real Analysis: Modern Techniques and Their Applications, 2nd ed, John Wiley & Sons, Inc., 1999.

Version: 3 Owner: matte Author(s): matte

1476

Chapter 378 26B10 – Implicit function theorems, Jacobians, transformations with several variables 378.1

Jacobian matrix

The Jacobian [Jf~(x)] of a function f~ : Rn → Rm is the matrix of partial derivatives such that   D1 f1 (x) . . . Dn f1 (x)   .. .. .. [Jf~(x)] =   . . . D1 fm (x) . . . Dn fm (x) A more concise way of writing it is

 ~ 1 ∇f −−→ −−→   ~ [Jf(x)] = [D1 f , · · · , Dn f ] =  ...  ~ m ∇f 

−−→ ~ m is the gradient where Dn f is the partial derivative with respect to the nth variable and ∇f of the nth component of f. The Jacobian matrix represents the full derivative matrix [Df (x)] of f at x iff f is differentiable at x. Also, if f is differentiable at x, then [Jf(x)] = [Df (x)] and the directional derivative in the direction ~v is [Df(x)]~v . Version: 9 Owner: slider142 Author(s): slider142

378.2

directional derivative

Partial derivatives measure the rate at which a multivariable function f~ varies as the variable 1477

moves in the direction of the standard basis vectors. Directional derivatives measure the rate at which f~ varies when the variable moves in the direction ~v. Thus the directional derivative of f~ at a in the direction ~v is represented as D~v f (a) =

f~(a + h~v ) − f~(a) ∂ f~(a) = lim . h→0 ∂~v h



 x For example, if f  y  = x2 + 3y 2z, and we wanted to find the derivative at the point z     1 1 a =  2  in the direction ~v =  1 , our equation would be 1 3 limh→0 h1 ((1 + h)2 + 3(2 + h)2 (3 + h) − 37)

= limh→0 h1 (3h3 + 37h2 + 50h)

.

= limh→0 3h2 + 37h + 50 = 50 One may also use the Jacobian matrix if the function is differentiable to find the derivative in the direction ~v as [Jf (x)]~v . Version: 6 Owner: slider142 Author(s): slider142

378.3

gradient

Summary. The gradient is a first-order differential operator that maps functions to vector fields. It is a generalization of the ordinary derivative, and as such conveys information about the rate of change of a function relative to small variations in the independent variables. The gradient of a function f is customarily denoted by ∇f or by grad f . Definition: Euclidean space Consider n-dimensional Euclidean space with orthogonal coordinates x1 , . . . , xn , and corresponding unit vectors e1 , . . . , en . In this setting, the gradient of a function f (x1 , . . . , xn ) is defined to be the vector field given by n X ∂f ∇f = ei . ∂xi i=1

It is also useful to represent the gradient operator as the vector-valued differential operator ∇=

n X

ei

i=1

1478

∂ . ∂xi

or, in the context of Euclidean 3-space, as ∇=i

∂ ∂ ∂ +j +k , ∂x ∂y ∂z

where i, j, k are the unit vectors lying along the positive direction of the x, y, z axes, respectively. Using this formalism, the ∇ symbol can be used to express the divergence operator as ∇·, the curl operator as ∇×, and the Laplacian operator as ∇2 . To wit, for a given vector field A = Ax i + Ay j + Az k, and a given function f we have ∂Ax ∂Ay ∂Az + + ∂x ∂y ∂z       ∂Az ∂Ay ∂Ax ∂Az ∂Ay ∂Ax ∇×A= i+ j+ k − − − ∂y ∂z ∂z ∂x ∂x ∂y ∂2f ∂2f ∂2f ∇2 f = + + . ∂x2 ∂y 2 ∂z 2 ∇·A=

Definition: Riemannian geometry More generally still, consider a Riemannian manifold with metric tensor gij and inverse g ij . In this setting the gradient X = grad f of a function f relative to a general coordinate system, is given by X j = g ij f,i .

(378.3.1)

Note that the Einstein summation convention is in force above. Also note that f,i denotes the partial derivative of f with respect to the ith coordinate. Definition (377.3.1) is useful even in the Euclidean setting, because it can be used to derive the formula for the gradient in various generalized coordinate systems. For example, in the cylindrical system of coordinates (r, θ, z) we have   1 0 0 gij = 0 r 2 0 0 0 1 while for the system of spherical coordinates (ρ, φ, θ) we have   1 0 0 0 . gij = 0 ρ2 2 0 0 ρ sin2 φ

Hence, for a given function f we have

1 ∂f ∂f ∂f er + eθ + k ∂r r ∂θ ∂z 1 ∂f 1 ∂f ∂f eρ + eφ + eθ ∇f = ∂ρ ρ ∂φ ρ sin φ ∂θ ∇f =

1479

Cylindrical Spherical ,

where for the cylindrical system ∂ x y = i+ j ∂r r r 1 ∂ y x eθ = =− i+ j r ∂θ r r are the unit vectors in the direction of increase of r and θ, respectively, and for the spherical system er =

∂ x y z = i+ j+ k ∂ρ ρ ρ ρ 1 ∂ zx zy r eφ = = i+ j− k ρ ∂φ rρ rρ ρ y x 1 ∂ =− i+ j eθ = ρ sin θ ∂θ r r eρ =

are the unit vectors in the direction of increase of ρ, θ, φ, respectively. Physical Interpretation. In the simplest case, we consider the Euclidean plane with Cartesian coordinates x, y. The gradient of a function f (x, y) is given by ∇f =

∂f ∂f i+ j, ∂x ∂y

where i, j denote, respectively, the standard unit horizontal and vertical vectors. The gradient vectors have the following geometric interpretation. Consider the graph z = f (x, y) as a surface in 3-space. The direction of the gradient vector ∇f is the direction of steepest ascent, while the magnitude is the slope in that direction. Thus, s   2 2 ∂f ∂f + k∇f k = ∂x ∂y describes the steepness of the hill z = f (x, y) at a point on the hill located at (x, y, f (x, y)). A more general conception of the gradient is based on the interpretation of a function f as a potential corresponding to some conservative physical force. The negation of the gradient, −∇f , is then interpreted as the corresponding force field. Differential identities. Several properties of the one-dimensional derivative generalize to a multi-dimensional setting ∇(af + bg) = a∇f + b∇g ∇(f g) = f ∇g + g∇f ∇(φ ◦ f ) = (φ0 ◦ f )∇f Version: 9 Owner: rmilson Author(s): rmilson, slider142 1480

Linearity Product rule Chain rule

378.4

implicit differentiation

Implicit differentiation is a tool used to analyze functions that cannot be conveniently put into a form y = f (x) where x = (x1 , x2 , ..., xn ). To use implicit differentiation meaningfully, you must be certain that your function is of the form f (x) = 0 (it can be written as a level set) and that it satisfies the implicit function theorem (f must be continuous, its first partial derivatives must be continuous, and the derivative with respect to the implicit function must be non-zero). To actually differentiate implicitly, we use the chain rule to differentiate the entire equation. Example: The first step is to identify the implicit function. For simplicity in the example, we will assume f (x, y) = 0 and y is an implicit function of x. Let f (x, y) = x2 + y 2 + xy = 0 (Since this is a two dimensional equation, all one has to check is that the graph of y may be an implicit function of x in local neighborhoods.) Then, to differentiate implicitly, we differentiate both sides of the equation with respect to x. We will get 2x + 2y ·

dy dy +x·1· +y =0 dx dx

Do you see how we used the chain rule in the above equation ? Next, we simply solve for our dy implicit derivative dx = − 2x+y . Note that the derivative depends on both the variable and 2y+x the implicit function y. Most of your derivatives will be functions of one or all the variables, including the implicit function itself. [better example and ?multidimensional? coming] Version: 2 Owner: slider142 Author(s): slider142

378.5

implicit function theorem

Let f = (f1 , ..., fn ) be a continuously differentiable, vector-valued function mapping an open set E ⊂ Rn+m into Rn . Let (a, b) = (a1 , ..., an , b1 , ..., bm ) be a point in E for which f(a, b) = 0 and such that the n × n determinant |Dj fi (a, b)| = 6 0 for i, j = 1, ..., n. Then there exists an m-dimensional neighbourhood W of b and a unique continuously differentiable function g : W → Rn such that g(b) = a and f(g(t), t) = 0 for all t ∈ W .

1481

Simplest case When n = m = 1, the theorem reduces to: Let F be a continuously differentiable, realvalued function defined on an open set E ⊂ R2 and let (x0 , y0) be a point on E for which F (x0 , y0) = 0 and such that ∂F |x ,y 6= 0 ∂x 0 0 Then there exists an open interval I containing y0 , and a unique function f : I → R which is continuously differentiable and such that f (y0 ) = x0 and F (f (y), y) = 0 for all y ∈ I. Note The inverse function theorem is a special case of the implicit function theorem where the dimension of each variable is the same. Version: 7 Owner: vypertd Author(s): vypertd

378.6

proof of implicit function theorem

Consider the function F : E → Rn × Rm defined by F (x, y) = (f (x, y), y). Setting Ajk =

∂f j (a, b), ∂xk

and Mji =

∂fj (a, b), ∂yi

A is an n × m matrix and M is n × n. It holds

Df (a, b) = (A|M) and hence DF (a, b) =



In 0 A M



.

Being det M 6= 0 M is invertible and hence DF (a, b) is invertible too. Applying the inverse function theorem to F we find that there exist a neighbourhood V of a and W of b and a function G ∈ C 1 (V × W, Rn+m ) such that F (G(x, y)) = (x, y) for all (x, y) ∈ V × W . Letting G(x, y) = (G1 (x, y), G2(x, y)) (so that G1 : V × W → Rn , G2 : V × W → Rm ) we hence have (x, y) = F (G1 (x, y), G2(x, y)) = (f (G1 (x, y), G2(x, y)), G2 (x, y)) 1482

and hence y = G2 (x, y) and x = f (G1 (x, y), G2(x, y)) = f (G1 (x, y), y). So we only have to set g(y) = G1 (0, y) to obtain f (g(y), y) = 0, Version: 1 Owner: paolini Author(s): paolini

1483

∀y ∈ W.

Chapter 379 26B12 – Calculus of vector functions 379.1

Clairaut’s theorem

Theorem. (Clairaut’s Theorem) If F : Rn → Rm is a function whose second partial derivatives exist and are continuous on a set S ⊆ Rn , then ∂2f ∂2f = ∂xi ∂xj ∂xj ∂xi on S (where 1 6 i, j 6 n). This theorem is commonly referred to as simply ’the equality of mixed partials’. It is usually first presented in a vector calculus course, and is useful in this context for proving basic properties of the interrelations of gradient, divergence, and curl. I.e., if F : R3 → R3 is a function satisfying the hypothesis, then ∇ · (∇ × F) = 0. Or, if f : R3 → R is a function satisfying the hypothesis, ∇ × ∇f = 0. Version: 10 Owner: flynnheiss Author(s): flynnheiss

379.2

Fubini’s Theorem

Fubini’s Theorem Let I ⊂ RN and J ⊂ RM be compact intervals, and let f : I × J → RK be a Riemann integrable function such that, for each x ∈ I the integral F (x) := intJ f (x, y) dµJ (y) exists. Then F : I → RK is Riemann integrable, and intI F = intI×J f. 1484

This theorem effectively states that, given a function of N variables, you may integrate it one variable at a time, and that the order of integration does not affect the result. Example Let I := [0, π/2] × [0, π/2], and let f : I → R, x 7→ sin(x) cos(y) be a function. Then ZZ intI f = sin(x) cos(y) [0,π/2]×[0,π/2]   π/2 π/2 int0 sin(x) cos(y) dy dx = int0 π/2

= int0

sin(x) (1 − 0) dx = (0 − −1) = 1

Note that it is often simpler (and no less correct) to write

R

Version: 3 Owner: vernondalhart Author(s): vernondalhart

379.3

···

R

I

f as intI f .

Generalised N-dimensional Riemann Sum

Let I = [a1 , b1 ] × · · · × [aN , bN ] be an N-cell in RN . For each j = 1, . . . , N, let aj = tj,0 < . . . < tj,N = bj be a partition Pj of [aj , bj ]. We define a partition P of I as P := P1 × · · · × PN Each partition P of I generates a subdivision of I (denoted by (Iν )ν ) of the form Iν = [t1,j , t1,j+1 ] × · · · × [tN,k , tN,k+1 ] Let f : U → RM be such that I ⊂ U, and let (Iν )ν be the corresponding subdivision of a partition P of I. For each ν, choose xν ∈ Iν . Define X S(f, P ) := f (xν )µ(Iν) ν

As the Riemann sum of f corresponding to the partition P . A partition Q of I is called a refinement of P if P ⊂ Q. Version: 1 Owner: vernondalhart Author(s): vernondalhart

379.4

Generalized N-dimensional Riemann Integral

Let I = [a1 , b1 ] × · · · × [aN , bN ] ⊂ RN be a compact interval, and let f : I → RM be a function. Let  > 0. If there exists a y ∈ RM and a partition P of I such that for each 1485

refinement P of P (and corresponding Riemann sum S(f, P )), kS(f, P ) − yk <  Then we say that f is Riemann integrable over I, that y is the Riemann integral of f over I, and we write intI f := intI f dµ := y Note also that it is possible to extend this definition to more arbitrary sets; for any bounded set D, one can find a compact interval I such that D ⊂ I, and define a function ( f (x), x ∈ D f˜ : I → RM x 7→ 0, x∈ /D in which case we define

intD f := intI f˜

Version: 3 Owner: vernondalhart Author(s): vernondalhart

379.5

Helmholtz equation

It is a partial differential equation which, in scalar form is ∇2 f + k 2 f = 0, or in vector form is ∇2 A + k 2 A = 0,

where ∇2 is the Laplacian. The solutions of this equation represent the solution of the wave equation, which is of great interest in physics. Consider a wave equation

∂2ψ = c 2 ∇2 ψ ∂t2 with wave speed c. If we look for time harmonic standing waves of frequency ω, ψ(x, t) = e−jωt φ(x)

we find that φ(x) satisfies the Helmholtz equation: (∇2 + k 2 )φ = 0 where k = ω/c is the wave number. Usually Helmholtz equation is solved by seperation of variables method, in cartesian, spherical or cylindrical coordinates. Version: 3 Owner: giri Author(s): giri 1486

379.6

Hessian matrix

The Hessian of a scalar function of a vector is the matrix of partial second derivatives. So the Hessian matrix of a function f : Rn → R is:      

∂2f dx21 ∂2f dx2 dx1

∂2f dx1 dx2 ∂2f dx22

∂2f dxn dx1

∂2f dxn dx2

.. .

.. .

... ... .. . ...



∂2f dx1 dxn ∂2f   dx2 dxn 

.. .

∂2f dx2n

 

(379.6.1)

Note that the Hessian is symmetric because of the equality of mixed partials. Version: 2 Owner: bshanks Author(s): akrowne, bshanks

379.7

Jordan Content of an N-cell

Let I = [a1 , b1 ] × · · · × [aN , bN ] be an N-cell in RN . Then the Jordan content (denoted µ(I)) of I is defined as N Y µ(I) := (bj − aj ) j=1

Version: 1 Owner: vernondalhart Author(s): vernondalhart

379.8

Laplace equation

The scalar form of Laplace’s equation is the partial differential equation ∇2 f = 0 and the vector form is ∇2 A = 0,

where ∇2 is the Laplacian. It is a special case of the Helmholtz differential equation with k = 0. A function f which satisfies Laplace’s equation is said to be harmonic. Since Laplace’s equation is linear, the superposition of any two solutions is also a solution. Version: 3 Owner: giri Author(s): giri 1487

379.9

chain rule (several variables)

The chain rule is a theorem of analysis that governs derivatives of composed functions. The basic theorem is the chain rule for functions of one variables (see here). This entry is devoted to the more general version involving functions of several variables and partial derivatives. Note: the symbol Dk will be used to denote the partial derivative with respect to the k th variable. Let F (x1 , . . . , xn ) and G1 (x1 , . . . , xm ), . . . , Gn (x1 , . . . , xm ) be differentiable functions of several variables, and let H(x1 , . . . , xm ) = F (G1 (x1 , . . . , xm ), . . . , Gn (x1 , . . . , xm )) be the function determined by the composition of F with G1 , . . . , Gn The partial derivatives of H are given by (Dk H)(x1 , . . . , xm ) =

n X

(Di F )(G1 (x1 , . . . , xm ), . . .)(Dk Gi )(x1 , . . . , xm ).

i=1

The chain rule can be more compactly (albeit less precisely) expressed in terms of the JacobiLegendre partial derivative symbols (historical note). Just as in the Leibniz system, the basic idea is that of one quantity (i.e. variable) depending on one or more other quantities. Thus we would speak about a variable z depends differentiably on y1 , . . . , yn , which in turn depend differentiably on variables x1 , . . . , xm . We would then write the chain rule as n

X ∂z ∂yi ∂z = , ∂xj ∂y ∂x i j i=1

j = 1, . . . m.

The most general, and conceptually clear approach to the multi-variable chain is based on the notion of a differentiable mapping, with the Jacobian matrix of partial derivatives playing the role of generalized derivative. Let, X ⊂ Rm and Y ⊂ Rn be open domains and let F : Y → Rl ,

G:X→Y

be differentiable mappings. In essence, the symbol F represents l functions of n variables each: F = (F1 , . . . , Fl ), Fi = Fi (x1 , . . . , xn ), whereas G = (G1 , . . . , Gn ) represents n functions of m variables each. The derivative of such mappings is no longer a function, but rather a matrix of partial derivatives, customarily called the Jacobian matrix. Thus     D1 G1 . . . Dm G1 D1 F1 . . . Dn F1   ..  ..  .. .. DG =  ... DF =  ... . . .  .  D1 Gn . . . Dm Gn

D1 Fl . . . Dn Fl

1488

The chain rule now takes the same form as it did for functions of one variable: D(F ◦ G) = ((DF) ◦ G) (DG),

albeit with matrix multiplication taking the place of ordinary multiplication. This form of the chain rule also generalizes quite nicely to the even more general setting where one is interested in describing the derivative of a composition of mappings between manifolds. Version: 7 Owner: rmilson Author(s): rmilson

379.10

divergence

Basic Definition. Let x, y, z be a system of Cartesian coordinates on 3-dimensional Euclidean space, and let i, j, k be the corresponding basis of unit vectors. The divergence of a continuously differentiable vector field F = F 1 i + F 2 j + F 3 k, is defined to be the function ∂F 1 ∂F 2 ∂F 3 + + . ∂x ∂y ∂z Another common notation for the divergence is ∇ · F (see gradient), a convenient mnemonic. div F =

Physical interpretation. In physical terms, the divergence of a vector field is the extent to which the vector field flow behaves like a source or a sink at a given point. Indeed, an alternative, but logically equivalent definition, gives the divergence as the derivative of the net flow of the vector field across the surface of a small sphere relative to the surface area of the sphere. To wit,  (div F)(p) = lim intS(F · N)dS / 4πr 2 , r→0

where S denotes the sphere of radius r about a point p ∈ R3 , and the integral is a surface integral taken with respect to N, the normal to that sphere.

The non-infinitesimal interpretation of divergence is given by Gauss’s Theorem. This theorem is a conservation law, stating that the volume total of all sinks and sources, i.e. the volume integral of the divergence, is equal to the net flow across the volume’s boundary. In symbols, intV div F dV = intS (F · N) dS,

where V ⊂ R3 is a compact region with a smooth boundary, and S = ∂V is that boundary oriented by outward-pointing normals. We note that Gauss’s theorem follows from the more general Stokes’ theorem, which itself generalizes the fundamental theorem of calculus. In light of the physical interpretation, a vector field with constant zero divergence is called incompressible – in this case, no net flow can occur across any closed surface. 1489

General definition. The notion of divergence has meaning in the more general setting of Riemannian geometry. To that end, let V be a vector field on a Riemannian manifold. The covariant derivative of V is a type (1, 1) tensor field. We define the divergence of V to be the trace of that field. In terms of coordinates (see tensor and Einstein summation convention), we have div V = V i ;i . Version: 6 Owner: rmilson Author(s): rmilson, jaswenso

379.11

extremum

Extrema are minima and maxima. The singular forms of these words are extremum, minimum, and maximum. Extrema may be “global” or “local”. A global minimum of a function f is the lowest value that f ever achieves. If you imagine the function as a surface, then a global minimum is the lowest point on that surface. Formally, it is said that f : U → V has a global minimum at x if ∀u ∈ U, f (x) 6 f (u). A local minimum of a function f is a point x which has less value than all points ”next to” it. If you imagine the function as a surface, then a local minimum is the bottom of a “valley” or “bowl” in the surface somewhere. Formally, it is said that f : U → V has a local minimum at x if ∃ a neighborhood N of x such that ∀y ∈ N, f (x) 6 f (y). If you flip the 6 signs above to >, you get the definitions of global and local maxima. A ”strict local minima” or ”strict local maxima” means that nearby points are strictly less than or strictly greater than the critical point, rather than 6 or >. For instance, a strict local minima at x has a neighborhood N such that ∀y ∈ N, (f (x) < f (y) or y = x). Related concepts are plateau and saddle point. Finding minima or maxima is an important task which is part of the field of optimization. Version: 9 Owner: bshanks Author(s): bshanks, bbukh

379.12

irrotational field

Suppose Ω is an open set in R3 , and V is a vector field with differentiable real (or possibly complex) valued component functions. If ∇ × V = 0, then V is called an irrotional vector field, or curl free field. If U and V are irrotational, then U × V is solenoidal. 1490

Version: 6 Owner: matte Author(s): matte, giri

379.13

partial derivative

The partial derivative of a multivariable function f is simply its derivative with respect to only one variable, keeping all other variables constant (which are not functions of the variable in question). The formal definition is  

a1 .. .

   ∂f 1   Di f (a) = = lim f  ai + h ∂ai h→0 h   ..   . an





    f (a + h~ei ) − f (a)    − f (a) = lim   h→0 h  

where ~ei is the standard basis vector of the ith variable. Since this only affects the ith variable, one can derive the function using common rules and tables, treating all other variables (which are not functions of ai ) as constants. For example, if f (x) = x2 + 2xy + y 2 + y 3z, then (1)

∂f ∂x

= 2x + 2y

(2)

∂f ∂y

= 2x + 2y + 3y 2z

(3)

∂f ∂z

= y3

Note that in equation (1), we treated y as a constant, since we were differentiating with respect to x. d(c∗x) = c The partial derivative of a vector-valued function f~(x) with respect dx −→ ∂ f~ to variable ai is a vector Di f = ∂a . i Multiple Partials: Multiple partial derivatives can be treated just like multiple derivatives. There is an additional degree of freedom though, as you can compound derivatives with respect to different variables. For example, using the above function, (4)

∂2f ∂x2

=

∂ (2x ∂x

+ 2y)

(5)

∂2f ∂z∂y

=

∂ (2x ∂z

+ 2y + (5)3y 2z) = 3y 2

(6)

∂2f ∂y∂z

=

∂ (y 3 ) ∂y

=2

= 3y 2

D12 is another way of writing ∂x1∂∂x2 . If f (x) is continuous in the neighborhood of x, it can be shown that Dij f (x) = Dji f (x) where i, j are the ith and jth variables. In fact, as long as an equal number of partials are taken with respect to each variable, changing the order 1491

of differentiation will produce the same results in the above condition. Another form of notation is f (a,b,c,...)(x) where a is the partial derivative with respect to the first variable a times, b is the partial with respect to the second variable b times, etc. Version: 17 Owner: slider142 Author(s): slider142

379.14

plateau

A plateau of a function is a region where a function has constant value. More formally, let U and V be topological spaces. A plateau for a scalar function f : U → V is a path-connected set of points P ⊆ U such that for some y we have ∀p ∈ P, f (p) = y

(379.14.1)

Please take note that this entry is not authoritative. If you know of a more standard definition of ”plateau”, please contribute it, thank you. Version: 4 Owner: bshanks Author(s): bshanks

379.15

proof of Green’s theorem

Consider the region R bounded by the closed curve P in a well-connected space. P can be given by a vector valued function F~ (x, y) = (f (x, y), g(x, y)). The region R can then be described by   ∂g ∂f ∂f ∂g intintR − dA − intintR dA dA = intintR ∂x ∂y ∂x ∂y The double integrals above can be evaluated separately. Let’s look at intintR

∂g B(y) ∂g dA = intba intA(y) dxdy ∂x ∂x

Evaluating the above double integral, we get intba (g(A(y), y) − g(B(y), y)) dy = intba g(A(y), y) dy − intba g(B(y), y) dy According to the fundamental theorem of line integrals, the above equation is actually equivalent to the evaluation of the line integral of the function F~1 (x, y) = (0, g(x, y)) over a path P = P1 + P2 , where P1 = (A(y), y) and P2 = (B(y), y). I b b inta g(A(y), y) dy − inta g(B(y), y) dy = intP1 F~1 · d~t + intP2 F~1 · d~t = F~1 · d~t P

1492

Thus we have

∂g intintR dA = ∂x

By a similar argument, we can show that

I

∂f dA = − intintR ∂y

F~1 · d~t

P

I

P

F~2 · d~t

where F~2 = (f (x, y), 0). Putting all of the above together, we can see that   I I I I ∂g ∂f − dA = F~1 · d~t + F~2 · d~t = (F~1 + F~2 ) · d~t = (f (x, y), g(x, y)) · d~t intintR ∂x ∂y P P P P which is Green’s theorem. Version: 7 Owner: slider142 Author(s): slider142

379.16

relations between Hessian matrix and local extrema

Let x be a vector, and let H(x) be the Hessian for f at a point x. Let the neighborhood of x be in the domain for f , and let f have continuous partial derivatives of first and second order. Let ∇f = ~0. If H(x) is positive definite, then x is a strict local minimum for f . If x is a local minimum for x, then H(x) is positive semidefinite. If H(x) is negative definite, then x is a strict local maximum for f . If x is a local maximum for x, then H(x) is negative semidefinite. If H(x) is indefinite, x is a nondegenerate saddle point. If the case when the dimension of x is 1 (i.e. f : R → R), this reduces to the Second Derivative Test, which is as follows: Let the neighborhood of x be in the domain for f , and let f have continuous partial derivatives of first and second order. Let f 0 (x) = 0. If f 00 (x) > 0, then x is a strict local minimum. If f 00 (x) < 0, then x is a strict local maximum. Version: 6 Owner: bshanks Author(s): bshanks

1493

379.17

solenoidal field

A solenoidal vector field is one that satisfies ∇·B=0 at every point where the vector field B is defined. Here ∇ · B is the divergence. This condition actually implies that there exists a vector A, known as the vector potential, such that B = ∇ × A. For a function f satisfying Laplace’s equation ∇2 f = 0, it follows that ∇f is solenoidal. Version: 4 Owner: giri Author(s): giri

1494

Chapter 380 26B15 – Integration: length, area, volume 380.1

arc length

Arc length is the length of a section of a differentiable curve. Finding arc length is useful in many applications, for the length of a curve can be attributed to distance traveled, work, etc. It is commonly represented as S or the differential ds if one is differentiating or integrating with respect to change in arclength. If one knows the vector function or parametric equations of a curve, finding the arc length is simple, as it can be given by the sum of the lengths of the tangent vectors to the curve or intba |F~ 0(t)| dt = S Note that t is an independent parameter. In Cartesian coordinates, arclength can be calculated by the formula p S = intba 1 + (f 0 (x))2 dx This formula is derived by viewing arclength as the Riemman sum

lim

∆x→∞

n X p

1 + f 0 (xi ) ∆x

i=1

The term being summed is the length of an approximating secant to the curve over the distance ∆x. As ∆x vanishes, the sum approaches the arclength, thus the algorithm. Arclength can also be derived for polar coordinates from the general formula for vector functions given

1495

above. The result is L = intba

p

r(θ)2 + (r 0 (θ))2 dθ

Version: 5 Owner: slider142 Author(s): slider142

1496

Chapter 381 26B20 – Integral formulas (Stokes, Gauss, Green, etc.) 381.1

Green’s theorem

Green’s theorem provides a connection between path integrals over a well-connected region in the plane and the area of the region bounded in the plane. Given a closed path P bounding a region R with area A, and a vector-valued function F~ = (f (x, y), g(x, y)) over the plane, I F~ · d~x = int intR [g1 (x, y) − f2 (x, y)]dA P

where an is the derivative of a with respect to the nth variable.

Corollary: The closed path integral over a gradient of a function with continuous partial derivatives is always zero. Thus, gradients are conservative vector fields. The smooth function is called the potential of the vector field.

Proof: The corollary states that I

P

~ h · d~x = 0 ∇

We can easily prove this using Green’s theorem.

1497

I

P

~ h · d~x = intintR [g1 (x, y) − f2 (x, y)]dA ∇

But since this is a gradient... intintR [g1 (x, y) − f2 (x, y)]dA = int intR [h21 (x, y) − h12 (x, y)]dA Since h12 = h21 for any function with continuous partials, the corollary is proven. Version: 4 Owner: slider142 Author(s): slider142

1498

Chapter 382 26B25 – Convexity, generalizations 382.1

convex function

Definition Suppose Ω is a convex set in a vector space over R (or C), and suppose f is a function f : Ω → R. If for any x, y ∈ Ω and any λ ∈ (0, 1), we have   f λa + (1 − λ)b 6 λf (a) + (1 − λ)f (b),

we say that f is a convex function. If for any x, y ∈ Ω and any λ ∈ (0, 1), we have   f λa + (1 − λ)b > λf (a) + (1 − λ)f (b),

we say that f is a concave function. If either of the inequalities are strict, then we say that f is a strictly convex function, or a strictly concave function, respectively.

Properties • A function f is a (strictly) convex function if and only if −f is a (strictly) concave function. • On R, a continuous function is convex if and only if for all x, y ∈ R, we have   x+y f (x) + f (y) . f ≤ 2 2 • A twice continuously differentiable function on R is convex if and only if f 00 (x) ≥ 0 for all x ∈ R. • A local minimum of a convex function is a global minimum. See this page. 1499

Examples • ex ,e−x , and x2 are convex functions on R. • A norm is a convex function. • On R2 , the 1-norm and the ∞-norm (i.e., ||(x, y)||1 = |x| + |y| and ||(x, y)||∞ = max{|x|, |y|}) are not strictly convex ([2], pp. 334-335).

REFERENCES 1. E. Kreyszig, Introductory Functional Analysis With Applications, John Wiley & Sons, 1978.

Version: 11 Owner: matte Author(s): matte, drini

382.2

extremal value of convex/concave functions

Theorem. Let U be a convex set in a normed (real or complex) vector space. If f : U → R is a convex function on U, then a local minimum of f is a global minimum. Proof. Suppose x is a local minimum for f , i.e., there is an open ball B ⊂ U with radius  and center x such that f (x) ≤ f (ξ) for all ξ ∈ B. Let us fix some y ∈ / B. Our aim is to 1 prove that f (x) ≤ f (y). We define λ = 2||x−y|| , where || · || is the norm on U. Then ||λy + (1 − λ)x − x|| = ||λy − λx|| = |λ|||x − y||  = , 2 so λy + (1 − λ)x ∈ B. If follows that f (x) ≤ f (λy + (1 − λ)x). Since f is convex, we then get f (x) ≤ f (λy + (1 − λ)x) ≤ λf (y) + (1 − λ)f (x), and f (x) ≤ f (y) as claimed. 2 The analogous theorem for concave functions is as follows. Theorem. Let U be a convex set in a normed (real or complex) vector space. If f : U → R is a concave function on U, then a local maximum of f is a global maximum. 1500

Proof. Consider the convex function −f . If x is a local maximum of f , then it is a local minimum of −f . By the previous theorem, x is then a global minimum of −f . Hence x is a global maximum of f . 2 Version: 1 Owner: matte Author(s): matte

1501

Chapter 383 26B30 – Absolutely continuous functions, functions of bounded variation 383.1

absolutely continuous function

Definition [1, 1] closed bounded interval of R. Then a function f : [a, b] → C is absolutely continuous on [a, b], if for any ε > 0, there is a δ > 0 such that the following condition holds: (∗) If (a1 , b1 ), . . . , (an , bn ) is a finite collection of disjoint open intervals in [a, b] such that n X i=1

then

n X i=1

(bi − ai ) < δ,

|f (bi ) − f (ai )| < ε.

Basic results for absolutely continuous functions are as follows. Theorem 1. A function f : [a, b] → C is absolutely continuous if and only if Re{f } and Im{f } are absolutely continuous real functions. 2. If f : [a, b] → C is a function, which is everywhere differentiable and f 0 is bounded, then f is absolutely continuous [1]. 1502

3. Any absolutely continuous function f : [a, b] → C is continuous on [a, b] and has a bounded variation [1, 1]. 4. If f, g be absolutely continuous functions, then so are f g, f + g, |f |γ (if γ ≥ 1), and f /g (if g is never zero) [1]. 5. If f, g are real valued absolutely continuous functions, then so are max{f, g} and min{f, g}. If f (x) > 0 for all x and γ ∈ R, then f γ is absolutely continuous [1]. Property (2), which is readily proven using the mean value theorem, implies that any smooth function with compact support on R is absolutely continuous. By property (3), any absolutely continuous function is a bounded variation. Hence, from properties of functions of bounded variation, the following theorem follows: Theorem ([1], pp. 536) Let f : [a, b] → C be a absolutely continuous function. Then f is differentiable almost everywhere, and |f 0| is Lebesgue integrable. We have the following characterization of absolutely continuous functions Theorem [Fundamental theorem of calculus for the Lebesgue integral] ([1], pp. 550, [1]) Let f : [a, b] → C be a function. Then f is absolutely continuous if and only if there is a function g ∈ L1 (a, b) (i.e. a g : (a, b) → C with int(a,b) |g| < ∞), such that f (x) = f (a) + intxa g(t)dt for all x ∈ [a, b]. What is more, if f and g are as above, then f 0 = g almost everywhere. (Above, both integrals are Lebesgue integrals.)

REFERENCES 1. G.B. Folland, Real Analysis: Modern Techniques and Their Applications, 2nd ed, John Wiley & Sons, Inc., 1999. 2. W. Rudin, Real and complex analysis, 3rd ed., McGraw-Hill Inc., 1987. 3. F. Jones, Lebesgue Integration on Euclidean Spaces, Jones and Barlett Publishers, 1993. 4. C.D. Aliprantis, O. Burkinshaw, Principles of Real Analysis, 2nd ed., Academic Press, 1990.

Version: 5 Owner: matte Author(s): matte

383.2

total variation

Let γ : [a, b] → X be a function mapping an interval [a, b] to a metric space (X, d). We say that γ is of bounded variation if there is a constant M such that, for each partition 1503

P = {a = t0 < t1 < · · · < tn = b} of [a, b], v(γ, P ) =

n X

d(γ(tk ), γ(tk−1 )) 6 M.

k=1

The total variation Vγ of γ is defined by Vγ = sup{v(γ, P ) : P is a partition of [a, b]}. It can be shown that, if X is either R or C, every smooth (or piecewise smooth) function γ : [a, b] → X is of bounded variation, and Vγ = intba |γ 0 (t)|dt. Also, if γ is of bounded variation and f : [a, b] → X is continuous, then the Riemann-Stieltjes integral intba f dγ is finite. If γ is also continuous, it is said to be a rectifiable path, and V (γ) is the length of its trace. If X = R, it can be shown that γ is of bounded variation if and only if it is the difference of two monotonic functions. Version: 3 Owner: Koro Author(s): Koro

1504

Chapter 384 26B99 – Miscellaneous 384.1

derivation of zeroth weighted power mean

Let x1 , x2 , . . . , xn be positive real numbers, and let w1 , w2 , . . . , wn be positive real numbers such that w1 + w2 + · · · + wn = 1. For r 6= 0, the r-th weighted power mean of x1 , x2 , . . . , xn is Mwr (x1 , x2 , . . . , xn ) = (w1 xr1 + w2 xr2 + · · · + wn xrn )1/r .

Using the Taylor series expansion et = 1 + t + O(t2 ), where O(t2 ) is Landau notation for terms of order t2 and higher, we can write xri as xri = er log xi = 1 + r log xi + O(r 2 ). By substituting this into the definition of Mwr , we get  1/r Mwr (x1 , x2 , . . . , xn ) = w1 (1 + r log x1 ) + · · · + wn (1 + r log xn ) + O(r 2 )  1/r = 1 + r(w1 log x1 + · · · + wn log xn ) + O(r 2 )   wn 2 1/r 1 w2 = 1 + r log(xw 1 x2 · · · xn ) + O(r )     1 w1 w2 wn 2 = exp log 1 + r log(x1 x2 · · · xn ) + O(r ) . r

Again using a Taylor series, this time log(1 + t) = t + O(t2 ), we get    1 w1 w2 r wn 2 Mw (x1 , x2 , . . . , xn ) = exp r log(x1 x2 · · · xn ) + O(r ) r wn 1 w2 = exp [log(xw 1 x2 · · · xn ) + O(r)] . Taking the limit r → 0, we find wn 1 w2 Mw0 (x1 , x2 , . . . , xn ) = exp [log(xw 1 x2 · · · xn )] wn 1 w2 = xw 1 x2 · · · xn .

1505

In particular, if we choose all the weights to be n1 , M 0 (x1 , x2 , . . . , xn ) =

√ n

x1 x2 · · · xn ,

the geometric mean of x1 , x2 , . . . , xn . Version: 3 Owner: pbruin Author(s): pbruin

384.2

weighted power mean

If w1 , w2 , . . . , wn are positive real numbers such that w1 + w2 + · · · + wn = 1, we define the r-th weighted power mean of the xi as: Mwr (x1 , x2 , . . . , xn ) = (w1 xr1 + w2 xr2 + · · · + wn xrn )1/r . When all the wi = n1 we get the standard power mean. The weighted power mean is a continuous function of r, and taking limit when r → 0 gives us wn 1 w2 Mw0 = xw 1 x2 · · · wn .

We can weighted use power means to generalize the power means inequality: If w is a set of weights, and if r < s then Mwr < Mws . Version: 6 Owner: drini Author(s): drini

1506

Chapter 385 26C15 – Rational functions 385.1

rational function

A real function R(x) of a single variable x is called rational if it can be written as a quotient R(x) =

P (x) , Q(x)

where P (x) and Q(x) are polynomials in x with real coefficients. In general, a rational function R(x1 , . . . , xn ) has the form R(x1 , . . . , xn ) =

P (x1 , . . . , xn ) , Q(x1 , . . . , xn )

where P (x1 , . . . , xn ) and Q(x1 , . . . , xn ) are polynomials in the variables (x1 , . . . , xn ) with coefficients in some field or ring S. In this sense, R(x1 , . . . , xn ) can be regarded as an element of the fraction field S(x1 , . . . , xn ) of the polynomial ring S[x1 , . . . , xn ]. Version: 1 Owner: igor Author(s): igor

1507

Chapter 386 26C99 – Miscellaneous 386.1

Laguerre Polynomial

A Laguerre Polynomial is a polynomial of the form: ex dn −x n  e x . Ln (x) = n! dxn

Associated to this is the Laguerre differential equation, the solutions of which are called associated Laguerre Polynomials: Lkn (x) = Of course

ex x−k dn −x n+k  e x . n! dxn

L0n (x) = Ln (x). The associated Laguere Polynomials are orthogonal over |0, ∞) with respect to the weighting function xk e−x : (n + k)! x k k k int∞ δn m. 0 e x Ln (x)Lm (x)dx = n! Version: 2 Owner: mathwizard Author(s): mathwizard

1508

Chapter 387 26D05 – Inequalities for trigonometric functions and polynomials 387.1

Weierstrass product inequality

For any finite family (ai )i∈I of real numbers in the interval [0, 1], we have Y X (1 − ai ) ≥ 1 − ai . i

Proof: Write f=

i

Y X (1 − ai ) + ai . i

i

For any k ∈ I, and any fixed values of the ai for i 6= k, f is a polynomial of the first degree in ak . Consequently f is minimal either at ak = 0 or ak = 1. That brings us down to two cases: all the ai are zero, or at least one of them is 1. But in both cases it is clear that f ≥ 1, QED. Version: 2 Owner: Daume Author(s): Larry Hammick

387.2

proof of Jordan’s Inequality

To prove that 2 x 6 sin(x) 6 x π

π ∀ x ∈ [0, ] 2

(387.2.1)

consider a unit circle (circle with radius = 1 unit). Take any point P on the circumference of the circle. 1509

Drop the perpendicular from P to the horizontal line, M being the foot of the perpendicular and Q the reflection of P at M. (refer to figure) Let x = ∠P OM. For x to be in [0, π2 ], the point P lies in the first quadrant, as shown.

The length of line segment P M is sin(x). Construct a circle of radius MP , with M as the center.

Length of line segment P Q is 2 sin(x). Length of arc P AQ is 2x. Length of arc P BQ is π sin(x). Since P Q 6 length of arc P AQ (equality holds when x = 0) we have 2 sin(x) 6 2x. This implies sin(x) 6 x Since length of arc P AQ is 6 length of arc P BQ (equality holds true when x = 0 or x = π2 ), we have 2x 6 π sin(x). This implies 2 x 6 sin(x) π Thus we have 2 x 6 sin(x) 6 x π Version: 12 Owner: giri Author(s): giri

1510

π ∀ x ∈ [0, ] 2

(387.2.2)

Chapter 388 26D10 – Inequalities involving derivatives and differential and integral operators 388.1

Gronwall’s lemma

If, for t0 6 t 6 t1 , φ(t) > 0 and ψ(t) > 0 are continuous functions such that the inequality φ(t) 6 K + Linttt0 ψ(s)φ(s)ds holds on t0 6 t 6 t1 , with K and L positive constants, then  φ(t) 6 K exp Linttt0 ψ(s)ds on t0 6 t 6 t1 .

Version: 1 Owner: jarino Author(s): jarino

388.2

proof of Gronwall’s lemma

The inequality φ(t) 6 K + Linttt0 ψ(s)φ(s)ds is equivalent to φ(t) 61 K + Linttt0 ψ(s)φ(s)ds Multiply by Lψ(t) and integrate, giving inttt0

Lψ(s)φ(s)ds 6 Linttt0 ψ(s)ds K + Lintst0 ψ(τ )φ(τ )dτ 1511

(388.2.1)

Thus and finally

 ln K + Linttt0 ψ(s)φ(s)ds − ln K 6 Linttt0 ψ(s)ds K + Linttt0 ψ(s)φ(s)ds 6 K exp Linttt0 ψ(s)ds



Using (387.2.1) in the left hand side of this inequality gives the result. Version: 2 Owner: jarino Author(s): jarino

1512

Chapter 389 26D15 – Inequalities for sums, series and integrals 389.1

Carleman’s inequality

Theorem ([4], pp. 24) For positive real numbers {an }∞ n=1 , Carleman’s inequality states that ∞ ∞ X X 1/n an . ≤e a1 a2 · · · an n=1

n=1

Although the constant e (the natural log base) is optimal, it is possible to refine Carleman’s inequality by decreasing the weight coefficients on the right hand side [2].

REFERENCES 1. L. H¨ormander, The Analysis of Linear Partial Differential Operators I, (Distribution theory and Fourier Analysis), 2nd ed, Springer-Verlag, 1990. 2. B.Q. Yuan, Refinements of Carleman’s inequality, Journal of Inequalities in Pure and Applied Mathematics, Vol. 2, Issue 2, 2001, Article 21. online

Version: 2 Owner: matte Author(s): matte

389.2

Chebyshev’s inequality

If x1 , x2 , . . . , xn and y1 , y2, . . . , yn are two sequences (at least one of them consisting of positive numbers):

1513

• if x1 < x2 < · · · < xn and y1 < y2 < · · · < yn then    x1 + x2 + · · · + xn y1 + y2 + · · · + yn x1 y1 + x2 y2 + · · · + xn yn . ≤ n n n • if x1 < x2 < · · · < xn and y1 > y2 > · · · > yn then    x1 + x2 + · · · + xn y1 + y2 + · · · + yn x1 y1 + x2 y2 + · · · + xn yn . ≥ n n n Version: 1 Owner: drini Author(s): drini

389.3

MacLaurin’s Inequality

Let a1 , a2 , . . . , an be positive real numbers , and define the sums Sk as follows : X ai1 ai2 · · · aik Sk =

16i1
  n k

Then the following chain of inequalities is true : p p p S1 > S2 > 3 S3 > · · · > n Sn

Note : Sk are called the averages of the elementary symmetric sums This inequality is in fact important because it shows that the Arithmetic-Geometric Mean inequality is nothing but a consequence of a chain of stronger inequalities Version: 2 Owner: drini Author(s): drini, slash

389.4

Minkowski inequality

If p > 1 and ak , bk are real numbers for k = 1, . . ., then !1/p !1/p n n X X + ≤ |ak |p |ak + bk |p k=1

k=1

n X k=1

|bk |p

!1/p

The Minkowski inequality is in fact valid for all Lp norms with p ≥ 1 on arbitrary measure spaces. This covers the case of Rn listed here as well as spaces of sequences and spaces of functions, and also complex Lp spaces. Version: 8 Owner: drini Author(s): drini, saforres 1514

389.5

Muirhead’s theorem

Let 0 6 s1 6 · · · 6 sn and 0 6 t1 6 . . . 6 tn be real numbers such that n X i=1

si =

n X

ti and

i=1

k X

si 6

i=1

k X i=1

ti (k = 1, . . . , n − 1)

Then for any nonnegative numbers x1 , . . . , xn , X sσ(1) X tσ(1) s t x1 . . . xnσ(n) 6 x1 . . . xnσ(n) σ

σ

where the sums run over all permutations σ of {1, 2, . . . , n}.

Version: 3 Owner: Koro Author(s): Koro

389.6

Schur’s inequality

If a, b, and c are positive real numbers and k > 1 a fixed real constant, then the following inequality holds: ak (a − b)(b − c) + bk (b − c)(c − a) + ck (c − a)(c − b) > 0 Taking k = 1, we get the well-known a3 + b3 + c3 + 3abc > ab(a + b) + ac(a + c) + bc(b + c) e can assume without loss of generality that c 6 b 6 a via a permutation of the variables (as both sides are symmetric in those variables). Then collecting terms, the lemma states that  (a − b) ak (a − c) − bk (b − c) + ck (a − c)(b − c) > 0 W

which is clearly true as every term on the left is positive. Version: 3 Owner: mathcam Author(s): mathcam, slash

389.7

Young’s inequality

Let φ : R → R be a continuous , strictly increasing function such that φ(0) = 0 . Then the following inequality holds: ab 6 inta0 φ(x)dx + intb0 φ−1 (y)dy 1515

The inequality is trivial to prove by drawing the graph of φ(x) and by observing that the sum of the two areas represented by the integrals above is greater than the area of a rectangle of sides a and b . Version: 2 Owner: slash Author(s): slash

389.8

arithmetic-geometric-harmonic means inequality

Let x1 , x2 , . . . , xn be positive numbers. Then x1 + x2 + · · · + xn n √ ≥ n x1 x2 · · · xn n ≥ 1 1 + x2 + · · · + x1n x1

max{x1 , x2 , . . . , xn } ≥

≥ min{x1 , x2 , . . . , xn } There are several generalizations to this inequality using power means and weighted power means. Version: 4 Owner: drini Author(s): drini

389.9

general means inequality

The power means inequality is a generalization of arithmetic-geometric means inequality. If 0 6= r ∈ R, the r-mean (or r-th power mean) of the nonnegative numbers a1 , . . . , an is defined as !1/r n X 1 M r (a1 , a2 , . . . , an ) = ark n k=1

Given real numbers x, y such that xy 6= 0 and x < y, we have Mx 6 My and the equality holds if and only if a1 = ... = an . Additionally, if we define M 0 to be the geometric mean (a1 a2 ...an )1/n , we have that the inequality above holds for arbitrary real numbers x < y. The mentioned inequality is a special case of this one, since M 1 is the arithmetic mean, M 0 is the geometric mean and M −1 is the harmonic mean. 1516

This inequality can be further generalized using weighted power means. Version: 3 Owner: drini Author(s): drini

389.10

power mean

The r-th power mean of the numbers x1 , x2 , . . . , xn is defined as:

r

M (x1 , x2 , . . . , xn ) =



xr1 + xr2 + · · · + xrn n

1/r

.

The arithmetic mean is a special case when r = 1. The power mean is a continuous function of r, and taking limit when r → 0 gives us the geometric mean: M 0 (x1 , x2 , . . . , xn ) =

√ n

x1 x2 · · · xn .

Also, when r = −1 we get M −1 (x1 , x2 , . . . , xn ) =

1 x1

+

1 x2

n +···+

1 xn

the harmonic mean. A generalization of power means are weighted power means. Version: 8 Owner: drini Author(s): drini

389.11

proof of Chebyshev’s inequality

Let x1 , x2 , . . . , xn and y1 , y2, . . . , yn be real numbers such that x1 ≤ x2 ≤ · · · ≤ xn . Write the product (x1 + x2 + · · · + xn )(y1 + y2 + · · · + yn ) as + + + +

(x1 y1 + x2 y2 + · · · + xn yn ) (x1 y2 + x2 y3 + · · · + xn−1 yn + xn y1 ) (x1 y3 + x2 y4 + · · · + xn−2 yn + xn−1 y1 + xn y2 ) ··· (x1 yn + x2 y1 + x3 y2 + · · · + xn yn−1 ). 1517

(389.11.1)

• If y1 ≤ y2 ≤ · · · ≤ yn , each of the n terms in parentheses is less than or equal to x1 y1 + x2 y2 + · · · + xn yn , according to the rearrangement inequality. From this, it follows that (x1 + x2 + · · · + xn )(y1 + y2 + · · · + yn ) ≤ n(x1 y1 + x2 y2 + · · · + xn yn ) or (dividing by n2 )    x1 + x2 + · · · + xn y1 + y2 + · · · + yn x1 y1 + x2 y2 + · · · + xn yn . ≤ n n n • If y1 ≥ y2 ≥ · · · ≥ yn , the same reasoning gives    x1 + x2 + · · · + xn y1 + y2 + · · · + yn x1 y1 + x2 y2 + · · · + xn yn ≥ . n n n It is clear that equality holds if x1 = x2 = · · · = xn or y1 = y2 = · · · = yn . To see that this condition is also necessary, suppose that not all yi ’s are equal, so that y1 6= yn . Then the second term in parentheses of (388.11.1) can only be equal to x1 y1 + x2 y2 + · · · + xn yn if xn−1 = xn , the third term only if xn−2 = xn−1 , and so on, until the last term which can only be equal to x1 y1 + x2 y2 + · · · + xn yn if x1 = x2 . This implies that x1 = x2 = · · · = xn . Therefore, Chebyshev’s inequality is an equality if and only if x1 = x2 = · · · = xn or y1 = y2 = · · · = yn . Version: 1 Owner: pbruin Author(s): pbruin

389.12

proof of Minkowski inequality

For p = 1 the result follows immediately from the triangle inequality, so we may assume p > 1. We have |ak + bk |p = |ak + bk ||ak + bk |p−1 6 (|ak | + |bk |)|ak + bk |p−1

by the triangle inequality. Therefore we have

|ak + bk |p 6 |ak ||ak + bk |p−1 + |bk ||ak + bk |p−1 Set q =

p . p−1

Then

1 p

n X k=0

n X k=0

+

1 q

= 1, so by the H¨older inequality we have

|ak ||ak + bk |p−1 6

n X

|bk ||ak + bk |p−1 6

n X

k=0

k=0

|ak |p |bk |p

1518

! 1p

! p1

n X

|ak + bk |(p−1)q

n X

|ak + bk |(p−1)q

k=0

k=0

! 1q

! 1q

Adding these two inequalities, dividing by the factor common to the right sides of both, and observing that (p − 1)q = p by definition, we have !1− 1q ! p1 ! p1 n n n n X X X X |ak + bk |p |ak |p |bk |p 6 (|ak | + |bk |)|ak + bk |p−1 6 + i=0

k=0

k=0

k=0

Finally, observe that 1 − 1q = 1p , and the result follows as required. The proof for the integral version is analogous. Version: 4 Owner: saforres Author(s): saforres

389.13

proof of arithmetic-geometric-harmonic means inequality

Let M be max{x1 , x2 , x3 , . . . , xn } and let m be min{x1 , x2 , x3 , . . . , xn }. Then

M +M +M +···+M x1 + x2 + x3 + · · · + xn > n n n n n m= n = 1 1 1 1 1 1 6 1 + x2 + x3 + · · · + x1n + m + m +···+ m m m x1 M=

where all the summations have n terms. So we have proved in this way the two inequalities at the extremes. Now we shall prove the inequality between arithmetic mean and geometric mean. We do first the case n = 2. √ √ ( x1 − x2 )2 √ x1 − 2 x1 x2 + x2 x1 + x2 x1 + x2 2

> 0 > 0 √ > 2 x1 x2 √ x1 x2 >

Now we prove the inequality for any power of 2 (that is, n = 2k for some integer k) by using mathematical induction. x1 + x2 + · · · + x2k + x2k +1 + · · · + x2k+1 k+1 2  x +x   +···+x2k+1 x1 +x2 +···+x2k 2k +1 2k +2 + k k 2 2 = 2 1519

and using the case n = 2 on the last expression we can state the following inequality x1 + x2 + · · · + x2k + x2k +1 + · · · + x2k+1 2k+1 s   x1 + x2 + · · · + x2k x2k +1 + x2k +2 + · · · + x2k+1 ≥ 2k 2k q √ √ 2k x1 x2 · · · x2k 2k x2k +1 x2k +2 · · · x2k+1 ≥

where the last inequality was obtained by applying the induction hypothesis with n = 2k . √ Finally, we see that the last expression is equal to 2k+1 x1 x2 x3 · · · x2k+1 and so we have proved the truth of the inequality when the number of terms is a power of two. Finally, we prove that if the inequality holds for any n, it must also hold for n − 1, and this proposition, combined with the preceding proof for powers of 2, is enough to prove the inequality for any positive integer. Suppose that

√ x1 + x2 + · · · + xn > n x1 x2 · · · xn n is known for a given value of n (we just proved that it is true for powers of two, as example). Then we can replace xn with the average of the first n − 1 numbers. So +···+xn−1  x1 + x2 + · · · + xn−1 + x1 +x2n−1 n (n − 1)x1 + (n − 1)x2 + · · · + (n − 1)xn−1 + x1 + x2 + · · · + xn = n(n − 1) nx1 + nx2 + · · · + nxn−1 = n(n − 1) x1 + x2 + · · · + xn−1 = (n − 1)

On the other hand

s n

 x1 + x2 + · · · + xn−1 x1 x2 · · · xn−1 n−1 r x1 + x2 + · · · + xn−1 √ = n x1 x2 · · · xn−1 n n−1 

which, by the inequality stated for n and the observations made above, leads to:   n  x1 + x2 + · · · + xn−1 x1 + x2 + · · · + xn−1 ≥ (x1 x2 · · · xn ) n−1 n−1 and so



x1 + x2 + · · · + xn−1 n−1

n−1

1520

≥ x1 x2 · · · xn

from where we get that x1 + x2 + · · · + xn−1 ≥ n−1



n−1

x1 x2 · · · xn .

So far we have proved the inequality between the arithmetic mean and the geometric mean. The geometric-harmonic inequality is easier. Let ti be 1/xi . From we obtain

1 x1

and therefore

√ t1 + t2 + · · · + tn > n t1 t2 t3 · · · tn n r + x12 + x13 + · · · + x1n 1 1 1 1 > n ··· n x1 x2 x3 xn

√ n

x1 x2 x3 · · · xn >

n 1 x1

+

and so, our proof is completed.

1 x2

+

1 x3

+···+

1 xn

Version: 2 Owner: drini Author(s): drini

389.14

proof of general means inequality

Let r < s be real numbers, and let w1 , w2 , . . . , wn be positive real numbers such that w1 + w2 + · · · + wn = 1. We will prove the weighted power means inequality, which states that for positive real numbers x1 , x2 , . . . , xn , Mwr (x1 , x2 , . . . , xn ) ≤ Mws (x1 , x2 , . . . , xn ). First, suppose that r and s are nonzero. Then the r-th weighted power mean of x1 , x2 , . . . , xn is Mwr (x1 , x2 , . . . , xn ) = (w1 x1 + w2 x2 + · · · + wn xn )1/r and Mws is defined similarly.

Let t = rs , and let yi = xri for 1 ≤ i ≤ n; this implies yit = xsi . Define the function f on 1 xt−2 . There are three cases (0, ∞) by f (x) = xt . The second derivative of f is f 00 (x) = t(t−1) for the signs of r and s: r < s < 0, r < 0 < s, and 0 < r < s. We will prove the inequality for the case 0 < r < s; the other cases are almost identical. 1 xt−2 > 0 for all x > 0, In the case that r and s are both positive, t > 1. Since f 00 (x) = t(t−1) f is a strictly convex function. Therefore, according to Jensen’s inequality,

(w1 y1 + w2 y2 + · · · + wn yn )t = f (w1y1 + w2 y2 + · · · + wn yn ) ≤ w1 f (y1) + w2 f (y2) + · · · + wn f (yn ) = w1 y1t + w2 y2t + · · · + wn ynt . 1521

with equality if and only if y1 = y2 = · · · = yn . By substituting t = this inequality, we get

s r

and yi = xri back into

(w1 xr1 + w2 xr2 + · · · + wn xrn )s/r ≤ w1 xs1 + w2 xs2 + · · · + wn xsn with equality if and only if x1 = x2 = · · · = xn . Since s is positive, the function x 7→ x1/s is strictly increasing, so raising both sides to the power 1/s preserves the inequality: (w1 xr1 + w2 xr2 + · · · + wn xrn )1/r ≤ (w1 xs1 + w2 xs2 + · · · + wn xsn )1/s , which is the inequality we had to prove. Equality holds if and only if all the xi are equal. If r = 0, the inequality is still correct: Mw0 is defined as limr→0 Mwr , and since Mwr ≤ Mws for all r < s with r 6= 0, the same holds for the limit r → 0. We can show by an identical argument that Mwr ≤ Mw0 for all r < 0. Therefore, for all real numbers r and s such that r < s, Mwr (x1 , x2 , . . . , xn ) ≤ Mws (x1 , x2 , . . . , xn ). Version: 1 Owner: pbruin Author(s): pbruin

389.15

proof of rearrangement inequality

We first prove the rearrangement inequality for the case n = 2. Let x1 , x2 , y1 , y2 be real numbers such that x1 ≤ x2 and y1 ≤ y2 . Then (x2 − x1 )(y2 − y1 ) ≥ 0, and therefore x1 y1 + x2 y2 ≥ x1 y2 + x2 y1 .

Equality holds iff x1 = x2 or y1 = y2 .

For the general case, let x1 , x2 , . . . , xn and y1 , y2, . . . , yn be real numbers such that x1 ≤ x2 ≤ · · · ≤ xn . Suppose that (z1 , z2 , . . . , zn ) is a permutation (rearrangement) of {y1, y2 , . . . , yn } such that the sum x1 z1 + x2 z2 + · · · + xn zn is maximized. If there exists a pair i < j with zi > zj , then xi zj + xj zi ≥ xi zi + xj zj (the n = 2 case); equality holds iff xi = xj . Therefore, x1 z1 + x2 z2 + · · · + xn zn is not maximal unless z1 ≤ z2 ≤ · · · ≤ zn or xi = xj for all pairs i < j such that zi > zj . In the latter case, we can consecutively interchange these pairs until z1 ≤ z2 ≤ · · · ≤ zn (this is possible because the number of pairs i < j with zi > zj decreases with each step). So x1 z1 + x2 z2 + · · · + xn zn is maximized if z1 ≤ z2 ≤ · · · ≤ zn . To show that x1 z1 + x2 z2 + · · · + xn zn is minimal for a permutation (z1 , z2 , . . . , zn ) of {y1 , y2 , . . . , yn } if z1 ≥ z2 ≥ · · · ≥ zn , observe that −(x1 z1 + x2 z2 + · · · + xn zn ) = x1 (−z1 ) + 1522

x2 (−z2 ) + · · · + xn (−zn ) is maximized if −z1 ≤ −z2 ≤ · · · ≤ −zn . This implies that x1 z1 + x2 z2 + · · · + xn zn is minimized if z1 ≥ z2 ≥ · · · ≥ zn . Version: 1 Owner: pbruin Author(s): pbruin

389.16

rearrangement inequality

Let x1 , x2 , . . . , xn and y1 , y2 , . . . , yn two sequences of positive real numbers. Then the sum x1 y1 + x2 y2 + · · · + xn yn is maximized when the two sequences are ordered in the same way (i.e. x1 ≤ x2 ≤ · · · ≤ xn and y1 ≤ y2 ≤ · · · ≤ yn ) and is minimized when the two sequences are ordered in the opposite way (i.e. x1 ≤ x2 ≤ · · · ≤ xn and y1 ≥ y2 ≥ · · · ≥ yn ). This can be seen intuitively as: If x1 , x2 , . . . , xn are the prices of n kinds of items, and y1 , y2 , . . . , yn the number of units sold of each, then the highest profit is when you sell more items with high prices and fewer items with low prices (same ordering), and the lowest profit happens when you sell more items with lower prices and less items with high prices (opposite orders). Version: 4 Owner: drini Author(s): drini

1523

Chapter 390 26D99 – Miscellaneous 390.1

Bernoulli’s inequality

Let x and r be real numbers. If r > 1 and x > −1 then (1 + x)r ≥ 1 + xr.

The inequality also holds when r is an even integer. Version: 3 Owner: drini Author(s): drini

390.2

proof of Bernoulli’s inequality

Let I be the interval (−1, ∞) and f : I → R the function defined as: f (x) = (1 + x)α − 1 − αx with α ∈ R \ {0, 1} fixed. Then f is differentiable and its derivative is f 0 (x) = α(1 + x)α−1 − α, for all x ∈ I, from which it follows that f 0 (x) = 0 ⇔ x = 0. 1. If 0 < α < 1 then f 0 (x) < 0 for all x ∈ (0, ∞) and f 0 (x) > 0 for all x ∈ (−1, 0) which means that 0 is a global maximum point for f . Therefore f (x) < f (0) for all x ∈ I \ {0} which means that (1 + x)α < 1 + αx for all x ∈ (−1, 0). 1524

2. If α ∈ / [0, 1] then f 0 (x) > 0 for all x ∈ (0, ∞) and f 0 (x) < 0 for all x ∈ (−1, 0) meaning that 0 is a global minimum point for f . This implies that f (x) > f (0) for all x ∈ I \{0} which means that (1 + x)α > 1 + αx for all x ∈ (−1, 0). Checking that the equality is satisfied for x = 0 or for α ∈ {0, 1} ends the proof. Version: 3 Owner: danielm Author(s): danielm

1525

Chapter 391 26E35 – Nonstandard analysis 391.1

hyperreal

An ultrafilter F on a set I is called nonprincipal if no finite subsets of I are in F. Fix once and for all a nonprincipal ultrafilter F on the set N of natural numbers. Let ∼ be the equivalence relation on the set RN of sequences of real numbers given by {an } ∼ {bn } ⇔ {n ∈ N | an = bn } ∈ F Let ∗ R be the set of equivalence classes of RN under the equivalence relation ∼. The set ∗ R is called the set of hyperreals. It is a field under coordinatewise addition and multiplication: {an } + {bn } = {an + bn } {an } · {bn } = {an · bn } The field ∗ R is an ordered field under the ordering relation {an } 6 {bn } ⇔ {n ∈ N | an 6 bn } ∈ F The real numbers embed into ∗ R by the map sending the real number x ∈ R to the equivalence class of the constant sequence given by xn := x for all n. In what follows, we adopt the convention of treating R as a subset of ∗ R under this embedding. A hyperreal x ∈ ∗ R is: • limited if a < x < b for some real numbers a, b ∈ R • positive unlimited if x > a for all real numbers a ∈ R • negative unlimited if x < a for all real numbers a ∈ R 1526

• unlimited if it is either positive unlimited or negative unlimited • positive infinitesimal if 0 < x < a for all positive real numbers a ∈ R+ • negative infinitesimal if a < x < 0 for all negative real numbers a ∈ R− • infinitesimal if it is either positive infinitesimal or negative infinitesimal For any subset A of R, the set ∗ A is defined to be the subset of ∗ R consisting of equivalence classes of sequences {an } such that {n ∈ N | an ∈ A} ∈ F. The sets ∗ N, ∗ Z, and ∗ Q are called hypernaturals, hyperintegers, and hyperrationals, respectively. An element of ∗ N is also sometimes called hyperfinite. Version: 1 Owner: djao Author(s): djao

391.2

e is not a quadratic irrational

Looking at the Taylor series for ex , we see that x

e =

∞ X xk k=0

k!

.

P∞ 1 P∞ −1 k 1 This converges for every x ∈ R, so e = = k=0 k! and e k=0 (−1) k! . Arguing by contradiction, assume ae2 +be+c = 0 for integers a, b and c. That is the same as ae+b+ce−1 = 0. Fix n > |a| + |c|, then a, c | n! and ∀k ≤ n, k! | n! . Consider ∞ ∞ X X 1 1 0 = n!(ae + b + ce ) = an! + b + cn! (−1)k k! k! k=0 k=0 −1

= b+

n X

(a + c(−1)k )

∞ X n! n! + (a + c(−1)k ) k! k! k=n+1

k=0

Since k! | n! for k ≤ n, the first two terms are integers. So the third term should be an

1527

integer. However, ∞ ∞ X X n! n! (a + c(−1)k ) ≤ (|a| + |c|) k! k! k=n+1 k=n+1 = (|a| + |c|)

≤ (|a| + |c|) = (|a| + |c|) = (|a| + |c|)

∞ X

1 (n + 1)(n + 2) · · · k k=n+1 ∞ X

(n + 1)n−k

k=n+1 ∞ X

(n + 1)−t

t=1

1 n

is less than 1 by our assumption that n > |a|P + |c|. Since there is only one integer which is k 1 less than 1 in absolute value, this means that ∞ k=n+1 (a+ c(−1) ) k! = 0 for every sufficiently large n which is not the case because ∞ X 1 1 1 (a + c(−1)k ) = (a + c(−1)k ) (a + c(−1) ) − k! k=n+2 k! (n + 1)! k=n+1 ∞ X

k

is not identically zero. The contradiction completes the proof. Version: 6 Owner: thedagit Author(s): bbukh, thedagit

391.3

zero of a function

Definition Suppose X is a set, and suppose f is a complex-valued function f : X → C. Then a zero of f is an element x ∈ X such that f (x) = 0. The zero set of f is the set Z(f ) = {x ∈ X | f (x) = 0}. Remark When X is a “simple” space, such as R or C a zero is also called a root. However, in pure mathematics and especially if Z(f ) is infinite, it seems to be customary to talk of zeroes and the zero set instead of roots. Examples • Suppose p is a polynomial p : C → C of degree n ≥ 1. Then p has at most n zeroes. That is, |Z(p)| ≤ n. 1528

• If f and g are functions f : X → C and g : X → C, then [ Z(f g) = Z(f ) Z(g), Z(f g) ⊃ Z(f ),

where f g is the function x 7→ f (x)g(x). • If X is a topological space and f : X → C is a function, then supp f = Z(f ){. Further, if f is continuous, then Z(f ) is a closed in X (assuming that C is given the usual topology of the complex plane where {0} is a closed set). Version: 21 Owner: mathcam Author(s): matte, yark, say 10, apmxi

1529

Chapter 392 28-00 – General reference works (handbooks, dictionaries, bibliographies, etc.) 392.1

extended real numbers

The extended real numbers are the real numbers together with +∞ (or simply ∞) and −∞. This set is usually denoted by R or [−∞, ∞] [3], and the elements +∞ and −∞ are called plus infinity respectively minus infinity. Following [3], let us next extend the order operation <, the addition and multiplication operations, and the absolute value from R to R. In other words, let us define how these operations should behave when some of their arguments are ∞ or −∞. Order on R The order relation on R extends to R by defining that for any x ∈ R, we have −∞ < x, x < ∞, and that −∞ < ∞. Addition For any real number x, we define x + (±∞) = (±∞) + x = ±∞, 1530

and for +∞ and −∞, we define (±∞) + (±∞) = ±∞. It should be pointed out that sums like (+∞) + (−∞) are left undefined. Multiplication If x is a positive real number, then x · (±∞) = (±∞) · x = ±∞. Similarly, if x is a negative real number, then x · (±∞) = (±∞) · x = ∓∞. Furthermore, for ∞ and −∞, we define (+∞) · (+∞) = (−∞) · (−∞) = +∞, (+∞) · (−∞) = (−∞) · (+∞) = −∞. In many areas of mathematics, products like 0 · ∞ are left undefined. However, a special case is measure theory, where it is convenient to define [3] 0 · (±∞) = (±∞) · 0 = 0. Absolute value For ∞ and −∞, the absolute value is defined as | ± ∞| = +∞. Examples 1. By taking x = −1 in the product rule, we obtain the relations (−1) · (±∞) = ∓∞.

REFERENCES 1. D.L. Cohn, Measure Theory, Birkh¨auser, 1980.

Version: 1 Owner: matte Author(s): matte 1531

Chapter 393 28-XX – Measure and integration 393.1

Riemann integral

Suppose there is a function f : D → R where D, R ⊆ R and that there is a closed interval I = [a, b] such that I ⊆ D. For any finite set of points {x0 , x1 , x2 , . . . xn } such that a = x0 < x1 < x2 · · · < xn = b, there is a corresponding partition P = {[x0 , x1 ), [x1 , x2 ), . . . , [xn − 1, xn ]} of I. Let C() be the set of all partitions of I with max(xi+1 − xi ) < . Then let S ∗ () be the infimum of the set of upper Riemann sums with each partition in C(), and let S∗ () be the supremum of the set of lower Riemann sums with each partition in C(). If 1 < 2 , then C(1 ) ⊂ C(2 ), so S ∗ = lim→0 S ∗ () and S∗ = lim→0 S∗ () exist. If S ∗ = S∗ , then f is Riemann-integrable over I, and the Riemann integral of f over I is defined by intba f (x)dx = S ∗ = S∗ . Version: 4 Owner: bbukh Author(s): bbukh, vampyr

393.2

martingale

Let ν be a probability measure on Cantor space C, and let s ∈ [0, ∞). 1. A ν-s-supergale is a function d : {0, 1}∗ → [0, ∞) that satisfies the condition d(w)ν(w)s > d(w0)ν(w0)s + d(w1)ν(w1)s

(393.2.1)

for all w ∈ {0, 1}∗. 2. A ν-s-gale is a ν-s-supergale that satisfies the condition with equality for all w ∈ {0, 1}∗. 1532

3. A ν-supermartingale is a ν-1-supergale. 4. A ν-martingale is a ν-1-gale. 5. An s-supergale is a µ-s-supergale, where µ is the uniform probability measure. 6. An s-gale is a µ-s-gale. 7. A supermartingale is a 1-supergale. 8. A martingale is a 1-gale. Put in another way, a martingale is a function d : {0, 1}∗ → [0, ∞) such that, for all w ∈ {0, 1}∗, d(w) = (d(w0) + d(w1))/2. Let d be a ν-s-supergale, where ν is a probability measure on C and s ∈ [0, ∞). We say that d succeeds on a sequence S ∈ C if lim sup d(S[0..n − 1]) = ∞. n→∞

The success set of d is S ∞ [d] = {S ∈ C d succeeds on S}. d succeeds on a language A ⊆ {0, 1}∗ if d succeeds on the characteristic sequence χA of A. We say that d succeeds strongly on a sequence S ∈ C if lim inf d(S[0..n − 1]) = ∞. n→∞ ∞ The strong success set of d is Sstr [d] = {S ∈ C d succeeds strongly on S}. Intuitively, a supergale d is a betting strategy that bets on the next bit of a sequence when the previous bits are known. s is the parameter that tunes the fairness of the betting. The smaller s is, the less fair the betting is. If d succeeds on a sequence, then the bonus we can get from applying d as the betting strategy on the sequence is unbounded. If d succeeds strongly on a sequence, then the bonus goes to infinity. Version: 10 Owner: xiaoyanggu Author(s): xiaoyanggu

1533

Chapter 394 28A05 – Classes of sets (Borel fields, σ-rings, etc.), measurable sets, Suslin sets, analytic sets 394.1

Borel σ-algebra

For any topological space X, the Borel sigma algebra of X is the σ–algebra B generated by the open sets of X. An element of B is called a Borel subset of X, or a Borel set. Version: 5 Owner: djao Author(s): djao, rmilson

1534

Chapter 395 28A10 – Real- or complex-valued set functions 395.1

σ-finite

A measure space (Ω, B, µ) is σ-finite if the total space is the union of a finite or countable family of sets of finite measure; i.e. if there exists a finite or countable set F ⊂ B such that S µ(A) < ∞ for each A ∈ F, and Ω = A∈F A. In this case we also say that µ is a σ-finite measure. If µ is not σ-finite, we say that it is σ-infinite. Examples. Any finite measure space is σ-finite. A more interesting example is the Lebesgue measure µ in Rn : it is σ-finite but not finite. In fact [ R= [−k, k]n k∈N

([−k, k]n is a cube with center at 0 and side length 2k, and its measure is (2k)n ), but µ(Rn ) = ∞. Version: 6 Owner: Koro Author(s): Koro, drummond

395.2

Argand diagram

An argand diagram is the graphical representation of complex numbers written in polar coordinates. Argand is the name of Jean-Robert Argand, the frenchman who is is credited with the geometric interpretation of the complex numbers [Biography] Version: 3 Owner: drini Author(s): drini 1535

395.3

Hahn-Kolmogorov theorem

Let A0 be an algebra of subsets of a set X. If a finitely additive measure µ0 : A → R satisfies ∞ ∞ [ X µ0 ( An ) = µ0 (An ) n=1

S

{∞}

n=1

S for any disjoint family {An : n ∈ N} of elements of A0 such that ∞ n=0 An ∈ A0 , then µ0 extends uniquely to a measure S defined on the σ-algebra A generated by A0 ; i.e. there exists a unique measure µ : A → R {∞} such that its restriction to A0 coincides with µ0 Version: 3 Owner: Koro Author(s): Koro

395.4

measure

Let S (E, B(E)) be a measurable space. A measure on (E, B(E)) is a function µ : B(E) −→ R {∞} with values in the extended real numbers such that:

I

1. µ(A) > 0 for A ∈ B(E), with equality if A = ∅ S P∞ 2. µ( ∞ i=0 Ai ) = i=0 µ(Ai ) for any sequence of disjoint sets Ai ∈ B(E).

The second property is called countable additivity. A finitely additive measure µ has the same definition except that B(E) is only required to be an algebra and the second property above is only required to hold for finite unions. Note the slight abuse of terminology: a finitely additive measure is not necessarily a measure. The triple (E, B, µ) is called a measure space. If µ(E) = 1, then it is called a probability space, and the measure µ is called a probability measure. Lebesgue measure on Rn is one important example of a measure. Version: 8 Owner: djao Author(s): djao

395.5

outer measure

Definition [1, 2, 1] Let X be a set, and let P(X) be the power set of X. An outer measure on X is a function µ∗ : P(X) → [0, ∞] satisfying the properties 1. µ∗ (∅) = 0. 1536

2. If A ⊂ B are subsets in X, then µ∗ (A) ≤ µ∗ (B). 3. If {Ai } is a countable collection of subsets of X, then [ X µ∗ ( Ai ) ≤ µ∗ (Ai ). i

i

Here, we can make two remarks. First, from (1) and (2), it follows that µ∗ is a positive function on P(X). Second, property (3) also holds for any finite collection of subsets since we can always append an infinite sequence of empty sets to such a collection. Examples • [1, 2] On a set X, let us define µ∗ : P(X) → [0, ∞] as  1 when E 6= ∅, ∗ µ (E) = 0 when E = ∅. Then µ∗ is an outer measure. • [1] On a uncountable set X, let us define µ∗ : P(X) → [0, ∞] as  1 when E is uncountable, ∗ µ (E) = 0 when E is countable. Then µ∗ is an outer measure. Theorem [1, 2, 1] Let X be a set, and let F be a collection of subsets of X such that ∅ ∈ F and X ∈ F. Further, let ρ : F → [0, ∞] be a mapping such that ρ(∅) = 0. If A ⊂ X, let ∗

µ (A) = inf

∞ X

ρ(Fi ),

i=1

where the infimum is taken over all collections {Fi }∞ i=1 ⊂ F such that A ⊂ ∗ µ : P(X) → [0, ∞] is an outer measure.

S∞

i=1

Fi . Then

REFERENCES 1. A. Mukherjea, K. Pothoven, Real and Functional analysis, Plenum press, 1978. 2. A. Friedman, Foundations of Modern Analysis, Dover publications, 1982. 3. G.B. Folland, Real Analysis: Modern Techniques and Their Applications, 2nd ed, John Wiley & Sons, Inc., 1999.

Version: 1 Owner: mathcam Author(s): matte 1537

395.6

properties for measure

Theorem [1, 1, 3, 2] Let (E, B, µ) be a measure space, i.e., let E be a set, let B be a σ-algebra of sets in E, and let µ be a measure on B. Then the following properties hold: 1. Monotonicity: If A, B ∈ B, and A ⊂ B, then µ(A) ≤ µ(B). 2. If A, B in B, A ⊂ B, and µ(A) < ∞, then µ(B \ A) = µ(B) − µ(A). 3. For any A, B in B, we have µ(A

[

B) + µ(A

\

B) = µ(A) + µ(B).

4. subadditivity: If {Ai }∞ i=1 is a collection of sets from B, then µ

∞ [

i=1



Ai ≤

∞ X

µ(Ai ).

i=1

5. Continuity from below: If {Ai }∞ i=1 is a collection of sets from B such that Ai ⊂ Ai+1 for all i, then ∞ [  Ai = lim µ(Ai ). µ i→∞

i=1

6. Continuity from above: If {Ai }∞ i=1 is a collection of sets from B such that µ(A1 ) < ∞, and Ai ⊃ Ai+1 for all i, then µ

∞ \

i=1

 Ai = lim µ(Ai ). i→∞

Remarks In (2), the assumption µ(A) < ∞ assures that the right hand side is always well defined, i.e., not of the form ∞ − ∞. Without the assumption we can prove T that µ(B) = µ(A) + µ(B \ A) (see below). In (3), it is tempting to move the term µ(A B) to the other side for aesthetic reasons. However, this is only possible if the term is finite. S Proof. For (1), suppose A ⊂ B. We can then write B as the disjoint union B = A (B \ A), whence [ µ(B) = µ(A (B \ A)) = µ(A) + µ(B \ A).

Since µ(B \ A) ≥ 0, the claim follows. Property (2) follows from the above equation; since µ(A) we can subtract this quantity from both sides. For property (3), we can write S < ∞, S A B = A (B \ A), whence [ µ(A B) = µ(A) + µ(B \ A) ≤ µ(A) + µ(B). 1538

S If µ(A B) is infinite, the last inequality must be equality, and either of µ(A) or µ(B) T must be infinite. Together with (1), we obtain that if any of the quantities µ(A), µ(B), µ(A B) S or µ(A B) is infinite, then all quantities are infinite, whence the claim clearly holds. S We can therefore without loss of generality assume that all quantities are finite. From A B= S B (A \ B), we have [ µ(A B) = µ(B) + µ(A \ B) and thus

2µ(A

[

B) = µ(A) + µ(B) + µ(A \ B) + µ(B \ A).

For the last two terms we have

[ µ(A \ B) + µ(B \ A) = µ((A \ B) (B \ A)) [ \ = µ((A B) \ (A B)) [ \ = µ(A B) − µ(A B),

where, in the second equality we have used properties for the symmetric set difference, and the last equality follows from property (2). This completes the proof of property (3). For property (4), let us define the sequence {Di }∞ i=1 as D1 = A1 ,

Di = Ai \

i−1 [

Ak .

k=1

T S S∞ Now Di Dj = ∅ for i < j, so {Di } is a sequence of disjoint sets. Since ∞ i=1 Di = i=1 Ai , and since Di ⊂ Ai , we have ∞ ∞ [ [ µ( Ai ) = µ( Di ) i=1

i=1

= ≤ and property (4) follows.

∞ X

i=1 ∞ X

µ(Di ) µ(Ai ),

i=1

TODO: proofs for (5)-(6).

REFERENCES 1. G.B. Folland, Real Analysis: Modern Techniques and Their Applications, 2nd ed, John Wiley & Sons, Inc., 1999. 2. A. Mukherjea, K. Pothoven, Real and Functional analysis, Plenum press, 1978. 3. D.L. Cohn, Measure Theory, Birkh¨auser, 1980. 4. A. Friedman, Foundations of Modern Analysis, Dover publications, 1982.

Version: 2 Owner: matte Author(s): matte 1539

Chapter 396 28A12 – Contents, measures, outer measures, capacities 396.1

Hahn decomposition theorem

Let µ be a signed measure in the measurable space (Ω, S). There are two measurable sets A and B such that: 1. A

S

B = Ω and A

T

B = ∅;

2. µ(E) > 0 for each E ∈ S such that E ⊂ A; 3. µ(E) 6 0 for each E ∈ S such that E ⊂ B. The pair (A, B) is called a Hahn decomposition for µ. This decomposition is not unique, but any other such decomposition (A0 , B 0 ) satisfies µ(A0 M A) = µ(B M B 0 ) = 0 (where M denotes the symmetric difference), so the two decompositions differ in a set of measure 0. Version: 6 Owner: Koro Author(s): Koro

396.2

Jordan decomposition

Let (Ω, S, µ) be a signed measure space, and let (A, B) be a Hahn decomposition for µ. We define µ+ and µ− by \ \ µ+ (E) = µ(A E) and µ− (E) = −µ(B E). This definition is easily shown to be independent of the chosen Hahn decomposition. 1540

It is clear that µ+ is a positive measure, and it is called the positive variation of µ. On the other hand, µ− is a positive finite measure, called the negative variation of µ. The measure |µ| = µ+ + µ− is called the total variation of µ. Notice that µ = µ+ − µ− . This decomposition of µ into its positive and negative parts is called the Jordan decomposition of µ. Version: 6 Owner: Koro Author(s): Koro

396.3

Lebesgue decomposition theorem

Let µ and ν be two σ-finite signed measures in the measurable space (Ω, S). There exist two σ-finite signed measures ν0 and ν1 such that: 1. ν = ν0 + ν1 ; 2. ν0  µ (i.e. ν0 is absolutely continuous with respect to µ;) 3. ν1 ⊥ µ (i.e. ν1 and µ are singular.) These two measures are uniquely determined. Version: 5 Owner: Koro Author(s): Koro

396.4

Lebesgue outer measure

Let S be some arbitrary subset of R. Let L(I) be the traditional definition of the length of an interval I ⊆ R. If I = (a, b), then L(I) = b − a. Let M be the set containing X

L(A)

A∈C

for any countable collection of open intervals C that covers S (that is, S ⊆ Lebesgue outer measure of S is defined by:

m∗ (S) = inf(M) Note that (R, P(R), m∗ ) is “almost” a measure space. In particular: 1541

S

C). Then the

• Lebesgue outer measure is defined for any subset of R (and P(R) is a σ-algebra). • m∗ A > 0 for any A ⊆ R, and m∗ ∅ = 0.

S ∗ • If A and B are disjoint sets, then m∗ (A B) 6 mS A + m∗P B. More generally, if hAi i ∗ is a countable sequence of disjoint sets, then m ( Ai ) 6 m∗ Ai . This property is known as countable subadditivity and is weaker than countable additivity. In fact, m∗ is not countably additive.

Lebesgue outer measure has other nice properties: • The outer measure of an interval is its length: m∗ (a, b) = b − a. • m∗ is translation invariant. That is, if we define A + y to be the set {x + y : x ∈ A}, we have m∗ A = m∗ (A + y) for any y ∈ R. Version: 4 Owner: vampyr Author(s): vampyr

396.5

absolutely continuous

Given two signed measures µ and ν on the same measurable space (Ω, S), we say that ν is absolutely continuous with respect to µ if, for each A ∈ S such that |µ|(A) = 0, it holds ν(A) = 0. This is usually denoted by ν  µ. Remarks. If (ν + , ν − ) is the Jordan decomposition of ν, the following propositions are equivalent: 1. ν  µ; 2. ν +  µ and ν −  µ; 3. |ν|  kµ|. If ν is a finite signed measure and ν  µ, the following useful property holds: for each ε > 0, there is a δ > 0 such that |ν|(E) < ε whenever |µ|(E) < δ. Version: 5 Owner: Koro Author(s): Koro

1542

396.6

counting measure

Let (X, B) be a measurable space. We call a measure µ counting measure on X if  n if A has exactly n elements µ(A ∈ B) = ∞ otherwise. Generally, counting measure is applied on N or Z. Version: 2 Owner: mathwizard Author(s): mathwizard, drummond

396.7

measurable set

Let (X, F, µ) be a measure space with a sigma algebra F. A measurable set with respect to µ in X is an element of F. These are also sometimes called µ-measurable sets. Any subset Y ⊂ X with Y ∈ / F is said to be nonmeasurable with respect to µ, or non-µ-measurable. Version: 2 Owner: mathcam Author(s): mathcam, drummond

396.8

outer regular

Let X be a locally compact Hausdorff topological space with Borel σ–algebra B, and suppose µ is a measure on (X, B). For any Borel set B ∈ B, the measure µ is said to be outer regular on B if µ(B) = inf {µ(U) | U ⊃ B, U open}. We say µ is inner regular on B if µ(B) = sup {µ(K) | K ⊂ B, K compact}. Version: 1 Owner: djao Author(s): djao

396.9

signed measure

A signed measure on a measurable space (Ω, S) is a function µ : S → R σ-additive and such that µ(∅) = 0. Remarks. 1543

S

{+∞} which is

1. The usual (positive) measure is a particular case of signed measure, in which |µ| = µ (see Jordan decomposition.) 2. Notice that the value −∞ is not allowed. 3. An important example of signed measures arises from the usual measures in the following way: Let (Ω, S, µ) be a measure space, and let f be a (real valued) measurable function such that int{x∈Ω:f (x)<0} |f |dµ < ∞. Then a signed measure is defined by A 7→ intA f dµ. Version: 4 Owner: Koro Author(s): Koro

396.10

singular measure

Two measures µ and ν in a measurableSspace (Ω, A) are called singular if there exist two disjoint sets A and B in A such that A B = Ω and µ(B) = ν(A) = 0. This is denoted by µ ⊥ ν. Version: 4 Owner: Koro Author(s): Koro

1544

Chapter 397 28A15 – Abstract differentiation theory, differentiation of set functions 397.1

Hardy-Littlewood maximal theorem

There is a constant K > 0 such that for each Lebesgue integrable function f ∈ L1 (Rn ), and each t > 0, K K m({x : Mf (x) > t}) 6 kf k1 = intRn |f (x)|dx, t t where Mf is the Hardy-Littlewood maximal function of f . Remark. The theorem holds for the constant K = 3n . Version: 1 Owner: Koro Author(s): Koro

397.2

Lebesgue differentiation theorem

Let f be a locally integrable function on Rn with Lebesgue measure m, i.e. f ∈ L1loc (Rn ). Lebesgue’s differentiation theorem basically says that for almost every x, the averages 1 intQ |f (y) − f (x)|dy m(Q) converge to 0 when Q is a cube containing x and m(Q) → 0. Formally, this means that there is a set N ⊂ Rn with µ(N) = 0, such that for every x ∈ /N and ε > 0, there exists δ > 0 such that, for each cube Q with x ∈ Q and m(Q) < δ, we have 1 intQ |f (y) − f (x)|dy < ε. m(Q) 1545

For n = 1, this can be restated as an analogue of the fundamental theorem of calculus for Lebesgue integrals. Given a x0 ∈ R, d intx f (t)dt = f (x) dx x0

for almost every x. Version: 6 Owner: Koro Author(s): Koro

397.3

Radon-Nikodym theorem

Let µ and ν be two σ-finite measures on the same measurable space (Ω, S), such that ν  µ (i.e. ν is absolutely continuous with respect to µ.) Then there exists a measurable function f , which is nonnegative and finite, such that for each A ∈ S, ν(A) = intA f dµ. This function is unique (any other function satisfying these conditions is equal to f µ-almost everywhere,) and it is called the Radon-Nikodym derivative of ν with respect to µ, dν . denoted by f = dµ Remark. The theorem also holds if ν is a signed measure. Even if ν is not σ-finite the theorem holds, with the exception that f is not necessarely finite. Some properties of the Radon-Nikodym derivative Let ν, µ, and λ be σ-finite measures in (Ω, S). 1. If ν  λ and µ  λ, then

d(ν + µ) dν dµ = + µ-almost everywhere; dλ dλ dλ

2. If ν  µ  λ, then

dν dν dν = µ-almost everywhere; dλ dµ dλ

3. If µ  λ and g is a µ-integrable function, then intΩ gdµ = intΩ g 4. If µ  ν and ν  µ, then

dµ = dν



Version: 5 Owner: Koro Author(s): Koro 1546

dν dµ

dµ dλ; dλ

−1

.

397.4

integral depending on a parameter

Suppose (E, B, µ) is a measure space, suppose I is an open interval in R, and suppose we are given a function f : E × I → R, (x, t) 7→ f (x, t),

where R is the extended real numbers. Further, suppose that for each t ∈ I, the mapping x 7→ f (x, t) is in L1 (E). (Here, L1 (E) is the set of measurable functions f : E → R with finite Lebesgue integral; intE |f (x)|dµ < ∞.) Then we can define a function F : I → R by F (t) = intE f (x, t)dµ. Continuity of F Let t0 ∈ I. In addition to the above, suppose: 1. For almost all x ∈ E, the mapping t 7→ f (x, t) is continuous at t = t0 . 2. There is a function g ∈ L1 (E) such that for almost all x ∈ E, |f (x, t)| ≤ g(x) for all t ∈ I. Then F is continuous at t0 . Differentiation under the integral sign Suppose that the assumptions given in the introduction hold, and suppose: 1. For almost all x ∈ E, the mapping t 7→ f (x, t) is differentiable for all t ∈ I. 2. There is a function g ∈ L1 (E) such that for almost all x ∈ E, |

d f (x, t)| ≤ g(x) dt

for all t ∈ I. Then F is differentiable on I,

d f (x, t)dµ dt

is in L1 (E), and for all t ∈ I,

d d F (t) = intE f (x, t)dµ. dt dt The above results can be found in [1, 1]. 1547

(397.4.1)

REFERENCES 1. F. Jones, Lebesgue Integration on Euclidean Spaces, Jones and Barlett Publishers, 1993. 2. C.D. Aliprantis, O. Burkinshaw, Principles of Real Analysis, 2nd ed., Academic Press, 1990.

Version: 1 Owner: matte Author(s): matte

1548

Chapter 398 28A20 – Measurable and nonmeasurable functions, sequences of measurable functions, modes of convergence 398.1

Egorov’s theorem

Let (X, S, µ) be a measure space, and let E be a subset of X of finite measure. If fn is a sequence of measurable functions converging to f almost everywhere, then for each δ > 0 there exists a set Eδ such that µ(Eδ ) < δ and fn → f uniformly on E − Eδ . Version: 2 Owner: Koro Author(s): Koro

398.2

Fatou’s lemma

If f1 , f2 , . . . is a sequence of nonnegative measurable functions in a measure space X, then intX lim inf fn 6 lim inf intX fn n→∞

n→∞

Version: 3 Owner: Koro Author(s): Koro

1549

398.3

Fatou-Lebesgue theorem

Let X be a measure space. If Φ is a measurable function with intX Φ < ∞, and if f1 , f2 , . . . is a sequence of measurable functions such that |fn | 6 Φ for each n, then g = lim inf fn and h = lim sup fn n→∞

n→∞

are both integrable, and −∞ < intX g 6 lim inf intX fn 6 lim sup intX fn 6 intX h < ∞. n→∞

k→∞

Version: 3 Owner: Koro Author(s): Koro

398.4

dominated convergence theorem

Let X be a measure space, and let Φ, f1 , f2 , . . . be measurable functions such that intX Φ < ∞ and |fn | 6 Φ for each n. If fn → f almost everywhere, then f is integrable and lim intX fn = intX f.

n→∞

This theorem is a corollary of the Fatou-Lebesgue theorem. A possible generalization is that if {fr : r ∈ R} is a family of measurable functions such that |fr | 6 |Φ| for each r ∈ R and fr −−→ f , then f is integrable and r→0

lim intX fr = intX f.

r→0

Version: 8 Owner: Koro Author(s): Koro

398.5

measurable function

¯ be a function defined on a measure space X. We say that f is measurable Let f : X → R if {x ∈ X | f (x) > a} is a measurable set for all a ∈ R. Version: 5 Owner: vypertd Author(s): vypertd

1550

398.6

monotone convergence theorem

Let X be a measure space, and let 0 6 f1 6 f2 6 · · · be a monotone increasing sequence of nonnegative measurable functions. Let f be the function defined almost everywhere by f (x) = limn→∞ fn (x). Then f is measurable, and lim intX fn = intX f.

n→∞

Remark. This theorem is the first of several theorems which allow us to “exchange integration and limits”. It requires the use of the Lebesgue integral: with the Riemann integral, we cannot even formulate the theorem, lacking, as we do, the concept of “almost everywhere”. For instance, the characteristic function of the rational numbers in [0, 1] is not Riemann integrable, despite being the limit of an increasing sequence of Riemann integrable functions. Version: 5 Owner: Koro Author(s): Koro, ariels

398.7

proof of Egorov’s theorem

Let Ei,j = {x ∈ E : |fj (x) − f (x)| < 1/i}. Since fn → f almost everywhere, there is a set S with µ(S) = 0 such that, given i ∈ N and x ∈ E − S, there is m ∈ N such that j > m implies |fj (x) − f (x)| < 1/i. This can be expressed by [ \ E−S ⊂ Ei,j , m∈N j>m

or, in other words,

\ [

(E − Ei,j ) ⊂ S.

m∈N j>m

S Since { j>m (E − Ei,j )}m∈N is a decreasing nested sequence of sets, each of which has finite measure, and such that its intersection has measure 0, by continuity from above we know that [ µ( (E − Ei,j )) −−−→ 0. m→∞

j>m

Therefore, for each i ∈ N, we can choose mi such that µ(

[

j>mi

Let Eδ =

(E − Ei,j )) <

[ [

i∈N j>mi

δ . 2i

(E − Ei,j ).

1551

Then µ(Eδ ) 6

∞ X i=1

∞ X δ µ( (E − Ei,j )) < = δ. 2i j>m i=1

[

i

We claim that fn → f uniformly on E −Eδ . In fact, given ε > 0, choose n such that 1/n < ε. If x ∈ E − Eδ , we have \ \ x∈ Ei,j , i∈N j>mi

which in particular implies that, if j > mn , x ∈ En,j ; that is, |fj (x) − f (x)| < 1/n < ε. Hence, for each xε > 0 there is N (which is given by mn above) such that j > N implies |fj (x) − f (x)| < ε for each x ∈ E − Eδ , as required. This completes the proof. Version: 3 Owner: Koro Author(s): Koro

398.8

proof of Fatou’s lemma

Let f (x) = lim inf n→∞ fn (x) and let gn (x) = inf k≥n fk (x) so that we have f (x) = sup gn (x). n

As gn is an increasing sequence of measurable nonnegative functions we can apply the monotone convergence theorem to obtain intX f dµ = lim intX gn dµ. n→∞

On the other hand, being gn ≤ fn , we conclude by observing lim intX gn dµ = lim inf intX gn dµ ≤ lim inf intX fn dµ.

n→∞

n→∞

n→∞

Version: 1 Owner: paolini Author(s): paolini

398.9

proof of Fatou-Lebesgue theorem

By Fatou’s lemma we have intX g ≤ lim inf intX fn n→∞

and (recall that lim sup f = − lim inf −f ) lim sup intX fn ≤ intX h. n→∞

1552

On the other hand by the properties of lim inf and lim sup we have g ≥ −Φ,

f ≤Φ

and hence intX g ≥ intX − Φ > −∞,

intX h ≤ intX Φ < +∞.

Version: 1 Owner: paolini Author(s): paolini

398.10

proof of dominated convergence theorem

It is not difficult to prove that f is measurable. In fact we can write f (x) = sup inf k≥n fk (x) n

and we know that measurable functions are closed under the sup and inf operation. Consider the sequence gn (x) = 2Φ(x) − |f (x) − fn (x)|. clearly gn are nonnegative functions since f − fn ≤ 2Φ. So, applying Fatou’s lemma, we obtain lim intX |f − fn | dµ ≤ lim sup intX |f − fn | dµ

n→∞

n→∞

= − lim inf intX − |f − fn | dµ n→∞

= intX 2Φ dµ − lim inf intX 2Φ − |f − fn | dµ n→∞

≤ intX 2Φ dµ − intX 2Φ − lim sup |f − fn | dµ n→∞

= intX 2Φ dµ − intX 2Φ dµ = 0. Version: 1 Owner: paolini Author(s): paolini

398.11

proof of monotone convergence theorem

It is enough to prove the following S Theorem 7. Let (X, µ) be a measurable space and let fk : X → R {+∞} be a monotone increasing sequence of positive measurable functions (i.e. 0 ≤ f1 ≤ f2 ≤ . . .). Then f (x) = limk→∞ fk (x) is measurable and lim intX fk dµ = intX f (x). n→∞

1553

First of all by the monotonicity of the sequence we have f (x) = sup fk (x) k

hence we know that f is measurable. Moreover being fk ≤ f for all k, by the monotonicity of the integral, we immediately get sup intX fk dµ ≤ intX f (x) dµ. k

So take any simple measurable function s such that 0 ≤ s ≤ f . Given also α < 1 define Ek = {x ∈ X : fk (x) ≥ αs(x)}. The sequence Ek is an increasing sequence of measurable sets. Moreover the union of all Ek is the whole space X since limk→∞ fk (x) = f (x) ≥ s(x) > αs(x). Moreover it holds intX fk dµ ≥ intEk fk dµ ≥ αintEk s dµ. Being s a simple measurable function it is easy to check that E 7→ intE s dµ is a measure and hence sup intX fk dµ ≥ αintX s dµ. k

But this last inequality holds for every α < 1 and for all simple measurable functions s with s ≤ f . Hence by the definition of Lebesgue integral sup intk fk dµ ≥ intX f dµ k

which completes the proof. Version: 1 Owner: paolini Author(s): paolini

1554

Chapter 399 28A25 – Integration with respect to measures and other set functions 399.1

L∞(X, dµ)

The L∞ space, L∞ (X, dµ), is a vector space consisting of equivalence classes of functions f : X → C with norm given by kf k∞ = ess sup |f (t)| , the essential supremum of |f |. Additionally, we require that kf k∞ < ∞. The equivalence classes of L∞ (X, dµ) are given by saying that f, g : X → C are equivalent iff f and g differ on a set of µ measure zero. Version: 3 Owner: ack Author(s): bbukh, ack, apmxi

399.2

Hardy-Littlewood maximal operator

The Hardy-Littlewood maximal operator in Rn is an operator defined on L1loc (Rn ) (the space of locally integrable functions in Rn with the Lebesgue measure) which maps each locally integrable function f to another function Mf , defined for each x ∈ Rn by Mf (x) = sup Q

1 intQ |f (y)|dy, m(Q)

where the supremum is taken over all cubes Q containing x. This function is lower semicontinuous (and hence measurable), and it is called the Hardy-Littlewood maximal function of f .

1555

The operator M is sublinear, which means that M(af + bg) 6 |a|Mf + |b|Mg for each pair of locally integrable functions f, g and scalars a, b. Version: 3 Owner: Koro Author(s): Koro

399.3

Lebesgue integral

S The integral of a measurable function f : X → R {±∞} on a measure space (X, B, µ) is written intX f dµ

or just

intf.

(399.3.1)

It is defined via the following steps: • If f = f rm[o]−−A is the characteristic function of a set A ∈ B, then set intX f rm[o]−−A dµ := µ(A).

(399.3.2)

• If f is a simple function (i.e. if f can be written as f=

n X

ck f rm[o]−−Ak ,

k=1

for some finite collection Ak ∈ B), then define intX f dµ :=

n X

ck intX f rm[o]−−Ak dµ =

k=1

ck ∈ R

n X

ck µ(Ak ).

(399.3.3)

(399.3.4)

k=1

• If f is a nonnegative measurable function (possibly attaining the value ∞ at some points), then we define intX f dµ := sup {intX h dµ : h is simple and h(x) ≤ f (x) for all x ∈ X} . (399.3.5) • For any measurable function f (possibly attaining the values ∞ or −∞ at some points), write f = f + − f − where f + := max(f, 0) and

f − := max(−f, 0),

(399.3.6)

and define the integral of f as intX f dµ := intX f + dµ − intX f − dµ, provided that intX f + dµ and intX f − dµ are not both ∞. 1556

(399.3.7)

If µ is Lebesgue measure and X is any interval in Rn then the integral is called the Lebesgue integral. If the Lebesgue integral of a function f on a set A exists, f is said to be Lebesgue integrable. The Lebesgue integral equals the Riemann integral everywhere the latter is defined; the advantage to the Lebesgue integral is that many Lebesgue-integrable functions are not Riemann-integrable. For example, the Riemann integral of the characteristic function of the rationals in [0, 1] is undefined, while the Lebesgue integral of this function is simply the measure of the rationals in [0, 1], which is 0. Version: 12 Owner: djao Author(s): djao, drummond

1557

Chapter 400 28A60 – Measures on Boolean rings, measure algebras 400.1

σ-algebra

Let X be a set. A σ-algebra is a collection M of subsets of X such that • X∈M • If A ∈ M then X − A ∈ M. • If A1 , A2 , A3 , . . . is a countable subcollection of M, that is, Aj ∈ M for j = 1, 2, 3, . . . (the subcollection can be finite) then the union of all of them is also in M: ∞ [

j=1

Ai ∈ M.

Version: 3 Owner: drini Author(s): drini, apmxi

400.2

σ-algebra

Given a set E, a sigma algebra (or σ–algebra) in E is a collection B(E) of subsets of E such that: • ∅ ∈ B(E) • Any countable union of elements of B(E) is in B(E) 1558

• The complement of any element of B(E) in E is in B(E) Given any collection C of subsets of B(E), the σ–algebra generated by C is defined to be the smallest σ–algebra in E containing C. Version: 5 Owner: djao Author(s): djao

400.3

algebra

Given a set E, an algebra in E is a collection B(E) of subsets of E such that: • ∅ ∈ B(E) • Any finite union of elements of B(E) is in B(E) • The complement of any element of B(E) in E is in B(E) Given any collection C of subsets of B(E), the algebra generated by C is defined to be the smallest algebra in E containing C. Version: 2 Owner: djao Author(s): djao

400.4

measurable set (for outer measure)

Definition [1, 2, 1] Let µ∗ be an outer measure on a set X. A set E ⊂ X is said to be measurable, or µ∗ -measurable, if for all A ⊂ X, we have \ \ µ∗ (A) = µ∗ (A E) + µ∗ (A E {). (400.4.1) Remark If A, E ⊂ X, we have, from the properties of the outer measure, \ [  µ∗ (A) = µ∗ A (E E {) \ [ \  = µ∗ (A E) (A E {) \ \ ≤ µ∗ (A E) + µ∗ (A E {)). Hence equation (399.4.1) is equivalent to the inequality [1, 2, 1] \ \ µ∗ (A) ≥ µ(A E) + µ(A E {). 1559

Of course, this inequality is trivially satisfied if µ∗ (A) = ∞. Thus a set E ⊂ X is µmeasurable in X if and only if the above inequality holds for all A ⊂ X for which µ∗ (A) < ∞ [1]. Theorem [Carath´ eodory’s theorem] [1, 2, 1] Suppose µ∗ is an outer measure on a set X, and suppose M is the set of all µ∗ -measurable sets in X. Then M is a σ-algebra, and µ∗ restricted to M is a measure (on M). Example Let µ∗ be an outer measure on a set X. ∗ ∗ 1. Any null set (a set E with Indeed, suppose µT (E) = 0, and T µ (E) = 0) is measurable. T ∗ A ⊂ X. Then, sinceTA E ⊂ E, we have µ (A E) = 0, and since A E { ⊂ A, we have µ∗ (A) ≥ µ∗ (A E {), so \ µ∗ (A) ≥ µ∗ (A E) \ \ = µ∗ (A E) + µ∗ (A E {).

Thus E is measurable.

S∞ 2. If {Bi }∞ i=1 is a countable collection of null sets, then i=1 Bi is a null set. This follows directly from the last property of the outer measure.

REFERENCES 1. A. Mukherjea, K. Pothoven, Real and Functional analysis, Plenum press, 1978. 2. A. Friedman, Foundations of Modern Analysis, Dover publications, 1982. 3. G.B. Folland, Real Analysis: Modern Techniques and Their Applications, 2nd ed, John Wiley & Sons, Inc., 1999.

Version: 1 Owner: matte Author(s): matte

1560

Chapter 401 28A75 – Length, area, volume, other geometric measure theory 401.1

Lebesgue density theorem

Let µ be the Lebesgue measure on R. If µ(Y ) > 0, then there exists X ⊂ Y such that µ(Y − X) = 0 and for all x ∈ X T µ(X [x − , x + ]) = 1. lim →+0 2 Version: 2 Owner: bbukh Author(s): bbukh

1561

Chapter 402 28A80 – Fractals 402.1

Cantor set

The Cantor set C is the canonical example of an uncountable set of measure zero. We construct C as follows. Begin with the unit interval C0 = [0, 1], and remove the open segment R1 := ( 31 , 23 ) from the middle. We define C1 as the two remaining pieces     1 [ 2 C1 := C0 R1 = 0, ,0 (402.1.1) 3 3

Now repeat the process on each remaining segment, removing the open set     1 2 [ 7 8 R2 := , , 9 9 9 9

(402.1.2)

to form the four-piece set

       1 [ 2 1 [ 2 7 [ 8 C2 := C1 R2 = 0, , , ,1 9 9 3 3 9 16 

(402.1.3)

Continue the process, forming C3 , C4 , . . . Note that Ck has 2k pieces.

Figure 402.1: The sets C0 through C5 in the construction of the Cantor set

Also note that at each step, the endpoints of each closed segment will stay in the set forever— e.g., the point 32 isn’t touched as we remove sets. 1562

The Cantor set is defined as C :=

∞ \

k=1

Ck = C0 \

∞ [

Rn

(402.1.4)

n=1

Cardinality of the Cantor set To establish cardinality, we want a bijection between some set whose cardinality we know (e.g. Z, R) and the points in the Cantor set. We’ll be aggressive and try the reals. Start at C1 , which has two pieces. Mark the left-hand segment “0” and the right-hand segment “1”. Then continue to C2 , and consider only the leftmost pair. Again, mark the segments “0” and “1”, and do the same for the rightmost pair. Keep doing this all the way down the Ck , starting at the left side and marking the segments 0, 1, 0, 1, 0, 1 as you encounter them, until you’ve labeled the entire Cantor set. Now, pick a path through the tree starting at C0 and going left-left-right-left. . . and so on. Mark a decimal point for C0 , and record the zeros and ones as you proceed. Each path has a unique number based on your decision at each step. For example, the figure represents your choice of left-left-right-left-right at the first five steps, representing the number beginning 0.00101... Every point in the Cantor set will have a unique address dependent solely on the pattern Figure 402.2: One possible path through C5 : 0.00101

of lefts and rights, 0’s and 1’s, required to reach it. Each point thus has a unique number, the real number whose binary expansion is that sequence of zeros and ones. Every infinite stream of binary digits can be found among these paths, and in fact the binary expansion of every real number is a path to a unique point in the Cantor set. Some caution is justified, as two binary expansions may refer to the same real number; for example, 0.011111 . . . = 0.100000 . . . = 21 . However, each one of these duplicates must correspond to a rational number. To see this, suppose we have a number x in [0, 1] whose binary expansion becomes all zeros or all ones at digit k (both are the same number, remember). Then we can multiply that number by 2k and get 1, so it must be a (binary) rational number. There are only countably many rationals, and not even all of those are the double-covered numbers we’re worried about (see, e.g., 13 = 0.0101010 . . .), so we have at most countably many duplicated reals. Thus, the cardinality of the 0.Cantor set is equal to that of the reals. (If we want to be really picky, map (0, 1) to the reals with, say, f (x) = 1/x + 1/(x − 1), and the end points really don’t matter much.) Return, for a moment, to the earlier observation that numbers such as 31 , 29 , the endpoints of deleted intervals, are themselves never deleted. In particluar, consider the first deleted 1563

interval: the ternary expansions of its constituent numbers are precisely those that begin 0.1, and proceed thence with at least one non-zero “ternary” digit (just digit for us) further along. Note also that the point 13 , with ternary expansion 0.1, may also be written 0.02˙ (or 0.0¯2), which has no digits 1. Similar descriptions apply to further deleted intervals. The result is that the cantor set is precisely those numbers in the set [0, 1] whose ternary expansion contains no digits 1.

Measure of the Cantor set Let µ be Lebesgue measure. The measure of the sets Rk that we remove during the construction of the Cantor set are 1 2 1 − = 3 3 3    2 1 8 7 2 − − µ(R2 ) = + = 9 9 9 9 9 .. . µ(R1 ) =

µ(Rk ) =

k X 2n−1 n=1

3n

(402.1.5) (402.1.6) (402.1.7) (402.1.8)

Note that the R’s are disjoint, which will allow us to sum their measures without worry. In the limit k → ∞, this gives us ! ∞ ∞ [ X 2n−1 µ Rn = = 1. (402.1.9) 3n n=1 n=1 But we have µ(C0 ) = 1 as well, so this means ! ∞ ∞ [ X 1 µ(C) = µ C0 \ Rn = µ(C0) − = 1 − 1 = 0. 2n n=1 n=1

(402.1.10)

Thus we have seen that the measure of C is zero (though see below for more on this topic). How many points are there in C? Lots, as we shall see. So we have a set of measure zero (very tiny) with uncountably many points (very big). This non-intuitive result is what makes Cantor sets so interesting.

Cantor sets with positive measure clearly, Cantor sets can be constructed for all sorts of “removals”—we can remove middle halves, or thirds, or any amount 1r , r > 1 we like. All of these Cantor sets have measure zero, since at each step n we end up with  n 1 Ln = 1 − (402.1.11) r 1564

of what we started with, and limn→∞ Ln = 0 for any r > 1. With apologies, the figure above is drawn for the case r = 2, rather than the r = 3 which seems to be the publically favored example. However, it is possible to construct Cantor sets with positive measure as well; the key is to remove less and less as we proceed. These Cantor sets have the same “shape” (topology) as the Cantor set we first constructed, and the same cardinality, but a different “size.” Again, start with the unit interval for C0 , and choose a number 0 < p < 1. Let   2−p 2+p , R1 := 4 4 which has length (measure) p2 . Again, define C1 := C0 \ R1 . Now define     2 − p 2 + p [ 14 − p 14 + p R2 := , , 16 16 16 16

(402.1.12)

(402.1.13)

which has measure p4 . Continue as before, such that each Rk has measure 2pk ; note again that all the Rk are disjoint. The resulting Cantor set has measure ! ∞ ∞ ∞ X X [ µ C0 \ Rn = 1 − µ(Rn ) = 1 − p 2−n = 1 − p > 0. n=1

n=1

n=1

Thus we have a whole family of Cantor sets of positive measure to accompany their vanishing brethren. Version: 19 Owner: drini Author(s): drini, quincynoodles, drummond

402.2

Hausdorff dimension

Let Θ be a bounded subset of Rn let NΘ () be the minimum number of balls of radius  required to cover Θ. Then define the Hausdorff dimension dH of Θ to be log NΘ () . →0 log 

dH (Θ) := − lim

Hausdorff dimension is easy to calculate for simple objects like the Sierpinski gasket or a Koch curve. Each of these may be covered with a collection of scaled-down copies of itself. In fact, in the case of the Sierpinski gasket, one can take the individual triangles in each approximation as balls in the covering. At stage n, there are 3n triangles of radius 21n , and log 3 log 3 = log . so the Hausdorff dimension of the Sierpinski triangle is − nnlog 1/2 2

1565

From some notes from Koro This definition can be extended to a general metric space X with distance function d. Define the diameter |C| of a bounded subset C of X to be supx,y∈C d(x, y), and define a countable r-cover S of X to be a collection of subsets Ci of X indexed by some countable set I, such that X = i∈I Ci . We also define the handy function HrD (X) = inf

X i∈I

|Ci |D

where the infimum is over all countable r-covers of X. The Hausdorf dimension of X may then be defined as dH (X) = inf{D | lim HrD (X) = 0}. r→0

n

When X is a subset of R with any restricted norm-induced metric, then this definition reduces to that given above. Version: 8 Owner: drini Author(s): drini, quincynoodles

402.3

Koch curve

A Koch curve is a fractal generated by a replacement rule. This rule is, at each step, to replace the middle 1/3 of each line segment with two sides of a right triangle having sides of length equal to the replaced segment. Two applications of this rule on a single line segment gives us:

To generate the Koch curve, the rule is applied indefinitely, with a starting line segment. Note that, if the length of the initial line segment is l, the length LK of the Koch curve at the nth step will be  n 4 l LK = 3 This quantity increases without bound; hence the Koch curve has infinite length. However, the curve still bounds a finite area. We can prove this by noting that in each step, we add an amount of area equal to the area of all the equilateral triangles we have just created. We can bound the area of each triangle of side length s by s2 (the square containing the triangle.) Hence, at step n, the area AK ”under” the Koch curve (assuming l = 1) is

1566

Figure 402.3: Sierpinski gasket stage 0, a single triangle Figure 402.4: Stage 1, three triangles

AK

 2  2  2 1 1 1 +3 +9 +··· < 3 9 27 n X 1 = i−1 3 i=1

but this is a geometric series of ratio less than one, so it converges. Hence a Koch curve has infinite length and bounds a finite area. A Koch snowflake is the figure generated by applying the Koch replacement rule to an equilateral triangle indefinitely. Version: 3 Owner: akrowne Author(s): akrowne

402.4

Sierpinski gasket

Let S0 be a triangular area, and define Sn+1 to be obtained from Sn by replacing each triangular area in Sn with three similar and similarly oriented triangular areas each intersecting with each of the other two at exactly one vertex, each one half the linear scale of the orriginal in size. The limiting set as n → ∞ (alternately the intersection of all these sets) is a Sierpinski gasket, also known as a Sierpinski triangle. Version: 3 Owner: quincynoodles Author(s): quincynoodles

402.5

fractal

Option 1: Some equvialence class of subsets of Rn . A usual equivalence is postulated when some generalised ”distance” is zero. For example, let F, G ⊂ Rn , and let d(x, y) be the usual distance (x, y ∈ R). Define the distance D between F and G as D(F, G) := inf f ∈F sup d(f, g) + inf g∈G sup d(f, g) f ∈F

g∈G

Figure 402.5: Stage 2, nine triangles 1567

Figure 402.6: Stage n, 3n triangles Then in this case we have, as fractals, that Q and R are equivalent. Option 2: A subset of Rn with non-integral Hausdorff dimension. Examples: (we think) the coast of Britain, a Koch snowflake. Option 3: A “self-similar object”. That is, one which can be covered by copies of itself using a set of (usually two or more) transformation mappings. Another way to say this would be “an object with a discrete approximate scaling symmetry.” Example: A square region, a Koch curve, a fern frond. This isn’t much different from Option 1 because of the collage theorem. A cursory description of some relationships between options 2 and 3 is given towards the end of the entry on Hausdorff dimension. The use of option 1 is that it permits one to talk about how ”close” two fractals are to one another. This becomes quite handy when one wants to talk about approximating fractals, especially approximating option 3 type fractals with pictures that can be drawn in finite time. A simple example: one can talk about how close one of the line drawings in the Koch curve entry is to an actual Koch curve. Version: 7 Owner: quincynoodles Author(s): quincynoodles

1568

Chapter 403 28Axx – Classical measure theory 403.1

Vitali’s Theorem

There exists a set V ⊂ [0, 1] which is not Lebesgue measurable Version: 1 Owner: paolini Author(s): paolini

403.2

proof of Vitali’s Theorem

Consider the equivalence relation in [0, 1) given by x∼y



x−y ∈Q

and let F be the family of all equivalence classes of ∼. Let V be a section of F i.e. put in V an element for each equivalence class of ∼ (notice that we are using the axiom of choice). T Given q ∈ Q [0, 1) define \ [ \ Vq = ((V + q) [0, 1)) ((V + q − 1) [0, 1))

that is Vq is obtained translating V by a quantity q to the right and then cutting the piece which goes beyond the point 1 and putting it on the left, starting from 0. Now notice that given x ∈ [0, 1) there T exists y ∈ V such that x ∼ y (because V is a section of ∼) and hence there exists q ∈ Q [0, 1) such that x ∈ Vq . So [ Vq = [0, 1). q∈Q

T

[0,1)

T Moreover all the Vq are disjoint. In fact if x ∈ Vq Vp then x − q (modulus [0, 1)) and x − p are both in V which is not possible since they differ by a rational quantity q −p (or q −p + 1). 1569

Now if V is Lebesgue measurable, clearly also Vq are measurable and µ(Vq ) = µ(V ). Moreover by the countable additivity of µ we have X X µ([0, 1)) = µ(Vq ) = µ(V ). q∈Q

T

[0,1)

q

So if µ(V ) = 0 we had µ([0, 1)) = 0 and if µ(V ) > 0 we had µ([0, 1)) = +∞. So the only possibility is that V is not Lebesgue measurable. Version: 1 Owner: paolini Author(s): paolini

1570

Chapter 404 28B15 – Set functions, measures and integrals with values in ordered spaces 404.1

Lp-space

Definition Let (X, B, µ) be a measure space. The Lp -norm of a function f : X → R is defined as 1 kf kp := (intX |f |p dµ) p (404.1.1)

when the integral exists. The set of functions with finite Lp -norm form a vector space V with the usual pointwise addition and scalar multiplication of functions. In particular, the set of functions with zero Lp -norm form a linear subspace of V , which for this article will be called K. We are then interested in the quotient space V /K, which consists of real functions on X with finite Lp -norm, identified up to equivalence almost everywhere. This quotient space is the real Lp -space on X. Theorem The vector space V /K is complete with respect to the Lp norm.

The space L∞ . The space L∞ is somewhat special, and may be defined without explicit reference to an integral. First, the L∞ -norm of f is defined to be the essential supremum of |f |: kf k∞ := ess sup |f | = inf {a ∈ R : µ({x : |f (x)| > a}) = 0} (404.1.2) The definitions of V , K, and L∞ then proceed as above. Functions in L∞ are also called essentially bounded. Example Let X = [0, 1] and f (x) =

√1 . x

Then f ∈ L1 (X) but f ∈ / L2 (X).

Version: 18 Owner: mathcam Author(s): Manoj, quincynoodles, drummond 1571

404.2

locally integrable function

Definition [4, 1, 2] Suppose that U is an open set in Rn , and f : U → C is a Lebesgue integrable function. If the Lebesgue integral intK |f |dx is finite for all compact subsets K in U, then f is locally integrable. The set of all such functions is denoted by L1loc (U). Example 1. L1 (U) ⊂ L1loc (U), where L1 (U) is the set of integrable functions. Theorem Suppose f and g are locally integrable functions on an open subset U ⊂ Rn , and suppose that intU f φdx = intU gφdx for all smooth functions with compact support φ ∈ C0∞ (U). Then f = g almost everywhere. A proof based on the Lebesgue differentiation theorem is given in [4] pp. 15. Another proof is given in [2] pp. 276.

REFERENCES 1. L. H¨ormander, The Analysis of Linear Partial Differential Operators I, (Distribution theory and Fourier Analysis), 2nd ed, Springer-Verlag, 1990. 2. G.B. Folland, Real Analysis: Modern Techniques and Their Applications, 2nd ed, John Wiley & Sons, Inc., 1999. 3. S. Lang, Analysis II, Addison-Wesley Publishing Company Inc., 1969.

Version: 3 Owner: matte Author(s): matte

1572

Chapter 405 28C05 – Integration theory via linear functionals (Radon measures, Daniell integrals, etc.), representing set functions and measures 405.1

Haar integral

Let Γ be a locally compact topological group and C be the algebra of all continuous realvalued functions on Γ with compact support. In addition we define C+ to be the set of non-negative functions that belong to C. The Haar integral is a real linear map I of C into the field of the real number for Γ if it satisfies: • I is not the zero map • I only takes non-negative values on C+ • I has the following property I(γ · f ) = I(f ) for all elements f of C and all element γ of Γ. The Haar integral may be denoted in the following way (there are also other ways): intγ∈Γ f (γ) or intΓ f or intΓ f dγ or I(f ) In order for the Haar intergral to exists and to be unique, the following conditions are necessary and sufficient: That there exists a real-values function I + on C+ satisfying the following condition: 1573

1. (Linearity).I + (λf + µg) = λI + (f ) + µI + (g) where f, g ∈ C+ and λ, µ ∈ R+ . 2. (Positivity). If f (γ) > 0 for all γ ∈ Γ then I + (f (γ)) > 0. 3. (Translation-Invariance). I(f (δγ)) = I(f (γ)) for any fixed δ ∈ Γ and every f in C+ . An additional property is if Γ is a compact group then the Haar integral has right translationinvariance: intγ∈Γ f (γδ) = intγ∈Γ f (γ) for any fixed δ ∈ Γ. In addition we can define normalized Haar integral to be intΓ 1 = 1 since Γ is compact, it implies that intΓ 1 is finite. (The proof for existence and uniqueness of the Haar integral is presented in [PV] on page 9.)

( the information of this entry is in part quoted and paraphrased from [GSS])

REFERENCES [GSS] Golubsitsky, Matin. Stewart, Ian. Schaeffer, G. David.: Singularities and Groups in Bifurcation Theory (Volume II). Springer-Verlag, New York, 1988. [HG] Gochschild, G.: The Structure of Lie Groups. Holden-Day, San Francisco, 1965.

Version: 4 Owner: Daume Author(s): Daume

1574

Chapter 406 28C10 – Set functions and measures on topological groups, Haar measures, invariant measures 406.1

Haar measure

406.1.1

Definition of Haar measures

Let G be a locally compact topological group. A left Haar measure on G is a measure µ on the Borel sigma algebra B of G which is: 1. outer regular on all Borel sets B ∈ B 2. inner regular on all open sets U ⊂ G 3. finite on all compact sets K ⊂ G 4. invariant under left translation: µ(gB) = µ(B) for all Borel sets B ∈ B A right Haar measure on G is defined similarly, except with left translation invariance replaced by right translation invariance (µ(Bg) = µ(B) for all Borel sets B ∈ B). A bi– invariant Haar measure is a Haar measure that is both left invariant and right invariant.

1575

406.1.2

Existence of Haar measures

For any finite group G, the counting measure on G is a bi–invariant Haar measure. More generally, every locally compact topological group G has a left 1 Haar measure µ, which is unique up to scalar multiples. The Haar measure plays an important role in the development of Fourier analysis and representation theory on locally compact groups such as Lie groups and profinite groups. Version: 1 Owner: djao Author(s): djao

1 G also has a right Haar measure, although the right and left Haar measures on G are not necessarily equal unless G is abelian.

1576

Chapter 407 28C20 – Set functions and measures and integrals in infinite-dimensional spaces (Wiener measure, Gaussian measure, etc.) 407.1

essential supremum

Let (X, B, µ) be a measure space and let f : X → R be a function. The essential supremum of f is the smallest number a ∈ R for which f only exceeds a on a set of measure zero. This allows us to generalize the maximum of a function in a useful way. More formally, we define ess sup f as follows. Let a ∈ R, and define Ma = {x : f (x) > a} ,

(407.1.1)

the subset of X where f (x) is greater than a. Then let A0 = {a ∈ R : µ(Ma ) = 0} ,

(407.1.2)

the set of real numbers for which Ma has measure zero. If A0 = ∅, then the essential supremum is defined to be ∞. Otherwise, the essential supremum of f is ess sup f := infA0 . Version: 1 Owner: drummond Author(s): drummond

1577

(407.1.3)

Chapter 408 28D05 – Measure-preserving transformations 408.1

measure-preserving

Let (X, B, µ) be a measure space, and T : X → X be a (possibly non-invertible) measurable transformation. We call T measure-preserving if for all A ∈ B, µ(T −1(A)) = µ(A), where T −1 (A) is defined to be the set of points x ∈ X such that T (x) ∈ A. A measure-preserving transformation is also called an endomorphism of the measure space. Version: 5 Owner: mathcam Author(s): mathcam, drummond

1578

Chapter 409 30-00 – General reference works (handbooks, dictionaries, bibliographies, etc.) 409.1

domain

A non-empty open set in C is called a domain. The topology considered is the Euclidean one (viewing C as R2 ). So we have that for a domain D being connected is equivalent to being path-connected. Since we have that every component of a domain D will be a region, we have that every domain has at most countably many components. Version: 4 Owner: drini Author(s): drini

409.2

region

A region is a connected domain. Since every domain of C can be seen as the union of countably many components and each component is a region, we have that regions play a major role in complex analysis. Version: 2 Owner: drini Author(s): drini

1579

409.3

regular region

Let E be a n-dimensional Euclidean space with the topology induced by the Euclidean metric. Then a set in E is a regular region, if it can be written as the closure of a non-empty region with a piecewise smooth boundary. Version: 10 Owner: ottocolori Author(s): ottocolori

409.4

topology of the complex plane

The usual topology for the complex plane C is the topology induced by the metric d(x, y) = |x − y| for x, y ∈ C. Here, | · | is the complex modulus. If we identify R2 and C, it is clear that the above topology coincides with topology induced by the Euclidean metric on R2 . Version: 1 Owner: matte Author(s): matte

1580

Chapter 410 30-XX – Functions of a complex variable 410.1

z0 is a pole of f

Let f be an analytic function on a punctured neighborhood of x0 ∈ C, that is, f analytic on {z ∈ C : 0 < |z − x0 | < ε} for some ε > 0 and such that lim f = ∞.

z→z0

We say then that x0 is a pole for f . Version: 2 Owner: drini Author(s): drini, apmxi

1581

Chapter 411 30A99 – Miscellaneous 411.1

Riemann mapping theorem

Let U be a simply connected open proper subset of C, and let a ∈ U. There is a unique analytic function f : U → C such that 1. f (a) = 0, and f 0 (a) is real and positive; 2. f is injective; 3. f (U) = {z ∈ C : |z| < 1}. Remark. As a consequence of this theorem, any two simply connected regions, none of which is the whole plane, are conformally equivalent. Version: 2 Owner: Koro Author(s): Koro

411.2

Runge’s theorem

S Let K be a compact subset of C, and let E be a subset of C∞ = C {∞} (the extended complex plane) which intersects every connected component of C∞ − K. If f is an analytic function in an open set containing K, given ε > 0, there is a rational function R(z) whose only poles are in E, such that |f (z) − R(z)| < ε for all z ∈ K. Version: 2 Owner: Koro Author(s): Koro

1582

411.3

Weierstrass M-test

Let X be a topological space, {fn }n∈N a sequence of real or complex valued functions on X and {Mn }n∈N a sequence of non-negative real P numbers. Suppose that, for each n∈ N P∞ ∞ and x ∈ X, we have |fn (x)| ≤ Mn . Then f = n=1 fn converges uniformly if n=1 Mn converges. Version: 8 Owner: vypertd Author(s): vypertd, igor

411.4

annulus

Briefly, an annulus is the region bounded between two (usually concentric) circles. An open annulus, or just annulus for short, is a domain in the complex plane of the form A = Aw (r, R) = {z ∈ C | r < |z − w| < R}, where w is an abitrary complex number, and r and R are real numbers with 0 < r < R. Such a set is often called an annular region. More generally, one can allow r = 0 or R = ∞. (This makes sense for the purposes of the bound on |z − w| above.) This would make an annulus include the cases of a punctured disc, and some unbounded domains. Analogously, one can define a closed annulus to be a set of the form A = Aw (r, R) = {z ∈ C | r 6 |z − w| 6 R}, where w ∈ C, and r and R are real numbers with 0 < r < R. One can show that two annuli Dw (r, R) and Dw0 (r 0 , R0 ) are conformally equivalent if and only if R/r = R0 /r 0 . More generally, the complement of any closed disk in an open disk is conformally equivalen to precisely one annulus of the form D0 (r, 1). Version: 1 Owner: jay Author(s): jay

411.5

conformally equivalent

A region G is conformally equivalent to a set S if there is an analytic bijective function mapping G to S. Conformal equivalence is an equivalence relation. Version: 1 Owner: Koro Author(s): Koro 1583

411.6

contour integral

Let f be a complex-valued function defined on the image of a curve α: [a, b] → C, let P = {a0 , ..., an } be a partition of [a, b]. If the sum n X i=1

f (zi )(α(ai) − α(ai−1 ))

where zi is some point α(ti ) such that ai−1 6 ti 6 ai , tends to a unique limit l as n tends to infinity and the greatest of the numbers ai − ai−1 tends to zero, then we say that the contour integral of f along α exists and has value l. The contour integral is denoted by intα f (z)dz Note (i) If Im(α) is a segment of the real axis, then this definition reduces to that of the Riemann integral of f(x) between α(a) and α(b) (ii) An alternative definition, making use of the Riemann-Stieltjes integral, is based on the fact that the definition of this can be extended without any other changes in the wording to cover the cases where f and α are complex-valued functions. Now let α be any curve [a, b] → R2 . Then α can be expressed in terms of the components (α1 , α2 ) and can be assosciated with the complex valued function z(t) = α1 (t) + iα2 (t) Given any complex-valued function of a complex variable, f say, defined on Im(α) we define the contour integral of f along α, denoted by intα f (z)dz by intα f (z)dz = intba f (z(t))dz(t) whenever the complex Riemann-Stieltjes integral on the right exists. (iii) Reversing the direction of the curve changes the sign of the integral. 1584

(iv) The contour integral always exists if α is rectifiable and f is continuous. (v) If α is piecewise smooth and the countour integral of f along α exists, then intα f dz = intba f (z(t))z 0 (t)dt Version: 4 Owner: vypertd Author(s): vypertd

411.7

orientation

Let α be a rectifiable, Jordan curve in R2 and z0 be a point in R2 − Im(α) and let α have a winding number W [α : z0 ]. Then W [α : z0 ] = ±1; all points inside α will have the same index and we define the orientation of a Jordan curve α by saying that α is positively oriented if the index of every point in α is +1 and negatively oriented if it is −1. Version: 3 Owner: vypertd Author(s): vypertd

411.8

proof of Weierstrass M-test

P Consider the sequence of partial sums sn = nm=1 fm . Since the sums are finite, each sn is continuous. Take any p, q ∈ N such that p ≤ q, then, for every x ∈ X, we have q X fm (x) |sq (x) − sp (x)| = ≤



m=p+1 q X

m=p+1 q

X

|fm (x)|

Mm

m=p+1

P∞ But since n=1 Mn converges, for any  > 0 wePcan find an N ∈ N such that, for any p, q > N and x ∈ X, we have |sq (x) − sp (x)| ≤ qm=p+1 P∞ P∞Mm < . Hence the sequence sn converges uniformly to n=1 fn , and the function f = n=1 fn is continuous. Version: 1 Owner: igor Author(s): igor

1585

411.9

unit disk

The unit disk in the complex plane, denoted ∆, is defined as {z ∈ C : |z| < 1}. The unit circle, denoted ∂∆ or S 1 is the boundary {z ∈ C : |z| = 1} of the unit disk ∆. Every element z ∈ ∂∆ can be written as z = eiθ for some real value of θ. Version: 5 Owner: brianbirgen Author(s): brianbirgen

411.10

upper half plane

The upper half plane in the complex plane, abbreviated UHP, is defined as {z ∈ C : Im(z) > 0}. Version: 4 Owner: brianbirgen Author(s): brianbirgen

411.11

winding number and fundamental group

The winding number is an analytic way to define an explicit isomorphism W [• : z0 ] : π1 (C \ z0 ) → Z from the fundamental group of the punctured (at z0 ) complex plane to the group of integers. Version: 1 Owner: Dr Absentius Author(s): Dr Absentius

1586

Chapter 412 30B10 – Power series (including lacunary series) 412.1

Euler relation

Euler’s relation (also known as Euler’s formula) is considered the first bridge between the fields of algebra and geometry, as it relates the exponential function to the trigonometric sine and cosine functions. The goal is to prove eix = cos(x) + i sin(x) It’s easy to show that i4n = i4n+1 = i4n+2 = i4n+3 =

1 i −1 −i

Now, using the Taylor series expansions of sin x, cos x and ex , we can show that ix

e

ix

e

= =

∞ n n X i x n=0 ∞ X n=0

n!

ix4n+1 x4n+2 ix4n+3 x4n + − − (4n)! (4n + 1)! (4n + 2)! (4n + 3)!

Because the series expansion above is absolutely convergent for all x, we can rearrange the

1587

terms of the series as follows ix

e

∞ ∞ X X x2n x2n+1 n = (−1) + i (−1)n (2n)! (2n + 1)! n=0 n=0

eix = cos(x) + i sin(x)

Version: 8 Owner: drini Author(s): drini, fiziko, igor

412.2

analytic

Let U be a domain in the complex numbers (resp., real numbers). A function f : U −→ C (resp., f : U −→ R) is analytic (resp., real analytic) if f has a Taylor series about each point x ∈ U that converges to the function f in an open neighborhood of x.

412.2.1

On Analyticity and Holomorphicity

A complex function is analytic if and only if it is holomorphic. Because of this equivalence, an analytic function in the complex case is often defined to be one that is holomorphic, instead of one having a Taylor series as above. Although the two definitions are equivalent, it is not an easy matter to prove their equivalence, and a reader who does not yet have this result available will have to pay attention as to which definition of analytic is being used. Version: 4 Owner: djao Author(s): djao

412.3

existence of power series

In this entry we shall demonstrate the logical equivalence of the holomorphic and analytic concepts. As is the case with so many basic results in complex analysis, the proof of these facts hinges on the Cauchy integral theorem, and the Cauchy integral formula.

Holomorphic implies analytic. Theorem 8. Let U ⊂ C be an open domain that contains the origin, and let f : U → C, be a function such that the complex derivative f (z + ζ) − f (z) ζ→0 ζ

f 0 (z) = lim

1588

exists for all z ∈ U. Then, there exists a power series representation f (z) =

∞ X

ak z k ,

kzk < R,

k=0

ak ∈ C

for a sufficiently small radius of convergence R > 0. Note: it is just as easy to show the existence of a power series representation around every basepoint in z0 ∈ U; one need only consider the holomorphic function f (z − z0 ). Proof. Choose an R > 0 sufficiently small so that the disk kzk 6 R is contained in U. By the Cauchy integral formula we have that I f (ζ) 1 f (z) = dζ, kzk < R, 2πi kζk=R ζ − z where, as usual, the integration contour is oriented counterclockwise. For every ζ of modulus R, we can expand the integrand as a geometric power series in z, namely ∞

X f (ζ) f (ζ)/ζ f (ζ) = = zk , ζ −z 1 − z/ζ ζ k+1 k=0

kzk < R.

The circle of radius R is a compact set; hence f (ζ) is bounded on it; and hence, the power series above converges uniformly with respect to ζ. Consequently, the order of the infinite summation and the integration operations can be interchanged. Hence, f (z) =

∞ X

ak z k ,

k=0

where

1 ak = 2πi

as desired. QED

I

kζk=R

kzk < R, f (ζ) , ζ k+1

Analytic implies holomorphic. Theorem 9. Let f (z) =

∞ X

an z n ,

n=0

an ∈ C,

kzk < 

be a power series, converging in D = D (0), the open disk of radius  > 0 about the origin. Then the complex derivative f (z + ζ) − f (z) ζ→0 ζ

f 0 (z) = lim

exists for all z ∈ D, i.e. the function f : D → C is holomorphic. 1589

Note: this theorem generalizes immediately to shifted power series in z − z0 , z0 ∈ C. Proof. For every z0 ∈ D, the function f (z) can be recast as a power series centered at z0 . Hence, without loss of generality it suffices to prove the theorem for z = 0. The power series ∞ X

an+1 ζ n ,

n=0

ζ∈D

converges, and equals (f (ζ) − f (0))/ζ for ζ 6= 0. Consequently, the complex derivative f 0 (0) exists; indeed it is equal to a1 . QED Version: 2 Owner: rmilson Author(s): rmilson

412.4

infinitely-differentiable function that is not analytic

If f ∈ C∞ , then we can certainly write a Taylor series for f . However, analyticity requires that this Taylor series actually converge (at least across some radius of convergence) to f . It is not necessary that the power series for f converge to f , as the following example shows. Let

( e x 6= 0 f (x) = . 0 x=0

Then f ∈ C∞ , and for any n ≥ 0, f (n) (0) = 0 (see below). So the Taylor series for f around 0 is 0; since f (x) > 0 for all x 6= 0, clearly it does not converge to f .

Proof that f (n) (0) = 0 Let p(x), q(x) ∈ R[x] be polynomials, and define g(x) = Then, for x 6= 0,

p(x) · f (x). q(x)

(p0 (x) + p(x) x23 )q(x) − q 0 (x)p(x) · e. g (x) = q 2 (x) 0

Computing (e.g. by applying L’Hˆopital’s rule), we see that g 0(0) = limx→0 g 0 (x) = 0. Define p0 (x) = q0 (x) = 1. Applying the above inductively, we see that we may write (x) f (n) (x) = pqnn(x) f (x). So f (n) (0) = 0, as required. Version: 2 Owner: ariels Author(s): ariels 1590

412.5

power series

A power series is a series of the form ∞ X k=0

ak (x − x0 )k ,

with ak , x0 ∈ R or ∈ C. The ak are called the coefficients and x0 the center of the power series. Where it converges it defines a function, which can thus be represented by a power series. This is what power series are usually used for. Every power series is convergent at least at x = x0 where it converges to a0 . In addition it is absolutely convergent in the region {x | |x − x0 | < r}, with 1 r = lim inf p k k→∞ |ak | It is divergent for every x with |x − x0 | > r. For |x − x0 | = r no general predictions can be made. If r = ∞, the power series converges absolutely for every real or complex x. The real number r is called the radius of convergence of the power series.

Examples of power series are: • Taylor series, for example: x

e =

∞ X xk k=0

• The geometric series:

k!

.



X 1 = xk , 1−x k=0

with |x| < 1.

Power series have some important properties: • If a power series converges for a z0 ∈ C then it also converges for all z ∈ C with |z − x0 | < |z0 − x0 |. • Also, if a power series diverges for some z0 ∈ C then it diverges for all z ∈ C with |z − x0 | > |z0 − x0 |. • For |x − x0 | < r Power series can be added by adding coefficients and mulitplied in the obvious way: ∞ X k=0

k

ak (x−xo ) ·

∞ X

bj (x−x0 )j = a0 b0 +(a0 b1 +a1 b0 )(x−x0 )+(a0 b2 +a1 b1 +a2 b0 )(x−x0 )2 . . . .

l=0

1591

• (Uniqueness) If two power series are equal and their centers are the same, then their coefficients must be equal. • Power series can be termwise differentiated and integrated. These operations keep the radius of convergence. Version: 13 Owner: mathwizard Author(s): mathwizard, AxelBoldt

412.6

proof of radius of convergence

According to Cauchy’s root test a power series is absolutely convergent if p p lim sup k |ak (x − x0 )k | = |x − x0 | lim sup k |ak | < 1. k→∞

k→∞

This is obviously true if

1 1 p = . |x − x0 | < lim sup p k |ak | lim inf k→∞ k |ak | k→∞

In the same way we see that the series is divergent if |x − x0 | >

1 lim inf k→∞

p , k |ak |

which means that the right hand side is the radius of convergence of the power series. Now from the ratio test we see that the power series is absolutely convergent if ak+1 ak+1 (x − x0 )k+1 = |x − x0 | lim < 1. lim k→∞ ak k→∞ ak (x − x0 )k

Again this is true if

The series is divergent if

ak . |x − x0 | < lim k→∞ ak+1

ak , |x − x0 | > lim k→∞ ak+1

as follows from the ratio test in the same way. So we see that in this way too we can calculate the radius of convergence. Version: 1 Owner: mathwizard Author(s): mathwizard

1592

412.7

radius of convergence

To the power series

∞ X k=0

ak (x − x0 )k

(412.7.1)

there exists a number r ∈ [0, ∞], its radius of convergence, such that the series converges absolutely for all (real or complex) numbers x with |x − x0 | < r and diverges whenever |x − x0 | > r. (For |x − x0 | = r no general statements can be made, except that there always exists at least one complex number x with |x − x0 | = r such that the series diverges.) The radius of convergence is given by:

and can also be computed as

if this limit exists.

1 r = lim inf p k→∞ k |a | k ak , r = lim k→∞ ak+1

Version: 6 Owner: mathwizard Author(s): mathwizard, AxelBoldt

1593

(412.7.2)

(412.7.3)

Chapter 413 30B50 – Dirichlet series and other series expansions, exponential series 413.1

Dirichlet series

Let (λn )n≥1 be an increasing sequence of positive real numbers tending to ∞. A Dirichlet series with exponents (λn ) is a series of the form X an e−λn z n

where z and all the an are complex numbers. An ordinary Dirichlet series is one having λn = log n for all n. It is written X an . nz

The best-known examples are the Riemann zeta fuction (in which an is the constant 1) and the more general Dirichlet L-series (in which the mapping n 7→ an is multiplicative and periodic). When λn = n, the Dirichlet series is just a power series in the variable e−z . The following are the basic convergence properties of Dirichlet series. There is nothing profound about their proofs, which can be found in [1] and in various other works on complex analysis and analytic number theory. P Let f (z) = n an e−λn z be a Dirichlet series. 1. If f converges at z = z0 , then f converges uniformly in the region Re(z − z0 ) ≥ 0

− α ≤ arg(z − z0 ) ≤ α 1594

where α is any real number such that 0 < α < π/2. (Such a region is known as a “Stoltz angle”.) 2. Therefore, if f converges at z0 , its sum defines a holomorphic function on the region Re(z) > Re(z0 ), and moreover f (z) → f (z0 ) as z → z0 within any Stoltz angle. 3. f = 0 identically iff all the an are zero. So, if f converges somewhere but not everywhere in C, then the domain of its convergence is the region Re(z) > ρ for some real number ρ, which is called the abscissa P of convergence of the Dirichlets series. The abscissa of convergence of the series f (z) = n |an |e−λn z , if it exists, is called the abscissa of absolute convergence of f . Now suppose that the coefficients an are all real and ≥ 0. If the series f converges for Re(z) > ρ, and the resulting function admits an analytic extension to a neighbourhood of ρ, then the series f converges in a neighbourhood of ρ. Consequently, the domain of convergence of f (unless it is the whole of C) is bounded by a singularity at a point on the real axis. Finally, return to the general case Pofanany complex numbers (an ), but suppose λn = log n, so f is an ordinary Dirichlet series . nz 1. If the sequence (an ) is bounded, then f converges absolutely in the region Re(z) > 1. P 2. If the partial sums ln=k an are bounded, then f converges (not necessarily absolutely) in the region Re(z) > 0.

Reference: [1] Serre, J.-P., A Course in Arithmetic, Chapter VI, Springer-Verlag, 1973. Version: 2 Owner: bbukh Author(s): Larry Hammick

1595

Chapter 414 30C15 – Zeros of polynomials, rational functions, and other analytic functions (e.g. zeros of functions with bounded Dirichlet integral) 414.1

Mason-Stothers theorem

Mason’s theorem is often described as the polynomial case of the (currently unproven) ABC conjecture. Theorem 1 (Mason-Stothers). Let f (z), g(z), h(z) ∈ C[z] be such that f (z) + g(z) = h(z) for all z, and such that f , g, and h are pair-wise relatively prime. Denote the number of distinct roots of the product f gh(z) by N. Then max deg{f, g, h} + 1 6 N. Version: 1 Owner: mathcam Author(s): mathcam

414.2

zeroes of analytic functions are isolated

The zeroes of a non-constant analytic function on C are isolated. Let f be an analytic function defined in some domain D ⊂ C and let f (z0 ) = 0 for some z0 ∈ D. Because f is analytic, there is a Taylor series expansion for f around z0 which converges on an open disk n |z − z0 | < R. Write it as f (z) = Σ∞ n=k an (z − z0 ) , with ak 6= 0 and k > 0 (ak is the first n non-zero term). One can factor the series so that f (z) = (z − z0 )k Σ∞ n=0 an+k (z − z0 ) and 1596

n k define g(z) = Σ∞ n=0 an+k (z − z0 ) so that f (z) = (z − z0 ) g(z). Observe that g(z) is analytic on |z − z0 | < R.

To show that z0 is an isolated zero of f , we must find  > 0 so that f is non-zero on 0 < |z − z0 | < . It is enough to find  > 0 so that g is non-zero on |z − z0 | <  by the relation f (z) = (z − z0 )k g(z). Because g(z) is analytic, it is continuous at z0 . Notice that g(z0 ) = ak 6= 0, so there exists an  > 0 so that for all z with |z − z0 | <  it follows that |g(z) − ak | < |a2k | . This implies that g(z) is non-zero in this set. Version: 5 Owner: brianbirgen Author(s): brianbirgen

1597

Chapter 415 30C20 – Conformal mappings of special domains 415.1

automorphisms of unit disk

All automorphisms of the complex unit disk ∆ = {z ∈ C : |z| < 1} to itself, can be written z−a in the form fa (z) = eiθ 1−az where a ∈ ∆ and θ ∈ S 1 . This map sends a to 0, 1/a to ∞ and the unit circle to the unit circle. Version: 3 Owner: brianbirgen Author(s): brianbirgen

415.2

unit disk upper half plane conformal equivalence theorem

Theorem: There is a conformal map from ∆, the unit disk, to UHP , the upper half plane. 1+w Proof: Define f : C → C, f (z) = z−i . Notice that f −1 (w) = i 1−w and that f (and therefore z+i −1 f ) is a Mobius transformation.

−1−i Notice that f (0) = −1, f (1) = 1−i = −i and f (−1) = −1+i = i. By the Mobius circle transformation theore 1+i f takes the real axis to the unit circle. Since f (i) = 0, f maps UHP to ∆ and f −1 : ∆ → UHP .

Version: 3 Owner: brianbirgen Author(s): brianbirgen

1598

Chapter 416 30C35 – General theory of conformal mappings 416.1

proof of conformal mapping theorem

Let D ⊂ C be a domain, and let f : D → C be an analytic function. By identifying the complex plane C with R2 , we can view f as a function from R2 to itself: f˜(x, y) := (Re f (x + iy), Im f (x + iy)) = (u(x, y), v(x, y)) with u and v real functions. The Jacobian matrix of f˜ is   ∂(u, v) ux uy J(x, y) = = . vx vy ∂(x, y) As an analytic function, f satisfies the Cauchy-Riemann equations, so that ux = vy and uy = −vx . At a fixed point z = x + iy ∈ D, we can therefore define a = ux (x, y) = vy (x, y) and b = uy (x, y) = −vx (x, y). We write (a, b) in polar coordinates as (r cos θ, r sin θ) and get     a b cos θ sin θ J(x, y) = =r −b a − sin θ cos θ Now we consider two smooth curves through (x, y), which we parametrize by γ1 (t) = (u1 (t), v1 (t)) and γ2 (t) = (u2 (t), v2 (t)). We can choose the parametrization such that γ1 (0) = γ2 (0) = z. The images of these curves under f˜ are f˜ ◦ γ1 and f˜ ◦ γ2 , respectively, and their derivatives at t = 0 are  du1  dγ ∂(u, v) 1 0 dt (γ1 (0)) · (0) = J(x, y) dv (f˜ ◦ γ1 ) (0) = 1 ∂(x, y) dt dt and, similarly, (f˜ ◦ γ2 )0 (0) = J(x, y) 1599

 du2  dt dv2 dt

by the chain rule. We see that if f 0 (z) 6= 0, f transforms the tangent vectors to γ1 and γ2 at t = 0 (and therefore in z) by the orthogonal matrix   cos θ sin θ J/r = − sin θ cos θ and scales them by a factor of r. In particular, the transformation by an orthogonal matrix implies that the angle between the tangent vectors is preserved. Since the determinant of J/r is 1, the transformation also preserves orientation (the direction of the angle between the tangent vectors). We conclude that f is a conformal mapping. Version: 3 Owner: pbruin Author(s): pbruin

1600

Chapter 417 30C80 – Maximum principle; Schwarz’s lemma, Lindel¨ of principle, analogues and generalizations; subordination 417.1

Schwarz lemma

Let ∆ = {z : |z| < 1} be the open unit disk in the complex plane C. Let f : ∆ → ∆ be a holomorphic function with f(0)=0. Then |f (z)| ≤ |z| for all z ∈ ∆, and |f 0 (0)| ≤ 1. If equality |f (z)| = |z| holds for any z 6= 0 or |f 0 (0)| = 1, then f is a rotation: f (z) = az with |a| = 1. This lemma is less celebrated than the bigger guns (such as the Riemann mapping theorem, which it helps prove); however, it is one of the simplest results capturing the “rigidity” of holomorphic functions. No similar result exists for real functions, of course. Version: 2 Owner: ariels Author(s): ariels

417.2

maximum principle

Maximum principle Let f : U → R (where U ⊆ Rd ) be a harmonic function. Then f attains its extremal values on any compact K ⊆ U on the boundary ∂K of K. If f attains an extremal value anywhere inside int K, then it is constant. Maximal modulus principle Let f : U → C (where U ⊆ C) be a holomorphic function. Then |f | attains its maximal value on any compact K ⊆ U on the boundary ∂K of K. If |f | attains its maximal value anywhere inside int K, then it is constant. 1601

Version: 1 Owner: ariels Author(s): ariels

417.3

proof of Schwarz lemma

Define g(z) = f (z)/z. Then g : ∆ → C is a holomorphic function. The Schwarz lemma is just an application of the maximal modulus principle to g. For any 1 >  > 0, by the maximal modulus principle |g| must attain its maximum on the closed disk {z : |z| ≤ 1 − } at its boundary {z : |z| = 1 − }, say at some point z . But then 1 |g(z)| ≤ |g(z )| ≤ 1− for any |z| ≤ 1 − . Taking an infinimum as  → 0, we see that values of g are bounded: |g(z)| ≤ 1. Thus |f (z)| ≤ |z|. Additionally, f 0 (0) = g(0), so we see that |f 0 (0)| = |g(0)| ≤ 1. This is the first part of the lemma. Now suppose, as per the premise of the second part of the lemma, that |g(w)| = 1 for some w ∈ ∆. For any r > |w|, it must be that |g| attains its maximal modulus (1) inside the disk {z : |z| ≤ r}, and it follows that g must be constant inside the entire open disk ∆. So g(z) ⇔ a for a = g(w) of size 1, and f (z) = az, as required. Version: 2 Owner: ariels Author(s): ariels

1602

Chapter 418 30D20 – Entire functions, general theory 418.1

Liouville’s theorem

A bounded entire function is constant. That is, a bounded complex function f : C → C which is holomorphic on the entire complex plane is always a constant function. More generally, any holomorphic function f : C → C which satisfies a polynomial bound condition of the form |f (z)| < c · |z|n

for some c ∈ R, n ∈ Z, and all z ∈ C with |z| sufficiently large is necessarily equal to a polynomial function.

Liouville’s theorem is a vivid example of how stringent the holomorphicity condition on a complex function really is. One has only to compare the theorem to the corresponding statement for real functions (namely, that a bounded differentiable real function is constant, a patently false statement) to see how much stronger the complex differentiability condition is compared to real differentiability. Applications of Liouville’s theorem include proofs of the fundamental theorem of algebra and of the partial fraction decomposition theorem for rational functions. Version: 4 Owner: djao Author(s): djao

418.2

Morera’s theorem

Morera’s theorem provides the converse of Cauchy’s integral theorem. 1603

Theorem [1] Suppose G is a region in C, and f : G → C is a continuous function. If for every closed triangle ∆ in G, we have int∂∆ f dz = 0, then f is analytic on G. (Here, ∂∆ is the piecewise linear boundary of ∆.) In particular, if for every rectifiable closed curve Γ in G, we have intΓ f dz = 0, then f is analytic on G. Proofs of this can be found in [2, 2].

REFERENCES 1. W. Rudin, Real and complex analysis, 3rd ed., McGraw-Hill Inc., 1987. 2. E. Kreyszig, Advanced Engineering Mathematics, John Wiley & Sons, 1993, 7th ed. 3. R.A. Silverman, Introductory Complex Analysis, Dover Publications, 1972.

Version: 7 Owner: matte Author(s): matte, drini, nerdy2

418.3

entire

A function f : C −→ C is entire if it is holomorphic. Version: 2 Owner: djao Author(s): djao

418.4

holomorphic

Let U ⊂ C be a domain in the complex numbers. A function f : U −→ C is holomorphic if f has a complex derivative at every point x in U, i.e. if f (z) − f (z0 ) lim z→z0 z − z0 exists for all z0 ∈ U. Version: 5 Owner: djao Author(s): djao, rmilson

418.5

proof of Liouville’s theorem

Let f : C → C be a bounded, entire function. Then by Taylor’s Theorem, ∞ X 1 f (w) f (z) = cn xn where cn = intΓr n+1 dw 2πi w n=0 1604

where Γr is the circle of radius r about 0, for r > 0. Then cn can be estimated as   f (w) 1 Mr Mr 1 |cn | 6 length(Γr ) sup n+1 : w ∈ Γr = 2πr n+1 = n 2π w 2π r r

where Mr = sup{|f (w)| : w ∈ Γr }.

But f is bounded, so there is M such that Mr 6 M for all r. Then |cn | 6 rMn for all n and all r > 0. But since r is arbitrary, this gives cn = 0 whenever n > 0. So f (z) = c0 for all z, so f is constant. Version: 2 Owner: Evandar Author(s): Evandar

1605

Chapter 419 30D30 – Meromorphic functions, general theory 419.1

Casorati-Weierstrass theorem

Let U ⊂ C be a domain, a ∈ U, and let f : U \ {a} → C be holomorphic. Then a is an essential singularity of f if and only if the image of any punctured neighborhood of a under f is dense in C. Version: 2 Owner: pbruin Author(s): pbruin

419.2

Mittag-Leffler’s theorem

Let G be an open subset of C, let {ak } be a sequence of distinct points in G which has no limit point in G. For each k, let A1k , . . . , Amk k be arbitrary complex coefficients, and define Sk (z) =

mk X j=1

Ajk . (z − ak )j

Then there exists a meromorphic function f on G whose poles are exactly the points {ak } and such that the singular part of f at ak is Sk (z), for each k. Version: 1 Owner: Koro Author(s): Koro

1606

419.3

Riemann’s removable singularity theorem

Let U ⊂ C be a domain, a ∈ U, and let f : U \ {a} be holomorphic. Then a is a removable singularity of f if and only if lim (z − a)f (z) = 0.

z→a

In particular, a is a removable singularity of f if f is bounded near a, i.e. if there is a punctured neighborhood V of a and a real number M > 0 such that |f (z)| < M for all z ∈ V. Version: 1 Owner: pbruin Author(s): pbruin

419.4

essential singularity

Let U ⊂ C be a domain, a ∈ U, and let f : U \{a} → C be holomorphic. If the Laurent series expansion of f (z) around a contains infinitely many terms with negative powers of z −a, then a is said to be an essential singularity of f . Any singularity of f is a removable singularity, a pole or an essential singularity. If a is an essential singularity of f , then the image of any punctured neighborhood of a under f is dense in C (the Casorati-Weierstrass theorem). In fact, an even stronger statement is true: according to Picard’s theorem, the image of any punctured neighborhood of a is C, with the possible exception of a single point. Version: 4 Owner: pbruin Author(s): pbruin

419.5

meromorphic

Let U ⊂ C be a domain. A function f : U −→ C is meromorphic if f is holomorphic except at an isolated set of poles. It can be proven that if f is meromorphic then its set of poles does not have an accumulation point. Version: 2 Owner: djao Author(s): djao

419.6

pole

Let U ⊂ C be a domain and let a ∈ C. A function f : U −→ C has a pole at a if it can be represented by a Laurent series centered about a with only finitely many negative terms; 1607

that is, f (z) =

∞ X

k=−n

ck (z − a)k

in some nonempty deleted neighborhood of a, for some n ∈ N. Version: 2 Owner: djao Author(s): djao

419.7

proof of Casorati-Weierstrass theorem

Assume that a is an essential singularity of f . Let V ⊂ U be a punctured neighborhood of a, and let λ ∈ C. We have to show that λ is a limit point of f (V ). Suppose it is not, then there is an  > 0 such that |f (z) − λ| >  for all z ∈ V , and the function g : V → C, z 7→

1 f (z) − λ

1 < −1 for all z ∈ V . According to Riemann’s removable singularity theorem is bounded, since |g(z)| = |f (z)−λ| this implies that S a is a removable singularity of g, so that g can be extended to a holomorphic function g¯ : V {a} → C. Now 1 f (z) = −λ g¯(z)

for z 6= a, and a is either a removable singularity of f (if g¯(z) 6= 0) or a pole of order n (if g¯ has a zero of order n at a). This contradicts our assumption that a is an essential singularity, which means that λ must be a limit point of f (V ). The argument holds for all λ ∈ C, so f (V ) is dense in C for any punctured neighborhood V of a. To prove the converse, assume that f (V ) is dense in C for any punctured neighborhood V of a. If a is a removable singularity, then f is bounded near a, and if a is a pole, f (z) → ∞ as z → a. Either of these possibilities contradicts the assumption that the image of any punctured neighborhood of a under f is dense in C, so a must be an essential singularity of f. Version: 1 Owner: pbruin Author(s): pbruin

419.8

proof of Riemann’s removable singularity theorem

Suppose that f is holomorphic on U \ {a} and limz→a (z − a)f (z) = 0. Let f (z) =

∞ X

k=−∞

ck (z − a)k

1608

be the Laurent series of f centered at a. We will show that ck = 0 for k < 0, so that f can be holomorphically extended to all of U by defining f (a) = c0 . For n ∈ N0 , the residue of (z − a)n f (z) at a is 1 lim Res((z − a) f (z), a) = 2πi δ→0+ n

I

|z−a|=δ

(z − a)n f (z)dz.

This is equal to zero, because I n (z − a) f (z)dz ≤ 2πδ max |(z − a)n f (z)| |z−a|=δ |z−a|=δ

= 2πδ n max |(z − a)f (z)| |z−a|=δ

which, by our assumption, goes to zero as δ → 0. Since the residue of (z − a)n f (z) at a is also equal to c−n−1 , the coefficients of all negative powers of z in the Laurent series vanish. Conversely, if a is a removable singularity of f , then f can be expanded in a power series centered at a, so that lim (z − a)f (z) = 0 z→a

because the constant term in the power series of (z − a)f (z) is zero. A corollary of this theorem is the following: if f is bounded near a, then |(z − a)f (z)| ≤ |z − a|M for some M > 0. This implies that (z − a)f (z) → 0 as z → a, so a is a removable singularity of f . Version: 1 Owner: pbruin Author(s): pbruin

419.9

residue

Let U ⊂ C be a domain and let f : U −→ C be a function represented by a Laurent series f (z) :=

∞ X

k=−∞

ck (z − a)k

centered about a. The coefficient c−1 of the above Laurent series is called residue of f at a, and denoted Res(f ; a). Version: 2 Owner: djao Author(s): djao

1609

419.10

simple pole

A simple pole is a pole of order 1. That is, a meromorphic function f has a simple pole at x0 ∈ C if a + g(z) f (z) = z − x0 where a 6= 0 ∈ C, and g is holomorphic at x0 . Version: 3 Owner: bwebste Author(s): bwebste

1610

Chapter 420 30E20 – Integration, integrals of Cauchy type, integral representations of analytic functions 420.1

Cauchy integral formula

The formulas. Let D = {z ∈ C : kz − z0 k < R} be an open disk in the complex plane, and let f (z) be a holomorphic 1 function defined on some open domain that contains D and its boundary. Then, for every z ∈ D we have I f (ζ) 1 dζ f (z) = 2πi C ζ − z I 1 f (ζ) 0 f (z) = dζ 2πi C (ζ − z)2 .. . I f (ζ) n! (n) dζ f (z) = 2πi C (ζ − z)n+1

Here C = ∂D is the corresponding circular boundary contour, oriented counterclockwise, with the most obvious parameterization given by ζ = z0 + Reit ,

0 6 t 6 2π.

Discussion. The first of the above formulas underscores the “rigidity” of holomoprhic functions. Indeed, the values of the holomorphic function inside a disk D are completely 1

It is necessary to draw a distinction between holomorphic functions (those having a complex derivative) and analytic functions (those representable by power series). The two concepts are, in fact, equivalent, but the standard proof of this fact uses the Cauchy Integral Formula with the (apparently) weaker holomorphicity hypothesis.

1611

specified by its values on the boundary of the disk. The second formula is useful, because it gives the derivative in terms of an integral, rather than as the outcome of a limit process. Generalization. The following technical generalization of the formula is needed for the treatment of removable singularities. Let S be a finite subset of D, and suppose that f (z) is holomorphic for all z ∈ / S, but also that f (z) is bounded near all z ∈ S. Then, the above formulas are valid for all z ∈ D \ S. Using the Cauchy residue theorem, one can further generalize the integral formula to the situation where D is any domain and C is any closed rectifiable curve in D; in this case, the formula becomes I 1 f (ζ) η(C, z)f (z) = dζ 2πi C ζ − z where η(C, z) denotes the winding number of C. It is valid for all points z ∈ D \ S which are not on the curve C. Version: 19 Owner: djao Author(s): djao, rmilson

420.2

Cauchy integral theorem

Theorem 10. Let U ⊂ C be an open, simply connected domain, and let f : U → C be a function whose complex derivative, that is lim

w→z

f (w) − f (z) , w−z

exists for all z ∈ U. Then, the integral around a every closed contour γ ⊂ U vanishes; in symbols I f (z) dz = 0. γ

We also have the following, technically important generalization involving removable singularities. Theorem 11. Let U ⊂ C be an open, simply connected domain, and S ⊂ U a finite subset. Let f : U\S → C be a function whose complex derivative exists for all z ∈ U\S, and that is bounded near all z ∈ S. Then, the integral around a every closed contour γ ⊂ U\S that avoids the exceptional points vanishes. Cauchy’s theorem is an essential stepping stone in the theory of complex analysis. It is required for the proof of the Cauchy integral formula, which in turn is required for the proof that the existence of a complex derivative implies a power series representation. The original version of the theorem, as stated by Cauchy in the early 1800s, requires that the derivative f 0 (z) exist and be continuous. The existence of f 0 (z) implies the Cauchy-Riemann equations, 1612

which in turn can be restated as the fact that the complex-valued differential f (z) dz is closed. The original proof makes use of this fact, and calls on Green’s theorem to conclude that the contour integral vanishes. The proof of Green’s theorem, however, involves an interchange of order in a double integral, and this can only be justified if the integrand, which involves the real and imaginary parts of f 0 (z), is assumed to be continuous. To this date, many authors prove the theorem this way, but erroneously fail to mention the continuity assumption. In the latter part of the 19th century E. Goursat found a proof of the integral theorem that merely required that f 0 (z) exist. Continuity of the derivative, as well as the existence of all higher derivatives, then follows as a consequence of the Cauchy integral formula. Not only is Goursat’s version a sharper result, but it is also more elementary and self-contained, in that sense that it is does not require Green’s theorem. Goursat’s argument makes use of rectangular contour (many authors use triangles though), but the extension to an arbitrary simply-connected domain is relatively straight-forward. Theorem 12 (Goursat). Let U be an open domain containing a rectangle R = {x + iy ∈ C : a 6 x 6 b , c 6 y 6 d}.

If the complex derivative of a function f : U → C exists at all points of U, then the contour integral of f around the boundary of R vanishes; in symbols I f (z) dz = 0. ∂R

Bibliography. • A. Ahlfors, “Complex Analysis”. Version: 7 Owner: rmilson Author(s): rmilson

420.3

Cauchy residue theorem

Let U ⊂ C be a simply connected domain, and suppose f is a complex valued function which is defined and analytic on all but finitely many points a1 , . . . , am of U. Let C be a closed curve in U which does not intersect any of the ai . Then m X intC f (z) dz = 2πi η(C, ai) Res(f ; ai ), i=1

where

1 dz intC 2πi z − ai is the winding number of C about ai , and Res(f ; ai) denotes the residue of f at ai . η(C, ai) :=

Version: 4 Owner: djao Author(s): djao, rmilson 1613

420.4

Gauss’ mean value theorem

Let Ω be a domain in C and suppose f is an analytic function on Ω. Furthermore, let C be a circle inside Ω with center z0 and radius r. Then f (z0 ) is the mean value of f along C, that is, 1 int2π f (z0 + reiθ )dθ. f (z0 ) = 2π 0 Version: 7 Owner: Johan Author(s): Johan

420.5

M¨ obius circle transformation theorem

M¨obius transformations always transform circles into circles. Version: 1 Owner: Johan Author(s): Johan

420.6

M¨ obius transformation cross-ratio preservation theorem

A M¨obius transformation f : z 7→ w preserves the cross-ratios, i.e. (z1 − z2 )(z3 − z4 ) (w1 − w2 )(w3 − w4 ) = (w1 − w4 )(w3 − w2 ) (z1 − z4 )(z3 − z2 ) Version: 3 Owner: Johan Author(s): Johan

420.7

Rouch’s theorem

Let f, g be analytic on and inside a simple closed curve C. Suppose |f (z)| > |g(z)| on C. Then f and f + g have the same number of zeros inside C. Version: 2 Owner: Johan Author(s): Johan

1614

420.8

absolute convergence implies convergence for an infinite product

If an infinite product is absolutely convergent then it is convergent. Version: 2 Owner: Johan Author(s): Johan

420.9

absolute convergence of infinite product

An infinite product verges.

Q∞

n=1 (1

+ an ) is said to be absolutely convergent if

Q∞

n=1 (1

+ |an |) con-

Version: 4 Owner: mathcam Author(s): mathcam, Johan

420.10

closed curve theorem

Let U ⊂ C be a simply connected domain, and suppose f : U −→ C is holomorphic. Then intC f (z) dz = 0 for any smooth closed curve C in U. More generally, if U is any domain, and C1 and C2 are two homotopic smooth closed curves in U, then intC1 f (z) dz = intC2 f (z) dz. for any holomorphic function f : U −→ C. Version: 3 Owner: djao Author(s): djao

420.11

conformal M¨ obius circle map theorem

Any conformal map that maps the interior of the unit disc onto itself is a M¨obius transformation. Version: 4 Owner: Johan Author(s): Johan

1615

420.12

conformal mapping

A mapping f : C 7→ C which preserves the size and orientation of the angles (at z0 ) between any two curves which intersects in a given point z0 is said to be conformal at z0 . A mapping that is conformal at any point in a domain D is said to be conformal in D. Version: 4 Owner: Johan Author(s): Johan

420.13

conformal mapping theorem

Let f (z) be analytic in a domain D. Then it is conformal at any point z ∈ D where f 0 (z) 6= 0. Version: 2 Owner: Johan Author(s): Johan

420.14

convergence/divergence for an infinite product

Q∞ Consider n=1 pn . We say that this infinite product converges iff the finite products Pm = Qm p −→ P 6= 0 converge or for at most a finite number of terms pnk = 0 , k = 1, . . . , K. n=1 n Otherwise the infinite product is called divergent. Note: The infinite product vanishes only if a factor is zero. Version: 6 Owner: Johan Author(s): Johan

420.15

example of conformal mapping

Consider the four curves A = {t}, B = {t + it}, C = {it} and D = {−t + it}, t ∈ [−10, 10]. Suppose there is a mapping f : C 7→ C which maps A to D and B to C. Is f conformal at z0 = 0? The size of the angles between A and B at the point of intersection z0 = 0 is preserved, however the orientation is not. Therefore f is not conformal at z0 = 0. Now suppose there is a function g : C 7→ C which maps A to C and B to D. In this case we see not only that the size of the angles is preserved, but also the orientation. Therefore g is conformal at z0 = 0. Version: 3 Owner: Johan Author(s): Johan

1616

420.16

examples of infinite products

A classic example is the Riemann zeta function. For Re(z) > 1 we have ∞ Y X 1 1 = . ζ(z) = z −z n 1 − p n=1 p prime

With the help of a Fourier series, or in other ways, one can prove this infinite product expansion of the sine function:  ∞  Y z2 sin z = z 1− 2 2 (420.16.1) nπ n=1 where z is an arbitrary complex number. Taking the logarithmic derivative (a frequent move in connection with infinite products) we get a decomposition of the cotangent into partial fractions:  ∞  X 1 1 1 + . (420.16.2) π cot πz = + 2 z z+n z−n n=1 The equation (495.2.1), in turn, has some interesting uses, e.g. to get the Taylor expansion of an Eisenstein series, or to evaluate ζ(2n) for positive integers n.

Version: 1 Owner: mathcam Author(s): Larry Hammick

420.17

link between infinite products and sums

Let

∞ Y

pk

k=1

be an infinite product such that pk > 0 for all k. Then the infinite product converges if and only if the infinite sum ∞ X log pk k=1

converges. Moreover

∞ Y

pk = exp

k=1

∞ X

log pk .

k=1

Proof. Simply notice that N Y

k=1

pk = exp

N X k=1

1617

log pk .

If the infinite sum converges then lim

N →∞

N Y

pk = lim exp N →∞

k=1

N X

log pk = exp

k=1

∞ X

log pk

k=1

and also the infinite product converges. Version: 1 Owner: paolini Author(s): paolini

420.18

proof of Cauchy integral formula

Let D = {z ∈ C : kz − z0 k < R} be a disk in the complex plane, S ⊂ D a finite subset, and ¯ Suppose that U ⊂ C an open domain that contains the closed disk D. • f : U\S → C is holomorphic, and that • f (z) is bounded on D\S. Let z ∈ D\S be given, and set g(ζ) =

f (ζ) − f (z) , ζ −z

ζ ∈ D\S 0,

S where S 0 = S {z}. Note that g(ζ) is holomorphic and bounded on D\S 0. The second assertion is true, because g(ζ) → f 0 (z), as ζ → z. Therefore, by the Cauchy integral theorem I g(ζ) dζ = 0, C

where C is the counterclockwise circular contour parameterized by ζ = z0 + Reit , 0 6 t 6 2π. Hence,

I

C

f (ζ) dζ = ζ −z

lemma. If z ∈ C is such that kzk = 6 1, then I

kζk=1

dζ = ζ −z

I

C

f (z) dζ. ζ −z

( 0 if kzk > 1 2πi if kzk < 1

1618

(420.18.1)

The proof is fun exercise in elementary integral calculus, an application of the half-angle trigonometric substitutions. Thanks to the Lemma, the right hand side of (495.2.1) evaluates to 2πif (z). Dividing through by 2πi, we obtain I 1 f (ζ) dζ, f (z) = 2πi C ζ − z as desired.

Since a circle is a compact set, the defining limit for the derivative f (ζ) d f (ζ) = dz ζ − z (ζ − z)2 converges uniformly for ζ ∈ ∂D. Thanks to the uniform convergence, the order of the derivative and the integral operations can be interchanged. In this way we obtain the second formula: I I 1 d f (ζ) f (ζ) 1 0 f (z) = dζ = dζ. 2πi dz C ζ − z 2πi C (ζ − z)2 Version: 9 Owner: rmilson Author(s): rmilson, stawn

420.19

proof of Cauchy residue theorem

Being f holomorphic by Cauchy Riemann equations the differential form f (z) dz is closed. So by the lemma about closed differential forms on a simple connected domain we know that the integral intC f (z) dz is equal to intC 0 f (z) dz if C 0 is any curve which is homotopic to C. In particular we can consider a curve C 0 which turns around the points aj along small circles and join these small circles with segments. Since the curve C 0 follows each segment two times with opposite orientation it is enough to sum the integrals of f around the small circles. So letting z = aj + ρeiθ be a parameterization of the curve around the point aj , we have dz = ρieiθ dθ and hence X intC f (z) dz = intC 0 f (z) dz = η(C, aj )int∂Bρ (aj ) f (z) dz j

=

X

iθ iθ η(C, aj )int2π 0 f (aj + ρe )ρie dθ

j

where ρ > 0 is choosen so small that the balls Bρ (aj ) are all disjoint and all contained in the domain U. So by linearity, it is enough to prove that for all j iθ iθ iint2π 0 f (aj + e )ρe dθ = 2πiRes(f, aj ).

1619

Let now j be fixed and consider now the Laurent series for f in aj : X f (z) = ck (z − aj )k k∈Z

so that Res(f, aj ) = c−1 . We have X X iθ iθ 2π iθ k iθ k+1 i(k+1)θ int2π f (a + e )ρe dθ = int c (ρe ) ρe dθ = ρ ck int2π dθ. j 0 0 k 0 e k

k

Notice now that if k = −1 we have i(k+1)θ ρk+1 ck int2π dθ = c−1 int2π 0 e 0 dθ = 2πc−1 = 2π Res(f, aj )

while for k 6= −1 we have i(k+1)θ int2π 0 e



ei(k+1)θ dθ = i(k + 1)

2π

= 0.

0

Hence the result follows. Version: 2 Owner: paolini Author(s): paolini

420.20

proof of Gauss’ mean value theorem

We can parametrize the circle by letting z = z0 + reiφ . Then dz = ireiφ dφ. Using the Cauchy integral formula we can express f (z0 ) in the following way: I 1 f (z) f (z0 + reiφ ) iφ 1 1 f (z0 ) = dz = int2π ire dφ = int2π f (z0 + reiφ )dφ. 0 iφ 2πi C z − z0 2πi re 2π 0 Version: 12 Owner: Johan Author(s): Johan

420.21

proof of Goursat’s theorem

We argue by contradiction. Set η=

I

f (z) dz,

∂R

and suppose that η 6= 0. Divide R into four congruent rectangles R1 , R2 , R3 , R4 (see Figure 1), and set I ηi = f (z) dz. ∂Ri

1620

Figure 1: subdivision of the rectangle contour.

Now subdivide each of the four sub-rectangles, to get 16 congruent sub-sub-rectangles Ri1 i2 , i1 , i2 = 1 . . . 4, and then continue ad infinitum to obtain a sequence of nested families of rectangles Ri1 ...ik , with ηi1 ...ik the values of f (z) integrated along the corresponding contour. Orienting the boundary of R and all the sub-rectangles in the usual counter-clockwise fashion we have η = η1 + η2 + η3 + η4 , and more generally ηi1 ...ik = ηi1 ...ik 1 + ηi1 ...ik 2 + ηi1 ...ik 3 + ηi1 ...ik 4 . In as much as the integrals along oppositely oriented line segments cancel, the contributions from the interior segments cancel, and that is why the right-hand side reduces to the integrals along the segments at the boundary of the composite rectangle. Let j1 ∈ {1, 2, 3, 4} be such that |ηj1 | is the maximum of |ηi |, i = 1, . . . , 4. By the triangle inequality we have |η1 | + |η2 | + |η3 | + |η4 | > |η|, and hence

|ηj1 | > 1/4|η|.

Continuing inductively, let jk+1 be such that |ηj1 ...jk jk+1 | is the maximum of |ηj1 ...jk i |, i = 1, . . . , 4. We then have |ηj1 ...jk jk+1 | > 4−(k+1) |η|. (420.21.1) Now the sequence of nested rectangles Rj1 ...jk converges to some point z0 ∈ R; more formally {z0 } =

∞ \

Rj1 ...jk .

k=1

The derivative f 0 (z0 ) is assumed to exist, and hence for every  > 0 there exists a k sufficiently large, so that for all z ∈ Rj1 ...jk we have |f (z) − f 0 (z0 )(z − z0 )| 6 |z − z0 |. Now we make use of the following. Lemma 9. Let Q ⊂ C be a rectangle, let a, b ∈ C, and let f (z) be a continuous, complex valued function defined and bounded in a domain containing Q. Then, I (az + b)dz = 0 ∂Q I f (z) 6 MP, ∂Q

where M is an upper bound for |f (z)| and where P is the length of ∂Q. 1621

The first of these assertions follows by the fundamental theorem of calculus; after all the function az + b has an anti-derivative. The second assertion follows from the fact that the absolute value of an integral is smaller than the integral of the absolute value of the integrand — a standard result in integration theory. Using the lemma and the fact that the perimeter of a rectangle is greater than its diameter we infer that for every  > 0 there exists a k sufficiently large that I f (z) dz 6 |∂Rj1 ...jk |2 = 4−k |∂R|2 . ηj1 ...jk = ∂Rj ...j 1

k

where |∂R| denotes the length of perimeter of the rectangle R. This contradicts the earlier estimate (419.21.1). Therefore η = 0. Version: 10 Owner: rmilson Author(s): rmilson

420.22

proof of M¨ obius circle transformation theorem

Case 1: f (z) = az + b. Case 1a: The points on |z − C| = R can be written as z = C + Reiθ . They are mapped to the points w = aC + b + aReiθ which all lie on the circle |w − (aC + b)| = |a|R.  iθ   Case 1b: The line Re(eiθ z) = k are mapped to the line Re e aw = k + Re ab .

Case 2: f (z) = z1 .

Case 2a: Consider a circle passing through the origin. This can be written as |z − C| = |C|. This circle is mapped to the line Re(Cw) = 21 which does not pass through the origin. To 1 show this, write z = C + |C|eiθ . w = 1z = C+|C|e iθ . 1 1 Re(Cw) = (Cw + Cw) = 2 2 1 = 2





C eiθ C/|C| C + C + |C|eiθ C + |C|e−iθ eiθ C/|C|

C C + iθ C + |C|e C + |C|e−iθ



1 = 2





C |C|eiθ + C + |C|eiθ |C|eiθ + C



=

1 2

Case 2b: Consider the line which does not pass through the origin. This can be written as Re(az) = 1 for a 6= 0. Then az + az = 2 which is mapped to wa + wa = 2. This is simplified as aw + aw = 2ww which becomes (w − a/2)(w − a/2) = aa/4 or w − a2 = |a| which is a 2 circle passing through the origin. 1622

Case 2c: Consider a circle which does not pass through the origin. This can be written as |z − C| = R with |C| = 6 R. This circle is mapped to the circle C R w − = 2 2 2 |C| − R ||C| − R2 | which is another circle not passing through the origin. To show this, we will demonstrate that C R C −zz 1 + = 2 2 2 2 |C| − R R z |C| − R z C−z z = 1. Note: R

z

C−zz zC − zz + zC R C + = 2 2 2 −R R z |C| − R z(|C|2 − R2 )

|C|2 =

CC − (z − C)(z − C) |C|2 − R2 1 = = 2 2 2 2 z(|C| − R ) z(|C| − R ) z

Case 2d: Consider a line passing through the origin. This can be written as Re(eiθ z) =  iθ 0. This is mapped to the set Re ew = 0, which can be rewritten as Re(eiθ w) = 0 or

Re(we−iθ ) = 0 which is another line passing through the origin.

Case 3: An arbitrary Mobius transformation can be written as f (z) = falls into Case 1, so we will assume that c 6= 0. Let f1 (z) = cz + d

f2 (z) =

1 z

f3 (z) =

az+b . cz+d

If c = 0, this

bc − ad a z+ c c

Then f = f3 ◦ f2 ◦ f1 . By Case 1, f1 and f3 map circles to circles and by Case 2, f2 maps circles to circles. Version: 2 Owner: brianbirgen Author(s): brianbirgen

420.23

proof of Simultaneous converging or diverging of product and sum theorem

From the fact that 1 + x 6 ex for x > 0 we get m X n=1

an 6

m Y

Pm

(1 + an ) 6 e

n=1

an

n=1

Since an > 0 both the partial sums and the partial products are monotone increasing with the number of terms. This concludes the proof. Version: 2 Owner: Johan Author(s): Johan 1623

420.24

proof of absolute convergence implies convergence for an infinite product

This comes at once from the link between infinite products and sums and the absolute convergence theorem for infinite sums. Version: 1 Owner: paolini Author(s): paolini

420.25

proof of closed curve theorem

Let f (x + iy) = u(x, y) + iv(x, y). Hence we have intC f (z) dz = intC ω + iintC η where ω and η are the differential forms ω = u(x, y) dx − v(x, y) dy,

η = v(x, y) dx + u(x, y) dy.

Notice that by Cauchy-Riemann equations ω and η are closed differential forms. Hence by the lemma on closed differential forms on a simple connected domain we get intC1 ω = intC2 ω,

intC1 η = intC2 η.

and hence intC1 f (z) dz = intC2 f (z) dz Version: 2 Owner: paolini Author(s): paolini

420.26

proof of conformal M¨ obius circle map theorem

z−a Let f be a conformal map from the unit disk ∆ onto itself. Let a = f (0). Let ga (z) = 1−az . Then ga ◦ f is a conformal map from ∆ onto itself, with ga ◦ f (0) = 0. Therefore, by Schwarz’s lemma for all z ∈ ∆ |ga ◦ f (z)| ≤ |z|.

Because f is a conformal map onto ∆, f −1 is also a conformal map of ∆ onto itself. (ga ◦ f )−1 (0) = 0 so that by Schwarz’s Lemma |(ga ◦ f )−1 (w)| ≤ |w| for all w ∈ ∆. Writing w = ga ◦ f (z) this becomes |z| ≤ |ga ◦ f (z)|. Therefore, for all z ∈ ∆ |ga ◦ f (z)| = |z|. By Schwarz’s Lemma, ga ◦ f is a rotation. Write ga ◦ f (z) = eiθ z, or f (z) = eiθ ga−1. 1624

Therefore, f is a M¨obius transformation. Version: 2 Owner: brianbirgen Author(s): brianbirgen

420.27

simultaneous converging or diverging of product and sum theorem

Let ak > 0. Then ∞ Y

(1 + an )and

n=1

∞ X

an

n=1

converge or diverge simultaneously.

Version: 3 Owner: Johan Author(s): Johan

420.28

Cauchy-Riemann equations

The following system of partial differential equations ∂u ∂v = , ∂x ∂y

∂u ∂v =− , ∂y ∂x

where u(x, y), v(x, y) are real-valued functions defined on some open subset of R2 , was introduced by Riemann[1] as a definition of a holomorphic function. Indeed, if f (z) satisfies the standard definition of a holomorphic function, i.e. if the complex derivative f 0 (z) = lim

ζ→0

f (z + ζ) − f (z) ζ

exists in the domain of definition, then the real and imaginary parts of f (z) satisfy the Cauchy-Riemann equations. Conversely, if u and v satisfy the Cauchy-Riemann equations, and if their partial derivatives are continuous, then the complex valued function f (z) = u(x, y) + iv(x, y),

z = x + iy,

possesses a continuous complex derivative. References 1. D. Laugwitz, Bernhard Riemann, 1826-1866: Turning points in the Conception of Mathematics, translated by Abe Shenitzer. Birkhauser, 1999. Version: 2 Owner: rmilson Author(s): rmilson 1625

420.29

Cauchy-Riemann equations (polar coordinates)

Suppose A is an open set in C and f (z) = f (reiθ ) = u(r, θ) + iv(r, θ) : A ⊂ C → C is a function. If the derivative of f (z) exists at z0 = (r0 , θ0 ). Then the functions u, v at z0 satisfy: ∂u 1 ∂v = ∂r r ∂θ ∂v 1 ∂u = − ∂r r ∂θ which are called Cauchy-Riemann equations in polar form.

Version: 4 Owner: Daume Author(s): Daume

420.30

proof of the Cauchy-Riemann equations

Existence of complex derivative implies the Cauchy-Riemann equations. Suppose that the complex derivative f (z + ζ) − f (z) ζ→0 ζ

f 0 (z) = lim

(420.30.1)

exists for some z ∈ C. This means that for all  > 0, there exists a ρ > 0, such that for all complex ζ with |ζ| < ρ, we have

0

f (z + ζ) − f (z)

f (z) −

< .

ζ Henceforth, set

f = u + iv,

z = x + iy.

If ζ is real, then the above limit reduces to a partial derivative in x, i.e. f 0 (z) =

∂f ∂u ∂v = +i , ∂x ∂x ∂x

Taking the limit with an imaginary ζ we deduce that f 0 (z) = −i

∂f ∂u ∂v = −i + . ∂y ∂y ∂y

Therefore

∂f ∂f = −i , ∂x ∂y and breaking this relation up into its real and imaginary parts gives the Cauchy-Riemann equations. 1626

The Cauchy-Riemann equations imply the existence of a complex derivative. Suppose that the Cauchy-Riemann equations ∂u ∂v = , ∂x ∂y

∂u ∂v =− , ∂y ∂x

hold for a fixed (x, y) ∈ R2 , and that all the partial derivatives are continuous at (x, y) as well. The continuity p implies that all directional derivatives exist as well. In other words, for ξ, η ∈ R and ρ = ξ 2 + η 2 we have ∂u u(x + ξ, y + η) − u(x, y) − (ξ ∂x + η ∂u ) ∂y

ρ

→ 0, as ρ → 0,

with a similar relation holding for v(x, y). Combining the two scalar relations into a vector relation we obtain

   ∂u ∂u     

u(x, y) ξ −1 u(x + ξ, y + η) ∂y

→ 0, as ρ → 0. − ∂x − ρ ∂v ∂v v(x, y) η v(x + ξ, y + η) ∂x ∂y

Note that the Cauchy-Riemann equations imply that the matrix-vector product above is equivalent to the product of two complex numbers, namely   ∂u ∂v +i (ξ + iη). ∂x ∂x Setting f (z) = u(x, y) + iv(x, y), ∂u ∂v f 0 (z) = +i ∂x ∂x ζ = ξ + iη we can therefore rewrite the above limit relation as

f (z + ζ) − f (z) − f 0 (z)ζ

→ 0, as ρ → 0,

ζ

which is the complex limit definition of f 0 (z) shown in (419.30.1). Version: 2 Owner: rmilson Author(s): rmilson

420.31

removable singularity

Let U ⊂ C be an open neighbourhood of a point a ∈ C. We say that a function f : U\{a} → C has a removable singularity at a, if the complex derivative f 0 (z) exists for all z 6= a, and if f (z) is bounded near a. Removable singularities can, as the name suggests, be removed. 1627

Theorem 13. Suppose that f : U\{a} → C has a removable singularity at a. Then, f (z) can be holomorphically extended to all of U, i.e. there exists a holomorphic g : U → C such that g(z) = f (z) for all z 6= a. Proof. Let C be a circle centered at a, oriented counterclockwise, and sufficiently small so that C and its interior are contained in U. For z in the interior of C, set I 1 f (ζ) dζ. g(z) = 2πi C ζ − z Since C is a compact set, the defining limit for the derivative d f (ζ) f (ζ) = dz ζ − z (ζ − z)2 converges uniformly for ζ ∈ C. Thanks to the uniform convergence, the order of the derivative and the integral operations can be interchanged. Hence, we may deduce that g 0(z) exists for all z in the interior of C. Furthermore, by the Cauchy integral formula we have that f (z) = g(z) for all z 6= a, and therefore g(z) furnishes us with the desired extension. Version: 2 Owner: rmilson Author(s): rmilson

1628

Chapter 421 30F40 – Kleinian groups 421.1

Klein 4-group

Any group G of order 4 must be abelian. If G isn’t isomorphic to the cyclic group with order 4 C4 , then it must be isomorphic to Z2 ⊕ Z2 . This groups is known as the 4-Klein group. The operation is the operation induced by Z2 taking it coordinate-wise. Version: 3 Owner: drini Author(s): drini, apmxi

1629

Chapter 422 31A05 – Harmonic, subharmonic, superharmonic functions 422.1

a harmonic function on a graph which is bounded below and nonconstant

There exists no harmonic function on all of the d-dimensional grid Zd which is bounded below and nonconstant. This categorises a particular property of the grid; below we see that other graphs can admit such harmonic functions. Let T3 = (V3 , E3 ) be a 3-regular tree. Assign “levels” to the vertices of T3 as follows: fix a vertex o ∈ V3 , and let π be a branch of T3 (an infinite simple path) from o. For every vertex v ∈ V3 of T3 there exists a unique shortest path from v to a vertex of π; let `(v) = |π| be the length of this path. Now define f (v) = 2−`(v) > 0. Without loss of generality, note that the three neighbours u1 , u2 , u3 of v satisfy `(u1 ) = `(v) − 1 (“u1 is the parent of v”), `(u2) = `(u3 ) = `(v) + 1 (“u2 , u3 are the siblings of v”). And indeed, 13 2`(v)−1 + 2`(v)+1 + 2`(v)+1 = 2`(v) . So f is a positive nonconstant harmonic function on T3 . Version: 2 Owner: drini Author(s): ariels

422.2

example of harmonic functions on graphs

1. Let G = (V, E) be a connected finite graph, and let a, z ∈ V be two of its vertices. The function f (v) = P {simple random walk from v hits a before z} 1630

is a harmonic function except on {a, z}.

Finiteness of G is required only to ensure f is well-defined. So we may replace “G finite” with “simple random walk on G is recurrent”. 2. Let G = (V, E) be a graph, and let V 0 ⊆ V . Let α : V 0 → R be some boundary condition. For u ∈ V , define a random variable Xu to be the first vertex of V 0 that simple random walk from u hits. The function f (v) = E α(Xv ) is a harmonic function except on V 0 . The first example is a special case of this one, taking V 0 = {a, z} and α(a) = 1, α(z) = 0. Version: 1 Owner: ariels Author(s): ariels

422.3

examples of harmonic functions on Rn

Some real functions in Rn (e.g. any linear function, or any affine function) are obviously harmonic functions. What are some more interesting harmonic functions? • For n ≥ 3, define (on the punctured space U = Rn \ {0}) the function f (x) = kxk2−n . Then ∂f xi = (2 − n) , ∂xi kxkn and

x2i 1 ∂2f = n(n − 2) 2 n+2 − (n − 2) kxkn ∂xi kxk

Summing over i = 1, ..., n shows ∆f ⇔ 0.

• For n = 2, define (on the punctured plane U = R2 \ {0}) the function f (x, y) = log(x2 + y 2). Derivation and summing yield ∆f ⇔ 0. • For n = 1, the condition (∆f )(x) = f 00 (x) =⇔ 0 forces f to be an affine function on every segment; there are no “interesting” harmonic functions in one dimension. Version: 2 Owner: ariels Author(s): ariels

1631

422.4

harmonic function

• A real or complex-valued function f : U → R or f : U → C in C2 (i.e. f is twice continuously differentiable), where U ⊆ Rn is some domain, is called harmonic if its Laplacian vanishes on U: ∆f ⇔ 0. • A real or complex-valued function f : V → R or f : V → C defined on the vertices V of a graph G = (V, E) is called harmonic at v ∈ V if its value at v is its average value at the neighbours of v: X 1 f (u). f (v) = deg(v) {u,v}∈E

It is called harmonic except on A, for some A ⊆ V , if it is harmonic at each v ∈ V \A, and harmonic if it is harmonic at each v ∈ V . In the continuous (first) case, any harmonic f : Rn → R or f : Rn → C satisfies Liouville’s theorem. Indeed, a holomorphic function is harmonic, and a real harmonic function f : U → R, where U ⊆ R2 , is locally the real part of a holomorphic function. In fact, it is enough that a harmonic function f be bounded below (or above) to conclude that it is constant. In the discrete (second) case, any harmonic f : Zn → R, where Zn is the n-dimensional grid, is constant if bounded below (or above). However, this is not necessarily true on other graphs. Version: 4 Owner: ariels Author(s): ariels

1632

Chapter 423 31B05 – Harmonic, subharmonic, superharmonic functions 423.1

Laplacian

Let (x1 , . . . , xn ) be Cartesian coordinates for some open set Ω in Rn . Then the Laplacian differential operator ∆ is defined as ∆=

∂2 ∂2 + . . . + . ∂x21 ∂x2n

In other words, if f is a twice differentiable function f : Ω → C, then ∆f =

∂2f ∂2f + . . . + . ∂x21 ∂x2n

A coordinate independent definition of the Laplacian is ∆ = ∇ · ∇, i.e., ∆ is the composition of gradient and divergence. A harmonic function is one for which the Laplacian vanishes. An older symbol for the Laplacian is ∇2 – conceptually the scalar product of ∇ with itself. This form may be more favoured by physicists. Version: 4 Owner: matte Author(s): matte, ariels

1633

Chapter 424 32A05 – Power series, series of functions 424.1

exponential function

We begin by defining the exponential function exp : BR → BR+ for all real values of x by the power series

exp(x) =

∞ X xk k=0

k!

It has a few elementary properties, which can be easily shown. • The radius of convergence is infinite • exp(0) = 1 • It is infinitely differentiable, and the derivative is the exponential function itself • exp(x) ≥ 1 + x so it is positive and unbounded on the non-negative reals Now consider the function f : R → R with f(x) = exp(x) exp(y − x) so, by the product rule and property 3

1634

0

f (x) = 0 By the constant value theorem exp(x) exp(y − x) = exp(y) ∀y, x ∈ R With a suitable change of variables, we have exp(x + y) = exp(x) exp(y) exp(x) exp(−x) = 1 Consider just the non-negative reals. Since it is unbounded, by the intermediate value theorem, it can take any value on the interval [1, ∞). We have that the derivative is strictly positive so by the mean-value theorem, exp(x) is strictly increasing. This gives surjectivity and injectivity i.e. it is a bijection from [0, ∞) → [1, ∞). Now exp(−x) = that:

1 , exp(x)

so it is also a bijection from (−∞, 0) → (0, 1). Therefore we can say

exp(x) is a bijection onto R+ We can now naturally define the logarithm function, as the inverse of the exponential function. It is usually denoted by ln(x), and it maps R+ to R Similarly, the natural log base, e may be defined by e = exp(1) Since the exponential function obeys the rules normally associated with powers, it is often denoted by ex . In fact it is now possible to define powers in terms of the exponential function by ax = ex ln(a)

a>0

Note the domain may be extended to the complex plane with all the same properties as before, except the bijectivity and ordering properties. Comparison with the power series expansions for sine and cosine yields the following identity, with the famous corollary attributed to Euler: eix = cos(x) + i sin(x) eiπ = −1 1635

Version: 10 Owner: mathcam Author(s): mathcam, vitriol

1636

Chapter 425 32C15 – Complex spaces 425.1

Riemann sphere

ˆ is the one-point compactification of the complex plane C, The Riemann Sphere, denoted C obtained by identifying the limits of all infinitely extending rays from the origin as one ˆ can be viewed as a 2-sphere with the top point single ”point at infinity.” Heuristically, C corresponding to the point at infinity, and the bottom point corresponding the origin. An atlas for the Riemann sphere is given by two charts: ˆ C\{∞} →C:z→z and 1 ˆ C\{0} →C:z→ z ˆ has a unique smooth extension to a map pˆ : C ˆ → C. ˆ Any polynomial on C Version: 2 Owner: mathcam Author(s): mathcam

1637

Chapter 426 32F99 – Miscellaneous 426.1

star-shaped region

Definition A subset U of a real (or possibly complex) vector space is called star-shaped if there is a point p ∈ U such that the line segment pq is contained in U for all q ∈ U. We then say that U is star-shaped with respect of p. (Here, pq = {tp + (1 − t)q | t ∈ [0, 1]}.) A region U is, in other words, star-shaped if there is a point p ∈ U such that U can be ”collapsed” or ”contracted” onto p.

Examples 1. In Rn , any vector subspace is star-shaped. Also, the unit cube and unit ball are starshaped, but the unit sphere is not. 2. A subset U in a vector space is star-shaped with respect of all it’s points, if and only if U is convex. Version: 2 Owner: matte Author(s): matte

1638

Chapter 427 32H02 – Holomorphic mappings, (holomorphic) embeddings and related questions 427.1

Bloch’s theorem

Let f be an holomorphic function on a region containing the closure of the disk D = {z ∈ C : |z| < 1}, such that f (0) = 0 and f 0 (0) = 1. Then there is a disk S ⊂ D such that f is 1 injective on S and f (S) contains a disk of radius 72 . Version: 2 Owner: Koro Author(s): Koro

427.2

Hartog’s theorem

Let U ⊂ Cn (n > 1) be an open set containing the origin 0. Then any holomorphic function on U − {0} extends uniquely to a holomorphic function on U. Version: 1 Owner: bwebste Author(s): bwebste

1639

Chapter 428 32H25 – Picard-type theorems and generalizations 428.1

Picard’s theorem

Let f be an holomorphic function with an essential singularity at w ∈ C. Then there is a number z0 ∈ C such that the image of any neighborhood of w by f contains C − {z0 }. In other words, f assumes every complex value, with the possible exception of z0 , in any neighborhood of w. Remark. little Picard theorem follows as a corollary: Given a nonconstant entire function f , if it is a polynomial, it assumes every value in C as a consequence of the fundamental theorem of algebra. If f is not a polynomial, then g(z) = f (1/z) has an essential singularity at 0; Picard’s theorem implies that g (and thus f ) assumes every complex value, with one possble exception. Version: 4 Owner: Koro Author(s): Koro

428.2

little Picard theorem

The range of a nonconstant entire function is either the whole complex plane C, or the complex plane with a single point removed. In other words, if an entire function omits two or more values, then it is a constant function. Version: 2 Owner: Koro Author(s): Koro

1640

Chapter 429 33-XX – Special functions 429.1

beta function

The beta function is defined as: B(p, q) = int10 xp−1 (1 − x)q−1 dx for any p, q > 0 The beta fuction has the property:

B(p, q) =

Γ(p)Γ(q) Γ(p + q)

where Γ is the gamma function Also, B(p, q) = B(q, p) and 1 1 B( , ) = π 2 2 The function was discovered by L.Euler (1730) and the name was given by J.Binet. Version: 8 Owner: vladm Author(s): vladm

1641

Chapter 430 33B10 – Exponential and trigonometric functions 430.1

natural logarithm

The natural logarithm of a number is the logarithm in base e. It is defined formally as 1 ln(x) = intx1 dx x The origin of the natural logarithm, the exponential function and Euler’s number e are very much intertwined. The integral above was found to have the properties of a logarithm. You can view these properties in the entry on logarithms. If indeed the integral represented a logarithmic function, its base would have to be e, where the value of the integral is 1. Thus was the natural logarithm defined. The natural logarithm can be represented by power-series for −1 < x 6 1: ln(1 + x) =

∞ X (−1)k+1 k=1

k

xk .

Note that the above is only the definition of a logarithm for real numbers greater than zero. For complex and negative numbers, one has to look at the Euler relation. Version: 3 Owner: mathwizard Author(s): mathwizard, slider142

1642

Chapter 431 33B15 – Gamma, beta and polygamma functions 431.1

Bohr-Mollerup theorem

Let f : R+ → R+ be a function with the following properties: 1. log f (x) is a convex function; 2. f (x + 1) = xf (x) for all x > 0; 3. f (1) = 1. Then f (x) = Γ(x) for all x > 0. That is, the only function satisfying those properties is the gamma function (restricted to the positive reals.) Version: 1 Owner: Koro Author(s): Koro

431.2

gamma function

The gamma function is −t x−1 Γ(x) = int∞ dt 0 e t

For integer values of x = n,

1643

Γ(n) = (n − 1)! Hence the Gamma function satisfies Γ(x + 1) = xΓ(x) if x > 0. The gamma function looks like :

(generated by GNU Octave and gnuplot)

Some values of the gamma function for small arguments are: Γ(1/5) = 4.5909 Γ(1/3) = 2.6789 Γ(3/5) = 1.4892 Γ(3/4) = 1.2254 and the ever-useful Γ(1/2) =



Γ(1/4) = 3.6256 Γ(2/5) = 2.2182 Γ(2/3) = 1.3541 Γ(4/5) = 1.1642

π. These values allow a quick calculation of Γ(n + f )

Where n is a natural number and f is any fractional value for which the Gamma function’s value is known. Since Γ(x + 1) = xΓ(x), we have

Γ(n + f ) = (n + f − 1)Γ(n + f − 1) = (n + f − 1)(n + f − 2)Γ(n + f − 2) .. . = (n + f − 1)(n + f − 2) · · · (f )Γ(f ) Which is easy to calculate if we know Γ(f ). The gamma function has a meromorphic continuation to the entire complex plane with poles at the non-positive integers. It satisfies the product formula ∞ e−γz Y  z −1 z/n Γ(z) = e 1+ z n=1 n

1644

where γ is Euler’s constant, and the functional equation Γ(z)Γ(1 − z) =

π . sin πz

Version: 8 Owner: akrowne Author(s): akrowne

431.3

proof of Bohr-Mollerup theorem

We prove this theorem in two stages: first, we establish that the gamma function satisfies the given conditions and then we prove that these conditions uniquely determine a function on (0, ∞). By its definition, Γ(x) is positive for positive x. Let x, y > 0 and 0 6 λ 6 1.

log Γ(λx + (1 − λ)y) = = 6 =

−t λx+(1−λ)y−1 log int∞ dt 0 e t ∞ −t x−1 λ −t y−1 1−λ log int0 (e t ) (e t ) dt −t x−1 −t y−1 log((int∞ dt)λ (int∞ dt)1−λ ) 0 e t 0 e t λ log Γ(x) + (1 − λ) log Γ(y)

The inequality follows from H¨older’s inequality, where p =

1 λ

and q =

1 . 1−λ

This proves that Γ is log-convex. Condition 2 follows from the definition by applying integration by parts. Condition 3 is a trivial verification from the definition. Now we show that the 3 conditions uniquely determine a function. By condition 2, it suffices to show that the conditions uniquely determine a function on (0, 1). Let G be a function satisfying the 3 conditions, 0 6 x 6 1 and n ∈ N. n + x = (1 − x)n + x(n + 1) and by log-convexity of G, G(n + x) 6 G(n)1−x G(n + 1)x = G(n)1−x G(n)x nx = (n − 1)!nx . Similarly n + 1 = x(n + x) + (1 − x)(n + 1 + x) gives n! 6 G(n + x)(n + x)1−x . Combining these two we get n!(n + x)x−1 6 G(n + x) 6 (n − 1)!xn and by using condition 2 to express G(n + x) in terms of G(x) we find

1645

an :=

n!(n + x)x−1 (n − 1)!xn 6 G(x) 6 =: bn . x(x + 1) . . . (x + n − 1) x(x + 1) . . . (x + n − 1)

Now these inequalities hold for every integer n and the terms on the left and right side have a common limit (limn→∞ abnn = 1) so we find this determines G. As a corollary we find another expression for Γ. For 0 6 x 6 1, n!nx . n→∞ x(x + 1) . . . (x + n)

Γ(x) = lim

In fact, this equation, called Gauß’s product, goes for the whole complex plane minus the negative integers. Version: 1 Owner: lieven Author(s): lieven

1646

Chapter 432 33B30 – Higher logarithm functions 432.1

Lambert W function

Lambert’s W function is the inverse of the function f : C → C given by f (x) := xex . That is, W (x) is the complex valued function that satisfies W (x)eW (x) = x, for all x ∈ C. In practice the definition of W (x) requires a branch cut, which is usually taken along the negative real axis. Lambert’s W function is sometimes also called product log function. This function allow us to solve the functional equation g(x)g(x) = x since g(x) = eW (ln(x)) .

432.1.1

References

A site with good information on Lambert’s W function is Corless’ page ”On the Lambert W Function” Version: 4 Owner: drini Author(s): drini

1647

Chapter 433 33B99 – Miscellaneous 433.1

natural log base

The natural log base, or e, has value 2.718281828459045 . . . e was extensively studied by Euler in the 1720’s, but it was originally discovered by John Napier. e is defined by  n 1 lim 1 + n→∞ n It is more effectively calculated, however, by using a Taylor series to get the representation e=

1 1 1 1 1 + + + + +··· 0! 1! 2! 3! 4!

Version: 3 Owner: akrowne Author(s): akrowne

1648

Chapter 434 33D45 – Basic orthogonal polynomials and functions (Askey-Wilson polynomials, etc.) 434.1

orthogonal polynomials

Polynomials of order n are analytic functions that can be written in the form pn (x) = a0 + a1 x + a2 x2 + · · · + an xn They can be differentiated and integrated for any value of x, and are fully determined by the n + 1 coefficients a0 . . . an . For this simplicity they are frequently used to approximate more complicated or unknown functions. In approximations, the necessary order n of the polynomial is not normally defined by criteria other than the quality of the approximation. Using polynomials as defined above tends to lead into numerical difficulties when determining the ai , even for small values of n. It is therefore customary to stabilize results numerically by using orthogonal polynomials over an interval [a, b], defined with respect to a positive weight function W (x) > 0 by intba pn (x)pm (x)W (x)dx = 0 for n 6= m Orthogonal polynomials are obtained in the following way: define the scalar product. (f, g) = intba f (x)g(x)W (x)dx

1649

between the functions f and g, where W (x) is a weight factor. Starting with the polynomials p0 (x) = 1, p1 (x) = x, p2 (x) = x2 , etc., from the Gram-Schmidt decomposition one obtains a sequence of orthogonal polynomials q0 (x), q1 (x), . . ., such that (qm , qn ) = Nn δmn . The normalization factors Nn are arbitrary. When all Ni are equal to one, the polynomials are called orthonormal. Some important orthogonal polynomials are: a b -1 1 1 1 −∞ ∞

W (x) 1 (1 − x2 )−1/2 2 e−x

name Legendre polynomials Chebychev polynomials Hermite polynomials

Orthogonal polynomials of successive orders can be expressed by a recurrence relation pn = (An + Bn x)pn−1 + Cn pn−2 This relation can be used to compute a finite series a0 p0 + a1 p1 + · · · + an pn with arbitrary coefficients ai , without computing explicitly every polynomial pj (Horner’s rule). Chebyshev polynomials Tn (x) are also orthogonal with respect to discrete values xi : X i

Tn (xi )Tm (xi ) = 0 for nr < m ≤ M

where the xi depend on M. For more information, see [Abramowitz74], [Press95]. References • Originally from The Data Analysis Briefbook (http://rkb.home.cern.ch/rkb/titleA.html) Abramowitz74 M. Abramowitz and I.A. Stegun (Eds.), Handbook of Mathematical Functions, National Bureau of Standards, Dover, New York, 1974. Press95 W.H. Press, S.A. Teukolsky, W.T. Vetterling, and B.P. Flannery, Numerical Recipes in C, Second edition, Cambridge University Press, 1995. (The same book exists for the Fortran language). There is also an Internet version which you can work from. Version: 3 Owner: akrowne Author(s): akrowne 1650

Chapter 435 33E05 – Elliptic functions and integrals 435.1

Weierstrass sigma function

Definition 37. Let Λ ⊂ C be a lattice. Let Λ∗ denote Λ − {0}. 1. The Weierstrass sigma function is defined as the product Y  z  z/w+ 1 (z/w)2 2 σ(z; Λ) = z 1− e w ∗ w∈Λ 2. The Weierstrass zeta function is defined by the sum  X 1 σ 0 (z; Λ) 1 1 z ζ(z; Λ) = = + + + σ(z; Λ) z w∈Λ∗ z − w w w 2

Note that the Weierstrass zeta function is basically the derivative of the logarithm of the sigma function. The zeta function can be rewritten as: ∞

1 X G2k+2 (Λ)z 2k+1 ζ(z; Λ) = − z k=1

where G2k+2 is the Eisenstein series of weight 2k + 2. 3. The Weierstrass eta function is defined to be

η(w; Λ) = ζ(z + w; Λ) − ζ(z; Λ), for any z ∈ C (It can be proved that this is well defined, i.e. ζ(z + w; Λ) −ζ(z; Λ) only depends on w). The Weierstrass eta function must not be confused with the Dedekind eta function. Version: 1 Owner: alozano Author(s): alozano 1651

435.2

elliptic function

Let Λ ∈ C be a lattice in the sense of number theory, i.e. a 2-dimensional free group over Z which generates C over R. An elliptic function φ, with respect to the lattice Λ, is a meromorphic funtion φ : C → C which is Λ-periodic: φ(z + λ) = φ(z), ∀z ∈ C, ∀λ ∈ Λ Remark: An elliptic function which is holomorphic is constant. Indeed such a function would induce a holomorphic function on C/Λ, which is compact (and it is a standard result from complex Analysis that any holomorphic function with compact domain is constant, this follows from Liouville’s theorem). Example: The Weierstrass ℘-function (see elliptic curve) is an elliptic function, probably the most important. In fact: Theorem 12. The field of elliptic functions with respect to a lattice Λ is generated by ℘ and ℘0 (the derivative of ℘). S

ee [2], chapter 1, theorem 4.

REFERENCES 1. James Milne, Modular Functions and Modular Forms, online course notes. http://www.jmilne.org/math/CourseNotes/math678.html 2. Serge Lang, Elliptic Functions. Springer-Verlag, New York. 3. Joseph H. Silverman, The Arithmetic of Elliptic Curves. Springer-Verlag, New York, 1986.

Version: 4 Owner: alozano Author(s): alozano

435.3

elliptic integrals and Jacobi elliptic functions

Elliptic integrals For 0 < k < 1, write dθ F (k, φ) = intφ0 p 1 − k 2 sin2 θ p E(k, φ) = intφ0 1 − k 2 sin2 θ dθ dθ p Π(k, n, φ) = intφ0 (1 + n sin2 θ) 1 − k 2 sin2 θ 1652

(435.3.1) (435.3.2) (435.3.3)

The change of variable x = sin φ turns these into F1 (k, x) = intx0 p r

dv (1 − v 2 )(1 − k 2 v 2 )

1 − k2 v2 dv 1 − v2 dv p Π1 (k, n, x) = intx0 (1 + nv 2 ) (1 − v 2 )(1 − k 2 v 2 ) E1 (k, x) =

intx0

(435.3.4) (435.3.5) (435.3.6)

The first three functions are known as Legendre’s form of the incomplete elliptic integrals of the first, second, and third kinds respectively. Notice that (2) is the special case n = 0 of (3). The latter three are known as Jacobi’s form of those integrals. If φ = π/2, or x = 1, they are called complete rather than incomplete integrals, and their names are abbreviated to F (k), E(k), etc. One use for elliptic integrals is to systematize the evaluation of certain other integrals. p In particular, let p be a third- or fourth-degree polynomial in one variable, and let y = p(x). If q and r are any two polynomials in two variables, then the indefinite integral int

q(x, y) dx r(x, y)

has a “closed form” in terms of the above incomplete elliptic integrals, together with elementary functions and their inverses. Jacobi’s elliptic functions In (1) we may regard φ as a function of F , or vice versa. The notation used is φ = am u

u = arg φ

and φ and u are known as the amplitude and argument respectively. But x = sin φ = sin am u. The function u 7→ sin am u = x is denoted by sn and is one of four Jacobi (or Jacobian) elliptic functions. The four are: sn u = x √ 1 − x2 cn u = sn u tn u = cn u √ dn u = 1 − k 2 x2 When the Jacobian elliptic functions are extended to complex arguments, they are doubly periodic and have two poles in any parallogram of periods; both poles are simple. Version: 1 Owner: mathcam Author(s): Larry Hammick

1653

435.4

examples of elliptic functions

Examples of Elliptic Functions Let Λ ⊂ C be a lattice generated by w1 , w2 . Let Λ∗ denote Λ − {0}. 1. The Weierstrass ℘-function is defined by the series X 1 1 1 ℘(z; Λ) = 2 + − 2 2 z (z − w) w w∈Λ∗ 2. The derivative of the Weierstrass ℘-function is also an elliptic function X 1 ℘0 (z; L) = −2 (z − w)3 w∈Λ∗ 3. The Eisenstein series of weight 2k for Λ is the series X G2k (Λ) = w −2k w∈Λ∗

The Eisenstein series of weight 4 and 6 are of special relevance in the theory of elliptic curves. In particular, the quantities g2 and g3 are usually defined as follows: g2 = 60 · G4 (Λ),

g3 = 140 · G6 (Λ)

Version: 3 Owner: alozano Author(s): alozano

435.5

modular discriminant

Definition 38. Let Λ ⊂ C be a lattice. 1. Let qτ = e2πiτ . The Dedekind eta function is defined to be η(τ ) =

qτ1/24

∞ Y

(1 − qτn )

n=1

The Dedekind eta function should not be confused with the Weierstrass eta function, η(w; Λ). 2. The j-invariant, as a function of lattices, is defined to be: j(Λ) =

g23 g23 − 27g32

where g2 and g3 are certain multiples of the Eisenstein series of weight 4 and 6 (see this entry). 1654

3. The ∆ function (delta function or modular discriminant) is defined to be ∆(Λ) = g23 − 27g32 Let Λτ be the lattice generated by 1, τ . The ∆ function for Λτ has a product expansion ∆(τ ) = ∆(Λτ ) = (2πi)12 qτ

∞ Y

(1 − qτn )24 = (2πi)12 η(τ )24

n=1

Version: 2 Owner: alozano Author(s): alozano

1655

Chapter 436 34-00 – General reference works (handbooks, dictionaries, bibliographies, etc.) 436.1

Liapunov function

Suppose we are given an autonomous system of first order differential equations. dx = F (x, y) dt

dy = G(x, y) dt

Let the origin be an isolated critical point of the above system. A function V (x, y) that is of class C 1 and satisfies V (0, 0) = 0 is called a Liapunov function if every open ball Bδ (0, 0) contains at least one point where V > 0. If there happens to exist δ ∗ such that the function V˙ , given by V˙ (x, y) = Vx (x, y)F (x, y) + Vy (x, y)G(x, y) is positive definite in Bδ∗ (0, 0). Then the origin is an unstable critical point of the system. Version: 2 Owner: tensorking Author(s): tensorking

1656

436.2

Lorenz equation

436.2.1

The history

The Lorenz equation was published in 1963 by a meteorologist and mathematician from MIT called Edward N. Lorenz. The paper containing the equation was titled “Deterministic non-periodic flows” and was published in the Journal of Atmospheric Science. What drove Lorenz to find the set of three dimentional ordinary differential equations was the search for an equation that would “model some of the unpredictable behavior which we normally associate with the weather”[PV]. The Lorenz equation represent the convective motion of fluid cell which is warmed from below and cooled from above.[PV] The same system can also apply to dynamos and laser. In addition some of its popularity can be attributed to the beauty of its solution. It is also important to state that the Lorenz equation has enough properties and interesting behavior that whole books are written analyzing results.

436.2.2

The equation

The Lorenz equation is commonly defined as three coupled ordinary differential equation like dx = σ(y − x) dt dy = x(τ − z) − y dt dz = xy − βz dt where the three parameter σ, τ , β are positive and are called the Prandtl number, the Rayleigh number, and a physical proportion, respectively. It is important to note that the x, y, z are not spacial coordinate. The ”x is proportional to the intensity of the convective motion, while y is proportional to the temperature difference between the ascending and descending currents, similar signs of x and y denoting that warm fluid is rising and cold fluid is descending. The variable z is proportional to the distortion of vertical temperature profile from linearity, a positive value indicating that the strongest gradients occur near the boundaries.” [GSS]

436.2.3

Properties of the Lorenz equations

• Symmetry The Lorenz equation has the following symmetry of ordinary differential equation: (x, y, z) → (−x, −y, z)

This symmetry is present for all parameters of the Lorenz equation (see natural symmetry of the Loren 1657

• Invariance The z-axis is invariant, meaning that a solution that starts on the z-axis (i.e. x = y = 0) will remain on the z-axis. In addition the solution will tend toward the origin if the initial condition are on the z-axis. • Critical points

 σ(y − x) To solve for the critical points we let x˙ = f (x) = x(τ − z) − y  and we solve xy − βz f (x) = 0. It is clear that one of those critical point p 0, 0) and with some p is x0 = (0, algebraic p manipulationpwe detemine that xC1 = ( β(τ − 1), β(τ − 1), τ − 1) and xC2 = (− β(τ − 1), − β(τ − 1), τ − 1) are critical points and real when τ > 1.

436.2.4



An example (The x solution with respect to time.) (The y solution with respect to time.) (The z solution with respect to time.)

the above is the solution of the Lorenz equation with parameters σ = 10, τ = 28 and β = 8/3(which is the classical example). The inital condition of the system is (x0 , y0 , z0 ) = (3, 15, 1).

436.2.5

Experimenting with octave

By changing the parameters and initial condition one can observe that some solution will be drastically different. (This is in no way rigorous but can give an idea of the qualitative property of the Lorenz equation.)

function y = lorenz (x, t) y = [10*(x(2) - x(1)); x(1)*(28 - x(3)) - x(2); x(1)*x(2) - 8 endfunction solution = lsode ("lorenz", [3; 15; 1], (0:0.01:50)’);

gset parametric gset xlabel "x" gset ylabel "y" gset zlabel "z" gset nokey gsplot soluti

REFERENCES [LNE] Lorenz, N. Edward: Deterministic non-periodic flows. Journal of Atmospheric Science, 1963.

1658

[MM] Marsden, E. J. McCracken, M.: The Hopf Bifurcation and Its Applications. Springer-Verlag, New York, 1976. [SC] Sparow, Colin: The Lorenz Equations: Bifurcations, Chaos and Strange Attractors. SpringerVerlag, New York, 1982.

436.2.6

See also

• Paul Bourke, The Lorenz Attractor in 3D • Tim Whitcomb, http://students.washington.edu/timw/ (If you click on the Lorenz equation phase portrait, you get to download a copy of the article[GSS].) Version: 12 Owner: Daume Author(s): Daume

436.3

Wronskian determinant

If we have some functions f1 , f2 , . . . fn then the Wronskian determinant (or simply the Wronskian) W (f1 , f2 , f3 . . . fn ) is the determinant of the square matrix f1 f f . . . f 2 3 n 0 0 0 f10 f2 f3 ... fn 00 f200 f300 ... fn00 W (f1 , f2 , f3 . . . fn ) = f1 .. .. .. .. .. . . . . . (n−1) (n−1) (n−1) (n−1) f1 f2 f3 . . . fn where f (k) indicates the kth derivative of f (not exponentiation).

The Wronskian of a set of functions F is another function, which is zero over any interval where F is linearly dependent. Just as a set of vectors is said to be linearly dependent whenever one vector may by expressed as a linear combination of a finite subset of the others, a set of functions {f1 , f2 , f3 . . . fn } is said to be dependent over an interval I if one of the functions can be expressed as a linear combination of a finite subset of the others, i.e, a1 f1 (t) + a2 f2 (t) + · · · an fn (t) = 0 for some a1 , a2 , . . . an , not all zero, at any t ∈ I. Therefore the Wronskian can be used to determine if functions are independent. This is useful in many situations. For example, if we wish to determine if two solutions of a second-order differential equation are independent, we may use the Wronskian. 1659

Examples Consider the functions x2 , x, and 1. Take the Wronskian: 2 x x 1 W = 2x 1 0 = −2 2 0 0

Note that W is always non-zero, so these functions are independent everywhere. Consider, however, x2 and x: 2 x x = x2 − 2x2 = −x2 W = 2x 1

Here W = 0 only when x = 0. Therefore x2 and x are independent except at x = 0. Consider 2x2 + 3, x2 , and 1: 2 2x + 3 x2 1 2x 0 = 8x − 8x = 0 W = 4x 4 2 0

Here W is always zero, so these functions are always dependent. This is intuitively obvious, of course, since 2x2 + 3 = 2(x2 ) + 3(1) Version: 5 Owner: mathcam Author(s): mathcam, vampyr

436.4

dependence on initial conditions of solutions of ordinary differential equations

Let E ⊂ W where W is a normed vector space, f ∈ C 1 (E) is a continuous differentiable map f : E → W . Furthermore consider the the ordinary differential equation x˙ = f (x) with the initial condition x(0) = x0 . Let x(t) be the solution of the above initial value problem defined as x:I→E 1660

where I = [−a, a]. Then there exist δ > 0 such that for all y0 ∈ Nδ (x0 )(y0 in the δ neighborhood of x0 ) has a unique solution y(t) to the initial value problem above except for the initial value changed to x(0) = y0 . In addition y(t) is twice continouously differentialble function of t over the interval I. Version: 1 Owner: Daume Author(s): Daume

436.5

differential equation

A differential equation is an equation involving an unknown function of one or more variables, its derivatives and the independent variables. This type of equations comes up often in many different branches of mathematics. They are also especially important in many problems in physics and engineering. There are many types of differential equations. An ordinary differential equation (ODE) is a differential equation where the unknown function depends on a single variable. A general ODE has the form F (x, f (x), f 0(x), . . . , f (n) (x)) = 0, (436.5.1) where the unknown f is usually understood to be a real or complex valued function of x, and x is usually understood to be either a real or complex variable. The order of a differential equation is the order of the highest derivative appearing in Eq. (435.5.1). In this case, assuming that F depends nontrivially on f (n) (x), the equation is of nth order. If a differential equation is satisfied by a function which identically vanishes (i.e. f (x) = 0 for each x in the domain of interest), then the equation is said to be homogeneous. Otherwise it is said to be nonhomogeneous (or inhomogeneous). Many differential equations can be expressed in the form L[f ] = g(x), where L is a differential operator (with g(x) = 0 for the homogeneous case). If the operator L is linear in f , then the equation is said to be a linear ODE and otherwise nonlinear. Other types of differential equations involve more complicated relations involving the unknown function. A partial differential equation (PDE) is a differential equation where the unknown function depends on more than one variable. In a delay differential equation (DDE), the unknown function depends on the state of the system at some instant in the past. Solving differential equations is a difficult task. Three major types of approaches are possible: • Exact methods are generally restricted to equations of low order and/or to linear systems. • Qualitative methods do not give explicit formula for the solutions, but provide information pertaining to the asymptotic behavior of the system. 1661

• Finally, numerical methods allow to construct approximated solutions.

Examples A common example of an ODE is the equation for simple harmonic motion d2 u + ku = 0. dx2 This equation is of second order. It can be transformed into a system of two first order differential equations by introducing a variable v = du/dx. Indeed, we then have dv = −ku dx du = v. dx A common example of a PDE is the wave equation in three dimensions 2 ∂2u ∂2u ∂2u 2∂ u + + = c ∂x2 ∂y 2 ∂z 2 ∂t2

Version: 7 Owner: igor Author(s): jarino, igor

436.6

existence and uniqueness of solution of ordinary differential equations

Let E ⊂ W where W is a normed vector space, f ∈ C 1 (E) is a continuous differentiable map f : E → W , and let x0 ∈ E. Then there exists an a > 0 such that the ordinary differential equation x˙ = f (x) with the initial condition x(0) = x0 has a unique solution x : [−a, a] → E which also satify the initial condition of the initial value problem. Version: 3 Owner: Daume Author(s): Daume

1662

436.7

maximal interval of existence of ordinary differential equations

Let E ⊂ W where W is a normed vector space, f ∈ C 1 (E) is a continuous differentiable map f : E → W . Furthermore consider the the ordinary differential equation x˙ = f (x) with the initial condition x(0) = x0 . For all x0 ∈ E there exists a unique solution x:I→E where I = [−a, a], which also satify the initial condition of the initial value problem. Then there exists a maximal interval of existence J = (α, β) such that I ⊂ J and there exists a unique solution x : J → E. Version: 3 Owner: Daume Author(s): Daume

436.8

method of undetermined coefficients

Given a (usually non-homogenous) ordinary differential equation F (x, f (x), f 0 (x), . . . , f (n) (x)) = 0, the method of undetermined coefficients is a way of finding an exact solution when a guess can be made as to the general form of the solution. In this method, the form of the solution is guessed with unknown coefficients left as variables. A typical guess might be of the form Ae2x or Ax2 + Bx + C. This can then be substituted into the differential equation and solved for the coefficients. Obviously the method requires knowing the approximate form of the solution, but for many problems this is a feasible requirement. This method is most commonly used when the formula is some combination of exponentials, polynomials, sin and cos.

1663

Examples Suppose we have f 00 (x) − 2f 0(x) + f (x) − 2e2x = 0. If we guess that the soution is of the form f (x) = Ae2x then we have 4Ae2x − 4Ae2x + Ae2x − 2e2 x = 0 and therefore Ae2x = 2e2x , so A = 2, giving f (x) = 2e2x as a solution. Version: 4 Owner: Henry Author(s): Henry

436.9

natural symmetry of the Lorenz equation

The Lorenz equation has a natural symmetry defined by (x, y, z) 7→ (−x, −y, z).

(436.9.1)

To verify that (435.9.1) is a symmetry of an ordinary differential equation (Lorenz equation) there must exist a 3 × 3 matrix which commutes with the differential equation. This can be easily verified by observing that the symmetry is associated with the matrix R defined as   −1 0 0 R =  0 −1 0 . (436.9.2) 0 0 1

Let



 σ(y − x) x˙ = f (x) = x(τ − z) − y  xy − βz

(436.9.3)

where f (x) is the Lorenz equation and xT = (x, y, z). We proceed by showing that Rf (x) = f (Rx). Looking at the left hand side    σ(y − x) −1 0 0 Rf (x) =  0 −1 0 x(τ − z) − y  xy − βz 0 0 1   σ(x − y)  = x(z − τ ) + y  xy − βz and now looking at the right hand side

   −1 0 0 x f (Rx) = f ( 0 −1 0 y ) 0 0 1 z   −x  = f ( y ) z   σ(x − y) = x(z − τ ) + y  . xy − βz 1664

Since the left hand side is equal to the right hand side then (435.9.1) is a symmetry of the Lorenz equation. Version: 2 Owner: Daume Author(s): Daume

436.10

symmetry of a solution of an ordinary differential equation

Let γ be a symmetry of the ordinary differential equation and x0 be a steady state solution of x˙ = f (x). If γx0 = x0 then γ is called a symmetry of the solution of x0 . Let γ be a symmetry of the ordinary differential equation and x0 (t) be a periodic solution of x˙ = f (x). If γx0 (t − t0 ) = x0 (t) for a certain t0 then (γ, t0 ) is called a symmetry of the periodic solution of x0 (t).

lemma: If γ is a symmetry of the ordinary differential equation and let x0 (t) be a solution (either steady state or preiodic) of x˙ = f (x). Then γx0 (t) is a solution of x˙ = f (x). 0 (t) proof: If x0 (t) is a solution of dx = f (x) implies dxdt = f (x0 (t)). Let’s now verify that γx0 (t) dt is a solution, with a substitution into dx = f (x). The left hand side of the equation becomes dt dγx0 (t) dx0 (t) = γ dt and the right hand side of the equation becomes f (γx0 (t)) = γf (x0 (t)) since dt γ is a symmetry of the differential equation. Therefore we have that the left hand side equals 0 (t) the right hand side since dxdt = f (x0 (t)). QED

REFERENCES [GSS] Golubsitsky, Matin. Stewart, Ian. Schaeffer, G. David: Singularities and Groups in Bifurcation Theory (Volume II). Springer-Verlag, New York, 1988.

Version: 3 Owner: Daume Author(s): Daume

436.11

symmetry of an ordinary differential equation

Let f : Rn → Rn be a smooth function and let x˙ = f (x) 1665

be a system of ordinary differential equation, in addition let γ be an invertible matrix. Then γ is a symmetry of the ordinary differential equation if f (γx) = γf (x). Example:

• natural symmetry of the Lorenz equation is a simple example of a symmetry of a differential equation.

REFERENCES [GSS] Golubsitsky, Matin. Stewart, Ian. Schaeffer, G. David: Singularities and Groups in Bifurcation Theory (Volume II). Springer-Verlag, New York, 1988.

Version: 4 Owner: Daume Author(s): Daume

1666

Chapter 437 34-01 – Instructional exposition (textbooks, tutorial papers, etc.) 437.1

second order linear differential equation with constant coefficients

Consider the second order homogeneous linear differential equation x00 + bx0 + cx = 0,

(437.1.1)

where b and c are real constants. The explicit solution is easily found using the characteristic equation method. This method, introduced by Euler, consists in seeking solutions of the form x(t) = ert for (446.2.1). Assuming a solution of this form, and substituting it into (446.2.1) gives r 2 ert + brert + cert = 0. Thus r 2 + br + c = 0

(437.1.2)

which is called the characteristic equation of (446.2.1). Depending on the nature of the roots r1 and r2 of (436.1.2), there are three cases. • If the roots are real and distinct, then two linearly independent solutions of (446.2.1) are x1 (t) = er1 t , x2 (t) = er2 t . • If the roots are real and equal, then two linearly independent solutions of (446.2.1) are x1 (t) = er1 t , 1667

x2 (t) = ter1 t .

• If the roots are complex conjugates of the form r1,2 = α ± iβ, then two linearly independent solutions of (446.2.1) are x1 (t) = eαt cos βt,

x2 (t) = eαt sin βt.

The general solution to (446.2.1) is then constructed from these linearly independent solutions, as φ(t) = C1 x1 (t) + C2 x2 (t). (437.1.3) Characterizing the behavior of (436.1.3) can be accomplished by studying the two dimensional linear system obtained from (446.2.1) by defining y = x0 : x0 = y y 0 = −by − cx.

(437.1.4)

Remark that the roots of (436.1.2) are the eigenvalues of the Jacobian matrix of (436.1.4). This generalizes to the characteristic equation of a differential equation of order n and the n-dimensional system associated to it. Also note that the only equilibrium of (436.1.4) is the origin (0, 0). Suppose that c 6= 0. Then (0, 0) is called a 1. source iff b < 0 and c > 0, 2. spiral source iff it is a source and b2 − 4c < 0, 3. sink iff b > 0 and c > 0, 4. spiral sink iff it is a sink and b2 − 4c < 0, 5. saddle iff c < 0, 6. center iff b = 0 and c > 0. Version: 3 Owner: jarino Author(s): jarino

1668

Chapter 438 34A05 – Explicit solutions and reductions 438.1

separation of variables

Separation of variables is a valuable tool for solving differential equations of the form dy = f (x)g(y) dx The above equation can be rearranged algebraically through Leibniz notation to separate the variables and be conveniently integrable. dy = f (x)dx g(y) It follows then that int

dy = F (x) + C g(y)

where F (x) is the antiderivative of f and C is a constant of integration. This gives a general form of the solution. An explicit form may be derived by an initial value. Example: A population that is initially at 200 organisms increases at a rate of 15% each year. We then have a differential equation dP = 0.15P dt The solution of this equation is relatively straightforward, we simple separate the variables algebraically and integrate. dP int = int0.15 dt P 1669

This is just ln P = 0.15t + C or P = Ce0.15t When we substitute P (0) = 200, we see that C = 200. This is where we get the general relation of exponential growth P (t) = P0 ekt [more later] Version: 2 Owner: slider142 Author(s): slider142

438.2

variation of parameters

The method of variation of parameters is a way of finding a particular solution to a nonhomogeneous linear differential equation. Suppose that we have an nth order linear differential operator L[y] := y (n) + p1 (t)y (n−1) + · · · + pn (t)y,

(438.2.1)

and a corresponding nonhomogeneous differential equation L[y] = g(t).

(438.2.2)

Suppose that we know a fundamental set of solutions y1 , y2, . . . , yn of the corresponding homogeneous differential equation L[yc ] = 0. The general solution of the homogeneous equation is yc (t) = c1 y1 (t) + c2 y2 (t) + · · · + cn yn (t), (438.2.3) where c1 , c2 , . . . , cn are constants. The general solution to the nonhomogeneous equation L[y] = g(t) is then y(t) = yc (t) + Y (t), (438.2.4) where Y (t) is a particular solution which satisfies L[Y ] = g(t), and the constants c1 , c2 , . . . , cn are chosen to satisfy the appropriate boundary conditions or initial conditions. The key step in using variation of parameters is to suppose that the particular solution is given by Y (t) = u1 (t)y1 (t) + u2 (t)y2 (t) + · · · + un (t)yn (t), (438.2.5) where u1 (t), u2(t), . . . , un (t) are as yet to be determined functions (hence the name variation of parameters). To find these n functions we need a set of n independent equations. One obvious condition is that the proposed ansatz stisfies Eq. (437.2.2). Many possible additional conditions are possible, we choose the ones that make further calculations easier. Consider

1670

the following set of n − 1 conditions y1 u01 + y2 u02 + · · · + yn u0n = 0 y10 u01 + y20 u02 + · · · + yn0 u0n = 0 .. . (n−2) 0 y1 u1

+

(n−2) 0 y2 u2

+···+

yn(n−2) u0n

(438.2.6)

= 0.

Now, substituting Eq. (437.2.5) into L[Y ] = g(t) and using the above conditions, we can get another equation (n−1) 0 u1

y1

(n−1) 0 u2

+ y2

+ · · · + yn(n−1) u0n = g.

(438.2.7)

So we have a system of n equations for u01 , u02 , . . . , u0n which we can solve using Cramer’s rule: u0m (t) =

g(t)Wm (t) , W (t)

m = 1, 2, . . . , n.

(438.2.8)

Such a solution always exists since the Wronskian W = W (y1 , y2, . . . , yn ) of the system is nowhere zero, because the y1 , y2, . . . , yn form a fundamental set of solutions. Lastly the term Wm is the Wronskian determinant with the mth column replaced by the column (0, 0, . . . , 0, 1). Finally the particular solution can be written explicitly as Y (t) =

n X

ym (t)int

m=1

g(t)Wm (t) dt. W (t)

(438.2.9)

REFERENCES 1. W. E. Boyce—R. C. DiPrima. Elementary Differential Equations and Boundary Value Problems John Wiley & Sons, 6th edition, 1997.

Version: 3 Owner: igor Author(s): igor

1671

Chapter 439 34A12 – Initial value problems, existence, uniqueness, continuous dependence and continuation of solutions 439.1

initial value problem

Consider the simple differential equation dy = x. dx The solution goes by writing dy = x dx and then integrating both sides as intdy = intx dx. 2 the solution becomes then y = x2 + C where C is any constant. 2

2

Differentiating x2 + 5, x2 + 7 and some other examples shows that all these functions hold the condition given by the differential equation. So we have an infinite number of solutions. An initial value problem is then a differential equation (ordinary or partial, or even a system) which, besides of stating the relation among the derivatives, it also specifies the value of the unknown solutions at certain points. This allows to get a unique solution from the infinite number of potential ones. In our example we could add the condition y(4) = 3 turning it into an initial value problem. 2 The general solution x2 + C is now hold to the restriction 42 +C =3 2 1672

by solving for C we obtain C = −5 and so the unique solution for the system dy = x dx y(4) = 3 is y(x) =

x2 2

− 5.

Version: 1 Owner: drini Author(s): drini

1673

Chapter 440 34A30 – Linear equations and systems, general 440.1

Chebyshev equation

Chebyshev’s equation is the second order linear differential equation (1 − x2 )

d2 y dy − x + p2 y = 0 2 dx dx

where p is a real constant. There are two independent solutions which are given as series by:

and

y1 (x) = 1 −

p2 2 x 2!

y2 (x) = x −

+

(p−2)p2 (p+2) 4 x 4!

(p−1)(p+1) 3 x 3!

+



(p−4)(p−2)p2 (p+2)(p+4) 6 x 6!

(p−3)(p−1)(p+1)(p+3) 5 x 5!

+···

−···

In each case, the coefficients are given by the recursion an+2 =

(n − p)(n + p) an (n + 1)(n + 2)

with y1 arising from the choice a0 = 1, a1 = 0, and y2 arising from the choice a0 = 0, a1 = 1. The series converge for |x| < 1; this is easy to see from the ratio test and the recursion formula above. When p is a non-negative integer, one of these series will terminate, giving a polynomial solution. If p ≥ 0 is even, then the series for y1 terminates at xp . If p is odd, then the series for y2 terminates at xp . 1674

These polynomials are, up to multiplication by a constant, the Chebyshev polynomials. These are the only polynomial solutions of the Chebyshev equation. (In fact, polynomial solutions are also obtained when p is a negative integer, but these are not new solutions, since the Chebyshev equation is invariant under the substitution of p by −p.) Version: 3 Owner: mclase Author(s): mclase

1675

Chapter 441 34A99 – Miscellaneous 441.1

autonomous system

A system of ordinary differential equation is autonomous when it does not depend on time (does not depend on the independent variable) i.e. x˙ = f (x). In contrast nonautonomous is when the system of ordinary differential equation does depend on time (does depend on the independent variable) i.e. x˙ = f (x, t). It can be noted that every nonautonomous system can be converted to an autonomous system by additng a dimension. i.e. If x ˙ = f(x, t) x ∈ Rn then it can be written as an autonomous system with x ∈ Rn+1 and by doing a substitution with xn+1 = t and x˙ n+1 = 1. Version: 1 Owner: Daume Author(s): Daume

1676

Chapter 442 34B24 – Sturm-Liouville theory 442.1

eigenfunction

Consider the Sturm-Liouville system given by   dy d p(x) + q(x)y + λr(x)y = 0 dx dx a1 y(a) + a2 y 0(a) = 0,

a6x6b

b1 y(b) + b2 y 0 (b) = 0,

(442.1.1)

(442.1.2)

where ai , bi ∈ R with i ∈ {1, 2} and p(x), q(x), r(x) are differentiable functions and λ ∈ R. A non zero solution of the system defined by (441.1.1) and (441.1.2) exists in general for a specified λ. The functions corresponding to that specified λ are called eigenfunctions. More generally, if D is some linear differential operator, and λ ∈ R and f is a function such that Df = λf then we say f is an eigenfunction of D with eigenvalue λ. Version: 5 Owner: tensorking Author(s): tensorking

1677

Chapter 443 34C05 – Location of integral curves, singular points, limit cycles 443.1

Hopf bifurcation theorem

Consider a planar system of ordinary differential equations, written is such a form as to make explicit the dependance on a parameter µ: x0 = f1 (x, y, µ) y 0 = f2 (x, y, µ) Assume that this system has the origin as an equilibrium for all µ. Suppose that the linearization Df at zero has the two purely imaginary eigenvalues λ1 (µ) and λ2 (µ) when µ = µc . If the real part of the eigenvalues verify d (Re (λ1,2 (µ)))|µ=µc > 0 dµ and the origin is asymptotically stable at µ = µc , then 1. µc is a bifurcation point; ¯ such that µ1 < µ < µc , the origin is a stable focus; 2. for some µ1 ∈ R ¯ such that µc < µ < µ2 , the origin is unstable, surrounded by a stable 3. for some µ2 ∈ R limit cycle whose size increases with µ. This is a simplified version of the theorem, corresponding to a supercritical Hopf bifurcation. Version: 1 Owner: jarino Author(s): jarino 1678

443.2

Poincare-Bendixson theorem

Let M be an open subset of R2 , and f ∈ C 1 (M, R2 ). Consider the planar differential equation x0 = f (x) Consider a fixed x ∈ M. Suppose that the omega limit set ω(x) 6= ∅ is compact, connected, and contains only finitely many equilibria. Then one of the following holds: 1. ω(x) is a fixed orbit (a periodic point with period zero, i.e., an equilibrium). 2. ω(x) is a regular periodic orbit. 3. ω(x) consists of (finitely many) equilibria {xj } and non-closed orbits γ(y) such that ω(y) ∈ {xj } and α(y) ∈ {xj } (where α(y) is the alpha limit set of y). The same result holds when replacing omega limit sets by alpha limit sets. Since f was chosen such that existence and unicity hold, and that the system is planar, the Jordan curve theorem implies that it is not possible for orbits of the system satisfying the hypotheses to have complicated behaviors. Typical use of this theorem is to prove that an equilibrium is globally asymptotically stable (after using a Dulac type result to rule out periodic orbits). Version: 1 Owner: jarino Author(s): jarino

443.3

omega limit set

Let Φ(t, x) be the flow of the differential equation x0 = f (x), where f ∈ C k (M, Rn ), with k > 1 and M an open subset of Rn . Consider x ∈ M. The omega limit set of x, denoted ω(x), is the set of points y ∈ M such that there exists a sequence tn → ∞ with Φ(tn , x) = y. Similarly, the alpha limit set of x, denoted α(x), is the set of points y ∈ M such that there exists a sequence tn → −∞ with Φ(tn , x) = y. Note that the definition is the same for more general dynamical systems. Version: 1 Owner: jarino Author(s): jarino

1679

Chapter 444 34C07 – Theory of limit cycles of polynomial and analytic vector fields (existence, uniqueness, bounds, Hilbert’s 16th problem and ramif 444.1

Hilbert’s 16th problem for quadratic vector fields

Find a maximum natural number H(2) and relative position of limit cycles of a vector field x˙ = p(x, y) =

2 X

aij xi y j

i+j=0

y˙ = q(x, y) =

2 X

bij xi y j

i+j=0

[GSS] As of now neither part of the problem (i.e. the bound and the positions of the limit cycles) are solved. Although R. Bam`on in 1986 showed [PV] that a quadratic vector field has finite number of limit cycles. In 1980 Shi Songling showed [SS] an example of a quadratic vector field which has four limit cycles (i.e. H(2) > 4).

REFERENCES [DRR] Dumortier, F., Roussarie, R., Rousseau, C.: Hilbert’s 16th Problem for Quadratic Vector Fields. Journal of Differential Equations 110, 86-133, 1994. [BR] R. Bam` on: Quadratic vector fields in the plane have a finite number of limit cycles, Publ. I.H.E.S. 64 (1986), 111-142.

1680

[SS] Shi Songling, A concrete example of the existence of four limit cycles for plane quadratic systems, Scientia Sinica 23 (1980), 154-158.

Version: 6 Owner: Daume Author(s): Daume

1681

Chapter 445 34C23 – Bifurcation 445.1

equivariant branching lemma

Let Γ be a Lie group acting absolutely irreducible on V and let g ∈ ~Ex,λ(where ~E(Γ) is the space of Γ-equivariant germs, at the origin, of C ∞ mappings of V into V ) be a bifurcation problem with symmetry group Γ. Since V is absolutely irreducible the Jacobian matrix is (dg)0,λ = c(λ)I then we suppose that c0 (0) 6= 0. Let Σ be an isotropy subgroup satisfying dim Fix(Σ) = 1 . Then there exists a unique smooth solution branch to g = 0 such that the isotropy subgroup of each solution is Σ. [GSS]

REFERENCES [GSS] Golubsitsky, Matin. Stewart, Ian. Schaeffer, G. David.: Singularities and Groups in Bifurcation Theory (Volume II). Springer-Verlag, New York, 1988.

Version: 2 Owner: Daume Author(s): Daume

1682

Chapter 446 34C25 – Periodic solutions 446.1

Bendixson’s negative criterion

Let x˙ = f(x) be a planar dynamical system where f = (X, Y)t and x = (x, y)t . Furthermore f ∈ C 1 (E) where E is a simply connected region of the plane. If ∂X + ∂Y (the divergence of the ∂x ∂y vector field f, ∇ · f) is always of the same sign but not identically zero then there are no periodic solution in the region E of the planar system. Version: 1 Owner: Daume Author(s): Daume

446.2

Dulac’s criteria

Let x˙ = f(x) be a planar dynamical system where f = (X, Y)t and x = (x, y)t . Furthermore f ∈ C 1 (E) where E is a simply connected region of the plane. If there exists a function p(x, y) ∈ C 1 (E) such that ∂(p(x,y)X) + ∂(p(x,y)Y) (the divergence of the vector field p(x, y)f, ∇ · p(x, y)f) is ∂x ∂y always of the same sign but not identically zero then there are no periodic solution in the region E of the planar system. In addition, if A is an annular region contained in E on which the above condition is satisfied then there exists at most one periodic solution in A. Version: 1 Owner: Daume Author(s): Daume 1683

446.3

proof of Bendixson’s negative criterion

Suppose that there exists a periodic solution called Γ which has a period of T and lies in E. Let the interior of Γ be denoted by D. Then by Green’s theorem we can observe that ∂X ∂Y + dx dy intintD ∇ · f dx dy = intintD ∂x ∂y I = (X dx − Y dy) Γ

= intT0 (Xy˙ − Yx) ˙ dt T = int0 (XY − YX) dt = 0

Since ∇ · f is not identically zero by hypothesis and is of one sign, the double integral on the left must be non zero and of that sign. This leads to a contradiction since the right hand side is equal to zero. Therefore there does not exists a periodic solution in the simply connected region E. Version: 1 Owner: Daume Author(s): Daume

1684

Chapter 447 34C99 – Miscellaneous 447.1

Hartman-Grobman theorem

Consider the differential equation x0 = f (x)

(447.1.1)

where f is a C 1 vector field. Assume that x0 is a hyperbolic equilibrium of f . Denote Φt (x) the flow of (446.2.1) through x at time t. Then there exists a homeomorphism ϕ(x) = x+h(x) with h bouded, such that ϕ ◦ etDf (x0 ) = Φt ◦ ϕ is a sufficiently small neighboorhood of x0 . This fundamental theorem in the qualitative analysis of nonlinear differential equations states that, in a small neighborhood of x0 , the flow of the nonlinear equation (446.2.1) is qualitatively similar to that of the linear system x0 = Df (x0 )x. Version: 1 Owner: jarino Author(s): jarino

447.2

equilibrium point

Consider an autonomous differential equation x˙ = f (x)

(447.2.1)

An equilibrium (point) x0 of (446.2.1) is such that f (x0 ) = 0. If the linearization Df (x0 ) has no eigenvalue with zero real part, x0 is said to be a hyperbolic equilibrium, whereas if there exists an eigenvalue with zero real part, the equilibrium point is nonhyperbolic. Version: 5 Owner: Daume Author(s): Daume, jarino 1685

447.3

stable manifold theorem

Let E be an open subset of Rn containing the origin, let f ∈ C 1 (E), and let φt be the flow of the nonlinear system x0 = f (x). Suppose that f (x0 ) = 0 and that Df (x0 ) has k eigenvalues with negative real part and n − k eigenvalues with positive real part. Then there exists a k-dimensional differentiable manifold S tangent to the stable subspace E S of the linear system x0 = Df (x)x at x0 such that for all t > 0, φt (S) ⊂ S and for all y ∈ S, lim φt (y) = x0

t→∞

and there exists an n − k dimensional differentiable manifold U tangent to the unstable subspace E U of x0 = Df (x)x at x0 such that for all t 6 0, φt (U) ⊂ U and for all y ∈ U, lim φt (y) = x0 .

t→−∞

Version: 1 Owner: jarino Author(s): jarino

1686

Chapter 448 34D20 – Lyapunov stability 448.1

Lyapunov stable

A fixed point is Lyapunov stable if trajectories of nearby points remain close for future time. More formally the fixed point x∗ is Lyapunov stable, if for any  > 0, there is a δ > 0 such that for all t ≥ 0 and for all x such that d(x, x∗ ) < δ, d(x(t), x∗ ) < . Version: 2 Owner: armbrusterb Author(s): yark, armbrusterb

448.2

neutrally stable fixed point

A fixed point is considered neutrally stable if is Liapunov stable but not attracting. A center is an example of such a fixed point. Version: 3 Owner: armbrusterb Author(s): Johan, armbrusterb

448.3

stable fixed point

Let X be a vector field on a manifold M. A fixed point of X is said to be stable if it is both attracting and Lyapunov stable. Version: 5 Owner: alinabi Author(s): alinabi, yark, armbrusterb

1687

Chapter 449 34L05 – General spectral theory 449.1

Gelfand spectral radius theorem

For every self consistent matrix norm, || · ||, and every square matrix A we can write 1

ρ(A) = lim ||An || n . n→∞

Note: ρ(A) denotes the spectral radius of A. Version: 4 Owner: Johan Author(s): Johan

1688

Chapter 450 34L15 – Estimation of eigenvalues, upper and lower bounds 450.1

Rayleigh quotient

The Rayleigh quotient, RA , to the Hermitian matrix A is defined as RA (x) =

xH Ax , xH x

Version: 1 Owner: Johan Author(s): Johan

1689

x 6= 0.

Chapter 451 34L40 – Particular operators (Dirac, one-dimensional Schr¨ odinger, etc.) 451.1

Dirac delta function

The Dirac delta “function” δ(x) is not a true function since it cannot be defined completely by giving the function value for all values of the argument x. Similar to the Kronecker delta, the notation δ(x) stands for δ(x) = 0 for x 6= 0, and int∞ −∞ δ(x)dx = 1 For any continuous function F : int∞ −∞ δ(x)F (x)dx = F (0) or in n dimensions: intRn δ(x − s)f (s)dn s = f (x) δ(x) can also be defined as a normalized Gaussian function (normal distribution) in the limit of zero width. References • Originally from The Data Analysis Briefbook (http://rkb.home.cern.ch/rkb/titleA.html) Version: 2 Owner: akrowne Author(s): akrowne 1690

451.2

construction of Dirac delta function

The Dirac delta function is notorious in mathematical circles for having no actual realization as a function. However, a little known secret is that in the domain of nonstandard analysis, the Dirac delta function admits a completely legitimate construction as an actual function. We give this construction here. Choose any positive infinitesimal ε and define the hyperreal valued function δ : ∗ R −→ ∗ R by ( 1/ε −ε/2 < x < ε/2, δ(x) := 0 otherwise. We verify that the above function satisfies the required properties of the Dirac delta function. By definition, δ(x) = 0 for all nonzero real numbers x. Moreover, ε/2

int∞ −∞ δ(x) dx = int−ε/2

1 dx = 1, ε

so the integral property is satisfied. Finally, for any continuous real function f : R −→ R, choose an infinitesimal z > 0 such that |f (x) − f (0)| < z for all |x| < ε/2; then ε·

f (0) + z f (0) − z < int∞ −∞ δ(x)f (x) dx < ε · ε ε

which implies that int∞ −∞ δ(x)f (x) dx is within an infinitesimal of f (0), and thus has real part equal to f (0). Version: 2 Owner: djao Author(s): djao

1691

Chapter 452 35-00 – General reference works (handbooks, dictionaries, bibliographies, etc.) 452.1

differential operator

Roughly speaking, a differential operator is a mapping, typically understood to be linear, that transforms a function into another function by means of partial derivatives and multiplication by other functions. On Rn , a differential operator is commonly understood to be a linear transformation of C∞ (Rn ) having the form X f 7→ aI fI , f ∈ C∞ (Rn ), I

where the sum is taken over a finite number of multi-indices I = (i1 , . . . , in ) ∈ Nn , where aI ∈ C∞ (Rn ), and where fI denotes a partial derivative of f taken i1 times with respect to the first variable, i2 times with respect to the second variable, etc. The order of the operator is the maximum number of derivatives taken in the above formula, i.e. the maximum of i1 + . . . + in taken over all the I involved in the above summation. On a C∞ manifold M, a differential operator is commonly understood to be a linear transformation of C∞ (M) having the above form relative to some system of coordinates. Alternatively, one can equip C∞ (M) with the limit-order topology, and define a differential operator as a continuous transformation of C∞ (M). The order of a differential operator is a more subtle notion on a manifold than on Rn . There are two complications. First, one would like a definition that is independent of any particular system of coordinates. Furthermore, the order of an operator is at best a local concept: it can change from point to point, and indeed be unbounded if the manifold is non-compact. 1692

To address these issues, for a differential operator T and x ∈ M, we define ordx (T ) the order of T at x, to be the smallest k ∈ N such that T [f k+1](x) = 0 for all f ∈ C∞ (M) such that f (x) = 0. For a fixed differential operator T , the function ord(T ) : M → N defined by x 7→ ordx (T ) is lower semi-continuous, meaning that ordy (T ) > ordx (T ) for all y ∈ M sufficiently close to x. The global order of T is defined to be the maximum of ordx (T ) taken over all x ∈ M. This maximum may not exist if M is non-compact, in which case one says that the order of T is infinite. Let us conclude by making two remarks. The notion of a differential operator can be generalized even further by allowing the operator to act on sections of a bundle. A differential operator T is a local operator, meaning that T [f ](x) = T [f ](x),

f, g ∈ C∞ (M), x ∈ M,

if f ⇔ g in some neighborhood of x. A theorem, proved by Peetre states that the converse is also true, namely that every local operator is necessarily a differential operator. References 1. Dieudonn´e, J.A., Foundations of modern analysis 2. Peetre, J. , “Une caract´erisation abstraite des op´erateurs diff´erentiels”, Math. Scand., v. 7, 1959, p. 211 Version: 5 Owner: rmilson Author(s): rmilson

1693

Chapter 453 35J05 – Laplace equation, reduced wave equation (Helmholtz), Poisson equation 453.1

Poisson’s equation

Poisson’s equation is a second-order partial differential equation which arises in physical problems such as finding the electrical potential of a given charge distribution. Its general form in n dimensions is ∇2 φ(r) = ρ(r) where ∇2 is the Laplacian and ρ : D → R, often called a source function, is a given function on some subset D of Rn . If ρ is identical to zero, the Poisson equation reduces to the Laplace equation. The Poisson equation is linear, and therefore obeys the superposition principle: if ∇2 φ1 = ρ1 and ∇2 φ2 = ρ2 , then ∇2 (φ1 + φ2 ) = ρ1 + ρ2 . This fact can be used to construct solutions to Poisson’s equation from fundamental solutions, where the source distribution is a delta function. A very important case is the one in which n = 3, D is all of R3 , and φ(r) → 0 as |r| → ∞. The general solution is then given by φ(r) = −

1 ρ(r0 ) 3 0 intR3 d r. 4π |r − r0 |

Version: 2 Owner: pbruin Author(s): pbruin

1694

Chapter 454 35L05 – Wave equation 454.1

wave equation

The wave equation is a partial differential equation which describes all kinds of waves. It arises in various physical situations, such as vibrating strings, sound waves, and electromagnetic waves. The wave equation in one dimension is 2 ∂2u 2∂ u = c . ∂t2 ∂x2 The general solution of the one-dimensional wave equation can be obtained by a change of ∂2u variables (x, t) −→ (ξ, η), where ξ = x − ct and η = x + ct. This gives ∂ξ∂η = 0, which we can integrate to get d’Alembert’s solution:

u(x, t) = F (x − ct) + G(x + ct) where F and G are twice differentiable functions. F and G represent waves travelling in the positive and negative x directions, respectively, with velocity c. These functions can be obtained if appropriate starting or boundary conditions are given. For example, if u(x, 0) = f (x) and ∂u (x, 0) = g(x) are given, the solution is ∂t 1 1 x+ct u(x, t) = [f (x − ct) + f (x + ct)] + intx−ct g(s)ds. 2 2c In general, the wave equation in n dimensions is ∂2u = c2 ∇2 u. ∂t2 where u is a function of the location variables x1 , x2 , . . . , xn , and time t. Here, ∇2 is the Laplacian with respect to the location variables, which in Cartesian coordinates is given by ∂2 ∂2 ∂2 ∇2 = ∂x 2 + ∂x2 + · · · + ∂x2 . 1

2

n

1695

Version: 4 Owner: pbruin Author(s): pbruin

1696

Chapter 455 35Q53 – KdV-like equations (Korteweg-de Vries, Burgers, sine-Gordon, sinh-Gordon, etc.) 455.1

Korteweg - de Vries equation

The Korteweg - de Vries equation is ut = uux + uxxx where u = u(x, t) and the subscripts indicate derivatives. Version: 4 Owner: superhiggs Author(s): superhiggs

1697

(455.1.1)

Chapter 456 35Q99 – Miscellaneous 456.1

heat equation

The heat equation in 1-dimension (for example, along a metal wire) is a partial differential equation of the following form: ∂u ∂2u = c2 · 2 ∂t ∂x also written as ut = c2 · uxx Where u : R2 → R is the function giving the temperature at time t and position x and c is a real valued constant. This can be easily extended to 2 or 3 dimensions as ut = c2 · (uxx + uyy ) and ut = c2 · (uxx + uyy + uzz ) Note that in the steady state, that is when ut = 0, we are left with the Laplacian of u: ∆u = 0 Version: 2 Owner: dublisk Author(s): dublisk

1698

Chapter 457 37-00 – General reference works (handbooks, dictionaries, bibliographies, etc.)

1699

Chapter 458 37A30 – Ergodic theorems, spectral theory, Markov operators 458.1

ergodic

Let (X, B, µ) be a probability space, and T : X → X be a measure-preserving transformation. We call T ergodic if for A ∈ B, T A = A ⇒ µ(A) = 0 or µ(A) = 1.

(458.1.1)

That is, T takes almost all sets all over the space. The only sets it doesn’t move are some sets of measure zero and the entire space. Version: 2 Owner: drummond Author(s): drummond

458.2

fundamental theorem of demography

Let At be a sequence of n × n nonnegative primitive matrices. Suppose that At → A∞ , with A∞ also a nonnegative primitive matrix. Define the sequence xt+1 = At xt , with xt ∈ Rn . If x0 > 0, then xt =p lim t→∞ kxt k

where p is the normalized (kpk = 1) eigenvector associated to the dominant eigenvalue of A∞ (also called the Perron-Frobenius eigenvector of A∞ ). Version: 3 Owner: jarino Author(s): jarino

1700

458.3

proof of fundamental theorem of demography

• First we will prove that ∃m, M > 0 such that m6

kxk+1 k 6 M, ∀k kxk k

(458.3.1)

with m and M independent of the sequence. In order to show this we use the primitivity of the matrices Ak and A∞ . Primitivity of A∞ implies that there exists l ∈ N such that Al∞  0 By continuity, this implies that there exists k0 such that, for all k > k0 , we have Ak+l Ak+l−1 · · · Ak  0 Let us then write xk+l+1 as a function of xk : xk+l+1 = Ak+l · · · Ak xk We thus have kxk+l+1 k 6 C l+1 kxk k

(458.3.2)

But since the matrices Ak+l ,. . . ,Ak are strictly positive for k > k0 , there exists a ε > 0 such that each component of these matrices is superior or equal to ε. From this we deduce that ∀k > k0 , kxk+l+1 k > εkxk k Applying relation (457.3.2), we then have that C l kxk+1 k > εkxk k which yields kxk+1 k >

ε kxk k, ∀k > 0 Cl

and so we indeed have relation (457.3.1). • Let us denote by ek the (normalised) Perron eigenvector of Ak . Thus Ak ek = λk ek

kek k = 1

Let us denote by πk the projection on the supplementary space of {ek } invariant by Ak . Choosing a proper norm, we can find ε > 0 such that |Ak πk | 6 (λk − ε); ∀k • We shall now prove that



∗ ek+1 , xk+1 → λ∞ when k → ∞ he∗k , xk i 1701

In order to do this, we compute the inner product of the sequence xk+1 = Ak xk with the ek ’s:



ek+1 , xk+1 = e∗k+1 − e∗k , Ak xk + λk he∗k , xk i = o (he∗k , xk i) + λk he∗k , xk i Therefore we have



∗ ek+1 , xk+1 = o(1) + λk he∗k , xk i

• Now assume

πk xk he∗k , xk i We will verify that uk → 0 when k → ∞. We have

and so

uk =

xk x he∗ , x i + ∗ k k Ak πk ∗ k uk+1 = (πk+1 − πk )Ak ∗ hek , xk+1 i hek , xk i ek+1 , xk+1 he∗ , xk i (λk − ε)|uk | |uk+1| 6 πk+1 − πk |C 0 + ∗ k ek+1 , xk+1

We deduce that there exists k1 > k0 such that, for all k > k1 ε |uk+1| 6 δk + (λ∞ − )|uk | 2 where we have noted δk = (πk+1 − πk )C 0 We have δk → 0 when t → ∞, we thus finally deduce that |uk | → 0 when k → ∞ Remark that this also implies that zk =

πk xk → 0 when k → ∞ kxk k

• We have zk → 0 when k → ∞, and xk /kxk k can be written xk = αk ek + zk kxk k

Therefore, we have αk ek → 1 when k → ∞, which implies that αk tends to 1, since we have chosen ek to be normalised (i.e.,kek k = 1). We then can conclude that and the proof is done.

xk → e∞ when k → ∞ kxk k

Version: 2 Owner: jarino Author(s): jarino 1702

Chapter 459 37B05 – Transformations and group actions with special properties (minimality, distality, proximality, etc.) 459.1

discontinuous action

Let X be a topological space and G a group that acts on X by homeomorphisms. The action of G is said to be discontinuous at x ∈ X if there is a neighborhood U of x such that the set \ {g ∈ G | gU U 6= ∅} is finite. The action is called discontinuous if it is discontinuous at every point.

Remark 1. If G acts discontinuously then the orbits of the action have no accumulation points, i.e. if {gn } is a sequence of distinct elements of G and x ∈ X then the sequence {gn x} has no limit points. If X is locally compact then an action that satisfies this condition is discntinuous. Remark 2. Assume that X is a locally compact Hausdorff space and let Aut(X) denote the group of self homeomorphisms of X endowed with the compact-open topology. If ρ : G → Aut(X) defines a discontimuous action then the image ρ(G) is a discrete subset of Aut(X). Version: 2 Owner: Dr Absentius Author(s): Dr Absentius

1703

Chapter 460 37B20 – Notions of recurrence 460.1

nonwandering set

Let X be a metric space, and f : X → X a continuous surjection. An element x of X is a wandering point if there is a neighborhood U of x and an integer N such that, for all n > N, T n f (U) U = ∅. If x is not wandering, we call it a nonwandering point. Equivalently, T x is n a nonwandering point if for every neighborhood U of x there is n > 1 such that f (U) U is nonempty. The set of all nonwandering points is called the nonwandering set of f , and is denoted by Ω(f ). If X is compact, then Ω(f ) is compact, nonempty, and forward invariant; if, additionally, f is an homeomorphism, then Ω(f ) is invariant. Version: 1 Owner: Koro Author(s): Koro

1704

Chapter 461 37B99 – Miscellaneous 461.1

ω-limit set

Let X be a metric space, and let f : X → X be a homeomorphism. The ω-limit set of x ∈ X, denoted by ω(x, f ), is the set of cluster points of the forward orbit {f n (x)}n∈N . Hence, y ∈ ω(x, f ) if and only if there is a strictly increasing sequence of natural numbers {nk }k∈N such that f nk (x) → y as k → ∞. Another way to express this is ω(x, f ) =

\

n∈N

{f k (x) : k > n}.

The α-limit set is defined in a similar fashion, but for the backward orbit; i.e. α(x, f ) = ω(x, f −1). Both sets are f -invariant, and if X is compact, they are compact and nonempty. If ϕ : R × X → X is a continuous flow, the definition is similar: ω(x, ϕ) consists of those elements y of X for which there exists a strictly increasing sequnece {tn } of real numbers such that tn → ∞ and ϕ(x, tn ) → y as n → ∞. Similarly, α(x, ϕ) is the ω-limit set of the reversed flow (i.e. ψ(x, t) = φ(x, −t)). Again, these sets are invariant and if X is compact they are compact and nonempty. Furthermore, \ ω(x, f ) = {ϕ(x, t) : t > n}. n∈N

Version: 2 Owner: Koro Author(s): Koro

1705

461.2

asymptotically stable

Let (X, d) be a metric space and f : X → X a continuous function. A point x ∈ X is said to be Lyapunov stable if for each  > 0 there is δ > 0 such that for all n ∈ N and all y ∈ X such that d(x, y) < δ, we have d(f n (x), f n (y)) < . We say that x is asymptotically stable if it belongs to the interior of its stable set, i.e. if there is δ > 0 such that limn→∞ d(f n (x), f n (y)) = 0 whenever d(x, y) < δ. In a similar way, if ϕ : X × R → X is a flow, a point x ∈ X is said to be Lyapunov stable if for each  > 0 there is δ > 0 such that, whenever d(x, y) < δ, we have d(ϕ(x, t), ϕ(y, t)) <  for each t > 0; and x is called asymptotically stable if there is a neighborhood U of x such that limt→∞ d(ϕ(x, t), ϕ(y, t)) = 0 for each y ∈ U. Version: 6 Owner: Koro Author(s): Koro

461.3

expansive

If (X, d) is a metric space, a homeomorphism f : X → X is said to be expansive if there is a constant ε0 > 0, called the expansivity constant, such that for any two points of X, their n-th iterates are at least ε0 appart for some integer n; i.e. if for any pair of points x 6= y in X there is n ∈ Z such that d(f n (x), f n (y)) > ε0 . The space X is often assumed to be compact, since under that assumption expansivity is a topological property; i.e. if d0 is any other metric generating the same topology as d, and if f is expansive in (X, d), then f is expansive in (X, d0 ) (possibly with a different expansivity constant). If f : X → X is a continuous map, we say that X is positively expansive (or forward expansive) if there is ε0 such that, for any x 6= y in X, there is n ∈ N such that d(f n (x), f n (y)) > ε0 . Remarks.The latter condition is much stronger than expansivity. In fact, one can prove that if X is compact and f is a positively expansive homeomorphism, then X is finite (proof). Version: 9 Owner: Koro Author(s): Koro

1706

461.4

the only compact metric spaces that admit a positively expansive homeomorphism are discrete spaces

Theorem. Let (X, d) be a compact metric space. If there exists a positively expansive homeomorphism f : X → X, then X consists only of isolated points, i.e. X is finite. Lemma 1. If (X, d) is a compact metric space and there is an expansive homeomorphism f : X → X such that every point is Lyapunov stable, then every point is asymptotically stable. Proof. Let 2c be the expansivity constant of f . Suppose some point x is not asymptotically stable, and let δ be such that d(x, y) < δ implies d(f n (x), f n (y)) < c for all n ∈ N. Then there exist  > 0, a point y with d(x, y) < δ, and an increasing sequence {nk } such that d(f nk (y), f nk (x)) >  for each k By uniform expansivity, there is N > 0 such that for every u and v such that d(u, v) >  there is n ∈ Z with |n| < N such that d(f n (x), f n (y)) > c. Choose k so large that nk > N. Then there is n with |n| < N such that d(f n+nk (x), f n+nk (y)) = d(f n (f nk (x)), f n (f nk (y))) > c. But since n + nk > 0, this contradicts the choce of δ. Hence every point is asymptotically stable. Lemma 2 If (X, d) is a compact metric space and f : X → X is a continuous surjection such that every point is asymptotically stable, then X is finite. Proof. For each x ∈ X let Kx be a closed neighborhood of x such that for all y ∈ Kx we have limn→∞ d(f n (x), f n (y)) = 0. We assert that limn→∞ diam(f n (Kx )) = 0. In fact, if that is not the case, then there is an increasing sequence of positive integers {nk }, some  > 0 and a sequence {xk } of points of Kx such that d(f nk (x), f nk (xk )) > , and there is a subsequence {xki } converging to some point y ∈ Kx for which lim sup d(f n (x), f n (y)) > , contradicting the choice of Kx . Sm Now since X is compact, there are finitely many points x , . . . , x such that X = 1 m i=1 Kxi , Sm n n so that X = f (X) = i=1 f (Kxi ). To show that X = {x1 , . . . , xm }, suppose there is y ∈ X such that r = min{d(y, xi) : 1 6 i 6 m} > 0. Then there is n such that diam(f n (Kxi )) < r for 1 6 i 6 m but since y ∈ f n (Kxi ) for some i, we have a contradiction. Proof of the theorem. Consider the sets K = {(x, y) ∈ X × X : d(x, y) > } for  > 0 and U = {(x, y) ∈ X × X : d(x, y) > c}, where 2c is the expansivity constant of f , and let F : X × X → X × X be the mapping given by F(x, y) = (f (x), f (y)). It is clear that F is a homeomorphism. By uniform expansivity, we know that for each  > 0 there is N such that for all (x, y) ∈ K , there is n ∈ {1, . . . , N } such that F n (x, y) ∈ U. We will prove that for each  > 0, there is δ > 0 such that F n (K ) ⊂ Kδ for all n ∈ N. This is equivalent to say that every point of X is Lyapunov stable for f −1 , and by the previous lemmas the proof will be completed. S  n Let K = N n=0 F (K ), and let δ0 = min{d(x, y) : (x, y) ∈ K}. Since K is compact, 1707

the minimum distance δ0 is reached at some point of K; i.e. there exist (x, y) ∈ K and 0 6 n 6 N such that d(f n (x), f n (y)) = δ0 . Since f is injective, it follows that δ0 > 0 and letting δ = δ0 /2 we have K ⊂ Kδ . Given α ∈ K − K , there is β ∈ K and some 0 < m 6 N such that α = F m (β), and F k (β) ∈ / K for 0 < k 6 m. Also, there is n with 0 < m < n 6 N such that F n (β) ∈ U ⊂ K . Hence m < N , and F(β) = F m+1 (α) ∈ F m+1 (K ) ⊂ K; On the other hand, F(K ) ⊂ K. Therefore F (K) ⊂ K, and inductively F n (K) ⊂ K for any n ∈ N. It follows that F n (K ) ⊂ F n (K) ⊂ K ⊂ Kδ for each n ∈ N as required. Version: 5 Owner: Koro Author(s): Koro

461.5

topological conjugation

Let X and Y be topological spaces, and let f : X → X and g : Y → Y be continuous functions. We say that f is topologically semicongugate to g, if there exists a continuous surjection h : Y → X such that f h = hg. If h is a homeomorphism, then we say that f and g are topologically conjugate, and we call h a topological conjugation between f and g. Similarly, a flow ϕ on X is topologically semiconjugate to a flow ψ on Y if there is a continuous surjection h : Y → X such that ϕ(h(y), t) = hψ(y, t) for each y ∈ Y , t ∈ R. If h is a homeomorphism then ψ and ϕ are topologically conjugate.

461.5.1

Remarks

Topological conjugation defines an equivalence relation in the space of all continuous surjections of a topological space to itself, by declaring f and g to be related if they are topologically conjugate. This equivalence relation is very useful in the theory of dynamical systems, since each class contains all functions which share the same dynamics from the topological viewpoint. In fact, orbits of g are mapped to homeomorphic orbits of f through the conjugation. Writting g = h−1 f h makes this fact evident: g n = h−1 f n h. Speaking informally, topological conjugation is a “change of coordinates” in the topological sense. However, the analogous definition for flows is somewhat restrictive. In fact, we are requiring the maps ϕ(·, t) and ψ(·, t) to be topologically conjugate for each t, which is requiring more than simply that orbits of ϕ be mapped to orbits of ψ homeomorphically. This motivates the definition of topological equivalence, which also partitions the set of all flows in X into classes of flows sharing the same dynamics, again from the topological viewpoint. We say that ψ and ϕ are topologically equivalent, if there is an homeomorphism h : Y → X, mapping orbits of ψ to orbits of ϕ homeomorphically, and preserving orientation of the orbits. This means that: 1708

1. O(y, ψ) = {ψ(y, t) : t ∈ R} = {ϕ(h(y), t) : t ∈ R} = O(h(y), ϕ) for each y ∈ Y ; 2. for each y ∈ Y , there is δ > 0 such that, if 0 < |s| < t < δ, and if s is such that ϕ(h(y), s) = ψ(y, t), then s > 0. Version: 5 Owner: Koro Author(s): Koro

461.6

topologically transitive

A continuous surjection f on a topological space X to itself is topologically transitive if T n for every pair of open sets U and V in X there is an integer n > 0 such that f (U) V 6= ∅, where f n denotes the n-th iterate of f . T If for every pair of open sets U and V there is an integer N such that f n (U) V 6= ∅ for each n > N, we say that f is topologically mixing. If X is a compact metric space, then f is topologically transitive if and only if there exists a point x ∈ X with a dense orbit, i.e. such that O(x, f ) = {f n (x) : n ∈ N} is dense in X. Version: 2 Owner: Koro Author(s): Koro

461.7

uniform expansivity

Let (X, d) be a compact metric space and let f : X → X be an expansive homeomorphism. Theorem (uniform expansivity). For every  > 0 and δ > 0 there is N > 0 such that for each pair x, y of points of X such that d(x, y) >  there is n ∈ Z with |n| 6 N such that d(f n (x), f n (y)) > c − δ, where c is the expansivity constant of f . Proof. Let K = {(x, y) ∈ X × X : d(x, y) > /2}. Then K is closed, and hence compact. For each pair (x, y) ∈ K, there is n(x,y) ∈ Z such that d(f n(x,y) (x), f n(x,y) (y)) > c. Since the mapping F : X × X → X × X defined by F (x, y) = (f (x), f (y)) is continuous, F nx is also continuous and there is a neighborhood U(x,y) of each (x, y) ∈ K such that d(f n(x,y) (u), f n(x,y) (v)) < c − δ for each (u, v) ∈ U(x,y) . Since K is compact and {U(x,y) : (x, y) ∈ K} is an open cover of K, there is a finite subcover {U(xi ,yi ) : 1 6 i 6 m}. Let N = max{|n(xi ,yi ) | : 1 6 i 6 m}. If d(x, y) > , then (x, y) ∈ K, so that (x, y) ∈ U(xi ,yi) for some i ∈ {1, . . . , m}. Thus for n = n(xi ,yi ) we have d(f n (x), f n (y)) < c − δ and |n| 6 N as requred. Version: 2 Owner: Koro Author(s): Koro

1709

Chapter 462 37C10 – Vector fields, flows, ordinary differential equations 462.1

flow

A flow on a set X is a group action of (R, +) on X. More explicitly, a flow is a function ϕ : X × R → X satisfying the following properties: 1. ϕ(x, 0) = x 2. ϕ(ϕ(x, t), s) = ϕ(x, s + t) for all s, t in R and x ∈ X. The set O(x, ϕ) = {ϕ(x, t) : t ∈ R} is called the orbit of x by ϕ. Flows are usually required to be continuous or even differentiable, when the space X has some additional structure (e.g. when X is a topological space or when X = Rn .) The most common examples of flows arise from describing the solutions of the autonomous ordinary differential equation y 0 = f (y), y(0) = x (462.1.1) as a function of the initial condition x, when the equation has existence and uniqueness of solutions. That is, if (461.1.1) has a unique solution ψx : R → X for each x ∈ X, then ϕ(x, t) = ψx (t) defines a flow. Version: 3 Owner: Koro Author(s): Koro

1710

462.2

globally attracting fixed point

An attracting fixed point is considered globally attracting if its stable manifold is the entire space. Equivalently, the fixed point x∗ is globally attracting if for all x, x(t) → x∗ as t → ∞. Version: 4 Owner: mathcam Author(s): mathcam, yark, armbrusterb

1711

Chapter 463 37C20 – Generic properties, structural stability 463.1

Kupka-Smale theorem

Let M be a compact smooth manifold. For every k ∈ N, the set of Kupka-Smale diffeomorphisms is residual in Diff k (M) (the space of all Ck diffeomorphisms from M to itself endowed with the uniform or strong Ck topology, also known as the Whitney Ck topology). Version: 2 Owner: Koro Author(s): Koro

463.2

Pugh’s general density theorem

Let M be a compact smooth manifold. There is a residual subset of Diff 1 (M) in which every element f satisfies Per(f ) = Ω(f ). In other words: generically, the set of periodic points of a C1 diffeomorphism is dense in its nonwandering set. Here, Diff 1 (M) denotes the set of all C1 difeomorphisms from M to itself, endowed with the (strong) C1 topology.

REFERENCES 1. Pugh, C., An improved closing lemma and a general density theorem, Amer. J. Math. 89 (1967).

Version: 5 Owner: Koro Author(s): Koro

1712

463.3

structural stability

Given a metric space (X, d) and an homeomorphism f : X → X, we say that f is structurally stable if there is a neighborhood V of f in Homeo(X) (the space of all homeomorphisms mapping X to itself endowed with the compact-open topology) such that every element of V is topologically conjugate to f . If M is a compact smooth manifold, a Ck diffeomorphism f is said to be Ck structurally stable if there is a neighborhood of f in Diff k (M) (the space of all Ck diffeomorphisms from M to itself endowed with the strong Ck topology) in which every element is topologically conjugate to f . If X is a vector field in the smooth manifold M, we say that X is Ck -structurally stable if there is a neighborhood of X in Xk (M) (the space of all Ck vector fields on M endowed with the strong Ck topology) in which every element is topologically equivalent to X, i.e. such that every other field Y in that neighborhood generates a flow on M that is topologically equivalent to the flow generated by X. Remark. The concept of structural stability may be generalized to other spaces of functions with other topologies; the general idea is that a function or flow is structurally stable if any other function or flow close enough to it has similar dynamics (from the topological viewpoint), which essentially means that the dynamics will not change under small perturbations. Version: 5 Owner: Koro Author(s): Koro

1713

Chapter 464 37C25 – Fixed points, periodic points, fixed-point index theory 464.1

hyperbolic fixed point

Let M be a smooth manifold. A fixed point x of a diffeomorphism f : M → M is said to be a hyperbolic fixed point if Df (x) is a linear hyperbolic isomorphism. If x is a periodic point of least period n, it is called a hyperbolic periodic point if it is a hyperbolic fixed point of f n (the n-th iterate of f ). If the dimension of the stable manifold of a fixed point is zero, the point is called a source; if the dimension of its unstable manifold is zero, it is called a sink; and if both the stable and unstable manifold have nonzero dimension, it is called a saddle. Version: 3 Owner: Koro Author(s): Koro

1714

Chapter 465 37C29 – Homoclinic and heteroclinic orbits 465.1

heteroclinic

Let f be an homeomorphism mapping a topological space X to itself or a flow on X. An heteroclinic point, or heteroclinic intersection, is a point that belongs to the intersection of the stable set of x with the unstable set of y, where xTand y are two different fixed or periodic points of f , i.e. a point that belongs to W s (f, x) W u (f, y). Version: 1 Owner: Koro Author(s): Koro

465.2

homoclinic

If X is a topological space and f is a flow on X or an homeomorphism mapping X to itself, we say that x ∈ X is an homoclinic point (or homoclinic intersection) if it belongs to both the stable and unstable sets of some fixed or periodic point p; i.e. \ x ∈ W s (f, p) W u (f, p).

The orbit of an homoclinic point is called an homoclinic orbit. Version: 2 Owner: Koro Author(s): Koro

1715

Chapter 466 37C75 – Stability theory 466.1

attracting fixed point

A fixed point is considered attracting if there exists a small neighborhood of the point in its stable manifold. Equivalently, the fixed point x∗ is attracting if there exists a δ > 0 such that for all x, d(x, x∗ ) < δ implies x(t) → x∗ as t → ∞. The stability of a fixed point can also be classified as stable, unstable, neutrally stable, and Liapunov stable. Version: 2 Owner: alinabi Author(s): alinabi, armbrusterb

466.2

stable manifold

Let X be a topological space, and f : X → X a homeomorphism. If p is a fixed point for f , the stable and unstable sets of p are defined by W s (f, p) = {q ∈ X : f n (q) −−−→ p}, n→∞

u

W (f, p) = {q ∈ X : f

−n

(q) −−−→ p}, n→∞

respectively. If p is a periodic point of least period k, then it is a fixed point of f k , and the stable and unstable sets of p are W s (f, p) = W s (f k , p) W u (f, p) = W u (f k , p).

1716

Given a neighborhood U of p, the local stable and unstable sets of p are defined by s Wloc (f, p, U) = {q ∈ U : f n (q) ∈ U for each n > 0}, u s Wloc (f, p, U) = Wloc (f −1 , p, U).

If X is metrizable, we can define the stable and unstable sets for any point by W s (f, p) = {q ∈ U : d(f n (q), f n (p)) −−−→ 0}, n→∞

u

s

W (f, p) = W (f

−1

, p),

where d is a metric for X. This definition clearly coincides with the previous one when p is a periodic point. Suppose now that X is a compact smooth manifold, and f is a Ck diffeomorphism, k > 1. If p is a hyperbolic periodic point, the stable manifold theorem assures that for some neighborhood U of p, the local stable and unstable sets are Ck embedded disks, whose tangent spaces at p are E s and E u (the stable and unstable spaces of Df (p)), respectively; moreover, they vary continuously (in certain sense) in a neighborhood of f in the Ck topology of Diff k (X) (the space of all Ck diffeomorphisms from X to itself). Finally, the stable and unstable sets are Ck injectively immersed disks. This is why they are commonly called stable and unstable manifolds. This result is also valid for nonperiodic points, as long as they lie in some hyperbolic set (stable manifold theorem for hyperbolic sets). Version: 7 Owner: Koro Author(s): Koro

1717

Chapter 467 37C80 – Symmetries, equivariant dynamical systems 467.1

Γ-equivariant

Let Γ be a compact Lie group acting linearly on V and let g be a mapping defined as g : V → V . Then g is Γ-equivariant if g(γv) = γg(v) for all γ ∈ Γ, and all v ∈ V . Therefore if g commutes with Γ then g is Γ-equivariant. [GSS]

REFERENCES [GSS] Golubsitsky, Matin. Stewart, Ian. Schaeffer, G. David.: Singularities and Groups in Bifurcation Theory (Volume II). Springer-Verlag, New York, 1988.

Version: 2 Owner: Daume Author(s): Daume

1718

Chapter 468 37D05 – Hyperbolic orbits and sets 468.1

hyperbolic isomorphism

Let X be a Banach space and T : X → X a continuous linear isomorphism. We say that TT is an hyperbolic isomorphism if its spectrum is disjoint with the unit circle, i.e. σ(T ) {z ∈ C : |z| = 1} = ∅.

If this is the case, then there is a splitting of X into two invariant subspaces, X = E s ⊕ E u (and therefore, a corresponding splitting of T into two operators T s : E s → E s and T u : E u → E u , i.e. T = T s ⊕T u ), and an equivalent norm k · k1 in X such that T s is a contraction and T u is an expansion with respect to this norm. That is, there are constants λs and λu , with 0 < λs , λu < 1 such that for all xs ∈ E s and xu ∈ E u , kT s xs k1 < λs kxs k1 and k(T u )−1 xu k1 < λu kxu k1 . Version: 4 Owner: Koro Author(s): Koro

1719

Chapter 469 37D20 – Uniformly hyperbolic systems (expanding, Anosov, Axiom A, etc.) 469.1

Anosov diffeomorphism

If M is a compact smooth manifold, a diffeomorphism f : M → M (or a flow φ : R × M → M) such that the whole space M is an hyperbolic set for f (or φ) is called an Anosov diffeomorphism (or flow). Anosov diffeomorphisms were introduced by D.V. Anosov, who proved that they are C1 structurally stable and form an open subset of C1 (M, M) with the C 1 topology. Not every manifold admits an Anosov diffeomorphism; for example, there are no such diffeomorphisms on the sphere S n . The simplest examples of compact manifolds admiting them are the tori Tn : they admit the so called linear Anosov diffeomorphisms, which are isomorphisms of Tn having no eigenvalue of modulus 1. It was proved that any other Anosov diffeomorphism in Tn is topologically conjugate to one of this kind. The problem of classifying manifolds that admit Anosov diffeomorphisms showed to be very difficult and still has no answer. The only known examples of these are the tori and infranilmanifolds, and it is conjectured that they are the only ones. Another famous problem that still remains open is to determine whether or not the nonwandering set of an Anosov diffeomorphism must be the whole manifold M. This is known to be true for linear Anosov diffeomorphisms (and hence for any Anosov diffeomorphism in a torus). Version: 1 Owner: Koro Author(s): Koro

1720

469.2

Axiom A

Let M be a smooth manifold. We say that a diffeomorphism f : M → M satisfies (Smale’s) Axiom A (or that f is an Axiom A diffeomorphism) if 1. the nonwandering set Ω(f ) has a hyperbolic structure; 2. the set of periodic points of f is dense in Ω(f ): Per(f ) = Ω(f ). Version: 3 Owner: Koro Author(s): Koro

469.3

hyperbolic set

Let M be a compact smooth manifold, and let f : M → M be a diffeomorphism. An f invariant subset Λ of M is said to be hyperbolic (or to have an hyperbolic structure) if there is a splitting of the tangent bundle of M restricted to Λ into a (Whitney) sum of two Df -invariant subbundles, E s and E u such that the restriction of Df |E s is a contraction and Df |E u is an expansion. This means that there are constants 0 < λ < 1 and c > 0 such that 1. TΛ M = E s ⊕ E u ; 2. Df (x)Exs = Efs(x) and Df (x)Exs = Efu(x) for each x ∈ Λ; 3. kDf n vk < cλn kvk for each v ∈ E s and n > 0; 4. kDf −n vk < cλn kvk for each v ∈ E u and n > 0. using some Riemannian metric on M. If Λ is hyperbolic, then there exists an adapted Riemannian metric, i.e. one such that c = 1. Version: 1 Owner: Koro Author(s): Koro

1721

Chapter 470 37D99 – Miscellaneous 470.1

Kupka-Smale

A diffeomorphism f mapping a smooth manifold M to itself is called a Kupka-Smale diffeomorphism if 1. every periodic point of f is hyperbolic; 2. for each pair of periodic points p,q of f , the intersection between the stable manifold of p and the unstable manifold of q is transversal. Version: 1 Owner: Koro Author(s): Koro

1722

Chapter 471 37E05 – Maps of the interval (piecewise continuous, continuous, smooth) 471.1

Sharkovskii’s theorem

Every natural number can be written as 2r p, where p is odd, and r is the maximum exponent such that 2r divides the given number. We define the Sharkovskii ordering of the natural numbers in this way: given two odd numbers p and q, and two nonnegative integers r and s, then 2r p  2s q if 1. r < s and p > 1; 2. r = s and p < q; 3. r > s and p = q = 1. This defines a linear ordering of N, in which we first have 3, 5, 7, . . . , followed by 2·3, 2·5, . . . , followed by 22 · 3, 22 · 5, . . . , and so on, and finally 2n+1 , 2n , . . . , 2, 1. So it looks like this: 3  5  · · ·  3 · 2  5 · 2  · · ·  3 · 2n  5 · 2n  · · ·  22  2  1. Sharkovskii’s theorem. Let I ⊂ R be an interval, and let f : I → R be a continuous function. If f has a periodic point of least period n, then f has a periodic point of least period k, for each k such that n  k. Version: 3 Owner: Koro Author(s): Koro

1723

Chapter 472 37G15 – Bifurcations of limit cycles and periodic orbits 472.1

Feigenbaum constant

The Feigenbaum delta constant has the value δ = 4.669211660910299067185320382047 . . . It governs the structure and behavior of many types of dynamical systems. It was discovered in the 1970’s by Mitchell Feigenbaum, while studying the logistic map y 0 = µ · y(1 − y), which produces the Feigenbaum tree:

Generated by GNU Octave and GNUPlot.

If the bifurcations in this tree (first few shown as dotted blue lines) are at points b1 , b2 , b3 , . . ., then bn − bn−1 =δ n→∞ bn+1 − bn lim

That is, the ratio of the intervals between the bifurcation points approaches Feigenbaum’s constant. 1724

However, this is only the beginning. Feigenbaum discovered that this constant arose in any dynamical system that approaches chaotic behavior via period-doubling bifurcation, and has a single quadratic maximum. So in some sense, Feigenbaum’s constant is a universal constant of chaos theory. Feigenabum’s constant appears in problems of fluid-flow turbulence, electronic oscillators, chemical reactions, and even the Mandelbrot set (the ”budding” of the Mandelbrot set along the negative real axis occurs at intervals determined by Feigenbaum’s constant).

References. • “What is Feigenbaum’s constant?”: http://fractals.iuta.u-bordeaux.fr/sci-faq/feigenbaum.html • “Bifurcations”: http://mcasco.com/bifurcat.html

• “Feigenbaum’s Universal Constant”: http://www.stud.ntnu.no/ berland/math/feigenbaum/feigconst • “Feigenbaum’s Constant”: http://home.imf.au.dk/abuch/feigenbaum.html • “Bifurcation”: http://spanky.triumf.ca/www/fractint/bif type.html Version: 2 Owner: akrowne Author(s): akrowne

472.2

Feigenbaum fractal

A Feigenbaum fractal is any bifurcation fractal produced by a period-doubling cascade. The “canonical” Feigenbaum fractal is produced by the logistic map (a simple population model), y 0 = µ · y(1 − y) where µ is varied smoothly along one dimension. The logistic iteration either terminates in a cycle (set of repeating values) or behaves chaotically. If one plots the points of this cycle versus the µ-value, a graph like the following is produced:

Note the distinct bifurcation (branching) points and the chaotic behavior as µ increases. Many other iterations will generate this same type of plot, for example the iteration 1725

p0 = r · sin(π · p) One of the most amazing things about this class of fractals is that the bifurcation intervals are always described by Feigenbaum’s constant. Octave/Matlab code to generate the above image is available here.

References.

• “Quadratic Iteration, bifurcation, and chaos”: http://mathforum.org/advanced/robertd/bifurcation.h • “Bifurcation”: http://spanky.triumf.ca/www/fractint/bif type.html • “Feigenbaum’s Constant”: http://fractals.iuta.u-bordeaux.fr/sci-faq/feigenbaum.html Version: 3 Owner: akrowne Author(s): akrowne

472.3

equivariant Hopf theorem

Let the system of ordinary differential equations x˙ + f(x, λ) = 0 where f : Rn ×R → Rn is smooth and commutes with a compact Lie group Γ(Γ-equivariant). In addition we assume that Rn is Γ-simple and we choose a basis of coordinates such that   0 −Im (df )0,0 ⇔ Im 0 where m = n/2. Furthermore let the eigenvalues of (df )0,0 be defined as σ(λ) ± iρ(λ) and σ(0) ˙ 6= 0. Suppose that dim Fix(Σ) = 2 where Σ is an isotropy subgroup Σ ⊂ Γ×S1 acting on Rn . Then there exists a unique branch of small-amplitude periodic solutions to x˙ + f(x, λ) = 0 with period near 2π, having Σ as their group of symmetries. [GSS]

1726

REFERENCES [GSS] Golubsitsky, Matin. Stewart, Ian. Schaeffer, G. David.: Singularities and Groups in Bifurcation Theory (Volume II). Springer-Verlag, New York, 1988.

Version: 1 Owner: Daume Author(s): Daume

1727

Chapter 473 37G40 – Symmetries, equivariant bifurcation theory 473.1

Po´ enaru (1976) theorem

~ Let Γ be a compact Lie group and let g1 , . . . , gr generate the module P(Γ)(the space of Γ-equivariant polynomial mappings) of Γ-equivariant polynomials over the ring P(Γ)(the ring of Γ-invariant polynomial). Then g1 , . . . , gr generate the module ~E(Γ)(the space of Γequivariant germs at the origin of C ∞ mappings) over the ring E(Γ)(the ring of Γ-invariant germs). [GSS]

REFERENCES [GSS] Golubsitsky, Matin. Stewart, Ian. Schaeffer, G. David: Singularities and Groups in Bifurcation Theory (Volume II). Springer-Verlag, New York, 1988. [PV] Po´enaru, V.:Singularit´es C ∞ en Pr´esence de Sym´etrie. Lecture Notes in Mathematics 510, Springer-Verlag, Berlin, 1976.

Version: 1 Owner: Daume Author(s): Daume

473.2

bifurcation problem with symmetry group

Let Γ be a Lie group acting on a vector space V and let the system of ordinary differential equations x˙ + g(x, λ) = 0

1728

where g : Rn ×R → Rn is smooth. Then g is called a bifurcation problem with symmetry group Γ if g ∈ ~Ex,λ(Γ) (where ~E(Γ) is the space of Γ-equivariant germs, at the origin, of C ∞ mappings of V into V ) satisfying g(0, 0) = 0 and (dg)0,0 = 0 where (dg)0,0 denotes the Jacobian matrix evaluated at (0, 0). [GSS]

REFERENCES [GSS] Golubsitsky, Matin. Stewart, Ian. Schaeffer, G. David.: Singularities and Groups in Bifurcation Theory (Volume II). Springer-Verlag, New York, 1988.

Version: 1 Owner: Daume Author(s): Daume

473.3

trace formula

Let Γ be a compact Lie group acting on V and let Σ ⊂ Γ be a Lie subgroup. Then dim Fix(Σ) = intΣ trace(σ) where int denotes the normalized Haar integral on Σ and Fix(Σ) is the fixed-point subspace of Σ .

REFERENCES [GSS] Golubsitsky, Matin. Stewart, Ian. Schaeffer, G. David: Singularities and Groups in Bifurcation Theory (Volume II). Springer-Verlag, New York, 1988.

Version: 2 Owner: Daume Author(s): Daume

1729

Chapter 474 37G99 – Miscellaneous 474.1

chaotic dynamical system

As Strogatz says in reference [1], ”No definition of the term chaos is universally accepted yet, but almost everyone would agree on the three ingredients used in the following working definition”. Chaos is aperiodic long-term behavior in a deterministic system that exhibits sensitive dependence on initial conditions. Aperiodic long-term behavior means that there are trajectories which do not settle down to fixed points, periodic orbits, or quasiperiodic orbits as t → ∞. For the purposed of this definition, a trajectory which approaches a limit of ∞ as t → ∞ should be considered to have a fixpoint at ∞. Sensitive dependence on initial conditions means that nearby trajectories separate exponentially fast, i.e., the system has a positive Liapunov exponent. Strogatz notes that he favors additional contraints on the aperidodic long-term behavior, but leaves open what form they may take. He suggests two alternatives to fulfill this: 1. Requiring that ∃ an open set of initial conditions having aperiodic trajectories, or 2. If one picks a random initial condition x(0) then there must be a nonzero chance of the associated trajectory x(t) being aperiodic.

474.1.1

References

1. Steven H. Strogatz, ”Nonlinear Dynamics and Chaos”. Westview Press, 1994. 1730

Version: 2 Owner: bshanks Author(s): bshanks

1731

Chapter 475 37H20 – Bifurcation theory 475.1

bifurcation

Bifurcation refers to the splitting of attractors, as in dynamical systems. For example, the branching of the Feigenbaum tree is an instance of bifurcation. A cascade of bifurcations is a precursor to chaos.

REFERENCES 1. “Bifurcations”, http://mcasco.com/bifurcat.html 2. “Bifurcation”, http://spanky.triumf.ca/www/fractint/bif type.html 3. “Quadratic Iteration, bifurcation, and chaos”, http://mathforum.org/advanced/robertd/bifurcation.html

Version: 2 Owner: akrowne Author(s): akrowne

1732

Chapter 476 39B05 – General 476.1

functional equation

A functional equation is an equation whose unknowns are functions. f (x+y) = f (x)+f (y), f (x·y) = f (x)·f (y) are examples of such equations. The systematic study of these didn’t begin before the 1960’s, although various mathematicians have been studying them before, including Euler and Cauchy just to mention a few. Functional equations appear many places, for example, the gamma function and Riemann’s zeta function both satisfy functional equations. Version: 4 Owner: jgade Author(s): jgade

1733

Chapter 477 39B62 – Functional inequalities, including subadditivity, convexity, etc. 477.1

Jensen’s inequality

If f is a convex function on the interval [a, b], then ! n n X X f λk xk 6 λk f (xk ) k=1

k=1

where 0 ≤ λk ≤ 1, λ1 + λ2 + · · · + λn = 1 and each xk ∈ [a, b]. If f is a concave function, the inequality is reversed. Example: f (x) = x2 is a convex function on [0, 10]. Then (0.2 · 4 + 0.5 · 3 + 0.3 · 7)2 6 0.2(42 ) + 0.5(32 ) + 0.3(72 ).

A very special case of this inequality is when λk = n1 because then ! n n 1X 1X xk ≤ f (xk ) f n k=1 n k=1 that is, the value of the function at the mean of the xk is less or equal than the mean of the values of the function at each xk . There is another formulation of Jensen’s inequality used in probability: Let X be some random variable, and let f (x) be a convex function (defined at least on a 1734

segment containing the range of X). Then the expected value of f (X) is at least the value of f at the mean of X: E f (X) ≥ f (E X).

With this approach, the weights of the first form can be seen as probabilities. Version: 2 Owner: drini Author(s): drini

477.2

proof of Jensen’s inequality

We prove an equivalent, more convenient formulation: Let X be some random variable, and let f (x) be a convex function (defined at least on a segment containing the range of X). Then the expected value of f (X) is at least the value of f at the mean of X: E f (X) ≥ f (E X). Indeed, let c = E X. Since f (x) is convex, there exists a supporting line for f (x) at c: ϕ(x) = α(x − c) + f (c)

for some α, and ϕ(x) ≤ f (x). Then as claimed.

E f (X) ≥ E ϕ(X) = E α(X − c) + f (c) = f (c)

Version: 2 Owner: ariels Author(s): ariels

477.3

proof of arithmetic-geometric-harmonic means inequality

We can use the Jensen inequality for an easy proof of the arithmetic-geometric-harmonic means inequality. Let x1 , . . . , xn > 0; we shall first prove that √ x1 + . . . + xn n . x1 · . . . · xn ≤ n Note that log is a concave function. Applying it to the arithmetic mean of x1 , . . . , xn and using Jensen’s inequality, we see that log(

x1 + . . . + xn log(x1 ) + . . . + log(xn ) )> n n log(x1 · . . . · xn ) = √ n = log n x1 · . . . · xn . 1735

Since log is also a monotone function, it follows that the arithmetic mean is at least as large as the geometric mean. The proof that the geometric mean is at least as large as the harmonic mean is the usual one (see “proof of arithmetic-geometric-harmonic means inequality”). Version: 4 Owner: mathcam Author(s): mathcam, ariels

477.4

subadditivity

A sequence {an }∞ n=1 is called subadditive if it satisfies the inequality an+m 6 an + am

for all n and m.

(477.4.1)

The major reason for use of subadditive sequences is the following lemma due to Fekete. Lemma 10 ([1]). For every subadditive sequence {an }∞ n=1 the limit lim an /n exists and equal to infan /n. Similarly, a function f (x) is subadditive if f (x + y) 6 f (x) + f (y)

for all x and y.

The analogue of Fekete lemma holds for subadditive functions as well. There are extensions of Fekete lemma that do not require (476.5.1) to hold for all m and n. There are also results that allow one to deduce the rate of convergence to the limit whose existence is stated in Fekete lemma if some kind of both super- and subadditivity is present. A good exposition of this topic may be found in [2].

REFERENCES 1. Gy¨orgy Polya and G´abor Szeg¨o. Problems and theorems in analysis, volume 1. 1976. Zbl 0338.00001. 2. Michael J. Steele. Probability theory and combinatorial optimization, volume 69 of CBMS-NSF Regional Conference Series in Applied Mathematics. SIAM, 1997. Zbl 0916.90233.

Version: 6 Owner: bbukh Author(s): bbukh

477.5

superadditivity

A sequence {an }∞ n=1 is called superadditive if it satisfies the inequality an+m > an + am

for all n and m. 1736

(477.5.1)

The major reason for use of superadditive sequences is the following lemma due to Fekete. Lemma 11 ([1]). For every superadditive sequence {an }∞ n=1 the limit lim an /n exists and equal to sup an /n. Similarly, a function f (x) is superadditive if f (x + y) > f (x) + f (y)

for all x and y.

The analogue of Fekete lemma holds for superadditive functions as well. There are extensions of Fekete lemma that do not require (476.5.1) to hold for all m and n. There are also results that allow one to deduce the rate of convergence to the limit whose existence is stated in Fekete lemma if some kind of both super- and subadditivity is present. A good exposition of this topic may be found in [2].

REFERENCES 1. Gy¨orgy Polya and G´abor Szeg¨o. Problems and theorems in analysis, volume 1. 1976. Zbl 0338.00001. 2. Michael J. Steele. Probability theory and combinatorial optimization, volume 69 of CBMS-NSF Regional Conference Series in Applied Mathematics. SIAM, 1997. Zbl 0916.90233.

Version: 5 Owner: bbukh Author(s): bbukh

1737

Chapter 478 40-00 – General reference works (handbooks, dictionaries, bibliographies, etc.) 478.1

Cauchy product

Let ak and bk be two sequences of real or complex numbers for k ∈ N0 ( N0 is the set of natural numbers containing zero). The Cauchy product is defined by: (a ◦ b)(k) =

k X

al bk−l .

(478.1.1)

l=0

This the convolution for two sequences. Therefore the product of two series P∞ is basically P∞ a , b k=0 k k=0 k is given by: ! ! ∞ ∞ ∞ ∞ X k X X X X ck = ak · bk = al bk−l . (478.1.2) k=0

k=0

k=0

k=0 l=0

P∞ A sufficient condition for the resulting series k=0 ck to be absolutely convergent is that P∞ P∞ k=0 bk both converge absolutely . k=0 ak and

Version: 4 Owner: msihl Author(s): msihl

1738

478.2

Cesro mean

Definition Let {an }∞ aro n=0 be a sequence of real (or possibly complex numbers). The Ces` mean of the sequence {an } is the sequence {bn }∞ with n=0 n

1 X bn = ai . n + 1 i=0

(478.2.1)

Properties 1. A key property of the Ces`aro mean is that it has the same limit as the original sequence. In other words, if {an } and {bn } are as above, and an → a, then bn → a. In particular, if {an } converges, then {bn } converges too. Version: 5 Owner: mathcam Author(s): matte, drummond

478.3

alternating series

An alternating series is of the form ∞ X

(−1)i ai

i=0

or ∞ X

(−1)i+1 ai

i=0

where (an ) is a non-negative sequence. Version: 2 Owner: vitriol Author(s): vitriol

478.4

alternating series test

The alternating series test, or the Leibniz’s Theorem, states the following: Theorem [1, 2] Let (an )∞ sequence or real numbers such n=1 be a non-negative, P non-increasing (n+1) that limn→∞ an = 0. Then the infinite sum ∞ an converges. i=1 (−1) 1739

This test provides a sufficient (but not necessary) condition for the convergence of an alternating series, and is therefore often used as a simple first test for convergence of such series. The condition limn→∞ an = 0 is necessary for convergence of an alternating series. P∞ P 1 k+1 1 Example: The series ∞ k=1 k does not converge, but the alternating series k=1 (−1) k converges to ln(2).

REFERENCES 1. W. Rudin, Principles of Mathematical Analysis, McGraw-Hill Inc., 1976. 2. E. Kreyszig, Advanced Engineering Mathematics, John Wiley & Sons, 1993, 7th ed.

Version: 10 Owner: Koro Author(s): Larry Hammick, matte, saforres, vitriol

478.5

monotonic

A sequence (sn ) is said to be monotonic if it is • monotonically increasing • monotonically decreasing • monotonically nondecreasing • monotonically nonincreasing Intuitively, this means that the sequence can be thought of a “staircase” going either only up, or only down, with the stairs any height and any depth. Version: 1 Owner: akrowne Author(s): akrowne

478.6

monotonically decreasing

A sequence (sn ) is monotonically decreasing if s m < sn ∀ m > n 1740

Compare this to monotonically nonincreasing. Version: 4 Owner: akrowne Author(s): akrowne

478.7

monotonically increasing

A sequence (sn ) is called monotonically increasing if s m > sn ∀ m > n Compare this to monotonically nondecreasing. Version: 3 Owner: akrowne Author(s): akrowne

478.8

monotonically nondecreasing

A sequence (sn ) is called monotonically nondecreasing if sm ≥ sn ∀ m > n Compare this to monotonically increasing. Version: 2 Owner: akrowne Author(s): akrowne

478.9

monotonically nonincreasing

A sequence (sn ) is monotonically nonincreasing if sm ≤ sn for all m > n Compare this to monotonically decreasing. Examples. • (sn ) = 1, 0, −1, −2, . . . is monotonically nonincreasing. It is also monotonically decreasing. 1741

• (sn ) = 1, 1, 1, 1, . . . is nonincreasing but not monotonically decreasing. 1 ) is nonincreasing (note that n is nonnegative). • (sn ) = ( n+1

• (sn ) = 1, 1, 2, 1, 1, . . . is not nonincreasing. It also happens to fail to be monotonically nondecreasing. • (sn ) = 1, 2, 3, 4, 5, . . . is not nonincreasing, rather it is nondecreasing (and monotonically increasing). Version: 5 Owner: akrowne Author(s): akrowne

478.10

sequence

Sequences Given any set X, a sequence in X is a function f : N −→ X from the set of natural numbers to X. Sequences are usually written with subscript notation: x0 , x1 , x2 . . . , instead of f (0), f (1), f (2) . . . .

Generalized sequences One can generalize the above definition to any arbitrary ordinal. For any set X, a generalized sequence in X is a function f : ω −→ X where ω is any ordinal number. If ω is a finite ordinal, then we say the sequence is a finite sequence. Version: 5 Owner: djao Author(s): djao

478.11

series

Given P a sequence of real numbers {an }P we can define a sequence of partial sums {SN }, where ∞ a . We define the series SN = N n=1 an to be the limit of these partial sums. More n=1 n precisely N ∞ X X an . an = lim Sn = lim n=1

N →∞

N →∞

n=1

The elements of the sequence {an } are called the terms of the series.

Traditionally, as above, series are infinite sums of real numbers. However, the formal constraints on the terms {an } are much less strict. We need only be able to add the terms and take the limit of partial sums. So in full generality the terms could be complex numbers or even elements of certain rings, fields, and vector spaces. Version: 2 Owner: igor Author(s): igor

1742

Chapter 479 40A05 – Convergence and divergence of series and sequences 479.1

Abel’s lemma

N (or complex) numbers with N ≥ 0. Theorem 1 Let {ai }N i=0 and {bi }i=0 be sequences of Preal n For n = 0, . . . , N, let An be the partial sum An = i=0 ai . Then N X

ai bi =

i=0

N −1 X i=0

Ai (bi − bi+1 ) + AN bN .

In the trivial case, when N = 0, then sum on the right hand side should be interpreted as identically zero. In other words, if the upper limit is below the lower limit, there is no summation. An inductive proof can be found here. The result can be found in [1] (Exercise 3.3.5). If the sequences are indexed from M to N, we have the following variant: N numbers with 0 ≤ Corollary Let {ai }N i=M and {bi }i=M be sequences of real (or Pcomplex) n M ≤ N. For n = M, . . . , N, let An be the partial sum An = i=M ai . Then N X

i=M

ai bi =

N −1 X

i=M

Ai (bi − bi+1 ) + AN bN .

Proof. By defining a0 = . . . = aM −1 = b0 = . . . = bM −1 = 0, we can apply Theorem 1 to N the sequences {ai }N i=0 and {bi }i=0 . 2

1743

REFERENCES 1. R.B. Guenther, L.W. Lee, Partial Differential Equations of Mathematical Physics and Integral Equations, Dover Publications, 1988.

Version: 10 Owner: mathcam Author(s): matte, lieven

479.2

Abel’s test for convergence

P Suppose an converges and that (bn ) is a monotonic convergent sequence. Then the series P an bn converges. Version: 4 Owner: vypertd Author(s): vypertd

479.3

Baroni’s Theorem

Let (xn )n>0 be a sequence of real numbers such that lim (xn+1 −xn ) = 0. Let A = {xn |n ∈ N} n→∞

and A’ S the set of limit points of A. Then A’ is a (possibly degenerate) interval from R, where R = R {−∞, +∞} Version: 2 Owner: slash Author(s): slash

479.4

Bolzano-Weierstrass theorem

Given any bounded, real sequence (an ) there exists a convergent subsequence (anj ). More generally, any sequence (an ) in a compact set, has a convergent subsequence. Version: 6 Owner: vitriol Author(s): vitriol

479.5 A series

Cauchy criterion for convergence

P∞

i=0

ai is convergent iff for every ε > 0 there is a number N ∈ N such that |an+1 + an+2 + . . . + an+p | < ε

holds for all n > N and p > 1. 1744

Proof: First define sn :=

n X

ai .

i=0

Now by definition the series converges iff for every ε > 0 there is a number N, such that for all n, m > N holds: |sm − sn | < ε. We can assume m > n and thus set m = n + p. The the series is convergent iff |sn+p − sn | = |an+1 + an+2 + . . . + an+p | < ε. Version: 2 Owner: mathwizard Author(s): mathwizard

479.6 If

P

Cauchy’s root test

an is a series of positive real terms and √ n

an < k < 1

P √ for allPn > N, then an is convergent. If n an > 1 for an infinite number of values of n, then an is divergent. Limit form Given a series

P

an of complex terms, set ρ = lim sup n→∞

p n

|an |

P The series an is absolutely convergent if ρ < 1 and is divergent if ρ > 1. If ρ = 1, then the test is inconclusive. Version: 4 Owner: vypertd Author(s): vypertd

479.7

Dirichlet’s convergence test

Theorem. Let {an } and {bn } be sequencesPof real numbers such that { and {bn } decreases with 0 as limit. Then ∞ n=0 an bn converges. 1745

Pn

i=0

ai } is bounded

Proof. Let An :=

n X

Pn

i=0

ai bi =

i=m

=

an and let M be an upper bound for {|An |}. By Abel’s lemma, n X

i=0 n X i=0

=

n X i=m

|

n X i=m

ai bi | 6

n X i=m

6 M

ai bi −

m−1 X

ai bi

i=0

Ai (bi − bi+1 ) −

m−1 X i=0

Ai (bi − bi+1 ) + An bn − Am−1 bm−1

Ai (bi − bi+1 ) + An bn − Am−1 bm−1 |Ai (bi − bi+1 )| + |An bn | + |Am−1 bm−1 |

n X i=m

(bi − bi+1 ) + |An bn | + |Am−1 bm−1 |

P  (bi −bi+1 ) < 3M Since {bn } converges to 0, there is an N() P such that both ni=m and bi < P n for m, n > N(). Then, for m, n > N(), | i=m ai bi | <  and an bn converges.

 3M

Version: 1 Owner: lieven Author(s): lieven

479.8

Proof of Baroni’s Theorem

Let m = infA0 and M = sup A0 . If m = M we are done since the sequence is convergent and A0 is the degenerate interval composed of the point l ∈ R , where l = lim xn . n→∞

Now , assume that m < M . For every λ ∈ (m, M) , we will construct inductively two subsequences xkn and xln such that lim xkn = lim xln = λ and xkn < λ < xln n→∞

n→∞

From the definition of M there is an N1 ∈ N such that : λ < xN1 < M Consider the set of all such values N1 . It is bounded from below (because it consists only of natural numbers and has at least one element) and thus it has a smallest element . Let n1 be the smallest such element and from its definition we have xn1 −1 6 λ < xn1 . So , choose k1 = n1 − 1 , l1 = n1 . Now, there is an N2 > k1 such that : λ < xN2 < M

1746

Consider the set of all such values N2 . It is bounded from below and it has a smallest element n2 . Choose k2 = n2 − 1 and l2 = n2 . Now , proceed by induction to construct the sequences kn and ln in the same fashion . Since ln − kn = 1 we have : lim xkn = lim xln n→∞

n→∞

and thus they are both equal to λ. Version: 1 Owner: slash Author(s): slash

479.9

Proof of Stolz-Cesaro theorem

From the definition of convergence , for every  > 0 there is N() ∈ N such that (∀)n > N() , we have : an+1 − an N() be a natural number . Summing the last relation we get : (l − )

k X

(bi+1 − bi ) <

i=N ()

k X

i=N ()

(an+1 − an ) < (l + )

k X

(bi+1 − bi ) ⇒

i=N ()

(l − )(bk+1 − bN () ) < ak+1 − aN () < (l + )(bk+1 − bN () )

divide the last relation by bk+1 > 0 to get : (l − )(1 −

bN () bN () ak+1 aN () )< − < (l + )(1 − )⇔ bk+1 bk+1 bk+1 bk+1

bN () aN () bN () aN () ak+1 )+ < < (l + )(1 − )+ bk+1 bk+1 bk+1 bk+1 bk+1 This means that there is some K such that for k > K we have : ak+1 < (l + ) (l − ) < bk+1 (l − )(1 −

(since the other terms who were left out converge to 0) This obviously means that :

an =l n→∞ bn lim

and we are done . Version: 1 Owner: slash Author(s): slash 1747

479.10

Stolz-Cesaro theorem

Let (an )n>1 and (bn )n>1 be two sequences of real numbers. If bn is positive, strictly increasing and unbounded and the following limit exists: an+1 − an =l n→∞ bn+1 − bn lim

Then the limit: lim

n→∞

an bn

also exists and it is equal to l. Version: 4 Owner: Daume Author(s): Daume, slash

479.11

absolute convergence theorem

Every absolutely convergent series is convergent. Version: 1 Owner: paolini Author(s): paolini

479.12 The series

comparison test ∞ X

ai

i=0

with real ai is absolutely convergent if there is a sequence (bn )n∈N with positive real bn such that ∞ X bi i=0

is convergent and for all sufficiently large k holds |ak | 6 bk . P Also, the series ai is divergent if there is a sequence (bn ) with positive real bn , so that P bi is divergent and ak > bk for all sufficiently large k. Version: 1 Owner: mathwizard Author(s): mathwizard

1748

479.13

convergent sequence

A sequence x0 , x1 , x2 , . . . in a metric space (X, d) is a convergent sequence if there exists a point x ∈ X such that, for every real number  > 0, there exists a natural number N such that d(x, xn ) <  for all n > N. The point x, if it exists, is unique, and is called the limit point of the sequence. One can also say that the sequence x0 , x1 , x2 , . . . converges to x. A sequence is said to be divergent if it does not converge. Version: 4 Owner: djao Author(s): djao

479.14

convergent series

A series Σan is convergent iff the sequence of partial sums Σni=1 ai is convergent. A series Σan is said to be absolutely convergent if Σ|an | is convergent. Equivalently, a series Σan is absolutely convergent if and only if all possible rearrangements are also convergent. A series Σan which converges, but which is not absolutely convergent is called conditionally convergent. It can be shown that absolute convergence implies convergence. Let Σan be an absolutely convergent series, and Σbn be a conditionally convergent series. Then any rearrangement of Σan is convergent to the same sum. It is a result due to Riemann that Σbn can be rearranged to converge to any sum, or not converge at all. Version: 5 Owner: vitriol Author(s): vitriol

479.15

determining series convergence

Consider a series Σan . To determine whether Σan converges or diverges, several tests are available. There is no precise rule indicating which type of test to use with a given series. The more obvious approaches are collected below. • When the terms in Σan are positve, there are several possibilities: – the comparison test, – the root test (Cauchy’s root test), – the ratio test, – the integral test. 1749

• If the series is an alternating series, then the alternating series test may be used. • Abel’s test for convergence can be used when terms in Σan can be obained as the product of terms of a convergent series with terms of a monotonic convergent sequence. The root test and the ratio test are direct applications of the comparison test to the geometric series with terms (|an |)1/n and an+1 , respectively. an Version: 2 Owner: jarino Author(s): jarino

479.16

example of integral test

Consider the series

∞ X k=1

Since the integral

1 . k log k

1 dx = lim [log(log(x))]M 1 M →∞ x log x is divergent also the series considered is divergent. int∞ 1

Version: 2 Owner: paolini Author(s): paolini

479.17

geometric series

A geometric series is a series of the form n X

ar i−1

i=1

(with a and r real or complex numbers). The sum of a geometric series is

a(1 − r n ) 1−r

sn =

(479.17.1)

An infinite geometric series is a geometric series, as above, with n → ∞. It is denoted ∞ X

ar i−1

i=1

1750

If |r| ≥ 1, the infinite geometric series diverges. Otherwise it converges to ∞ X

ar i−1 =

i=1

a 1−r

(479.17.2)

Taking the limit of sn as n → ∞, we see that sn diverges if |r| ≥ 1. However, if |r| < 1, sn approaches (2). One way to prove (1) is to take

sn = a + ar + ar 2 + · · · + ar n−1 and multiply by r, to get

rsn = ar + ar 2 + ar 3 + · · · + +ar n−1 + ar n subtracting the two removes most of the terms:

sn − rsn = a − ar n factoring and dividing gives us a(1 − r n ) 1−r

sn = 

Version: 6 Owner: akrowne Author(s): akrowne

479.18

harmonic number

The harmonic number of order n of θ is defined as

Hθ (n) =

n X 1 iθ i=1

1751

Note that n may be equal to ∞, provided θ > 1. If θ ≤ 1, while n = ∞, the harmonic series does not converge and hence the harmonic number does not exist. If θ = 1, we may just write Hθ (n) as Hn (this is a common notation).

479.18.1

Properties

• If Re(θ) > 1 and n = ∞ then the sum is the Riemann zeta function. • If θ = 1, then we get what is known simply as“the harmonic number”, and it has many 1 important properties. For example, it has asymptotic expansion Hn = ln n+γ+ 2m +. . . where γ is Euler’s constant. • It is possible to define numbers for non-integral n. This is done by means of P harmonic −z the series Hn (z) = n>1 (n − (n + x)−z )1 . Version: 5 Owner: akrowne Author(s): akrowne

479.19

harmonic series

The harmonic series is ∞ X 1 h= n n=1

The harmonic series is known to diverge. This can be proven via the integral test; compare h with int∞ 1

1 dx. x

A harmonic series is any series of the form ∞ X 1 hp = np n=1 1

See “The Art of computer programming” vol. 2 by D. Knuth

1752

These are the so-called ”p-series.” When p > 1, these are known to converge (leading to the p-series test for series convergence). For complex-valued p, hp = ζ(p), the Riemann zeta function. A famous harmonic series is h2 (or ζ(2)), which converges to series of odd p has been solved analytically.

π2 . 6

In general no p-harmonic

A harmonic series which is not summed to ∞, but instead is of the form k X 1 hp (k) = np n=1

is called a harmonic series of order k of p. Version: 2 Owner: akrowne Author(s): akrowne

479.20

integral test

Consider a sequence (an ) = {a0 , a1 , a2 , a3 , . . .} and given M ∈ R consider any monotonically nonincreasing function f : [M, +∞) → R which extends the sequence, i.e. ∀n ≥ M

f (n) = an An example is an = 2n



f (x) = 2x

(the former being the sequence {0, 2, 4, 6, 8, . . .} and the later the doubling function for any real number. We are interested on finding out when the summation ∞ X

an

n=0

converges. The integral test states the following. The series

∞ X

an

n=0

converges if and only if the integral

int∞ M f (x) dx 1753

is finite. Version: 16 Owner: drini Author(s): paolini, drini, vitriol

479.21

proof of Abel’s lemma (by induction)

Proof. The proof is by induction. However, let us first recall that sum on the right side is a piece-wise defined function of the upper limit N − 1. In other words, if the upper limit is below the lower limit 0, the sum is identically set to zero. Otherwise, it is an ordinary sum. We therefore need to manually check the first two cases. For the trivial case N = 0, both sides equal to a0 b0 . Also, for N = 1 (when the sum is a normal sum), it is easy to verify that both sides simplify to a0 b0 + a1 b1 . Then, for the induction step, suppose that the claim holds for N ≥ 2. For N + 1, we then have N +1 X

ai bi =

i=0

= =

N X

i=0 N −1 X i=0 N X i=0

ai bi + aN +1 bN +1 Ai (bi − bi+1 ) + AN bN + aN +1 bN +1 Ai (bi − bi+1 ) − AN (bN − bN +1 ) + AN bN + aN +1 bN +1 .

Since −AN (bN − bN +1 ) + AN bN + aN +1 bN +1 = AN +1 bN +1 , the claim follows. 2. Version: 4 Owner: mathcam Author(s): matte

479.22

proof of Abel’s test for convergence

Let b be the limit of {bn } and let dn = bn − b when {bn }P is decreasing and dn = b − bn when {bn }Pis increasing. By an dn is convergent and so is P P Dirichlet’s P convergence test, an bn = an (b ± dn ) = b an ± an dn . Version: 1 Owner: lieven Author(s): lieven

479.23

proof of Bolzano-Weierstrass Theorem

To prove the Bolzano-Weierstrass theorem, we will first need two lemmas. Lemma 1. 1754

All bounded monotone sequences converge. proof. Let (sn ) be a bounded, nondecreasing sequence. Let S denote the set {sn : n ∈ N}. Then let b = sup S (the supremum of S.) Choose some  > 0. Then there is a corresponding N such that sN > b − . Since (sn ) is nondecreasing, for all n > N, sn > b − . But (sn ) is bounded, so we have b −  < sn ≤ b. But this implies |sn − b| < , so lim sn = b.  (The proof for nonincreasing sequences is analogous.) Lemma 2. Every sequence has a monotonic subsequence. proof. First a definition: call the nth term of a sequence dominant if it is greater than every term following it. For the proof, note that a sequence (sn ) may have finitely many or infinitely many dominant terms. First we suppose that (sn ) has infinitely many dominant terms. Form a subsequence (snk ) solely of dominant terms of (sn ). Then snk+1 < snk k by definition of “dominant”, hence (snk ) is a decreasing (monotone) subsequence of (sn ). For the second case, assume that our sequence (sn ) has only finitely many dominant terms. Select n1 such that n1 is beyond the last dominant term. But since n1 is not dominant, there must be some m > n1 such that sm > sn1 . Select this m and call it n2 . However, n2 is still not dominant, so there must be an n3 > n2 with sn3 > sn2 , and so on, inductively. The resulting sequence s1 , s2 , s3 , . . . is monotonic (nondecreasing).  proof of Bolzano-Weierstrass. The proof of the Bolzano-Weierstrass theorem is now simple: let (sn ) be a bounded sequence. By Lemma 2 it has a monotonic subsequence. By Lemma 1, the subsequence converges.  Version: 2 Owner: akrowne Author(s): akrowne

1755

479.24

proof of Cauchy’s root test

If for all n > N

√ n

an < k < 1

then an < k n < 1.

P∞ i P∞ √ n a Since so does n > 1 the by i=N k converges i=N an by the comparison test. If P∞ comparison with i=N 1 the series is divergent. Absolute convergence in case of nonpositive p n an can be proven in exactly the same way using |an |. Version: 1 Owner: mathwizard Author(s): mathwizard

479.25

proof of Leibniz’s theorem (using Dirichlet’s convergence test)

Proof. Let us define the sequence αn = (−1)n for n ∈ N = {0, 1, 2, . . .}. Then n X

αi =

i=0



1 for even n, 0 for odd n,

P so the sequence ni=0 αi is bounded. By assumption {an }∞ n=1 is a bounded decreasing sequence with limit 0. For P n ∈ N we set bn := an+1 . Using Dirichlet’s convergence test, it follows that the series ∞ i=0 αi bi converges. Since ∞ X i=0

∞ X αi bi = (−1)n+1 an , n=1

the claim follows. 2 Version: 4 Owner: mathcam Author(s): matte, Thomas Heye

479.26

proof of absolute convergence theorem

Suppose that notice that

P

an is absolutely convergent, i.e., that

P

|an | is convergent. First of all,

0 ≤ an + |an | ≤ 2|an |, P P and 2|an | = P since the series (an + |an |) has non-negative terms it can be compared with 2 |an | and hence converges. 1756

On the other hand

N X n=1

an =

N X n=1

(an + |an |) −

N X n=1

|an |.

Since both the partial sums on the right hand P side are convergent, the partial sum on the left hand side is also convergent. So, the series an is convergent. Version: 3 Owner: paolini Author(s): paolini

479.27

proof of alternating series test

If the first term a1 is positive then the series has partial sum S2n+2 = a1 − a2 + a3 + ... − a2n + a2n+1 − a2n+2 where the ai are all non-negative and non-increasing. If the first term is negative, consider the series in the absence of the first term. From above, we have S2n+1 = S2n + a2n+1 S2n+2 = S2n + (a2n+1 − a2n+2 )

S2n+3 = S2n+1 − (a2n+2 − a2n+3 ) = S2n+2 + a2n+3 .

Sincea2n+1 > a2n+2 > a2n+3 we have S2n+1 > S2n+3 > S2n+2 > S2n . Moreover, S2n+2 = a1 − (a2 − a3 ) − (a4 − a5 ) − · · · − (a2n − a2n+1 ) − a2n+2 Because the a1 are non-increasing, we haveSn > 0, for any n. Also, S2n+2 6 S2n+1 6 a1 . Thus a1 > S2n+1 > S2n+3 > S2n+2 > S2n > 0 Hence the even partial sums S2n and the odd partial sumsS2n+1 are bounded. The S2n are monotonically nondecreasing, while the odd sums S2n+1 are monotonically nonincreasing. Thus the even and odd series both converge. We note that S2n+1 − S2n = a2n+1 , therefore the sums converge to the same limit if and only if(an ) → 0. The theorem is then established. Version: 7 Owner: volator Author(s): volator

479.28

proof of comparison test

Assume |ak | 6 bk for all k > n. Then we define sk :=

∞ X i=k

1757

|ai |

and tk :=

∞ X

bi .

i=k

obviously sk 6 tk for all k > n. Since by assumption (tk ) isPconvergent (tk ) is bounded and so is (sk ). Also (sk ) is monotonic and therefore. Therefore ∞ i=0 ai is absolutely convergent. P P Now assume bk 6 ak for all k > n. If ∞ then so is ∞ i=k bi is divergentP i=k ai because otherwise ∞ we could apply the test we just proved and show that i=0 bi is convergent, which is is not by assumption. Version: 1 Owner: mathwizard Author(s): mathwizard

479.29

proof of integral test

Consider the function (see the definition of floor) g(x) = abxc . Clearly for x ∈ [n, n + 1), being f non increasing we have g(x + 1) = an+1 = f (n + 1) ≤ f (x) ≤ f (n) = an = g(x) hence +∞ +∞ +∞ int+∞ M g(x + 1) dx = intM +1 g(x) dx ≤ intM f (x) ≤ intM g(x) dx.

Since the integral of f and g on [M, M +1] is finite we notice that f is integrable on [M, +∞) if and only if g is integrable on [M, +∞). On the other hand g is locally constant so n+1 intn+1 n g(x) dx = intn an dx = an

and hence for all N ∈ Z

int+∞ N g(x)

=

∞ X

an

n=N

that is g is integrable on [N, +∞) if and only if

P∞

n=N

an is convergent.

But, again, intN on [M, +∞) if and only if g isPintegrable M g(x) dx PisN finite hence g is integrable P∞ on [N, +∞) and also n=0 an is finite so n=0 an is convergent if and only if ∞ n=N an is convergent. Version: 1 Owner: paolini Author(s): paolini

1758

479.30

proof of ratio test

Assume k < 1. By definition ∃N such that n > N → | an+1 − k| < 1−k | < 1+k → | an+1 <1 an 2 an 2 i.e. eventually the series |an | becomes less than a convergent geometric series, therefore a shifted subsequence of |an | converges by the comparison test. Note that a general sequence bn converges iff a shifted subsequence of bn converges. Therefore, by the absolute convergence theorem, the series an converges. Similarly for k > 1 a shifted subsequence of |an | becomes greater than a geometric series tending to ∞, and so also tends to ∞. Therefore an diverges. Version: 3 Owner: vitriol Author(s): vitriol

479.31

ratio test

| → k then: Let (an ) be a real sequence. If | an+1 an • k<1→ • k>1→

P

P

an converges absolutely an diverges

Version: 4 Owner: vitriol Author(s): vitriol

1759

Chapter 480 40A10 – Convergence and divergence of integrals 480.1

improper integral

Improper integrals are integrals of functions which either go to infinity at the integrands, between the integrands, or where the integrands are infinite. To evaluate these integrals, we use a limit process of the antiderivative. Thus we say that an improper integral converges and/or diverges if the limit converges or diverges. [examples and more exposition later] Version: 1 Owner: slider142 Author(s): slider142

1760

Chapter 481 40A25 – Approximation to limiting values (summation of series, etc.) 481.1

Euler’s constant

Euler’s constant γ is defined by

γ = lim 1 + n→∞

1 1 1 1 + + + · · · + − ln n 2 3 4 n

or equivalently

γ = lim

n→∞

n  X 1 i=1

  1 − ln 1 + i i

Euler’s constant has the value 0.57721566490153286060651209008240243104 . . . It is related to the gamma function by γ = −Γ0 (1) It is not known whether γ is rational or irrational. References.

1761

• Chris Caldwell - “Euler’s Constant”, http://primes.utm.edu/glossary/page.php/Gamma.html Version: 6 Owner: akrowne Author(s): akrowne

1762

Chapter 482 40A30 – Convergence and divergence of series and sequences of functions 482.1

Abel’s limit theorem

Suppose that

P

an xn has a radius of convergence r and that lim−

x→r

X

an xn =

X

an r n =

X

P

an r n is convergent. Then

( lim− an xn ) x→r

Version: 2 Owner: vypertd Author(s): vypertd

482.2

L¨ owner partial ordering

Let A and B be two Hermitian matrices of the same size. If A − B is positive semidefinite we write

A > B or B 6 A Note: > is a partial ordering, referred to as L¨owner partial ordering, on the set of hermitian matrices. Version: 3 Owner: Johan Author(s): Johan

1763

482.3

L¨ owner’s theorem

A real function f on an interval I is matrix monotone if and only if it is real analytic and has (complex) analytic continuations to the upper and lower half planes such that Im(f ) > 0 in the upper half plane. (L¨owner 1934) Version: 4 Owner: mathcam Author(s): Larry Hammick, yark, Johan

482.4

matrix monotone

A real function f on a real interval I is said to be matrix monotone of order n, if A 6 B ⇒ f (A) 6 f (B)

(482.4.1)

for all Hermitian n × n matrices A, B with spectra contained in I. Version: 5 Owner: Johan Author(s): Johan

482.5

operator monotone

A function is said to be operator monotone if it is matrix monotone of arbitrary order. Version: 2 Owner: Johan Author(s): Johan

482.6

pointwise convergence

Let X be any set, and let Y be a topological space. A sequence f1 , f2 , . . . of functions mapping X to Y is said to be pointwise convergent (or simply convergent) to another function f , if the sequence fn (x) converges to f (x) for each x in X. This is usually denoted by fn → f . Version: 1 Owner: Koro Author(s): Koro

1764

482.7

uniform convergence

Let X be any set, and let (Y, d) be a metric space. A sequence f1 , f2 , . . . of functions mapping X to Y is said to be uniformly convergent to another function f if, for each ε > 0, there exists N such that, for all x and all n > N, we have d(fn (x), f (x)) < ε. This is denoted by u fn − → f , or “fn → f uniformly” or, less frequently, by fn ⇒ f . Version: 8 Owner: Koro Author(s): Koro

1765

Chapter 483 40G05 – Ces` aro, Euler, N¨ orlund and Hausdorff methods 483.1

Ces` aro summability

Ces`aroPsummability is a generalized convergence criterion for infinite series. We say that a series ∞ aro summable if the Ces`aro means of the partial sums converge to some n=0 an is Ces` limit L. To be more precise, letting sN =

N X

an

n=0

denote the N th partial sum, we say that

P∞

n=0 an

Ces`aro converges to a limit L, if

1 (s0 + . . . + sN ) → L as N → ∞. N +1 Ces`aro summability is a generalization of the usual definition of the limit of an infinite series. Proposition 19. Suppose that

∞ X

an = L,

n=0

in the usual sense that sN → L as N → ∞. Then, the series in question Ces`aro converges to the same limit. The converse, however is false. The standard example of a divergent series, that is nonetheless Ces`aro summable is ∞ X (−1)n . n=0

1766

The sequence of partial sums 1, 0, 1, 0, . . . does not converge. The Ces`aro means, namely 1 1 2 2 3 3 , , , , , ,... 1 2 3 4 5 6 do converge, with 1/2 as the limit. Hence the series in question is Ces`aro summable. There is also a relation between Ces`aro summability and Abel summability 1 . Theorem 14 (Frobenius). A series that is Ces`aro summable is also Abel summable. To be more precise, suppose that 1 (s0 + . . . + sN ) → L N +1 Then, f (r) =

∞ X n=0

as well.

an r n → L

as N → ∞.

as r → 1−

Version: 3 Owner: rmilson Author(s): rmilson

1

This and similar results are often called Abelian theorems.

1767

Chapter 484 40G10 – Abel, Borel and power series methods 484.1

Abel summability

Abel summability is a generalized convergence criterion for power series. It extends the usual definition of the sum of aP series, and gives a way of summing up certain divergent series. Let us start with a series ∞ n=0 an , convergent or not, and use that series to define a power series ∞ X f (r) = an r n . n=0

Note that for |r| < 1 the summability of f (r) is easier to achieve thanP the summability of the original series. Starting with this observation we say that the series an is Abel summable if the defining series for f (r) is convergent forPall |r| < 1, and if f (r) converges to some limit L as r → 1− . If this is so, we shall say that an Abel converges to L. Of course it is important to ask whether an ordinary convergent series is also Abel summable, and whether it converges to the same limit? This is true, and the result is known as Abel’s convergence theorem, or simply as Abel’s theorem. P Theorem 15 (Abel). Let ∞ n=0 an be a series; let sN = a0 + . . . + aN ,

N ∈ N,

denote the corresponding partial sums; and let f (r) be the corresponding power series defined P as above. If an is convergent, in the usual sense that the sN converge to some limit L as N → ∞, then the series is also Abel summable and f (r) → L as r → 1− . The standard example of a divergent series that is nonetheless Abel summable is the alternating series ∞ X (−1)n . n=0

1768

The corresponding power series is ∞

X 1 = (−1)n r n . 1 + r n=0 Since

1 1 → 1+r 2

as r → 1− ,

this otherwise divergent series Abel converges to 21 . Abel’s theorem is the prototype for a number of other theorems about convergence, which are collectively known in analysis as Abelian theorems. An important class of associated results are the so-called Tauberian theorems. These describe various convergence criteria, and sometimes provide partial converses for the various Abelian theorems. The general converse to Abel’s theorem is false, as the example above illustrates 1 . However, in the 1890’s Tauber proved the following partial converse. P Theorem 16 (Tauber). Suppose that an is an Abel summable series and that nan → 0 P as n → ∞. Then, n an is convergent in the ordinary sense as well.

The proof of the above theorem is not hard, but the same cannot be said of the more general Tauberian theorems. The more famous of these are due to Hardy, Hardy-Littlewood, Weiner, and Ikehara. In all cases, the conclusion is that a certain series or a certain integral is convergent. However, the proofs are lengthy and require sophisticated techniques. Ikehara’s theorem is especially noteworthy because it is used to prove the prime number theorem. Version: 1 Owner: rmilson Author(s): rmilson

484.2

proof of Abel’s convergence theorem

Suppose that

∞ X

an = L

n=0

is a convergent series, and set

f (r) =

∞ X

an r n .

n=0

Convergence of the first series implies that an → 0, and hence f (r) converges for |r| < 1. We will show that f (r) → L as r → 1− . 1 We want the converse to be false; the whole idea is to describe a method of summing certain divergent series!

1769

Let sN = a0 + . . . + aN ,

N ∈ N,

denote the corresponding partial sums. Our proof relies on the following identity X X f (r) = an r n = (1 − r) sn r n . (484.2.1) n

n

The above identity obviously works at the level of formal power series. Indeed, a0 + (a1 + a0 ) r + (a2 + a1 + a0 ) r 2 + . . . −( a0 r + (a1 + a0 ) r 2 + . . .) = a0 + a1 r + a2 r 2 + . . . P Since the partial sums sn converge to L, they are bounded, and hence n sn r n converges for |r| < 1. Hence for |r| < 1, identity (483.2.1) is also a genuine functional equality. Let  > 0 be given. Choose an N sufficiently large so that all partial sums, sn with n > N, are sandwiched between L −  and L + . It follows that for all r such that 0 < r < 1 the series ∞ X (1 − r) sn r n n=N +1

is sandwiched between r N +1 (L − ) and r N +1 (L + ). Note that f (r) = (1 − r)

N X n=0

sn r n + (1 − r)

∞ X

sn r n .

n=N +1

As r → 1− , the first term goes to 0. Hence, lim sup f (r) and lim inf f (r) as r → 1− are sandwiched between L −  and L + . Since  > 0 was arbitrary, it follows that f (r) → L as r → 1− . QED Version: 1 Owner: rmilson Author(s): rmilson

484.3

proof of Tauber’s convergence theorem

Let f (z) =

∞ X

an z n ,

n=0

be a complex power series, convergent in the open disk kzk < 1. We suppose that 1. nan → 0 as n → ∞, and that 2. f (r) converges to some finite L as r → 1− ; 1770

and wish to show that

P

n

an converges to the same L as well.

Let sn = a0 + . . . + an , where n = 0, 1, . . ., denote the partial sums of the series in question. The enabling idea in Tauber’s convergence result (as well as other Tauberian theorems) is the existence of a correspondence in the evolution of the sn as n → ∞, and the evolution of f (r) as r → 1− . Indeed we shall show that

 

sn − f n − 1 → 0 as n → 0. (484.3.1)

n The desired result then follows in an obvious fashion. For every real 0 < r < 1 we have sn = f (r) +

n X k=0

k

ak (1 − r ) −

∞ X

ak r k .

k=n+1

Setting n = sup kkak k, k>n

and noting that 1 − r k = (1 − r)(1 + r + . . . + r k−1) < k(1 − r), we have that

∞ n X k r . ksn − f (r)k 6 (1 − r) kak + n k=n+1 k=0 n X

Setting r = 1 − 1/n in the above inequality we get

ksn − f (1 − 1/n)k 6 µn + n (1 − 1/n)n+1 , where

n

µn =

1X kkak k n k=0

are the Ces`aro means of the sequence kkak k, k = 0, 1, . . . Since the latter sequence converges to zero, so do the means µn , and the suprema n . Finally, Euler’s formula for e gives lim (1 − 1/n)n = e−1 .

n→∞

The validity of (483.3.1) follows immediately. QED Version: 1 Owner: rmilson Author(s): rmilson

1771

Chapter 485 41A05 – Interpolation 485.1

Lagrange Interpolation formula

Let (x1 , y1 ), (x2 , y2 ), . . . , (xn , yn ) be n points in the plane (xi 6= xj for i 6= j). Then there exists an unique polynomial p(x) of degree at most n−1 such that yi = p(xi ) for i = 1, . . . , n. Such polynomial can be found using Lagrange’s Interpolation formula:

p(x) =

f (x) f (x) f (x) y + y + · · · + yn 1 2 (x − x1 )f 0 (x1 ) (x − x2 )f 0 (x2 ) (x − xn )f 0 (xn )

where f (x) = (x − x1 )(x − x2 ) · · · (x − xn ).

To see this, notice that the above formula is the same as p(x) = y1

(x − x2 )(x − x3 ) . . . (x − xn ) (x − x1 )(x − x3 ) . . . (x − xn ) (x − x1 )(x − x2 ) . . . +y2 +· · ·+yn (x1 − x2 )(x1 − x3 ) . . . (x1 − xn ) (x2 − x1 )(x2 − x3 ) . . . (x2 − xn ) (xn − x1 )(xn − x2 ) . .

and that every polynomial in the numerators vanishes for all xi except onen and for that one xi the denominator makes the fraction equal to 1 so each p(xi ) equals yi. Version: 4 Owner: drini Author(s): drini

485.2

Simpson’s 3/8 rule

Simpson’s 83 rule is a method for approximating a definite integral by evaluating the integrand at finitely many points. The formal rule is given by intxx30 f (x) dx ≈

3h [f (x0 ) + 3f (x1 ) + 3f (x2 ) + f (x3 )] 8 1772

where h =

x1 −x0 . 3

Simpson’s 38 rule is the third Newton-Cotes quadrature formula. It has degree of precision 3. This means it is exact for polynomials of degree less than or equal to three. Simpson’s 3 rule is an improvement to the traditional Simpson’s rule. The extra function evaluation 8 gives a slightly more accurate approximation . We can see this with an example. Using the fundamental theorem of the calculus shows intπ0 sin(x) dx = 2. In this case Simpson’s rule gives, intπ0 sin(x) dx ≈ However, Simpson’s intπ0

3 8

π  i πh sin(0) + 4 sin + sin(π) = 2.094 6 2

rule does slightly better.

      π  2π 3 π sin(0) + 3 sin + 3 sin + sin(π) = 2.040 sin(x) dx ≈ 8 3 3 3

Version: 4 Owner: tensorking Author(s): tensorking

485.3

trapezoidal rule

Definition 11. The trapezoidal rule is a method for approximating a definite integral by evaluating the integrand at finitely many points. The formal rule is given by intxx10 f (x) dx =

h [f (x0 ) + f (x1 )] 2

where h = x1 − x0 . The trapezoidal rule is the first Newton-Cotes quadrature formula. It has degree of precision 1. This means it is exact for polynomials of degree less than or equal to one. We can see this with a simple example. Example 20. Using the fundamental theorem of the calculus shows int10 x dx = 1/2. In this case the trapezoidal rule gives the exact value, int10 x dx ≈

1 [f (0) + f (1)] = 1/2. 2 1773

It is important to note that most calculus books give the wrong definition of the trapezoidal rule. Typically they define a composite trapezoidal rule which uses the trapezoidal rule on a specified number of subintervals. Also note the trapezoidal rule can be derived by integrating a linear interpolation or using the method of undetermined coefficients. The later is probably a bit easier. Version: 6 Owner: tensorking Author(s): tensorking

1774

Chapter 486 41A25 – Rate of convergence, degree of approximation 486.1

superconvergence

Let xi = |ai+1 −ai |, the difference between two successive entries of a sequence. The sequence a0 , a1 , . . . superconverges if, when the xi are written in base 2, then each number xi starts with 2i − 1 ≈ 2i zeroes. The following sequence is superconverging to 0. xn+1 x0 x1 x2 x3 x4

= x2n (xn )10 1 = 2 1 = 4 1 = 8 1 = 16 1 = 32

(xn )2 .1 .01 .0001 .00000001 .0000000000000001

In this case it is easy to see that the number of binary places increases by twice the previous amount per xn . Version: 8 Owner: slider142 Author(s): slider142

1775

Chapter 487 41A58 – Series expansions (e.g. Taylor, Lidstone series, but not Fourier series) 487.1

Taylor series

487.1.1

Taylor Series

Let f be a function defined on any open interval containing 0. If f possesses derivatives of all order at 0, then

T (x) =

∞ X f (k) (0) k=0

k!

xk

is called the Taylor series of f about 0. We use 0 for simplicity, but any function with an infinitely-differentiable point can be shifted such that this point becomes 0. Tn (x), the “nth degree Taylor approximation” or a “Taylor series approximation to n terms”1 , is defined as

Tn (x) =

n−1 (k) X f (0) k=0

1

k!

xk

Tn is often defined as the sum from k = 0 to n rather than the sum from k = 0 to n − 1. This has the beneficial result of making the “nth degree Taylor approximation” a degree-n polynomial. However, the drawback is that Tn is no longer an approximation “to n terms”. The different definitions also give rise to slightly different statements of Taylor’s Theorem. In sum, mind the context when dealing with Taylor series and Taylor’s theorem.

1776

The remainder, Rn (x), is defined as Rn (x) = f (x) − Tn (x) Also note that f (x) = T (x) if and only if lim Rn (x) = 0

n→∞

For most functions one encounters in college calculus, f (x) = T (x) (for example, polynomials and ratios of polynomials), and thus, limn→∞ Rn (x) = 0. Taylor’s theorem is typically invoked in order to show this (the theorem gives the specific form of the remainder). Taylor series approximations are extremely useful to linearize or otherwise reduce the analytical complexity of a function. Taylor series approximations are most useful when the magnitude of the terms falls off rapidly.

487.1.2

Examples

Using the above definition of a Taylor series about 0, we have the following important series representations: x x2 x3 + + +··· 1! 2! 3! x3 x5 x7 x − + − +··· sin x = 1! 3! 5! 7! x2 x4 x6 cos x = 1 − + − +··· 2! 4! 6! ex = 1 +

487.1.3

Generalizations

Taylor series can also be extended to functions of more than one variable. The two-variable Taylor series of f (x, y) is

T (x, y) =

∞ X ∞ X f (i,j) (x, y) i=0 j=0

i!j!

xi y j

Where f (i,j) is the partial derivative of f taken with respect to x i times and with respect to y j times. We can generalize this to n variables, or functions f (x) , x ∈ Rn×1 . The Taylor series of this function of a vector is then 1777

∞ X

∞ X f (i1 ,i2 ,...,in ) (0) i1 i2 T (x) = ··· x1 x2 · · · xinn i1 !i2 ! · · · in ! i =0 i =0 n

1

Version: 7 Owner: akrowne Author(s): akrowne

487.2

Taylor’s Theorem

487.2.1

Taylor’s Theorem

Let f be a function which is defined on the interval (a, b), with a < 0 < b, and suppose the nth derivative f (n) exists on (a, b). Then for all nonzero x in (a, b),

Rn (x) =

f (n) (y) n x n!

with y strictly between 0 and x (y depends on the choice of x). Rn (x) is the nth remainder of the Taylor series for f (x). Version: 2 Owner: akrowne Author(s): akrowne

1778

Chapter 488 41A60 – Asymptotic approximations, asymptotic expansions (steepest descent, etc.) 488.1

Stirling’s approximation

Stirling’s formula gives an approximation for n!, the factorial function. It is n! ≈



2nπnn e−n

We can derive this from the gamma function. Note that for large x, Γ(x) =



1

2πxx− 2 e−x+µ(x)

(488.1.1)

where    ∞  X 1 1 θ µ(x) = x+n+ ln 1 + −1= 2 x+n 12x n=0 with 0 < θ < 1. Taking x = n and multiplying by n, we have n! =



1

θ

2πnn+ 2 e−n+ 12n

Taking the approximation for large n gives us Stirling’s formula.

1779

(488.1.2)

There is also a big-O notation version of Stirling’s approximation:

n! =

√

    n n  1 1+O 2πn e n

(488.1.3)

We can prove this equality starting from (487.1.2). It is clear that the big-o portion θ of (487.1.3) must come from e 12n , so we must consider the asymptotic behavior of e. First we observe that the Taylor series for ex is

ex = 1 +

x x2 x3 + + +··· 1 2! 3!

But in our case we have e to a vanishing exponent. Note that if we vary x as n1 , we have as n −→ ∞   1 e =1+O n x

We can then (almost) directly plug this in to (487.1.2) to get (487.1.3) (note that the factor of 12 gets absorbed by the big-O notation.) Version: 16 Owner: drini Author(s): drini, akrowne

1780

Chapter 489 42-00 – General reference works (handbooks, dictionaries, bibliographies, etc.) 489.1

countable basis

A countable basis β of a vector space V over a field F is a countable subset β ⊂ V with the property that every element v ∈ V can be written as an infinite series X v= ax x x∈β

in exactly one way (where ax ∈ F ). We are implicitly assuming, without further comment, that the vector space V has been given a topological structure or normed structure in which the above infinite sum is absolutely convergent (so that it converges to v regardless of the order in which the terms are summed). The archetypical example of a countable basis is the Fourier series of a function: every continuous real-valued periodic function f on the unit circle S 1 = R/2π can be written as a Fourier series ∞ ∞ X X f (x) = an cos(nx) + bn sin(nx) n=0

n=1

in exactly one way.

Note: A countable basis is a countable set, but it is not usually a basis. Version: 4 Owner: djao Author(s): djao

1781

489.2

discrete cosine transform

The discrete cosine transform is closely related to the fast Fourier transform; it plays a role in coding signals and images [Jain89], e.g. in the widely used standard JPEG compression. The one-dimensional transform is defined by

t(k) = c(k)

N −1 X

s(n) cos

n=0

π(2n + 1)k 2N

where s is the array of N original values, t is the array of N transformed values, and the coefficients c are given by c(0) = for 1 ≤ k ≤ N − 1.

p

1/N, c(k) =

p

2/N

The discrete cosine transform in two dimensions, for a square matrix, can be written as

t(i, j) = c(i, j)

N −1 N −1 X X

s(m, n) cos

n=1 m=0

π(2m + 1)i π(2n + 1)j cos 2N 2N

with an analogous notation for N, s, t, and the c(i, j) given by c(0, j) = 1/N, c(i, 0) = 1/N, and c(i, j) = 2/N for both i and j 6= 0. The DCT has an inverse, defined by

s(n) =

N −1 X

c(k)t(k) cos

k=0

π(2n + 1)k 2N

for the one-dimensional case, and

s(m, n) =

N −1 N −1 X X

c(i, j)t(i, j) cos

i=0 j=0

π(2m + 1)i π(2n + 1)j cos 2N 2N

for two dimensions. The DCT is included in commercial image processing packages, e.g. in Matlab. References 1782

• Originally from The Data Analysis Briefbook (http://rkb.home.cern.ch/rkb/titleA.html) • Jain89 A.K. Jain, Fundamentals of Digital Image Processing, Prentice Hall, 1989. Version: 4 Owner: akrowne Author(s): akrowne

1783

Chapter 490 42-01 – Instructional exposition (textbooks, tutorial papers, etc.) 490.1

Laplace transform

Let f (t) be a function defined on the interval [0, ∞). The Laplace transform of f (t) is the function F (s) defined by −st F (s) = int∞ f (t) dt, 0 e

provided that the improper integral converges. We will usually denote the Laplace transform of f by L{f (t)}. Some of the most common Laplace transforms are: 1. L{eat } =

1 s−a

,s>a

2. L{cos(bt)} =

s s2 −b2

,s>0

3. L{sin(bt)} =

b s2 −b2

,s>0

4. L{(tn )} =

n! sn+1

, s > 0.

Notice the Laplace transform is a linear transformation. Much like the Fourier transform, the Laplace transform has a convolution. The most popular usage of the Laplace transform is to solve initial value problems by taking the Laplace transform of both sides of an ordinary differential equation. Version: 4 Owner: tensorking Author(s): tensorking

1784

Chapter 491 42A05 – Trigonometric polynomials, inequalities, extremal problems 491.1

Chebyshev polynomial

We can always express cos(kt) as a polynomial of cos(t): Examples: cos(1t) = cos(t) cos(2t) = 2(cos(t))2 − 1 cos(3t) = 4(cos(t))3 − 3 cos(t) .. . This fact can be proved using the formula for cosine of angle-sum. If we write x = cos t we obtain the Chebyshev polynomials of first kind, that is Tn (x) = cos(nt) where x = cos t. So we have T0 (x) T1 (x) T2 (x) T3 (x)

= = = = .. .

1 x 2x2 − 1 4x3 − 3x

1785

These polynomials hold the recurrence relation: Tn+1 (x) = 2xTn (x) − Tn−1 (x) for n = 1, 2, . . . Version: 4 Owner: drini Author(s): drini

1786

Chapter 492 42A16 – Fourier coefficients, Fourier series of functions with special properties, special Fourier series 492.1

Riemann-Lebesgue lemma

Proposition. Let f : [a, b] → C be a measurable function. If f is L1 integrable, that is to say if the Lebesgue integral of |f | is finite, then intba f (x)einx dx → 0,

as n → ±∞.

The above result, commonly known as the Riemann-Lebesgue lemma, is of basic importance in harmonic analysis. It is equivalent to the assertion that the Fourier coefficients fˆn of a periodic, integrable function f (x), tend to 0 as n → ±∞. The proof can be organized into 3 steps. Step 1. An elementary calculation shows that intI einx dx → 0,

as n → ±∞

for every interval I ⊂ [a, b]. The proposition is therefore true for all step functions with support in [a, b]. Step 2. By the monotone convergence theorem, the proposition is true for all positive functions, integrable on [a, b]. Step 3. Let f be an arbitrary measurable function, integrable on [a, b]. The proposition is true for such a general f , because one can always write f = g − h, 1787

where g and h are positive functions, integrable on [a, b]. Version: 2 Owner: rmilson Author(s): rmilson

492.2

example of Fourier series

Here we present an example of Fourier series: Example: Let f : R → R be the “identity” function, defined by f (x) = x, for all x ∈ R We will compute the Fourier coefficients for this function. Notice that cos(nx) is an even function, while f and sin(nx) are odd functions. 1 1 intπ−π f (x)dx = intπ xdx = 0 2π 2π −π

af0 =

afn =

1 π 1 int−π f (x) cos(nx)dx = intπ−π x cos(nx)dx = 0 π π

1 1 π int−π f (x) sin(nx)dx = intπ−π x sin(nx)dx = π π π  π  sin(nx) 2 x cos(nx) 2 π 2 + = = (−1)n+1 − = int0 x sin(nx)dx = π π n n n 0 0

bfn =

Notice that af0 , afn are 0 because x and x cos(nx) are odd functions. Hence the Fourier series for f (x) = x is:

f (x) = x =

af0

+

∞ X

(afn cos(nx) + bfn sin(nx)) =

n=1

=

∞ X

2 (−1)n+1 sin(nx), n n=1

∀x ∈ (−π, π)

For an application of this Fourier series, see value of the Riemann zeta function at s = 2. Version: 4 Owner: alozano Author(s): alozano

1788

Chapter 493 42A20 – Convergence and absolute convergence of Fourier and trigonometric series 493.1

Dirichlet conditions

Let f be a piecewise regular real-valued function defined on some interval [a, b], such that f has only a finite number of discontinuities and extrema in [a, b]. Then the Fourier series of this function converges to f when f is continuous and to the arithmetic mean of the left-handed and right-handed limit of f at a point where it is discontinuous. Version: 3 Owner: mathwizard Author(s): mathwizard

1789

Chapter 494 42A38 – Fourier and Fourier-Stieltjes transforms and other transforms of Fourier type 494.1

Fourier transform

The Fourier transform F (s) of a function f (t) is defined as follows: 1 −ist F (s) = √ int∞ f (t)dt. −∞ e 2π The Fourier transform exists if f is Lebesgue integrable on the whole real axis. If f is Lebesgue integrable and can be divided into a finite number of continuous, monotone functions and at every point both one-sided limits exist, the Fourier transform can be inverted: 1 ist f (t) = √ int∞ −∞ e F (s)ds. 2π Sometimes the Fourier transform is also defined without the factor √12π in one direction, 1 . So when looking a but therefore giving the transform into the other direction a factor 2π transform up in a table you should find out how it is defined in that table. The Fourier transform has some important properties when solving differential equations. We denote the Fourier transform of f with respect to t in terms of s by Ft (f ). • Ft (af + bg) = aFt (f ) + bF(g), where a and b are real constants and f and g are real functions. 1790



• Ft

∂ f ∂t

• Ft

∂ f ∂x



= isFt (f ). =

∂ F (f ). ∂x t

• We define the bilateral convolution of two functions f1 and f2 as: 1 (f1 ∗ f2 )(t) := √ int∞ −∞ f1 (τ )f2 (t − τ )dτ. 2π Then the following equation holds: Ft ((f1 ∗ f2 )(t)) = Ft (f1 ) · Ft (f2 ). If f (t) is some signal (maybe a sound wave) then the frequency domain of f is given as Ft (f ). Rayleigh’s theorem states that then the energy E carried by the signal f given by: 2 E = int∞ −∞ |f (t)| dt

can also be expressed as: 2 E = int∞ −∞ |Ft (f )(s)| ds.

In general we have: 2 ∞ 2 int∞ −∞ |f (t)| dt = int−∞ |Ft (f )(s)| ds.

Version: 9 Owner: mathwizard Author(s): mathwizard

1791

Chapter 495 42A99 – Miscellaneous 495.1

Poisson summation formula

Let f : R → R be a once-differentiable, square-integrable function. Let f ∨ (y) = intR f (x)e2πixy dx be its Fourier transform. Then X n

f (n) =

X

f ∨ (n).

n

By convention, sums are over all integers. P L et g(x) = n f (x + n). This sum converges absolutely, P since f is square integrable, so g is differentiable, and periodic. Thus, the Fourier series n f ∨ (n)e2πinx converges pointwise to f . Evaluating our two sums for g at x = 0, we find X X f (n) = g(0) = f ∨ (n). n

n

Version: 5 Owner: bwebste Author(s): bwebste

1792

Chapter 496 42B05 – Fourier series and coefficients 496.1

Parseval equality

Let f be a Riemann integrable function from [−π, π] to R. The equation ∞

X f 1 π 2 [(ak )2 + (bfk )2 ], int−π f (x)dx = 2(af0 )2 + π k=1

where af0 , afk , bfk are the Fourier coefficients of the function f , is usually known as Parseval’s equality or Parseval’s theorem. Version: 3 Owner: vladm Author(s): vladm

496.2

Wirtinger’s inequality

Theorem: Let f : R → R be a periodic function of period 2π, which is continuous and has a continuous derivative throughout R, and such that int2π 0 f (x) = 0 .

(496.2.1)

02 2π 2 int2π 0 f (x)dx ≥ int0 f (x)dx

(496.2.2)

Then with equality iff f (x) = a sin x + b sin x for some a and b (or equivalently f (x) = c sin(x + d) for some c and d). Proof:Since Dirichlet’s conditions are met, we can write X 1 (an sin nx + bn cos ny) f (x) = a0 + 2 n≥1 1793

and moreover a0 = 0 by (495.2.1). By Parseval’s identity, 2 int2π 0 f (x)dx =

∞ X (a2n + b2n ) n=1

and 02 int2π 0 f (x)dx

=

∞ X

n2 (a2n + b2n )

n=1

and since the summands are all ≥ 0, we get (495.2.2), with equality iff an = bn = 0 for all n ≥ 2. Hurwitz used Wirtinger’s inequality in his tidy 1904 proof of the isoperimetric inequality. Version: 2 Owner: matte Author(s): Larry Hammick

1794

Chapter 497 43A07 – Means on groups, semigroups, etc.; amenable groups 497.1

amenable group

Let G be a locally compact group and L∞ (G) be the Banach space of all essentially bounded functions G → R with respect to the Haar measure. Definition 12. A linear functional on L∞ (G) is called a mean if it maps the constant function f (g) = 1 to 1 and non-negative functions to non-negative numbers. Definition 13. Let Lg be the left action of g ∈ G on f ∈ L∞ (G), i.e. (Lg f )(h) = f (gh). Then, a mean µ is said to be left invariant if µ(Lg f ) = µ(f ) for all g ∈ G and f ∈ L∞ (G). Similarly, right invariant if µ(Rg f ) = µ(f ), where Rg is the right action (Rg f )(h) = f (hg). Definition 14. A locally compact group G is amenable if there is a left (or right) invariant mean on L∞ (G). Example 21 (Amenable groups). All finite groups and all abelian groups are amenable. compact groups are amenable as the Haar measure is an (unique) invariant mean. Example 22 (Non-amenable groups). If a group contains a free (non-abelian) subgroup on two generators then it is not amenable. Version: 5 Owner: mhale Author(s): mhale

1795

Chapter 498 44A35 – Convolution 498.1

convolution

Introduction The convolution of two functions f, g : R → R is the function (f ∗ g)(u) = int∞ −∞ f (x)g(u − x)dx In a sense, (f ∗ g)(u) is the sum of all the terms f (x)g(y) where x + y = u. Such sums occur when investigating sums of independent random variables, and discrete versions appear in the coefficients of products of polynomials and power series. Convolution is an important tool in data processing, in particular in digital signal and image processing. We will first define the concept in various general settings, discuss its properties and then list several convolutions of probability distributions.

Definitions If G is a locally compact abelian topological group with Haar measure µ and f and g are measurable functions on G, we define the convolution (f ∗ g)(u) := intG f (x)g(u − x)dµ(x) whenever the right hand side integral exists (this is for instance the case if f ∈ Lp (G, µ), g ∈ Lq (G, µ) and 1/p + 1/q = 1). The case G = Rn is the most important one, but G = Z is also useful, since it recovers the convolution of sequences which occurs when computing the coefficients of a product of polynomials or power series. The case G = Zn yields the so-called cyclic convolution which is often discussed in connection with the discrete Fourier transform.

1796

The (Dirichlet) convolution of multiplicative functions considered in number theory does not quite fit the above definition, since there the functions are defined on a commutative monoid (the natural numbers under multiplication) rather than on an abelian group. If X and Y are independent random variables with probability densities fX and fY respectively, and if X + Y has a probability density, then this density is given by the convolution fX ∗ fY . This motivates the following definition: for probability distributions P and Q on Rn , the convolution P ∗ Q is the probability distribution on Rn given by (P ∗ Q)(A) := (P × Q)({(x, y) | x + y ∈ A}) for every Borel set A. The convolution of two distributions u and v on Rn is defined by (u ∗ v)(φ) = u(ψ) for any test function φ for v, assuming that ψ(t) := v(φ(· + t)) is a suitable test function for u. Properties The convolution operation, when defined, is commutative, associative and distributive with respect to addition. For any f we have f ∗δ =f where δ is the Dirac delta distribution. The Fourier transform F translates between convolution and pointwise multiplication: F (f ∗ g) = F (f ) · F (g). Because of the availability of the Fast Fourier Transform and its inverse, this latter relation is often used to quickly compute discrete convolutions, and in fact the fastest known algorithms for the multiplication of numbers and polynomials are based on this idea. Some convolutions of probability distributions • The convolution of two normal distributions with zero mean and variances σ12 and σ22 is a normal distribution with zero mean and variance σ 2 = σ12 + σ22 . • The convolution of two χ2 distributions with f1 and f2 degrees of freedom is a χ2 distribution with f1 + f2 degrees of freedom. • The convolution of two Poisson distributions with parameters λ1 and λ2 is a Poisson distribution with parameter λ = λ1 + λ2 . 1797

• The convolution of an exponential and a normal distribution is approximated by another exponential distribution. If the original exponential distributionhas density f (x) =

e−x/τ τ

(x ≥ 0) or f (x) = 0 (x < 0),

and the normal distribution has zero mean and variance σ 2 , then for u  σ the probability density of the sum is 2

e−u/τ +σ /(2τ √ f (u) ≈ στ 2π

2)

In a semi-logarithmic diagram where log(fX (x)) is plotted versus x and log(f (u)) versus u, the latter lies bye the amount σ 2 /(2τ 2 ) higher than the former but both are represented by parallel straight lines, the slope of which is determined by the parameter τ. • The convolution of a uniform and a normal distribution results in a quasi-uniform distribution smeared out at its edges. If the original distribution is uniform in the region a ≤ x < b and vanishes elsewhere and the normal distribution has zero mean and variance σ 2 , the probability density of the sum is f (u) =

ψ0 ((u − a)/σ) − ψ0 ((u − b)/σ) b−a

Where 1 2 ψ0 (x) = √ intx−∞ e−t /2 dt 2π is the distribution function of the standard normal distribution. For σ → 0, the function f (u) vanishes for u < a and u > b and is equal to 1/(b − a) in between. For finite σ the sharp steps at a and b are rounded off over a width of the order 2σ.

References

• Adapted with permission from The Data Analysis Briefbook (http://rkb.home.cern.ch/rkb/titleA.htm Version: 12 Owner: akrowne Author(s): akrowne, AxelBoldt

1798

Chapter 499 46-00 – General reference works (handbooks, dictionaries, bibliographies, etc.) 499.1

balanced set

Definition [3, 1, 2, 1] Let V be a vector space over R (or C), and let S be a subset of V . If λS ⊂ S for all scalars λ such that |λ| ≤ 1, then S is a balanced set in V . Here, λS = {λs | s ∈ S}, and | · | is the absolute value (in R), or the modulus of a complex number (in C). Examples and properties 1. Let V be a normed space with norm || · ||. Then the unit ball {v ∈ V | ||v|| ≤ 1} is a balanced set. 2. Any vector subspace is a balanced set. Thus, in R3 , lines and planes passing trough the origin are balanced sets. 3. Any nonempty balanced set contains the zero vector [1]. 4. The union and intersection of an arbitrary collection of balanced sets is again a balanced set [2]. 5. Suppose f is a linear map between to vector spaces. Then both f and f −1 (the inverse image of f ) map balanced sets into balanced sets [3, 2].

1799

Definition Suppose S is a set in a vector space V . Then the balanced hull of S, denoted by eq(S), is the smallest balanced set containing S. The balanced core of S is defined as the largest balanced contained in S. Proposition Let S be a set in a vector space. 1. For eq(S) we have [1, 1] eq(S) = {λa | a ∈ A, |λ| ≤ 1}. 2. The balanced hull of S is the intersection of all balanced sets containing A [1, 2]. 3. The balanced core of S is the union of all balanced sets contained in A [2]. 4. The balanced core of S is nonempty if and only if the zero vector belongs to S [2]. 5. If S is a closed set in a topological vector space, then the balanced core is also a closed set [2]. Notes A balanced set is also sometimes called circled [2]. The term balanced evelope is also used for the balanced hull [1]. Bourbaki uses the term ´ equilibr´ e [1], c.f. eq(A) above. In [4], a balanced set is defined as above, but with the condition |λ| = 1 instead of |λ| ≤ 1.

REFERENCES 1. W. Rudin, Functional Analysis, McGraw-Hill Book Company, 1973. 2. R.E. Edwards, Functional Analysis: Theory and Applications, Dover Publications, 1995. 3. J. Horv´ath, Topological Vector Spaces and Distributions, Addison-Wsley Publishing Company, 1966. 4. R. Cristescu, Topological vector spaces, Noordhoff International Publishing, 1977. 5. M. Reed, B. Simon, Methods of Modern Mathematical Physics: Functional Analysis I, Revised and enlarged edition, Academic Press, 1980.

Version: 7 Owner: matte Author(s): matte

499.2

bounded function

Definition Suppose X is a nonempty set. Then a function f : X → C is a bounded function if there exist a C < ∞ such that |f (x)| < C for all x ∈ X. The set of all bounded functions on X is usually denoted by B(X) ([1], pp. 61). 1800

Under standard point-wise addition and point-wise multiplication by a scalar, B(X) is a complex vector space. If f ∈ B(X), then the sup-norm, or uniform norm, of f is defined as ||f ||∞ = sup |f (x)|. x∈X

It is straightforward to check that || · ||∞ makes B(X) into a normed vector space, i.e., to check that || · ||∞ satisfies the assumptions for a norm. Example Suppose X is a compact topological space. Further, let C(X) be the set of continuous complex-valued functions on X (with the same vector space structure as B(X)). Then C(X) is a vector subspace of B(X).

REFERENCES 1. C.D. Aliprantis, O. Burkinshaw, Principles of Real Analysis, 2nd ed., Academic Press, 1990.

Version: 3 Owner: matte Author(s): matte

499.3

bounded set (in a topological vector space)

Definition [3, 1, 1] Suppose B is a subset of a topological vector space V . Then B is a bounded set if for every neighborhood U of the zero vector in V , there exists a scalar λ such that B ⊂ λS. Theorem If K is a compact set in a topological vector space, then K is bounded. ([3], pp. 12)

REFERENCES 1. W. Rudin, Functional Analysis, McGraw-Hill Book Company, 1973. 2. F.A. Valentine, Convex sets, McGraw-Hill Book company, 1964. 3. R. Cristescu, Topological vector spaces, Noordhoff International Publishing, 1977.

Version: 2 Owner: matte Author(s): matte 1801

499.4

cone

Definition [4, 2, 1] Suppose V is a real (or complex) vector space with a subset C. 1. If λC ⊂ C for any real λ > 0, then C is a cone. 2. If the origin belongs to a cone, then the cone is pointed. Otherwise, the cone is blunt. 3. A pointed cone is salient, if it contains no 1-dimensional vector subspace of V . 4. If C − x0 is a cone for some x0 in V , then C is a cone with vertex at x0 . Examples 1. In R, the set x > 0 is a salient blunt cone. 2. Suppose x ∈ Rn . Then for any ε > 0, the set [ C = { λBx (ε) | λ > 0 }

is an open cone. If |x| < ε, then C = Rn . Here, Bx (ε) is the open ball at x with radius ε.

Properties 1. The union and intersection of a collection of cones is a cone. 2. A set C in a real (or complex) vector space is a convex cone if and only if [2, 1] λC ⊂ C, for all λ > 0, C + C ⊂ C. T 3. For a convex pointed cone C, the set C (−C) is the largest vector subspace contained in C [2, 1]. T 4. A pointed convex cone C is salient if and only if C (−C) = {0} [1].

REFERENCES 1. M. Reed, B. Simon, Methods of Modern Mathematical Physics: Functional Analysis I, Revised and enlarged edition, Academic Press, 1980. 2. J. Horv´ath, Topological Vector Spaces and Distributions, Addison-Wesley Publishing Company, 1966. 3. R.E. Edwards, Functional Analysis: Theory and Applications, Dover Publications, 1995.

Version: 4 Owner: bwebste Author(s): matte 1802

499.5

locally convex topological vector space

Definition Let V be a topological vector space. If the topology of V has a basis where each member is a convex set, then V is a locally convex topological vector space [1].

REFERENCES 1. G.B. Folland, Real Analysis: Modern Techniques and Their Applications, 2nd ed, John Wiley & Sons, Inc., 1999.

Version: 2 Owner: matte Author(s): matte

499.6

sequential characterization of boundedness

Theorem [3, 1] A set B in a real (or possibly complex) topological vector space V is bounded if and only if the following condition holds: ∞ If {zi }∞ i=1 is a sequence in B, and {λi }i=1 is a sequence of scalars (in R or C), such that λi → 0, then λi zi → 0 in V .

REFERENCES 1. W. Rudin, Functional Analysis, McGraw-Hill Book Company, 1973. 2. R. Cristescu, Topological vector spaces, Noordhoff International Publishing, 1977.

Version: 4 Owner: bwebste Author(s): matte

499.7

symmetric set

Definition [1, 3] Suppose A is a set in a vector space. Then A is a symmetric set, if A = −A. Here, −A = {−a | a ∈ A}. In other words, A is symmetric if for any a ∈ A also −a ∈ A.

1803

Examples 1. In R, examples of symmetric sets are intervals of the type (−k, k) with k > 0, and the sets Z and {−1, 1}. 2. Any vector subspace in a vector space is a symmetric set. T S 3. If A is any set in a vector space, then A −A [1] and A −A are symmetric sets.

REFERENCES 1. R. Cristescu, Topological vector spaces, Noordhoff International Publishing, 1977. 2. W. Rudin, Functional Analysis, McGraw-Hill Book Company, 1973.

Version: 1 Owner: matte Author(s): matte

1804

Chapter 500 46A30 – Open mapping and closed graph theorems; completeness (including B-, Br -completeness) 500.1

closed graph theorem

A linear mapping between two Banach spaces X and Y is continuous if and only if its graph is a closed subset of X × Y (with the product topology). Version: 4 Owner: Koro Author(s): Koro

500.2

open mapping theorem

There are two important theorems having this name.  In the context of functions of a complex variable: Theorem. Every non-constant analytic function on a region is an open mapping.  In the context of functional analysis: Theorem. Every surjective continuous linear mapping between two Banach spaces is an open mapping. Version: 8 Owner: Koro Author(s): Koro

1805

Chapter 501 46A99 – Miscellaneous 501.1

Heine-Cantor theorem

Let X, Y be uniform spaces, and f : X → Y a continuous function. If X is compact, then f is uniformly continuous. For instance, if f : [a, b] → R is a continuous function, then it is uniformly continuous. Version: 6 Owner: n3o Author(s): n3o

501.2

proof of Heine-Cantor theorem

We prove this theorem in the case when X and Y are metric spaces. Suppose f is not uniformly continuous. Then ∃ > 0 ∀δ > 0 ∃x, y ∈ X

d(x, y) < δ but d(f (x), f (y)) ≥ .

In particular by letting δ = 1/k we can construct two sequences xk and yk such that d(xk , yk ) < 1/k and d(f (xk ), f (yk ) ≥ . Since X is compact the two sequence have convergent subsequences i.e. xkj → x¯ ∈ X,

ykj → y¯ ∈ X.

Since d(xk , yk ) → 0 we have x¯ = y¯. Being f continuous we hence conclude d(f (xkj ), f (ykj )) → 0 which is a contradiction being d(f (xk ), f (yk )) ≥ . Version: 2 Owner: paolini Author(s): paolini 1806

501.3

topological vector space

A topological vector space is a pair (V, T) where V is a vector space over a topological field K, and T is a Hausdorff topology on V such that under T, the vector space operations v 7→ λv is continuous from K × V to V and (v, w) 7→ v + w is continuous from V × V to V , where K × V and V × V are given the respective product topologies. A finite dimensional vector space inherits a natural topology. For if V is a finite dimensional vectos space, then V is isomorphic to K n for some N; then let f : V → K n be such an isomorphism, and suppose K n has the product topology. Give V the topology where a subset A of V is open in V if and only if f (A) is open in K n . This topology is independent of the choice of isomorphism f . Version: 6 Owner: Evandar Author(s): Evandar

1807

Chapter 502 46B20 – Geometry and structure of normed linear spaces 502.1

limp→∞ kxkp = kxk∞

Suppose x = (x1 , . . . , xn ) is a point in Rn , and let kxkp an kxk∞ be the usual p-norm and ∞-norm; 1/p , kxkp = |x1 |p + · · · + |xn |p kxk∞ = max{|x1 |, . . . , |xn |}. Our claim is that lim kxkp = kxk∞ .

p→∞

(502.1.1)

In other words, for any fixed x ∈ Rn , the above limit holds. This, or course, justifies the notation for the ∞-norm. Proof. Since both norms stay invariant if we exchange two components in x, we can arrange things such that kxk∞ = |x1 |. Then for any real p > 0, we have kxk∞ = |x1 | = (|x1 |p )1/p 6 kxkp and kxkp ≤ n1/p |x1 | = n1/p kxk∞ . Taking the limit of the above inequalities (see this page) we obtain kxk∞ ≤

lim kxkp ,

p→∞

lim kxkp ≤ kxk∞ ,

p→∞

1808

which combined yield the result. 2 Version: 7 Owner: matte Author(s): matte

502.2

Hahn-Banach theorem

The Hahn-Banach theorem is a foundational result in functional analysis. Roughly speaking, it asserts the existence of a great variety of bounded (and hence continuous) linear functionals on an normed vector space, even if that space happens to be infinite-dimensional. We first consider an abstract version of this theorem, and then give the more classical result as a corollary. Let V be a real, or a complex vector space, with K denoting the corresponding field of scalars, and let p : V → R+ be a seminorm on V . Theorem 17. Let f : U → K be a linear functional defined on a subspace U ⊂ V . If the restricted functional satisfies |f (u)| 6 p(u), u ∈ U, then it can be extended to all of V without violating the above property. To be more precise, there exists a linear functional F : V → K such that F (u) = f (u), |F (u)| 6 p(u),

u∈U u ∈ V.

Definition 15. We say that a linear functional f : V → K is bounded if there exists a bound B ∈ R+ such that |f (u)| 6 B p(u), u ∈ V. (502.2.1) If f is a bounded linear functional, we define kf k, the norm of f , according to kf k = sup{|f (u)| : p(u) = 1}. One can show that kf k is the infimum of all the possible B that satisfy (501.2.1) Theorem 18 (Hahn-Banach). Let f : U → K be a bounded linear functional defined on a subspace U ⊂ V . Let kf kU denote the norm of f relative to the restricted seminorm on U. Then there exists a bounded extension F : V → K with the same norm, i.e. kF kV = kf kU . Version: 7 Owner: rmilson Author(s): rmilson, Evandar 1809

502.3

proof of Hahn-Banach theorem

Consider the family of all possible extensions of f , i.e. the set F of all pairings (F, H) where H is a vector subspace of X containing U and F is a linear map F : H → K such that F (u) = f (u) for all u ∈ U and |F (u)| ≤ p(u) for all u ∈ H. F is naturally endowed with an partial order relation: given (F1 , H1 ), (F2 , H2 ) ∈ F we say that (F1 , H1 ) ≤ (F2 , H2 ) iff F2 is an extension of F1 that is H1 ⊂ H2 and F2 (u) = F1 (u) for all u ∈ H1 . We want to apply Zorn’s lemma to F so we are going to prove that every chain in F has an upper bound. S Let (Fi , Hi ) be the elements of a chain in F. Define H = i Hi . Clearly H is a vector subspace of V and contains U. Define F : H → K by “merging” all Fi ’s as follows. Given u ∈ H there exists i such that u ∈ Hi : define F (u) = Fi (u). This is a good definition since if both Hi and Hj contain u then Fi (u) = Fj (u) in fact either (Fi , Hi ) ≤ (Fj , Hj ) or (Fj , Hj ) ≤ (Fi , Hi ). Clearly the so constructed pair (F, H) is an upper bound for the chain (Fi , Hi ) since F is an extension of every Fi . Zorn’s Lemma then assures that there exists a maximal element (F, H) ∈ F. To complete the proof we will only need to prove that H = V . Suppose by contradiction that there exists v ∈ V \ H. Then consider the vector space H 0 = H + Kv = {u + tv : u ∈ H, t ∈ K} (H 0 is the vector space generated by H and v). Choose λ = sup{F (x) − p(x − v)}. x∈H

We notice that given any x, y ∈ H it holds F (x) − F (y) = F (x − y) ≤ p(x − y) = p(x − v + v − y) ≤ p(x − v) + p(y − v) i.e. F (x) − p(x − v) ≤ F (y) + p(y − v); in particular we find that λ < +∞ and for all y ∈ H it holds F (y) − p(y − v) ≤ λ ≤ F (y) + p(y − v). Define F 0 : H 0 → K as follows: F 0 (u + tv) = F (u) + tλ. Clearly F 0 is a linear functional. We have |F 0(u + tv)| = |F (u) + tλ| = |t| |F (u/t) + λ| and by letting y = −u/t by the previous estimates on λ we obtain F (u/t) + λ ≤ F (u/t) + F (−u/t) + p(−u/t − v) = p(u/t + v) 1810

and F (u/t) + λ ≥ F (u/t) + F (−u/t) − p(−u/t − v) = −p(u/t + v) which together give |F (u/t) + λ| ≤ p(u/t + v) and hence |F 0 (u + tv)| ≤ |t|p(u/t + v) = p(u + tv). So we have proved that (F 0 , H 0 ) ∈ F and (F 0 , H 0 ) > (F, H) which is a contradiction. Version: 4 Owner: paolini Author(s): paolini

502.4

seminorm

Let V be a real, or a complex vector space, with K denoting the corresponding field of scalars. A seminorm is a function p : V → R+ , from V to the set of non-negative real numbers, that satisfies the following two properties. p(k u) = |k| p(u), k ∈ K, u ∈ V p(u + v) 6 p(u) + p(v), u, v ∈ U,

Homogeneity Sublinearity

A seminorm differs from a norm in that it is permitted that p(u) = 0 for some non-zero u ∈ V. It is possible to characterize the seminorms properties geometrically. For k > 0, let Bk = {u ∈ V : p(u) 6 k} denote the ball of radius k. The homogeneity property is equivalent to the assertion that Bk = kB1 , in the sense that u ∈ B1 if and only if ku ∈ Bk . Thus, we see that a seminorm is fully determined by its unit ball. Indeed, given B ⊂ V we may define a function pB : V → R+ by pB (u) = inf{λ ∈ R+ : λ−1 u ∈ B}. The geometric nature of the unit ball is described by the following. Proposition 20. The function S pB satisfies the homegeneity property if and only if for every + {∞} such that u ∈ V , there exists a k ∈ R λu ∈ B

if and only if kλk 6 k. 1811

Proposition 21. Suppose that p is homogeneous. Then, it is sublinear if and only if its unit ball, B1 , is a convex subset of V . Proof. First, let us suppose that the seminorm is both sublinear and homogeneous, and prove that B1 is necessarily convex. Let u, v ∈ B1 , and let k be a real number between 0 and 1. We must show that the weighted average ku + (1 − k)v is in B1 as well. By assumption, p(k u + (1 − k)v) 6 k p(u) + (1 − k) p(v). The right side is a weighted average of two numbers between 0 and 1, and is therefore between 0 and 1 itself. Therefore k u + (1 − k)v ∈ B1 , as desired. Conversely, suppose that the seminorm function is homogeneous, and that the unit ball is convex. Let u, v ∈ V be given, and let us show that p(u + v) 6 p(u) + p(v). The essential complication here is that we do not exclude the possibility that p(u) = 0, but that u 6= 0. First, let us consider the case where p(u) = p(v) = 0. By homogeneity, for every k > 0 we have ku, kv ∈ B1 , and hence

k k u + v ∈ B1 , 2 2

as well. By homogeneity, again,

2 . k Since the above is true for all positive k, we infer that p(u + v) 6

p(u + v) = 0, as desired. Next suppose that p(u) = 0, but that p(v) 6= 0. We will show that in this case, necessarily, p(u + v) = p(v). Owing to the homogeneity assumption, we may without loss of generality assume that p(v) = 1. 1812

For every k such that 0 6 k < 1 we have k u + k v = (1 − k)

ku + k v. 1−k

The right-side expression is an element of B1 because ku , v ∈ B1 . 1−k Hence k p(u + v) 6 1, and since this holds for k arbitrarily close to 1 we conclude that p(u + v) 6 p(v). The same argument also shows that p(v) = p(−u + (u + v)) 6 p(u + v), and hence p(u + v) = p(v), as desired. Finally, suppose that neither p(u) nor p(v) is zero. Hence, u v , p(u) p(v) are both in B1 , and hence p(u) u p(v) v u+v + = p(u) + p(v) p(u) p(u) + p(v) p(v) p(u) + p(v) is in B1 also. Using homogeneity, we conclude that p(u + v) 6 p(u) + p(v), as desired. Version: 14 Owner: rmilson Author(s): rmilson, drummond

502.5

vector norm

A vector norm on the real vector space V is a function f : V → R that satisfies the following properties: 1813

f (x) = 0 ⇔ x = 0 f (x) ≥ 0 f (x + y) 6 f (x) + f (y) f (αx) = |α|f (x)

x∈V x, y ∈ V α ∈ R, x ∈ V

Such a function is denoted as || x ||. Particular norms are distinguished by subscripts, such as || x ||V , when referring to a norm in the space V . A unit vector with respect to the norm || · || is a vector x satisfying || x || = 1. A vector norm on a complex vector space is defined similarly. A common (and useful) example of a real norm is the Euclidean norm given by ||x|| = (x21 + x22 + · · · + x2n )1/2 defined on V = Rn . Note, however, that there does not exist any norm on all metric spaces; when it does, the space is called a normed vector space. A necessary and sufficient condition for a metric space to be a normed space, is d(x + a, y + a) = d(x, y) ∀x, y, a ∈ V d(αx, αy) = |α|d(x, y) ∀x, y ∈ V, α ∈ R But given a norm, a metric can always be defined by the equation d(x, y) = ||x − y|| Version: 14 Owner: mike Author(s): mike, Manoj, Logan

1814

Chapter 503 46B50 – Compactness in Banach (or normed) spaces 503.1

Schauder fixed point theorem

Let X be a Banach space, K ⊂ X compact, convex and non-empty, and let f : K → K be a continuous mapping. Then there exists x ∈ K such that f (x) = x. Notice that the unit disc of a finite dimensional vector space is always convex and compact hence this theorem extends Brouwer fixed point theorem. Version: 3 Owner: paolini Author(s): paolini

503.2

proof of Schauder fixed point theorem

The idea of the proof is to reduce ourselves to the finite dimensional case. Given  > 0 notice that the family of open sets {B (x) : x ∈ K} is an open covering of K. Being K compact there exists a finite subcover, i.e. there exists N points p1 , . . . , pN of K such that the balls B (pi ) cover the whole set K. Let K be the convex hull of p1 , . . . , pN and let V be the affine N − 1 dimensional space containing these points so that K ⊂ V . Now consider a projection π : X → V such that kπ (x) − π (y)k ≤ kx − yk and define f : K → K ,

f (x) = π (f (x)).

This is a continuous function defined on a convex and compact set K of a finite dimensional vector space V . Hence by Brouwer fixed point theorem it admits a fixed point x f (x ) = x . 1815

Since K is sequentially compact we can find a sequence k → 0 such that xk = xk converges to some point x¯ ∈ K. We claim that f (¯ x) = x¯. Clearly fk (xk ) = xk → x¯. To conclude the proof we only need to show that also fk (xk ) → f (¯ x) or, which is the same, that kfk (xk ) − f (¯ x)k → 0. In fact we have kfk (xk ) − f (¯ x)k = kπk (f (xk )) − f (¯ x)k ≤ kπk (f (xk )) − f (xk )k + kf (xk ) − f (¯ x)k ≤ k + kf (xk ) − f (¯ x)k → 0 where we used the fact that kπ (x) − xk ≤  being x ∈ K contained in some ball B centered on K . Version: 1 Owner: paolini Author(s): paolini

1816

Chapter 504 46B99 – Miscellaneous 504.1

`p

Let F be either R or C, and let p ∈ R with p > 1. We define `p to be the vector space of all sequences (ai )i>0 in F such that ∞ X |ai |p i=0

exists.

`p is a normed vector space, under the norm ∞ X k(ai )kp = ( |ai |p )1/p i=0

`∞ is defined to be the vector space of all bounded sequences (ai )i>0 with norm given by k(ai )k∞ = sup{|ai | : i > 0} `∞ and `p for p > 1 are complete under these norms, making them into Banach spaces. Moreover, `2 is a Hilbert space under the inner product h(ai ), (bi )i =

∞ X

ai bi

i=0

For p > 1 the (continuous) dual space of `p is `q where `∞ , and the dual space of `∞ is `1 . Version: 10 Owner: Evandar Author(s): Evandar 1817

1 p

+

1 q

= 1. The dual space of `1 is

504.2

Banach space

A Banach space (X, k.k) is a normed vector space such that X is complete under the metric induced by the norm k.k. Some authors use the term Banach space only in the case where X is infinite dimensional, although on Planetmath finite dimensional spaces are also considered to be Banach spaces. If Y is a Banach space and X is any normed vector space, then the set of continuous linear maps f : X → Y forms a Banach space, with norm given by the operator norm. In particular, since R and C are complete, the space of all continuous linear functionals on a normed vector space is a Banach space. Version: 4 Owner: Evandar Author(s): Evandar

504.3

an inner product defines a norm

an inner product defines a norm Let F be a field, and X be an inner product space over F with an inner product h.i : p X × X → F then we can define a function k, k from X → F such that X 7→ hx, xi and this defines a norm on X. Version: 20 Owner: say 10 Author(s): say 10, apmxi

504.4

continuous linear mapping

If (V1 , k · k1 ) and (V2 , k · k2 ) are normed vector spaces, a linear mapping T : V1 → V2 is continuous if it is continuous in the metric induced by the norms. If there is a nonnegative constant c such that kT (x)k2 6 ckxk1 for each x ∈ V1 , we say that T is bounded. This should not be confused with the usual terminology referring to a bounded function as one that has bounded range. In fact, bounded linear mappings usually have unbounded ranges. The expression bounded linear mapping is often used in functional analysis to refer to continuous linear mappings as well. This is because the two definitions are equivalent: If T is bounded, then kT (x) − T (y)k2 = kT (x − y)k2 6 ckx − yk1 , so T is a lipschitz function. Now suppose T is continuous. Then there exists r > 0 such that kT (x)k2 6 1 when kxk1 6 r.

1818

For any x ∈ V1 , we then have r kT (x)k2 = kT kxk1



 r x k2 6 1, kxk1

hence kT (x)k2 6 rkxk1 ; so T is bounded. It can be shown that a linear mapping between two topological vector spaces is continuous if and only if it is continuous at 0 [3].

REFERENCES 1. W. Rudin, Functional Analysis, McGraw-Hill Book Company, 1973.

Version: 4 Owner: Koro Author(s): Koro

504.5

equivalent norms

Definition Let ||·|| and ||·||0 be two norms on a vector space V . These norms are equivalent norms if there exist positive real numbers c, d such that c||x|| ≤ ||x||0 ≤ d||x|| for all x ∈ V . An equivalent condition is that there exists a number C > 0 such that 1 ||x|| ≤ ||x||0 ≤ C||x|| C for all x ∈ V . To see the equivalence, set C = max{1/c, d}. Some key results are as follows: 1. On a finite dimensional vector space all norms are equivalent. The same is not true for vector spaces of infinite dimension [2]. It follows that on a finite dimensional vector space, one can check the convergence of a sequence with respect with any norm. If a sequence converges in one norm, it converges in all norms. 2. If two norms are equivalent on a vector space V , they induce the same topology on V [2].

1819

REFERENCES 1. E. Kreyszig, Introductory Functional Analysis With Applications, John Wiley & Sons, 1978.

Version: 3 Owner: Koro Author(s): matte

504.6

normed vector space

Let F be a field which is either R or C. A normed vector space over F is a pair (V, k.k) where V is a vector space over F and k.k : V → R is a function such that 1. kvk > 0 for all v ∈ V and kvk = 0 if and only if v = 0 in V (positive definiteness) 2. kλvk = |λ| kvk for all v ∈ V and all λ ∈ F 3. kv + wk 6 kvk + kwk for all v, w ∈ V (the triangle inequality) The function k.k is called a norm on V . Some properties of norms: 1. If W is a subspace of V then W can be made into a normed space by simply restricting the norm on V to W . This is called the induced norm on W . 2. Any normed vector space (V, k.k) is a metric space under the metric d : V × V → R given by d(u, v) = ku − vk. This is called the metric induced by the norm k.k. 3. In this metric, the norm defines a continuous map from V to R - this is an easy consequence of the triangle inequality. 4. If (V, h,pi) is an inner product space, then there is a natural induced norm given by kvk = hv, vi for all v ∈ V . Version: 5 Owner: Evandar Author(s): Evandar

1820

Chapter 505 46Bxx – Normed linear spaces and Banach spaces; Banach lattices 505.1

vector p-norm

A class of vector norms, called a p-norm and denoted || · ||p , is defined as 1

|| x ||p = (|x1 |p + · · · + |xn |p ) p

p > 1, x ∈ Rn

The most widely used are the 1-norm, 2-norm, and ∞-norm:

|| x ||1 = |x1 | + · · · + |xn | √ p |x1 |2 + · · · + |xn |2 = || xT x x ||2 = x ||∞ = max |xi | 16i6n

The 2-norm is sometimes called the Euclidean vector norm, because || x − y ||2 yields the Euclidean distance between any two vectors x, y ∈ Rn . The 1-norm is also called the taxicab metric (sometimes Manhattan metric) since the distance of two points can be viewed as the distance a taxi would travel on a city (horizontal and vertical movements). A useful fact is that for finite dimensional spaces (like Rn ) the tree mentioned norms are equivalent. Version: 5 Owner: drini Author(s): drini, Logan

1821

Chapter 506 46C05 – Hilbert and pre-Hilbert spaces: geometry and topology (including spaces with semidefinite inner product) 506.1

Bessel inequality

Let H be a Hilbert space, and suppose e1 , e2 , . . . ∈ H is an orthonormal sequence. Then for any x ∈ H, ∞ X |hx, ek i|2 ≤ kxk2 . k=1

Bessel’s inequality immediately lets us define the sum ∞ X 0 x = hx, ek i ek . k=1

The inequality means that the series converges.

For a complete orthonormal series, we have Parseval’s theorem, which replaces inequality with equality (and consequently x0 with x). Version: 2 Owner: ariels Author(s): ariels

506.2

Hilbert module

Definition 16. A 1822

Definition 39. (right) pre-Hilbert module over a C ∗ -algebra A is a right A-module E equipped with an A-valued inner product h−, −i : E × E → A, i.e. a sesquilinear pairing satisfying hu, vai = hu, via hu, vi = hv, ui∗ hv, vi > 0, with hv, vi = 0 iff v = 0,

(506.2.1) (506.2.2) (506.2.3)

for all u, v ∈ E and a ∈ A. Note, positive definiteness is well-defined due to the p notion of ∗ positivity for C -algebras. The norm of an element v ∈ E is defined by ||v|| = ||hv, vi||. Definition 17. A

Definition 40. (right) Hilbert module over a C ∗ -algebra A is a right pre-Hilbert module over A which is complete with respect to the norm. Example 23 (Hilbert spaces). A complex Hilbert space is a Hilbert C-module. Example 24 (C ∗ -algebras). A C ∗ -algebra A is a Hilbert A-module with inner product ha, bi = a∗ b. Definition 18. A Definition 41. Hilbert A-B-bimodule is a (right) Hilbert module E over a C ∗ -algebra B together with a *-homomorphism π from a C ∗ -algebra A to End(E). Version: 4 Owner: mhale Author(s): mhale

506.3

Hilbert space

A Hilbert space is an inner product space (X, h, i) which is complete under the induced metric. In particular, a Hilbert space is a Banach space in the norm induced by the inner product, since the norm and the inner product both induce the same metric. Some authors require X to be infinite dimensional for it to be called a Hilbert space. Version: 7 Owner: Evandar Author(s): Evandar

506.4

proof of Bessel inequality

Let rn = x −

n X k=1

hx, ek i · ek .

1823

Then for j = 1, . . . , n, P hrn , ej i = hx, ej i − nk=1 hhx, ek i · ek , ej i = hx, ej i − hx, ej i hej , ej i = 0

(506.4.1) (506.4.2)

so e1 , . . . , en , rn is an orthogonal series. Computing norms, we see that

2

n n n

X X X

2 2 2 |hx, ek i| ≥ |hx, ek i|2 . hx, ek i · ek = krn k + kxk = rn +

k=1

k=1

So the series

∞ X k=1

2

|hx, ek i|2

converges and is bounded by kxk , as required. Version: 1 Owner: ariels Author(s): ariels

1824

k=1

Chapter 507 46C15 – Characterizations of Hilbert spaces 507.1

classification of separable Hilbert spaces

Let H1 and H2 be infinite dimensional, separable Hilbert spaces. Then there is an isomorphism f : H1 → H2 which is also an isometry. In other words, H1 and H2 are identical as Hilbert spaces. Version: 2 Owner: Evandar Author(s): Evandar

1825

Chapter 508 46E15 – Banach spaces of continuous, differentiable or analytic functions 508.1

Ascoli-Arzela theorem

Theorem 19. Let Ω be a bounded subset of Rn and (fk ) a sequence of functions fk : Ω → Rm . If {fk } is equibounded and uniformly equicontinuous then there exists a uniformly convergent subsequence (fkj ). A more abstract (and more general) version is the following. Theorem 20. Let X and Y be totally bounded metrical spaces and let F ⊂ C(X, Y ) be an equibounded family of continuous mappings from X to Y . Then F is totally bounded (with respect to the uniform convergence metric induced by C(X, Y )). Notice that the first version is a consequence of the second. Recall, in fact, that a subset of a complete metric space is totally bounded if and only if its closure is compact (or sequentially compact). Hence Ω is totally bounded and all the functions fk have image in a totally bounded set. Being F = {fk } totally bounded means that F is sequentially compact and hence (fk ) has a convergent subsequence. Version: 6 Owner: paolini Author(s): paolini, n3o

508.2

Stone-Weierstrass theorem

Let X be a compact metric space and let C 0 (X, R) be the algebra of continuous real functions defined over X. Let A be a subalgebra of C 0 (X, R) for which the following conditions hold: 1826

1. ∀x, y ∈ X ∃f ∈ A : f (x) 6= f (y) 2. 1 ∈ A Then A is dense in C 0 (X, R). Version: 1 Owner: n3o Author(s): n3o

508.3

proof of Ascoli-Arzel theorem

Given  > 0 we aim at finding a 4-lattice in F (see the definition of totally boundedness). Let δ > 0 be given with respect to  in the definition of equi-continuity of F . Let Xδ be a δ-lattice in X and Y be a -lattice in Y . Let now YXδ be the set of functions from Xδ to Y and define G ⊂ YXδ by G = {g ∈ YXδ : ∃f ∈ F ∀x ∈ Xδ

d(f (x), g(x)) < }.

Since YXδ is a finite set, G is finite too: say G = {g1 , . . . , gN }. Then define F ⊂ F , F = {f1 , . . . , fN } where fk : X → Y is a function in F such that d(fk (x), gk (x)) <  for all x ∈ Xδ (the existence of such a function is guaranteed by the definition of G ). We now will prove that F is a 4-lattice in F . Given f ∈ F choose g ∈ Y Xδ such that for all x ∈ Xδ it holds d(f (x), g(x)) <  (this is possible as for all x ∈ Xδ there exists y ∈ Y with d(f (x), y) < ). We conclude that g ∈ G and hence g = gk for some k ∈ {1, . . . , N}. Notice also that for all x ∈ Xδ we have d(f (x), fk (x)) ≤ d(f (x), gk (x)) + d(gk (x), fk (x)) < 2. Given any x ∈ X we know that there exists xδ ∈ Xδ such that d(x, xδ ) < δ. So, by equicontinuity of F , d(f (x), fk (x)) ≤ d(f (x), f (xδ )) + d(fk (x), fk (xδ )) + d(f (xδ ), fk (xδ )) ≤ 4. Version: 3 Owner: paolini Author(s): paolini

508.4

Holder inequality

The H¨ older inequality concerns vector p-norms:

If

1 1 + = 1 then |xT y| 6 || x ||p|| y ||q p q

An important instance of a H¨older inequality is the Cauchy-Schwarz inequality. 1827

There is a version of this result for the Lp spaces. If a function f is in Lp (X), then the Lp -norm of f is denoted || f ||p . Let (X, B, µ) be a measure space. If f is in Lp (X) and g is in Lq (X) (with 1/p + 1/q = 1), then the H¨older inequality becomes

1

1

kf gk1 = intX |f g|dµ ≤ (intX |f |pdµ) p (intX |g|q dµ) q = kf kp kgkq Version: 10 Owner: drini Author(s): paolini, drini, Logan

508.5

Young Inequality

Let a, b > 0 and p, q ∈ ]0, ∞[ with 1/p + 1/q = 1. Then ab ≤

ap bq + . p q

Version: 1 Owner: paolini Author(s): paolini

508.6

conjugate index

For p, q ∈ R, 1 < p, q < ∞ we say p and q are conjugate indices if we will also define q = ∞ as conjugate to p = 1 and vice versa.

1 p

+

1 q

= 1. Formally,

Conjugate indices are used in the H¨older inequality and more generally to define conjugate spaces. Version: 4 Owner: bwebste Author(s): bwebste, drummond

508.7

proof of Holder inequality

First we prove the more general form (in measure spaces). Let (X, µ) be a measure space and let f ∈ Lp (X), g ∈ Lq (X) where p, q ∈ [1, +∞] and 1 + 1q = 1. p The case p = 1 and q = ∞ is obvious since |f (x)g(x)| ≤ kgkL∞ |f (x)|. 1828

Also if f = 0 or g = 0 the result is obvious. Otherwise notice that (applying Young inequality) we have  p  q kf gkL1 |f | |g| 1 |f | |g| 1 1 1 = intX · dµ ≤ intX dµ+ intX dµ = + = 1 kf kLp · kgkLq kf kLp kgkLq p kf kLp q kgkLq p q hence the desired inequality holds

1

1

intX |f g| = kf gkL1 ≤ kf kLp · kgkLq = (intX |f |p ) p (intX |g|q ) q . If x and y are vectors in Rn or vectors in `p and `q -spaces the proof is the same, only replace integrals with sums. If we define 1 X p p |xk | kxkp =

we have P P 1 X |xk |p 1 X |yk |q 1 1 | k xk yk | |xk ||yk | X |xk | |yk | ≤ k = ≤ + = 1. p + q = kxkp · kykq kxkp · kykq kxk p kxk q kyk p q p q p kykq k k k Version: 1 Owner: paolini Author(s): paolini

508.8

proof of Young Inequality

By the concavity of the log function we have log ab =

1 1 1 1 log ap + log bq ≤ log( ap + bq ). p q p q

By exponentiation we obtain the desired result. Version: 1 Owner: paolini Author(s): paolini

508.9

vector field

A (smooth, differentiable) vector field on a (smooth differentiable) manifold M is a (smooth, differentiable) function v : M → T M, where T M is the tangent bundle of M, which takes m to the tangent space Tm M, i.e., a section of the tangent bundle. Less formally, it can be thought of a continuous choice of a tangent vector at each point of a manifold. Alternatively, vector fields on a manifold can be identified with derivations of the algebra of (smooth, differentiable) functions. Though less intuitive, this definition can be more formally useful. Version: 8 Owner: bwebste Author(s): bwebste, slider142 1829

Chapter 509 46F05 – Topological linear spaces of test functions, distributions and ultradistributions 509.1

Tf is a distribution of zeroth order

To check that Tf is a distribution of zeroth order, we shall use condition (3) on this page. First, it is clear that Tf is a linear mapping. To see that Tf is continuous, suppose K is a compact set in U and u ∈ DK , i.e., u is a smooth function with support in K. We then have |Tf (u)| = |intK f (x)u(x)dx| ≤ intK |f (x)| |u(x)|dx ≤ intK |f (x)|dx ||u||∞. Since f is locally integrable, it follows that C = intK |f (x)|dx is finite, so |Tf (u)| ≤ C||u||∞. Thus f is a distribution of zeroth order ([2], pp. 381). 2

REFERENCES 1. S. Lang, Analysis II, Addison-Wesley Publishing Company Inc., 1969.

Version: 3 Owner: matte Author(s): matte

1830

509.2

p.v.( x1 ) is a distribution of first order

(Following [4, 2].) Let u ∈ D(U). Then supp u ⊂ [−k, k] for some k > 0. For any ε > 0, u(x)/x is Lebesgue integrable in |x| ∈ [ε, k]. Thus, by a change of variable, we have 1 p.v.( )(u) = x

lim int[ε,k]

ε→0+

u(x) − u(−x) dx. x

Now it is clear that the integrand is continuous for all x ∈ R \ {0}. What is more, the integrand approaches 2u0 (0) for x → 0, so the integrand has a removable discontinuity at x = 0. That is, by assigning the value 2u0 (0) to the integrand at x = 0, the integrand becomes continuous in [0, k]. This means that the integrand is Lebesgue measurable on [0, k]. Then,  by defining fn (x) = χ[1/n,k] u(x) − u(−x) /x (where χ is the characteristic function), and applying the Lebesgue dominated convergence theorem, we have 1 u(x) − u(−x) p.v.( )(u) = int[0,k] dx. x x It follows that p.v.( x1 )(u) is finite, i.e., p.v.( x1 ) takes values in C. Since D(U) is a vector space, if follows easily from the above expression that p.v.( x1 ) is linear. To prove that p.v.( x1 ) is continuous, we shall use condition (3) on this page. For this, suppose K is a compact subset of R and u ∈ DK . Again, we can assume that K ⊂ [−k, k] for some k > 0. For x > 0, we have |

1 u(x) − u(−x) | = | int(−x,x) u0 (t)dt| x x ≤ 2||u0||∞ ,

where ||·||∞ is the supremum norm. In the first equality we have used the Fundamental theorem of calculus f (valid since u is absolutely continuous on [−k, k]). Thus 1 | p.v.( )(u)| ≤ 2k||u0||∞ x and p.v.( x1 ) is a distribution of first order as claimed. 2

REFERENCES 1. M. Reed, B. Simon, Methods of Modern Mathematical Physics: Functional Analysis I, Revised and enlarged edition, Academic Press, 1980. 2. S. Igari, Real analysis - With an introduction to Wavelet Theory, American Mathematical Society, 1998.

Version: 4 Owner: matte Author(s): matte 1831

509.3

Cauchy principal part integral

Definition [4, 2, 2] Let C0∞ (R) be the set of smooth functions with compact support on R. Then the Cauchy principal part integral p.v.( x1 ) is mapping p.v.( x1 ) : C0∞ (R) → C defined as 1 u(x) p.v.( )(u) = lim int|x|>ε dx ε→0+ x x for u ∈ C0∞ (R). Theorem The mapping p.v.( x1 ) is a distribution of first order. That is, p.v.( x1 ) ∈ D01 (R). (proof.)

Properties 1. The distribution p.v.( x1 ) is obtained as the limit ([2], pp. 250) χn|x| 1 → p.v.( ). x x as n → ∞. Here, χ is the characteristic function, the locally integrable functions on the left hand side should be interpreted as distributions (see this page), and the limit should be taken in D0 (R). 2. Let ln |t| be the distribution induced by the locally integrable function ln |t| : R → R. Then, for the distributional derivative D, we have ([2], pp. 149) 1 D(ln |t|) = p.v.( ). x

REFERENCES 1. M. Reed, B. Simon, Methods of Modern Mathematical Physics: Functional Analysis I, Revised and enlarged edition, Academic Press, 1980. 2. S. Igari, Real analysis - With an introduction to Wavelet Theory, American Mathematical Society, 1998. 3. J. Rauch, Partial Differential Equations, Springer-Verlag, 1991.

Version: 5 Owner: matte Author(s): matte

1832

509.4

delta distribution

Let U be an open subset of Rn such that 0 ∈ U. Then the delta distribution is the mapping [2, 3, 4] δ : D(U) → C u 7→ u(0). Claim The delta distribution is a distribution of zeroth order, i.e., δ ∈ D00 (U). Proof. With obvious notation, we have δ(u + v) = (u + v)(0) = u(0) + v(0) = δ(u) + δ(v), δ(αu) = (αu)(0) = αu(0) = αδ(u), so δ is linear. To see that δ is continuous, we use condition (3) on this this page. Indeed, if K is a compact set in U, and u ∈ DK , then |δ(u)| = |u(0)| ≤ ||u||∞, where || · ||∞ is the supremum norm. 2

REFERENCES 1. J. Rauch, Partial Differential Equations, Springer-Verlag, 1991. 2. W. Rudin, Functional Analysis, McGraw-Hill Book Company, 1973. 3. M. Reed, B. Simon, Methods of Modern Mathematical Physics: Functional Analysis I, Revised and enlarged edition, Academic Press, 1980.

Version: 1 Owner: matte Author(s): matte

509.5

distribution

Definition [1] Suppose U is an open set in Rn , and suppose D(U) is the topological vector space of smooth functions with compact support. A distribution is a linear continuous functional on D(U), i.e., a linear continuous mapping D(U) → C. The set of all distributions on U is denoted by D0 (U). Suppose T is a linear functional on D(U). Then T is continuous if and only if T is continuous in the origin (see this page). This condition can be rewritten in various ways, and the below theorem gives two convenient conditions that can be used to prove that a linear mapping is a distribution. 1833

Theorem Let U be an open set in Rn , and let T be a linear functional on D(U). Then the following are equivalent: 1. T is a distribution. 2. If K is a compact set in U, and {ui }∞ i=1 be a sequence in DK such that for any multi-index α, we have D α ui → 0 in the supremum norm as i → ∞, then T (ui) → 0 in C.

3. For any compact set K in U, there are constants C > 0 and k ∈ {1, 2, . . .} such that for all u ∈ DK , we have X |T (u)| ≤ C ||D αu||∞ , (509.5.1) |α|≤k

where α is a multi-index, and || · ||∞ is the supremum norm. Proof The equivalence of (2) and (3) can be found on this page, and the equivalence of (1) and (3) is shown in [3], pp. 141. If T is a distribution on an open set U, and the same k can be used for any K in inequality (508.5.1), then T is a distribution of order k. The set of all such distributions is denoted by D 0k (U). Further, the set of all distributions of finite order on U is defined as [4] DF0 (U) = {T ∈ D 0 (U) | T ∈ D 0k (U) for some k < ∞ }. A common notation for the action of a distribution T onto a test function u ∈ D(U) (i.e., T (u) with above notation) is hT, ui. The motivation for this comes from this example. Topology for D0 (U) The standard topology for D0 (U)0 is the weak∗ topology. In this topology, a sequence {Ti }∞ i=1 of distributions (in D0 (U)) converges to a distribution D ∈ D0 (U) if and only if Ti (u) → T (u) (in C) as i → ∞ for every u ∈ D(U) [3].

REFERENCES 1. G.B. Folland, Real Analysis: Modern Techniques and Their Applications, 2nd ed, John Wiley & Sons, Inc., 1999. 2. W. Rudin, Functional Analysis, McGraw-Hill Book Company, 1973.

1834

3. L. H¨ormander, The Analysis of Linear Partial Differential Operators I, (Distribution theory and Fourier Analysis), 2nd ed, Springer-Verlag, 1990.

Version: 6 Owner: matte Author(s): matte

509.6

equivalence of conditions

Let us first show the equivalence of (2) and (3) following [4], pp. 35. First, the proof that (3) implies (2) is a direct calculation. Next, let us show that (2) implies (3): Suppose T ui → 0 in C, and if K is a compact set in U, and {ui }∞ i=1 is a sequence in DK such that for any multi-index α, we have D α ui → 0 in the supremum norm || · ||∞ as i → ∞. For a contradiction, suppose there is a compact set K in U such that for all constants C > 0 and k ∈ {0, 1, 2, . . .} there exists a function u ∈ DK such that X |T (u)| > C ||D αu||∞ . |α|≤k

Then, P for Cα = k = 1, 2, . . . we obtain functions u1 , u2 , . . . in D(K) such that |T (ui )| > i |α|≤i ||D ui||∞ . Thus |T (ui)| > 0 for all i, so for vi = ui /|T (ui)|, we have 1>i

X

|α|≤i

||D α vi ||∞ .

It follows that ||D αui ||∞ < 1/i for any multi-index α with |α| ≤ i. Thus {vi }∞ i=1 satisfies our assumption, whence T (vi ) should tend to 0. However, for all i, we have T (vi ) = 1. This contradiction completes the proof. TODO: The equivalence of (1) and (3) is given in [3]. 2

REFERENCES 1. L. H¨ormander, The Analysis of Linear Partial Differential Operators I, (Distribution theory and Fourier Analysis), 2nd ed, Springer-Verlag, 1990. 2. W. Rudin, Functional Analysis, McGraw-Hill Book Company, 1973.

Version: 3 Owner: matte Author(s): matte

1835

509.7

every locally integrable function is a distribution

Suppose U is an open set in Rn and f is a locally integrable function on U, i.e., f ∈ L1loc (U). Then the mapping Tf : D(U) → C u 7→ intU f (x)u(x)dx is a zeroth order distribution [4, 2]. (Here, D(U) is the set of smooth functions with compact support on U.) (proof) If f and g are both locally integrable functions on a open set U, and Tf = Tg , then it follows (see this page), that f = g almost everywhere. Thus, the mapping f 7→ Tf is a linear injection when L1loc is equipped with the usual equivalence relation for an Lp -space. For this reason, one also writes f for the distribution Tf [2].

REFERENCES 1. L. H¨ormander, The Analysis of Linear Partial Differential Operators I, (Distribution theory and Fourier Analysis), 2nd ed, Springer-Verlag, 1990. 2. S. Lang, Analysis II, Addison-Wesley Publishing Company Inc., 1969.

Version: 2 Owner: matte Author(s): matte

509.8

localization for distributions

Definition [1, 3] Suppose U is an open set in Rn and T is a distribution T ∈ D0 (U). Then we say that T vanishes on an open set V ⊂ U, if the restriction of T to V is the zero distribution on V . In other words, T vanishes on V , if T (v) = 0 for all v ∈ C0∞ (V ). (Here C0∞ (V ) is the set of smooth function with compact support in V .) Similarly, we say that two distributions S, T ∈ D0 (U) are equal, or coincide on V , if S − T vanishes on V . We then write: S = T on V . Theorem[1, 4] Suppose U is an open set in Rn and {Ui }i∈I is an open cover of U, i.e., [ U= Ui . i∈I

Here, I is an arbitrary index set. If S, T are distributions on U, such that S = T on each Ui , then S = T (on U). 1836

Proof. Suppose u ∈ D(U). Our aim is to show that S(u) = T (u). First, we have supp u ⊂ K for some compact K ⊂ U. It follows that SNthere exist a finite collection of Ui :s from the open cover, say U1 , . . . , UN , such that K ⊂ i=1 Ui . By a smooth partition of unity (see e.g. [2] pp. 137), there are smooth functions φ1 , . . . , φN : U → R such that 1. supp φi ⊂ Ui for all i. 2. φi (x) ∈ [0, 1] for all x ∈ U and all i, PN 3. i=1 φi (x) = 1 for all x ∈ K.

From the first property, and from a property for the support of a function, it follows that T supp φi u ⊂ supp φi supp u ⊂ Ui . Therefore, for each i, S(φiu) = T (φi u) since S and T conicide on Ui . Then S(u) =

N X

S(φiu) =

i=1

N X

T (φiu) = T (u),

i=1

and the theorem follows. 2

REFERENCES 1. G.B. Folland, Real Analysis: Modern Techniques and Their Applications, 2nd ed, John Wiley & Sons, Inc., 1999. 2. W. Rudin, Functional Analysis, McGraw-Hill Book Company, 1973. 3. L. H¨ormander, The Analysis of Linear Partial Differential Operators I, (Distribution theory and Fourier Analysis), 2nd ed, Springer-Verlag, 1990. 4. S. Igari, Real analysis - With an introduction to Wavelet Theory, American Mathematical Society, 1998.

Version: 4 Owner: matte Author(s): matte

509.9

operations on distributions

Let us assume that U is an open set in Rn . Then we can define the below operations for distributions in D 0 (U). To prove that these operations indeed give rise to other distributions, one can use condition (2) given on this page.

1837

Vector space structure of D0 (U) Suppose S, T are distributions in D0 (U) and α are complex numbers. Thus it is natural to define [4] S + T : D(U) → C u 7→ S(u) + T (u) and αT : D(U) → C u 7→ αT (u).

It is readily shown that these are again distributions. Thus D0 (U) is a complex vector space. Restriction of distribution Suppose T is a distribution in D0 (U), and V is an open subset V in U. Then the restriction of the distribution T onto V is the distribution T |V ∈ D0 (V ) defined as [4] T |V : D(V ) → C v 7→ T (v).

Again, using condition (2) on this page, one can check that T |V is indeed a distribution. Derivative of distribution Suppose T is a distribution in D0 (U), and α is a multi-index. Then the α-derivative of T is the distribution ∂ α T ∈ D0 (U) defined as ∂ α : D(U) → C u 7→ (−1)|α| T (∂ α u),

where the last ∂ α is the usual derivative defined here for smooth functions. Suppose α is a multi-index, and f : U → C is a locally integrable function, whose all partial differentials up to order |α| are continuous. Then, if Tf is the distribution induced by f , we have ([3], pp. 143) ∂ α Tf = T∂ α f . This means that the derivative for a distribution coincide with the usual definition of the derivative provided that the distribution is induced by a sufficiently smooth function. If α and β are multi-indices, then for any T ∈ D 0 (U) we have ∂ α ∂ β T = ∂ β ∂ α T.

This follows since the corresponding relation holds in D(U) (see this page). 1838

Multiplication of distribution and smooth function Suppose T is a distribution in D0 (U), and f is a smooth function on U, i.e., f ∈ C ∞ (U). Then f T is the distribution f T ∈ D0 (U) defined as f T : D(U) → C u 7→ T (f u), where f u is the smooth mapping f u : x 7→ f (x)u(x). The proof that f T is a distribution is an application of Leibniz’ rule [3].

REFERENCES 1. L. H¨ormander, The Analysis of Linear Partial Differential Operators I, (Distribution theory and Fourier Analysis), 2nd ed, Springer-Verlag, 1990. 2. W. Rudin, Functional Analysis, McGraw-Hill Book Company, 1973.

Version: 4 Owner: matte Author(s): matte

509.10

smooth distribution

Definition 1 Suppose U is an open set in Rn , suppose T is a distribution on U, i.e., T ∈ D0 (U), and suppose V is an open set V ⊂ U. Then we say that T is smooth on V , if there exists a smooth function f : V → C such that T |V = Tf . In other words, T is smooth on V , if the restriction of T to V coincides with the distribution induced by some smooth function f : V → C. Definition 2 [1, 2] Suppose U is an open set in Rn and T ∈ D0 (U). Then the singular support of T (which is denoted by sing supp T ) is the complement of the largest open set where T is smooth. Examples 1. [2] On R, let f be the function  +1 when x is irrational, f (x) = 0 when x is rational. Then the distribution induced by f , that is Tf , is smooth. Indeed, let 1 be the smooth function x 7→ 1. Since f = 1 almost everywhere, we have Tf = T1 (see this page), so Tf is smooth. 1839

2. For the delta distribution δ, we have sing supp δ = {0}. 3. For any distribution T ∈ D0 (U), we have [1] sing supp T ⊂ supp T, where supp T is the support of T . 4. Let f be a smooth function f : U → C. Then sing supp Tf is empty [1].

REFERENCES 1. J. Barros-Neta, An introduction to the theory of distributions, Marcel Dekker, Inc., 1973. 2. A. Grigis, J. Sj¨ ostrand, Microlocal Analysis for Differential Operators, Cambridge University Press, 1994. 3. J. Rauch, Partial Differential Equations, Springer-Verlag, 1991.

Version: 2 Owner: matte Author(s): matte

509.11

space of rapidly decreasing functions

Definition [4, 2] The space of rapidly decreasing functions is the function space S(Rn ) = {f ∈ C ∞ (Rn ) | sup | ||f ||α,β < ∞ for all multi-indices α, β}, x∈Rn

where C ∞ (Rn ) is the set of smooth functions from Rn to C, and ||f ||α,β = ||xα D β f ||∞ . Here, || · ||∞ is the supremum norm, and we use multi-index notation. When the dimension n is clear, it is convenient to write S = S(Rn ). The space S is also called the Schwartz space, after Laurent Schwartz (1915-2002) [3]. The set S is closed under point-wise addition and under multiplication by a complex scalar. Thus S is a complex vector space. Examples of functions in S 1. If k ∈ {0, 1, 2, . . .}, and a is a positive real number, then [2] xk exp{−ax2 } ∈ S.

2. Any smooth function with compact support f is in S. This is clear since any derivative of f is continuous, so xα D β f has a maximum in Rn . 1840

Properties 1. For any 1 ≤ p ≤ ∞, we have [2, 4] S(Rn ) ⊂ Lp (Rn ), where Lp (Rn ) is the space of p-integrable functions. 2. Using Leibniz’ rule, it follows that S is also closed under point-wise multiplication; if f, g ∈ S, then f g : x 7→ f (x)g(x) is also in S.

REFERENCES 1. L. H¨ormander, The Analysis of Linear Partial Differential Operators I, (Distribution theory and Fourier Analysis), 2nd ed, Springer-Verlag, 1990. 2. S. Igari, Real analysis - With an introduction to Wavelet Theory, American Mathematical Society, 1998. 3. The MacTutor History of Mathematics archive, Laurent Schwartz 4. M. Reed, B. Simon, Methods of Modern Mathematical Physics: Functional Analysis I, Revised and enlarged edition, Academic Press, 1980.

Version: 2 Owner: matte Author(s): matte

509.12

support of distribution

Definition [1, 2, 3, 4] Let U be an open set in Rn and let T be a distribution T ∈ D0 (U). Then the support of T is the complement of the union of all open sets V ⊂ U where T vanishes. This set is denoted by supp T . If we denote by T |V the restriction of T to the set V , then we have the following formula supp T =

[ { {V ⊂ U | V is open, and T |V = 0 } .

Examples and properties [2, 1] Let U be an open set in Rn . 1. For the delta distribution, supp δ = {0} provided that 0 ∈ U. 1841

2. For any distribution T , the support supp T is closed. 3. Suppose Tf is the distribution induced by a continuous function f : U → C. Then the above definition for the support of Tf is compatible with the usual definition for the support of the function f , i.e., supp Tf = supp f. 4. If T ∈ D0 (U), then we have for any multi-index α, supp D α T ⊂ supp T. 5. If T ∈ D0 (U) and f ∈ D(U), then supp(f T ) ⊂ supp f

\

supp T.

Theorem [2, 3] Suppose U is an open set on Rn . If T is a distribution with compact support in U, then T is a distribution of finite order. What is more, if supp T is a point, say supp T = {p}, then T is of the form X T = Cα D α δp , |α|≤N

for some N ≥ 0, and complex constants Cα . Here, δp is the delta distribution at p; δp (u) = u(p) for u ∈ D(U).

REFERENCES 1. G.B. Folland, Real Analysis: Modern Techniques and Their Applications, 2nd ed, John Wiley & Sons, Inc., 1999. 2. J. Rauch, Partial Differential Equations, Springer-Verlag, 1991. 3. W. Rudin, Functional Analysis, McGraw-Hill Book Company, 1973. 4. L. H¨ormander, The Analysis of Linear Partial Differential Operators I, (Distribution theory and Fourier Analysis), 2nd ed, Springer-Verlag, 1990. 5. R.E. Edwards, Functional Analysis: Theory and Applications, Dover Publications, 1995.

Version: 3 Owner: matte Author(s): matte

1842

Chapter 510 46H05 – General theory of topological algebras 510.1

Banach algebra

Definition 19. A Banach algebra is a Banach space with a multiplication law compatible with the norm, i.e. ||ab|| 6 ||a|| ||b|| (product inequality). Definition 20. A Definition 42. Banach *-algebra is a Banach algebra with an involution following properties: a∗∗ (ab)∗ (λa + µb)∗ ||a∗ ||

= = = =

a, b∗ a∗ , ¯ ∗+µ λa ¯b∗ ||a||.

∀λ, µ ∈ C,



satisfying the (510.1.1) (510.1.2) (510.1.3) (510.1.4)

Example 25. The algebra of bounded operators on a Banach Space is a Banach algebra for the operator norm. Version: 4 Owner: mhale Author(s): mhale

1843

Chapter 511 46L05 – General theory of C ∗-algebras 511.1

C ∗-algebra

A Definition 43. C ∗ -algebra A is a Banach *-algebra such that ||a∗ a|| = ||a||2 for all a ∈ A. Version: 2 Owner: mhale Author(s): mhale

511.2

Gelfand-Naimark representation theorem

Every C ∗ -algebra is isomorphic to a C ∗ -subalgebra (norm closed *-subalgebra) of some B(H), the algebra of bounded operators on some Hilbert space H. In particular, every finite dimensional C ∗ -algebra is isomorphic to a direct sum of matrix algebras. Version: 2 Owner: mhale Author(s): mhale

511.3

state

A Definition 44. state Ψ on a C ∗ -algebra A is a positive linear functional Ψ : A → C, Ψ(a∗ a) > 0 for all a ∈ A, with unit norm. The norm of a positive linear functional is defined by ||Ψ|| = sup{|Ψ(a)| : ||a|| 6 1}. (511.3.1) a∈A



For a unital C -algebra, ||Ψ|| = Ψ(1I). 1844

The space of states is a convex set. Let Ψ1 and Ψ2 be states, then the convex combination λΨ1 + (1 − λ)Ψ2 ,

λ ∈ [0, 1],

(511.3.2)

is also a state. A state is Definition 45. pure if it is not a convex combination of two other states. Pure states are the extreme points of the convex set of states. A pure state on a commutative C ∗ -algebra is equivalent to a character. When a C ∗ -algebra is represented on a Hilbert space H, every unit vector ψ ∈ H determines a (not necessarily pure) state in the form of an Definition 46. expectation value, Ψ(a) = hψ, aψi.

(511.3.3)

In physics, it is common to refer to such states by their vector ψ rather than the linear functional Ψ. The converse is not always true; not every state need be given by an expectation value. For example, delta functions (which are distributions not functions) give pure states on C0 (X), but they do not correspond to any vector in a Hilbert space (such a vector would not be square-integrable).

REFERENCES 1. G. Murphy, C ∗ -Algebras and Operator Theory. Academic Press, 1990.

Version: 1 Owner: mhale Author(s): mhale

1845

Chapter 512 46L85 – Noncommutative topology 512.1

Gelfand-Naimark theorem

Let Haus be the category of locally compact Hausdorff spaces with continuous proper maps as morphisms. And, let C∗ Alg be the category of commutative C ∗ -algebras with proper *-homomorphisms (send approximate units into approximate units) as morphisms. There is a contravariant functor C : Hausop → C∗ Alg which sends each locally compact Hausdorff space X to the commutative C ∗ -algebra C0 (X) (C(X) if X is compact). Conversely, there is a contravariant functor M : C∗ Algop → Haus which sends each commutative C ∗ -algebra A to the space of characters on A (with the Gelfand topology). The functors C and M are an equivalences of category. Version: 1 Owner: mhale Author(s): mhale

512.2

Serre-Swan theorem

Let X be a compact Hausdorff space. Let Vec(X) be the category of complex vector bundles over X. And, let ProjMod(C(X)) be the category of finitely generated projective modules over the C ∗ -algebra C(X). There is a functor Γ : Vec(X) → ProjMod(C(X)) which sends each complex vector bundle E → X to the C(X)-module Γ(X, E) of continuous sections. The functor Γ is an equivalences of category. Version: 1 Owner: mhale Author(s): mhale

1846

Chapter 513 46T12 – Measure (Gaussian, cylindrical, etc.) and integrals (Feynman, path, Fresnel, etc.) on manifolds 513.1

path integral

The path integral is a generalization of the integral that is very useful in theoretical and applied physics. Consider a vector field F~ : Rn → Rm and a path P ⊂ Rn . The path integral of F~ along the path P is defined as a definite integral. It can be construed to be the Riemann sum of the values of F~ along the curve P , aka the area under the curve S : P → F~ . Thusly, it is defined in terms of the parametrization of P , mapped into the domain Rn of F~ . Analytically, intP F~ · d~x = intba F~ (P~ (t)) · d~x

1 where P~ (a), P~ (b) are elements of Rn , and d~x = h dx , · · · , dxdtn idt where each xi is parametrized dt into a function of t. Proof and existence of path integral: Assume we have a parametrized curve P~ (t) with t ∈ [a, b]. We want to construct a sum of F~ over this interval on the curve P . Split the interval [a, b] into n subintervals of size ∆t = (b − a)/n. This means that the path P has been divided into n segments of lesser change in tangent vector. Note that the arc lengths need not be of equal length, though the intervals are of equal size. Let ti be an element of the ith subinterval. The quantity |P~ 0(ti )| gives the average magnitude of the vector tangent to the curve at a point in the interval ∆t. |P~ 0(ti )|∆t is then the approximate arc length of the curve segment produced by the subinterval ∆t. Since we want to sum F~ over our curve P~ , we let the range of our curve equal the domain of F~ . We can then dot this vector with our tangent vector to get the approximation to F~ at the point P~ (ti ). Thus, to get the sum we want, we can take the

1847

limit as ∆t approaches 0. lim

∆t→0

b X a

F~ (P~ (ti )) · P~ 0 (ti )∆t

This is a Riemann sum, and thus we can write it in integral form. This integral is known as a path or line integral (the older name). intP F~ · d~x = intba F~ (P~ (t)) · P~ 0 (t)dt Note that the path integral only exists if the definite integral exists on the interval [a, b]. properties: A path integral that begins and ends at the same point is called a closed path integral, and H is denoted with the summa symbol with a centered circle: . These types of path integrals can also be evaluated using Green’s theorem. Another property of path integrals is that the directed path integral on a path C in a vector field is equal to the negative of the path integral in the opposite direction along the same path. A directed path integral on a closed path is denoted by summa and a circle with an arrow denoting direction. Visualization Aids:

This is an image of a path P superimposed on a vector field F~ .

This is a visualization of what we are doing when we take the integral under the curve S : P → F~ . Version: 9 Owner: slider142 Author(s): slider142

1848

Chapter 514 47A05 – General (adjoints, conjugates, products, inverses, domains, ranges, etc.) 514.1

Baker-Campbell-Hausdorff formula(e)

Given a linear operator A, we define: exp A :=

∞ X 1 k A . k! k=0

(514.1.1)

It follows that

∂ τA e = Aeτ A = eτ A A. (514.1.2) ∂τ Consider another linear operator B. Let B(τ ) = eτ A Be−τ A . Then one can prove the following series representation for B(τ ): ∞ X τm Bm , (514.1.3) B(τ ) = m! m=0

where Bm = [A, B]m := [A, [A, . . . [A, B]]] (m times) and B0 := B. A very important special case of eq. (513.1.3) is known as the Baker-Campbell-Hausdorff (BCH) formula. Namely, for τ = 1 we get: ∞ X 1 eA Be−A = Bm . (514.1.4) m! m=0 Alternatively, this expression may be rewritten as   1 −A −A [B, e ] = e [A, B] + [A, [A, B]] + . . . , 2 1849

(514.1.5)

or

  1 (514.1.6) [e , B] = [A, B] + [A, [A, B]] + . . . eA . 2 There is a descendent of the BCH formula, which often is also referred to as BCH formula. It provides us with the multiplication law for two exponentials of linear operators: Suppose [A, [A, B]] = [B, [B, A]] = 0. Then, A

1

eA eB = eA+B e 2 [A,B] .

(514.1.7)

Thus, if we want to commute two exponentials, we get an extra factor eA eB = eB eA e[A,B] .

(514.1.8)

Version: 5 Owner: msihl Author(s): msihl

514.2

adjoint

Let H be a Hilbert space and let A : D(A) ⊂ H → H be a densely defined linear operator. Suppose that for some y ∈ H, there exists z ∈ H such that (Ax, y) = (x, z) for all x ∈ D(A). Then such z is unique, for if z 0 is another element of H satisfying that condition, we have (x, z − z 0 ) = 0 for all x ∈ D(A), which implies z − z 0 = 0 since D(A) is dense. Hence we may define a new operator A∗ : D(A∗ ) ⊂ H → H by D(A∗ ) ={y ∈ H : there isz ∈ Hsuch that(Ax, y) = (x, z)}, A∗ (y) =z.

It is easy to see that A∗ is linear, and it is called the adjoint of A∗ . Remark. The requirement for A to be densely defined is essential, for otherwise we cannot guarantee A∗ to be well defined. Version: 4 Owner: Koro Author(s): Koro

514.3

closed operator

Let B be a Banach space. A linear operator A : D(A) ⊂ B → B is said to be closed if for every sequence {xn }n∈N in D(A) converging to x ∈ B such that Axn −−−→ y ∈ B, it holds n→∞

x ∈ D(A) and Ax = y. Equivalently, A is closed if its graph is closed in B ⊕ B.

Given an operator A, not necessarily closed, if the closure of its graph in B ⊕ B happens to be the graph of some operator, we call that operator the closure of A and denote it by A. It follows easily that A is the restriction of A to D(A). The following properties are easily checked: 1850

1. Any bounded linear operator defined on the whole space B is closed; 2. If A is closed then A − λI is closed; 3. If A is closed and it has an inverse, then A−1 is also closed; 4. An operator A admits a closure if and only if for every pair of sequences {xn } and {yn } in D(A) converging to x and y, respectively, such that both {Axn } and {Ayn } converge, it holds limn Axn = limn Ayn . Version: 2 Owner: Koro Author(s): Koro

514.4

properties of the adjoint operator

Let A and B be linear operators in a Hilbert space, and let λ ∈ C. Assuming all the operators involved are densely defined, the following properties hold: 1. If A−1 exists and is densely defined, then (A−1 )∗ = (A∗ )− 1; 2. (λA)∗ = λA∗ ; 3. A ⊂ B implies B ∗ ⊂ A∗ ; 4. A∗ + B ∗ ⊂ (A + B)∗ ; 5. B ∗ A∗ ⊂ (AB)∗ ; 6. (A + λI)∗ = A∗ + λI; 7. A∗ is a closed operator. Remark. The notation A ⊂ B for operators means that B is an extension of A, i.e. A is the restriction of B to a smaller domain. Also, we have the following Proposition. 1. If A admits a closure A, then A∗ is densely defined and (A∗ )∗ = A. Version: 5 Owner: Koro Author(s): Koro

1851

Chapter 515 47A35 – Ergodic theory 515.1

ergodic theorem

Let (X, B, µ) be a space with finite measure, f ∈ L1 (X), and T : X → X be an ergodic transformation, not necessarily invertible. The ergodic theorem (often the pointwise or strong ergodic theorem) states that k−1

holds for almost all x as k → ∞.

1X f (T j x) −→ intf dµ k j=0

That is, for an ergodic transformation, the time average converges to the space average almost surely. Version: 3 Owner: bbukh Author(s): bbukh, drummond

1852

Chapter 516 47A53 – (Semi-) Fredholm operators; index theories 516.1

Fredholm index

Let P be a Fredholm operator. The Definition 47. index of P is defined as index(P ) = dim ker (P ) − dim coker(P ) = dim ker (P ) − dim ker (P ∗ ).

Note: this is well defined as ker (P ) and ker (P ∗ ) are finite-dimensional vector spaces, for P Fredholm. properties • index(P ∗ ) = − index(P ). • index(P + K) = index(P ) for any compact operator K. • If P1 : H1 → H2 and P2 : H2 → H3 are Fredholm operators then index(P2 P1 ) = index(P1 ) + index(P2 ). Version: 2 Owner: mhale Author(s): mhale

516.2

Fredholm operator

A Fredholm operator is a bounded operator that has a finite dimensional kernel and cokernel. Equivalently, it is invertible modulo compact operators. That is, if F : X → Y 1853

is a Fredholm operator between two vector spaces X and Y . Then, there exists a bounded operator G : Y → X such that GF − 1IX ∈ K(X),

F G − 1IY ∈ K(Y ),

where K(X) denotes the space of compact operators on X. If F is Fredholm then so is it’s adjoint, F ∗ . Version: 4 Owner: mhale Author(s): mhale

1854

(516.2.1)

Chapter 517 47A56 – Functions whose values are linear operators (operator and matrix valued functions, etc., including analytic and meromorphic ones 517.1

Taylor’s formula for matrix functions

Let p be a polynomial and suppose A and B commute, i.e. AB = BA, then n X 1 (k) p(A + B) = p (A)Bk . k! k=0

where n = deg(p). Version: 4 Owner: bwebste Author(s): bwebste, Johan

1855

Chapter 518 47A60 – Functional calculus 518.1

Beltrami identity

D Let q(t) be a function R → R, q˙ = Dt q, and L = L(q, q, ˙ t). Begin with the time-relative Euler-Lagrange condition   ∂ D ∂ L− L = 0. (518.1.1) ∂q Dt ∂ q˙

If

∂ L ∂t

= 0, then the Euler-Lagrange condition reduces to

∂ L = C, (518.1.2) ∂ q˙ which is the Beltrami identity. In the calculus of variations, the ability to use the Beltrami identity can vastly simplify problems, and as it happens, many physical problems have ∂ L = 0. ∂t L + q˙

In space-relative terms, with q 0 :=

D q, Dx

we have

∂ D ∂ L− L = 0. ∂q Dx ∂q 0 If

∂ L ∂x

(518.1.3)

= 0, then the Euler-Lagrange condition reduces to L + q0

∂ L = C. ∂q 0

To derive the Beltrami identity, note that     D ∂ ∂ D ∂ q˙ L = q¨ L + q˙ L Dt ∂ q˙ ∂ q˙ Dt ∂ q˙ 1856

(518.1.4)

(518.1.5)

Multiplying (1) by q, ˙ we have D ∂ q˙ L − q˙ ∂q Dt



 ∂ L = 0. ∂ q˙

(518.1.6)

Now, rearranging (5) and substituting in for the rightmost term of (6), we obtain   ∂ D ∂ ∂ q˙ L + q¨ L − (518.1.7) q˙ L = 0. ∂q ∂ q˙ Dt ∂ q˙ Now consider the total derivative ∂ ∂ ∂ D L(q, q, ˙ t) = q˙ L + q¨ L + L. Dt ∂q ∂ q˙ ∂t

(518.1.8)

∂ If ∂t L = 0, then we can substitute in the left-hand side of (8) for the leading portion of (7) to get   ∂ D D q˙ L = 0. L− (518.1.9) Dt Dt ∂ q˙ Integrating with respect to t, we arrive at

L + q˙

∂ L = C, ∂ q˙

(518.1.10)

which is the Beltrami identity. Version: 4 Owner: drummond Author(s): drummond

518.2

Euler-Lagrange differential equation

D Let q(t) be a function R → R, q˙ = Dt q, and L = L(q, q, ˙ t). The Euler-Lagrange differential equation (or Euler-Lagrange condition) is

∂ D L− ∂q Dt



 ∂ L = 0. ∂ q˙

(518.2.1)

This is the central equation of the calculus of variations. In some cases, specifically for ∂ L = 0, it can be replaced by the Beltrami identity. ∂t Version: 1 Owner: drummond Author(s): drummond

518.3

calculus of variations

Imagine a bead of mass m on a wire whose endpoints are at a = (0, 0) and b = (xf , yf ), with yf lower than the starting position. If gravity acts on the bead with force F = mg, what 1857

path (arrangement of the wire) minimizes the bead’s travel time from a to b, assuming no friction? This is the famed “brachistochrone problem,” and its solution was one of the first accomplishments of the calculus of variations. Many minimum problems can be solved using the techniques introduced here. In its general form, the calculus of variations concerns quantities S[q, q, ˙ t] = intba L(q(t), q(t), ˙ t)dt

(518.3.1)

for which we wish to find a minimum or a maximum. To make this concrete, let’s consider a much simpler problem than the brachistochrone: what’s the shortest distance between two points p = (x1, y1) and q = (x2, y2)? Let the variable s represent distance along the path, so that intqp ds = S. We wish to find the path such that S is a minimum. Zooming in on a small portion of the path, we can see that ds2 = dx2 + dy 2 p ds = dx2 + dy 2

If we parameterize the path by t, then we have s   2 2 dy dx + dt ds = dt dt Let’s assume y = f (x), so that we may simplify (4) to s  2 p dy ds = 1 + dx = 1 + f 0 (x)2 dx. dx Now we have S = intqp L dx = intx2 x1

p 1 + f 0 (x)2 dx

(518.3.2) (518.3.3)

(518.3.4)

(518.3.5)

(518.3.6)

In this case, L is particularly simple. Converting to q’s and t’s to make the comparison easier, we have L = L[f 0 (x)] = L[q(t)], ˙ not the more general L[q(t), q(t), ˙ t] covered by the calculus of variations. We’ll see later how to use our L’s simplicity to our advantage. For now, let’s talk more generally. We wish to find the path described by L, passing through a point q(a) at t = a and through q(b) at t = b, for which the quantity S is a minimum, for which small perturbations in the path produce no first-order change in S, which we’ll call a “stationary point.” This is directly analogous to the idea that for a function f (t), the minimum can be found where small perturbations δt produce no first-order change in f (t). This is where f (t + δt) ≈ f (t); taking a Taylor series expansion of f (t) at t, we find f (t + δt) = f (t) + δtf 0 (t) + O(δt2 ) = f (t), 1858

(518.3.7)

D with f 0 (t) := Dt f (t). Of course, since the whole point is to consider δt 6= 0, once we neglect terms O(δt2 ) this is just the point where f 0 (t) = 0. This point, call it t = t0 , could be a minimum or a maximum, so in the usual calculus of a single variable we’d proceed by taking the second derivative, f 00 (t0 ), and seeing if it’s positive or negative to see whether the function has a minimum or a maximum at t0 , respectively.

In the calculus of variations, we’re not considering small perturbations in t—we’re considering small perturbations in the integral of the relatively complicated function L(q, q, ˙ t), where D q(t). S is called a functional, essentially a mapping from functions to real numbers, q˙ = Dt and we can think of the minimization problem as the discovery of a minimum in S-space as we jiggle the parameters q and q. ˙ For the shortest-distance problem, it’s clear the maximum time doesn’t exist, since for any finite path length S0 we (intuitively) can always find a curve for which the path’s length is greater than S0 . This is often true, and we’ll assume for this discussion that finding a stationary point means we’ve found a minimum. Formally, we write the condition that small parameter perturbations produce no change in S as δS = 0. To make this precise, we simply write:

δS := S[q + δq, q˙ + δ q, ˙ t] − S[q, q, ˙ t]

= intba L(q + δq, q˙ + δ q)Dt ˙ − S[q, q, ˙ t]

How are we to simplify this mess? We are considering small perturbations to the path, which suggests a Taylor series expansion of L(q + δq, q˙ + δ q) ˙ about (q, q): ˙ L(q + δq, q˙ + δ q) ˙ = L(q, q) ˙ + δq

∂ ∂ L(q, q) ˙ + δ q˙ L(q, q) ˙ + O(δq 2) + O(δ q˙2 ) ∂q ∂ q˙

and since we make little error by discarding higher-order terms in δq and δ q, ˙ we have intba L(q + δq, q˙ + δ q)Dt ˙ = S[q, q, ˙ t] + intba δq

∂ ∂ L(q, q) ˙ + δ q˙ L(q, q)Dt ˙ ∂q ∂ q˙

D δq and noting that Keeping in mind that δ q˙ = Dt   D D ∂ ∂ ∂ ˙ = δq L(q, q) ˙ + δ q˙ L(q, q), ˙ δq L(q, q) Dt ∂ q˙ Dt ∂ q˙ ∂ q˙

D a simple application of the product rule Dt (f g) = f˙g + f g˙ which allows us to substitute   ∂ D ∂ D ∂ δ q˙ L(q, q) ˙ = δq L(q, q) ˙ − δq L(q, q), ˙ ∂ q˙ Dt ∂ q˙ Dt ∂ q˙

1859

we can rewrite the integral, shortening L(q, q) ˙ to L for convenience, as:   ∂ ∂ ∂ ∂ D ∂ D b b inta δq L + δ q˙ LDt = inta δq L − δq δq L Dt L+ ∂q ∂ q˙ ∂q Dt ∂ q˙ Dt ∂ q˙   ∂ D ∂ ∂ b b = inta δq L− L Dt + δq L ∂q Dt ∂ q˙ ∂ q˙ a Substituting all of this progressively back into our original expression for δS, we obtain δS = intba L(q + δq, q˙ + δ q)Dt ˙ − S[q, q, ˙ t]   ∂ ∂ = S + intba δq L + δ q˙ L Dt − S ∂q ∂ q˙   ∂ D ∂ ∂ b b = inta δq L− L Dt + δq L = 0. ∂q Dt ∂ q˙ ∂ q˙ a

Two conditions come to our aid. First, we’re only interested in the neighboring paths that still begin at a and end at b, which corresponds to the condition δq = 0 at a and b, which lets us cancel the final term. Second, between those two points, we’re interested in the paths which do vary, for which δq 6= 0. This leads us to the condition   ∂ D ∂ b inta δq L− L Dt = 0. (518.3.8) ∂q Dt ∂ q˙ The fundamental theorem of the calculus of variations is that for functions f (t), g(t) with g(t) 6= 0 ∀t ∈ (a, b), intba f (t)g(t)dt = 0 =⇒ f (t) = 0 ∀t ∈ (a, b).

(518.3.9)

Using this theorem, we obtain ∂ D L− ∂q Dt



 ∂ L = 0. ∂ q˙

(518.3.10)

This condition, one of the fundamental equations of the calculus of variations, is called the Euler-Lagrange condition. When presented with a problem in the calculus of variations, the first thing one usually does is to ask why one simply doesn’t plug the problem’s L into this equation and solve. Recall our shortest-path problem, where we had arrived at p S = intba L dx = intx2 1 + f 0 (x)2 dx. x1

(518.3.11)

Here, x takes the place of t, f takes the place of q, and (8) becomes D ∂ ∂ L− L=0 ∂f Dx ∂f 0 1860

(518.3.12)

Even with

∂ L ∂f

= 0, this is still ugly. However, because L − q0

∂ L ∂f

= 0, we can use the Beltrami identity,

∂ L = C. ∂q 0

(518.3.13)

(For the derivation of this useful little trick, see the corresponding entry.) Now we must simply solve p ∂ 1 + f 0 (x)2 − f 0 (x) 0 L = C (518.3.14) ∂q

which looks just as daunting, but quickly reduces to

1 p 2f 0 (x) 0 2 0 2 1 + f (x) − f (x) p =C 1 + f 0 (x)2 1 + f 0 (x)2 − f 0 (x)2 p =C 1 + f 0 (x)2 1 p =C 1 + f 0 (x)2 r

f 0 (x) =

(518.3.15) (518.3.16) (518.3.17) 1 − 1 = m. C2

(518.3.18)

That is, the slope of the curve representing the shortest path between two points is a constant, which means the curve must be a straight line. Through this lengthy process, we’ve proved that a straight line is the shortest distance between two points. To find the actual function f (x) given endpoints (x1 , y1 ) and (x2 , y2), simply integrate with respect to x: f (x) = intf 0 (x)dx = intbdx = mx + d (518.3.19) and then apply the boundary conditions f (x1 ) = y1 = mx1 + d f (x2 ) = y2 = mx2 + d Subtracting the first condition from the second, we get m = for the slope of a line. Solving for d = y1 − mx1 , we get f (x) =

(518.3.20) (518.3.21) y2 −y1 , x2 −x1

the standard equation

y2 − y1 (x − x1 ) + y1 x2 − x1

(518.3.22)

which is the basic equation for a line passing through (x1 , y1 ) and (x2 , y2 ). The solution to the brachistochrone problem, while slightly more complicated, follows along exactly the same lines. Version: 6 Owner: drummond Author(s): drummond

1861

Chapter 519 47B15 – Hermitian and normal operators (spectral measures, functional calculus, etc.) 519.1

self-adjoint operator

A linear operator A : D(A) ⊂ H → H in a Hilbert space H is an Hermitian operator if (Ax, y) = (x, Ay) for all x, y ∈ D(A). If A is densely defined, it is said to be a symmetric operator if it is the restriction of its adjoint A∗ to D(A), i.e. if A ⊂ A∗ ; and it is called a self-adjoint operator if A = A∗ . Version: 2 Owner: Koro Author(s): Koro

1862

Chapter 520 47G30 – Pseudodifferential operators 520.1

Dini derivative

The upper Dini derivative of a continuous function, f , denoted D + f or f+0 , is defined as D + f (t) = f+0 (t) = lim+ sup h→0

f (t + h) − f (t) . h

The lower Dini derivative, D − f or f−0 is defined as D − f (t) = f−0 (t) = lim+ inf h→0

f (t + h) − f (t) . h

If f is defined on a vector space, then the upper Dini derivative at t in the direction d is denoted f (t + hd) − f (t) . f+0 (t, d) = lim+ sup h→0 hd If f is locally Lipschitz then D + f is finite. If f is differentiable at t then the Dini derivative at t is the derivative at t. Version: 5 Owner: lha Author(s): lha

1863

Chapter 521 47H10 – Fixed-point theorems 521.1

Brouwer fixed point in one dimension

Theorem 1 [1, 1] Suppose f is a continuous function f : [−1, 1] → [−1, 1]. Then f has a fixed point, i.e., there is a x such that f (x) = x. Proof (Following [1]) We can assume that f (−1) > −1 and f (+1) < 1, since otherwise there is nothing to prove. Then, consider the function g : [−1, 1] → R defined by g(x) = f (x) − x. It satisfies g(+1) > 0, g(−1) < 0, so by the intermediate value theorem, there is a point x such that g(x) = 0, i.e., f (x) = x. 2 Assuming slightly more of the function f yields the Banach fixed point theorem. In one dimension it states the following: Theorem 2 Suppose f : [−1, 1] → [−1, 1] is a function that satisfies the following condition: for some constant C ∈ [0, 1), we have for each a, b ∈ [−1, 1], |f (b) − f (a)| ≤ C|b − a|. Then f has a unique fixed point in [−1, 1]. In other words, there exists one and only one point x ∈ [−1, 1] such that f (x) = x. Remarks The fixed point in Theorem 2 can be found by iteration from any s ∈ [−1, 1] as follows: first choose some s ∈ [−1, 1]. Then form s1 = f (s), then s2 = f (s1 ), and generally sn = f (sn−1 ). As n → ∞, sn approaches the fixed point for f . More details are given on the 1864

entry for the Banach fixed point theorem. A function that satisfies the condition in Theorem 2 is called a contraction mapping. Such mappings also satisfy the Lipschitz condition.

REFERENCES 1. A. Mukherjea, K. Pothoven, Real and Functional analysis, Plenum press, 1978.

Version: 5 Owner: mathcam Author(s): matte

521.2

Brouwer fixed point theorem

Theorem Let B = {x ∈ Rn : kxk ≤ 1} be the closed unit ball in Rn . Any continuous function f : B → B has a fixed point.

Notes Shape is not important The theorem also applies to anything homeomorphic to a closed disk, of course. In particular, we can replace B in the formulation with a square or a triangle. Compactness counts (a) The theorem is not true if we drop a point from the interior of B. For example, the map f (~x) = 21 ~x has the single fixed point at 0; dropping it from the domain yields a map with no fixed points. Compactness counts (b) The theorem is not true for an open disk. For instance, the map f (~x) = 12 ~x + ( 12 , 0, . . . , 0) has its single fixed point on the boundary of B. Version: 3 Owner: matte Author(s): matte, ariels

521.3

any topological space with the fixed point property is connected

Theorem Any topological space with the fixed point property is connected [3, 2]. Proof. Suppose X is a topological space with the fixed point property. We will show that X is connected by contradiction: suppose there are non-empty disjoint open sets A, B ⊂ X

1865

S such that X = A B. Then there are elements a ∈ A and b ∈ B, and we can define a function f : X → X by  a, when x ∈ B, f (x) = b, when x ∈ A. T S Since A B = ∅ and A B = X, the function f is well defined. Also, since f (x) and x are always in disjoint connected components of X, f can have no fixed point. To obtain a contradiction, we only need to show that f is continuous. However, if V is an open set in X, a short calculation shows that f −1 (V ) is either ∅, A, B or X, which are all open sets. Thus f is continuous, and X must be connected. 2

REFERENCES 1. G.J. Jameson, Topology and Normed Spaces, Chapman and Hall, 1974. 2. L.E. Ward, Topology, An Outline for a First Course, Marcel Dekker, Inc., 1972.

Version: 5 Owner: matte Author(s): matte

521.4

fixed point property

Definition [2, 3, 2] Suppose X is a topological space. If every continuous function f : X → X has a fixed point, then X has the fixed point property.

Example 1. Brouwer’s fixed point theorem states that in Rn , the closed unit ball with the subspace topology has the fixed point property.

Properties 1. The fixed point property is preserved under a homeomorphism. In other words, suppose f : X → Y is a homeomorphism between topological spaces X and Y . If X has the fixed point property, then Y has the fixed point property [2]. 2. any topological space with the fixed point property is connected [3, 2]. 3. Suppose X is a topological space with the fixed point property, and Y is a retract of X. Then Y has the fixed point property [3].

1866

REFERENCES 1. G.L. Naber, Topological methods in Euclidean spaces, Cambridge University Press, 1980. 2. G.J. Jameson, Topology and Normed Spaces, Chapman and Hall, 1974. 3. L.E. Ward, Topology, An Outline for a First Course, Marcel Dekker, Inc., 1972.

Version: 5 Owner: matte Author(s): matte

521.5

proof of Brouwer fixed point theorem

Proof of the Brouwer fixed point theorem: Assume that there does exist a map from f : B n → B n with no fixed point. Then let g(x) be the following map: Start at f (x), draw the ray going through x and then let g(x) be the first intersection of that line with the sphere. This map is continuous and well defined only because f fixes no point. Also, it is not hard to see that it must be the identity on the boundary sphere. Thus we have a map g : B n → S n−1 , which is the identity on S n−1 = ∂B n , that is, a retraction. Now, if i : S n−1 → B n is the inclusion map, g ◦ i = idS n−1 . Applying the reduced homology functor, we find that g∗ ◦ i∗ = idH˜ n−1 (S n−1 ) , where ∗ indicates the induced map on homology. ˜ n−1(B n ) = 0 (since B n is contractible), and that H ˜ n−1 (S n−1 ) = But, it is a well-known fact that H Z. Thus we have an isomorphism of a non-zero group onto itself factoring through a trivial group, which is clearly impossible. Thus we have a contradiction, and no such map f exists. Version: 3 Owner: bwebste Author(s): bwebste

1867

Chapter 522 47L07 – Convex sets and cones of operators 522.1

convex hull of S is open if S is open

Theorem If S is an open set in a topological vector space, then the convex hull co(S) is open [1]. As the next example shows, the corresponding result does not hold for a closed set. Example ([1], pp. 14) If S = {(x, 1/|x|) ∈ R2 | x ∈ R \ {0}}, then S is closed, but co(S) is the open half-space {(x, y) | x ∈ R, y ∈ (0, ∞)}, which is open. 2

REFERENCES 1. F.A. Valentine, Convex sets, McGraw-Hill book company, 1964.

Version: 3 Owner: drini Author(s): matte

1868

Chapter 523 47L25 – Operator spaces (= matricially normed spaces) 523.1

operator norm

Let A : V → W be a linear map between normed vector spaces V and W. We can define a function k · kop : A 7→ R+ as kAvk kAkop := sup . v∈V kvk v6=0

Equivalently, the above definition can be written as kAkop := sup kAvk = sup kAvk. v∈V kvk=1

v∈V 0
Turns out that k · kop satisfies all the properties of a norm and hence is called the operator norm (or the induced norm) of A. If kAkop exists and is finite, we say that A is a bounded linear map. The space L(V, W) of bounded linear maps from V to W also forms a vector space with k · kop as the natural norm.

523.1.1

Example

Suppose that V = (Rn , k · kp ) and W = (Rn , k · kp ), where k · kp is the vector p-norm. Then the operator norm k · kop = k · kp is the matrix p-norm. Version: 3 Owner: igor Author(s): igor 1869

Chapter 524 47S99 – Miscellaneous 524.1

Drazin inverse

A Drazin inverse of an operator T is an operator, S, such that T S = ST, S 2 T = S, T m+1 S = T m for any integer m ≥ 0. For example, a projection operator, P , is its own Drazin inverse, as P P = P 2P = P m = P for any integer m ≥ 0. Version: 2 Owner: lha Author(s): lha

1870

Chapter 525 49K10 – Free problems in two or more independent variables 525.1

Kantorovitch’s theorem

Let a0 be a point in Rn , U an open neighborhood of a0 in Rn and f~ : U → Rn a differentiable mapping, with its derivative [Df~(a0 )] invertible. Define ~h0 = −[Df~(a0 )]−1 f(a ~ 0 ) , a1 = a0 + ~h0 , U0 = {x | |x − a1 | 6 |~h0 |}

If U0 ⊂ U and the derivative [Df~(x)] satisfies the Lipschitz condition ~ 1 )] − [Df~(u2 )]| 6 M|u1 − u2 | |[Df(u for all points u1 , u2 ∈ U0 , and if the inequality 2 1 ~ ~ −1 f (a0 ) [Df (a0 )] M 6 2

is satisfied, the equation f~(x) = ~0 has a unique solution in U0 , and Newton’s method with initial guess a0 converges to it. If we replace 6 with <, then it can be shown that Newton’s method superconverges! If you want an even stronger version, one can replace |...| with the norm ||...||. logic behind the theorem:Let’s look at the useful part of the theorem: 2 1 ~ f (a0 ) [Df~(a0 )]−1 M 6 2 It is a product of three distinct properties of your function such that the product is less than or equal to a certain number, or bound. If we call the product R, then it says that a0 must be within a ball of radius R. It also says that the solution x is within this same ball. How was this ball defined ? 1871

The first term, |f~(a0 )|, is a measure of how far the function is from the domain; in the cartesian plane, it would be how far the function is from the x-axis. Of course, if we’re solving for f~(x) = ~0, we want this value to be small, because it means we’re closer to the axis. However a function can be annoyingly close to the axis, and yet just happily curve away from the axis. Thus we need more. ~ 0 )]−1 |2 is a little more difficult. This is obviously a measure of how The second term, |[Df(a fast the function is changing with respect to the domain (x-axis in the plane). The larger the derivative, the faster it’s approaching wherever it’s going (hopefully the axis). Thus, we take the inverse of it, since we want this product to be less than a number. Why it’s squared though, is because it is the denominator where a product of two terms of like units is the numerator. Thus to conserve units with the numerator, it is multiplied by itself. Combined with the first term, this also seems to be enough, but what if the derivative changes sharply, but it changes the wrong way? The third term is the Lipschitz ratio M. This measures sharp changes in the first derivative, so we can be sure that if this is small, that the function won’t try to curve away from our goal on us too sharply. By the way, the number 12 is unitless, so all the units on the left side cancel. Checking units is essential in applications, such as physics and engineering, where Newton’s method is used. Version: 18 Owner: slider142 Author(s): slider142

1872

Chapter 526 49M15 – Methods of Newton-Raphson, Galerkin and Ritz types 526.1

Newton’s method

− → Let f be a differentiable function from Rn to Rn . Newton’s method consists of starting at − → − → an a0 for the equation f (x) = 0 . Then the function is linearized at a0 by replacing the − → − → − → increment f (x) − f (a0 ) by a linear function of the increment [D f (a0 )](x − a0 ). − → − → − → Now we can solve the linear equation f (a0 ) + [D f (a0 )](x − a0 ) = 0 . Since this is a system − → − → of n linear equations in n unknowns, [D f (a0 )](x − a0 ) = − f (a0 ) can be likened to the − → → general linear system A− x = b. − → − → − → Therefore, if [D f (a0 )] is invertible, then x = a0 − [D f (a0 )]−1 f (a0 ). By renaming x to a1 , you can reiterate Newton’s method to get an a2 . Thus, Newton’s method states − → − → an+1 = an − [D f (an )]−1 f (an ) − → − → Thus we get a series of a’s that hopefully will converge to x| f (x) = 0 . When we solve − → − → an equation of the form f (x) = 0 , we call the solution a root of the equation. Thus, Newton’s method is used to find roots of nonlinear equations. Unfortunately, Newton’s method does not always converge. There are tests for neighborhoods of a0 ’s where Newton’s method will converge however. One such test is Kantorovitch’s theorem, which combines what is needed into a concise mathematical equation. Corollary 1:Newton’s Method in one dimension - The above equation is simplified in one dimension to the well-used 1873

a1 = a0 −

f (a0 ) f 0 (a0 )

This intuitively cute equation is pretty much the equation of first year calculus. :) Corollary 2:Finding a square root - So now that you know the equation, you need to know how to use it, as it is an algorithm. The construction of the primary equation, of course is the important part. Let’s see how you do it if you want to find a square root of a number b. 2 We want to find a number x (x √ for unknown), such that x = b. You might think ”why not find a number such √ that x = b ?” Well, the problem with that approach is that we don’t have a value for b, so we’d be right back where we started. However, squaring both sides of the equation to get x2 = b lets us work with the number we do know, b.) Back to x2 = b. With some manipulation, we see this means that x2 − b = 0 ! Thus we have our f (x) = 0 scenario.

We can see that f 0 (x) = 2x thus, f 0 (a0 ) = 2a0 and f (a0 ) = a20 − b. Now we have all we need to carry out Newton’s method. By renaming x to be a1 , we have

.

  1 b 1 2 (a − b) = a0 + a1 = a0 − 2a0 0 2 a0

The equation on the far right is also known as the divide and average method, for those who have not learned the full Newton’s method, and just want a fast way to find square roots. Let’s see how this works out to find the square root of a number like 2: Let x2 = 2 x2 − 2 = 0 = f (x) Thus, by Newton’s method,...

a1 = a0 −

a20 − 2 2a0

All we did was plug in the expressions f (a0 ) and f 0 (a0 ) where Newton’s method asks for them. Now we have to pick an a0 . Hmm, since √

1< 1<

√ √

2<



2<2

1874

4

let’s pick a reasonable number between 1 and 2 like 1.5 1.52 − 2 2(1.5) a1 = 1.416¯

a1 = 1.5 −

Looks like our guess was too high. Let’s see what the next iteration says a2 = 1.41¯6 −

1.41¯62 − 2 2(1.41¯6)

a2 = 1.414215686 . . . getting better =) You can use your calculator to find that √

2 = 1.414213562 . . .

Not bad for only two iterations! Of course, the more you iterate, the more decimal places your an will be accurate to. By the way, this is also how your calculator/computer finds square roots! Geometric interpretation: Consider an arbitrary function f : R → R such as f (x) = x2 − b. Say you wanted to find a root of this function. You know that in the neighborhood of x = a0 , there is a root (Maybe you used Kantorovitch’s theorem or tested and saw that the function changed signs in this neighborhood). We want to use our knowledge a0 to find an a1 that is a better approximation to x0 (in this case, closer to it on the x-axis). So we know that x0 6 a1 6 a0 or in another case a0 6 a1 6 x0 . What is an efficient algorithm to bridge the gap between a0 and x0 ? Let’s look at a tangent line to the graph. Note that the line intercepts the x-axis between a0 and x0 , which is exactly what we want. The slope of this tangent line is f 0 (a0 ) by definition of the derivative at a0 , and we know one point on the line is (a1 , 0), since that is the x-intercept. That is all we need to find the formula of the line, and solve for a1 .

1875

y − y1

= m(x − x1 )

f (a0 ) − 0 = f 0 (a0 )(a0 − a1 ) f (a0 ) f 0 (a0 )

= a0 − a1

−a1

= −a0 +

a1

= a0 −

f (a0 ) f 0 (a0 )

Substituting

Aesthetic change. Flipped the equation around.

f (a0 ) f 0 (a0 )

Newton’s method!

Version: 17 Owner: slider142 Author(s): slider142

1876

Chapter 527 51-00 – General reference works (handbooks, dictionaries, bibliographies, etc.) 527.1

Apollonius theorem

Let a, b, c the sides of a triangle and m the length of the median to the side with length a. 2 Then b2 + c2 = 2m2 + a2 .

Version: 2 Owner: drini Author(s): drini

527.2

Apollonius’ circle

Apollonius’ circle. The locus of a point moving so that the ratio of its distances from two fixed points is fixed, is a circle. If two circles C1 and C2 are fixed with radius r1 and r2 then the cicle of Apollonius of the two centers with ratio r1 /r2 is the circle whose diameter is the segment that joins the two homothety centers of the circles. Version: 1 Owner: drini Author(s): drini

1877

527.3

Brahmagupta’s formula

If a cyclic quadrilateral has sides p, q, r, s then its area is given by p (T − p)(T − q)(T − r)(T − s)

where T =

p+q+r+s . 2

Note that if s → 0, Heron’s formula is recovered.

Version: 3 Owner: drini Author(s): drini

527.4

Brianchon theorem

If an hexagon ABCDEF (not necessarily convex) is inscribed into a conic (in particular into a circle), then the three diagonals AD, BE, CF are concurrent. This theorem is the dual of Pascal line theorem. (C. Brianchon, 1806)

Version: 2 Owner: vladm Author(s): vladm

527.5

Brocard theorem

Theorem: Let ABC be a triangle. Let A0 , B 0 , C 0 be three points such as A0 ∈ (BC), B 0 ∈ (AC), C 0 ∈ (AB). Then the circumscribed circles to the triangles AB 0 C 0 , BC 0 A0 , CA0 B 0 have a point in common. This point is called Brocard point. Proof: Let M be the point in witch the circles circumscribed to the triangles AB 0 C 0 , BC 0 A0 0 M and MC 0 B are \ \ meets. Because the quadrilateral AB 0 MC 0 is discreptible, the angles AB 0 B and \ congruent. Analog because the quadrilateral BA0 MC 0 is discreptible, the angles MC 0 C are congruent. So AB 0 M and MA 0 C are congruent and MA0 CB 0 is inscribable. \ \ \ MA

Version: 2 Owner: vladm Author(s): vladm 1878

527.6

Carnot circles

If ABC is a triangle, and H is the orthocenter, then we have three circles so that every circle contains two angles and the orthocenter. The three circles are called the Carnot circles.

Version: 2 Owner: vladm Author(s): vladm

527.7

Erd¨ os-Anning Theorem

If an infinite number of points in a plane are all separated by integer distances, then all the points lie on a straight line. Version: 1 Owner: giri Author(s): giri

527.8

Euler Line

In any triangle, the orthocenter H, the centroid G and the circumcenter O are collinear, and OG/GH = 1/2. The line passing by these points is known as the Euler line of the triangle.

This line also passes by the center of the nine-point circle (or Feuerbach circle) N, and N is the midpoint of OH.

Version: 9 Owner: drini Author(s): drini

527.9

Gergonne point

Let ABC be a triangle and D, E, F where the incircle touches the sides BC, CA, AB respectively. Then the lines AD, BE, CF are concurrent, and the common point is called the Gergonne point of the triangle. Version: 3 Owner: drini Author(s): drini 1879

527.10

Gergonne triangle

Let ABC be a triangle and D, E, F where the incircle touches the sides BC, CA, AB respectively. Then triangle DEF is called the Gergonne triangle or contact triangle of ABC. Version: 2 Owner: drini Author(s): drini

527.11

Heron’s formula

The area of a triangle with side lengths a, b, c is p 4 = s(s − a)(s − b)(s − c) where s =

a+b+c 2

(the semiperimeter).

Version: 2 Owner: drini Author(s): drini

527.12

Lemoine circle

If throught the Lemoine point of a triangle are drawn parallels to the sides, the six points where these intersect the circle lie all on a same circle. This circle is called the Lemoine circle of the triangle Version: 1 Owner: drini Author(s): drini

527.13

Lemoine point

The Lemoine point of a triangle, is the intersection point of its three symmedians. (That is, the isogonal conjugate of the centroid). It is related with the Gergonne point by the following result: On any triangle ABC, the Lemoine point of its Gergonne triangle is the Gergonne point of ABC. In the picture, the blue lines are the medians, intersecting an the centroid G. The green lines are anglee bisectors intersecting at the incentre I and the red lines are symmedians. The symmedians intersect at Lemoine point L. Version: 5 Owner: drini Author(s): drini 1880

527.14

Miquel point

Let AECF be a complete quadrilateral, then the four circles circumcised to the four triangles : AED, AF B, BEC, CDF , are concurrent in a point M. This point is called the Miquel point. The Miquel point is also the focus of the parabola inscribed to AECF .

Version: 2 Owner: vladm Author(s): vladm

527.15

Mollweide’s equations

In a triangle, having the sides a, b and c opposite to the angles α, β and γ respectively the following equations hold:   α−β γ (a + b) sin = c cos 2 2

and

γ (a − b) cos = c sin 2



α−β 2



.

Version: 2 Owner: mathwizard Author(s): mathwizard

527.16

Morley’s theorem

Morley’s theorem. The points of intersections of the adjacent trisectors in any triangle, are the vertices of an equilateral triangle.

Version: 3 Owner: drini Author(s): drini

1881

527.17

Newton’s line

Let ABCD be a circumscribed quadrilateral. The middle of the two diagonals M, N and the center of the inscribed circle I are collinear. This line is called the Newton’s line

Version: 1 Owner: vladm Author(s): vladm

527.18

Newton-Gauss line

Let AECF be a complete quadrilateral, and AC, BD, EF his diagonals. Let P be the middle of AC, Q the middle of BD, and R the middle of EF . Then P, Q, R are on a same line, called the Newton-Gauss line.

Version: 1 Owner: vladm Author(s): vladm

527.19

Pascal’s mystic hexagram

If an hexagon ADBF CE (not necessarily convex) is inscribed into a conic (in particular into a circle), then the points of intersections of opposite sides (AD with F C, DBwith CE and BF with EA) are collinear. This line is called the Pascal line of the hexagon. A very special case happens when the conic degenerates into two lines, however the theorem still holds although this particular case is usually called Pappus theorem.

Version: 5 Owner: drini Author(s): drini

527.20

Ptolemy’s theorem

If ABCD is a cyclic quadrilateral, then the product of the two diagonals is equal to the sum of the products of opposite sides. 1882

AC · BD = AB · CD + AD · BC. When the quadrilateral is not cyclic we have the following inequality AB · CD + BC · AD > AC · BD Version: 5 Owner: drini Author(s): drini

527.21

Pythagorean theorem

Pythagorean theorem states: If 4ABC is a right triangle, then the square of the length of the hypothenuse is equal to the sum of the squares of the two legs. In the following picture, the purple squares add up the same area as the orange one.

AC 2 = AB 2 + BC 2 . Cosines law is a generalization of Pythagorean theorem for any triangle. Version: 12 Owner: drini Author(s): drini

527.22

Schooten theorem

Theorem: Let ABC be a equilateral triangle. If M is a point on the circumscribed circle then the equality AM = BM + CM holds. \ = BCA [ = 60◦ , the triangle Proof: Let B 0 ∈ (MA) so that MB 0 = B 0 B. Because BMA \0 ⇔ MBB 0 is equilateral, so BB 0 = MB = MB 0 . Because AB = BC, BB 0 = BM and ABB \ we have that the triangles ABB 0 and CBM are equivalent. Since MC = AB 0 we MBC have that AM = AB 0 + B 0 M = MC + MB.  1883

Version: 1 Owner: vladm Author(s): vladm

527.23

Simson’s line

Let ABC a triangle and P a point on its circumcircle (other than A, B, C). Then the feet of the perpendiculars drawn from P to the sides AB, BC, CA (or their prolongations) are collinear.

An interesting result form the realm of analytic geometry states that the envelope formed by Simson’s lines when P varies is a circular hypocycloid of three points. Version: 9 Owner: drini Author(s): drini

527.24

Stewart’s theorem

Let be given a triangle 4ABC with AB = c, BC = a, CA = b, and a point X on BC such that BX = m and XC = n. Denote with p the length of AX. Then a(p2 + mn) = b2 m + c2 n. Version: 3 Owner: drini Author(s): drini

527.25

Thales’ theorem

Let A and B be two points and C a point on the semicircle above them. Then the angle ACB is 90◦ .

Version: 3 Owner: mathwizard Author(s): mathwizard

1884

527.26

alternate proof of parallelogram law

Proof of this is simple, given the cosine law: c2 = a2 + b2 − 2ab cos φ where a, b, and c are the lengths of the sides of the triangle, and angle φ is the corner angle opposite the side of length c. Let us define the largest interior angles as angle θ. Applying this to the parallelogram, we find that d21 = u2 + v 2 − 2uv cos θ d22 = u2 + v 2 − 2uv cos (π − θ) Knowing that cos (π − θ) = − cos θ

we can add the two expressions together, and find ourselves with d21 + d22 = 2u2 + 2v 2 − 2uv cos θ + 2uv cos θ d21 + d22 = 2u2 + 2v 2 which is the theorem we set out to prove. Version: 2 Owner: drini Author(s): fiziko

527.27

alternative proof of the sines law

The goal is to prove the sine law: sin A sin B sin C 1 = = = a b c 2R where the variables are defined by the triangle

(60,30)**@- (0,0)**@- ,(50,12)*a ,(30,18)*b ,(20,-3)*c ,(9,2)*A ,(39,3)* and where R is the radius of the circumcircle that encloses our triangle. Let’s add a couple of lines and define more variables.

(60,30)**@- (0,0)**@- ,(50,12)*a ,(30,18)*b ,(20,-3)*c ,(9,2)*A ,(39,3)* So, we now know that sin A = 1885

y b

and, therefore, we need to prove

y sin B = b ba

or sin B =

y a

From geometry, we can see that sin (π − B) =

y a

So the proof is reduced to proving that sin (π − B) = sin B This is easily seen as true after examining the top half of the unit circle. So, putting all of our results together, we get sin A y = a ba sin (π − B) sin A = a b sin A sin B = a b

(527.27.1)

The same logic may be followed to show that each of these fractions is also equal to

sin C . c

For the final step of the proof, we must show that 2R =

a sin A

We begin by defining our coordinate system. For this, it is convenient to find one side that is not shorter than the others and label it with length b. (The concept of a “longest” side is not well defined in equilateral and some isoceles triangles, but there is always at least one side that is not shorter than the others.) We then define our coordinate system such that the corners of the triangle that mark the ends of side b are at the coordinates (0, 0) and (b, 0). Our third corner (with sides labelled alphbetically clockwise) is at the point (c cos A, c sin A). Let the center of our circumcircle be at (x0 , y0 ). We now have x20 + y02 = R2 (b − x0 )2 + y02 = R2 (c cos A − x0 )2 + (c sin A − y0 )2 = R2

(527.27.2) (527.27.3) (527.27.4)

as each corner of our triangle is, by definition of the circumcircle, a distance R from the circle’s center. 1886

Combining equations (3) and (2), we find (b − x0 )2 + y02 = x20 + y02 b2 − 2bx0 = 0 b = x0 2 Substituting this into equation (2) we find that y02 = R2 −

b2 4

(527.27.5)

Combining equations (4) and (5) leaves us with (c cos A − x0 )2 + (c sin A − y0 )2 c2 cos2 A − 2x0 c cos A + c2 sin2 A − 2y0 c sin A c − 2x0 cos A − 2y0 sin A c − b cos A 2 sin A (c − b cos A)2 4 sin2 A (c − b cos A)2 + b2 sin2 A c2 − 2bc cos A + b2 a2 a sin A

= x20 + y02 = 0 = 0 = y0 b2 4 2 4R sin2 A 4R2 sin2 A 4R2 sin2 A 2R

= R2 − = = = =

where we have applied the cosines law in the second to last step. Version: 3 Owner: drini Author(s): fiziko

527.28

angle bisector

For every angle, there exists a line that divides the angle into two equal parts. This line is called the angle bisector.

The interior bisector of an angle is the line or line segment that divides it into two equal angles on the same side as the angle. The exterior bisector of an angle is the line or line segment that divides it into two equal angles on the opposite side as the angle. 1887

For a triangle, the point where the angle bisectors of the three angles meet is called the incenter. Version: 1 Owner: giri Author(s): giri

527.29

angle sum identity

It is desired to prove the identities sin(θ + φ) = sin θ cos φ + cos θ sin φ and cos(θ + φ) = cos θ cos φ − sin θ sin φ Consider the figure

where we have ◦ 4Aad ⇔ 4Ccb ◦ 4Bba ⇔ 4Ddc ◦ ad = dc = 1. Also, everything is Euclidean, and in particular, the interior angles of any triangle sum to π. Call ∠Aad = θ and ∠baB = φ. From the triangle sum rule, we have ∠Ada = ∠Ddc = π2 − φ, while the degenerate angle ∠AdD = π, so that

π 2

− θ and

∠adc = θ + φ We have, therefore, that the area of the pink parallelogram is sin(θ + φ). On the other hand, we can rearrange things thus:

In this figure we see an equal pink area, but it is composed of two pieces, of areas sin φ cos θ and cos φ sin θ. Adding, we have sin(θ + φ) = sin φ cos θ + cos φ sin θ 1888

which gives us the first. From definitions, it then also follows that sin(θ + π/2) = cos(θ), and sin(θ + π) = −sin(θ). Writing cos(θ + φ) = = = =

sin(θ + φ + π/2) sin(θ) cos(φ + π/2) + cos(θ) sin(φ + π/2) sin(θ) sin(φ + π) + cos(θ) cos(φ) cos θ cos φ − sinθ sin φ

Version: 7 Owner: quincynoodles Author(s): quincynoodles

527.30

annulus

An annulus is a two-dimensional shape which can be thought of as a disc with a smaller disc removed from its center. An annulus looks like:

Note that both the inner and outer radii may take on any values, so long as the outer radius is larger than the inner. Version: 9 Owner: akrowne Author(s): akrowne

527.31

butterfly theorem

Let M be the midpoint of a chord P Q of a circle, through which two other chords AB and CD are drawn. If AD intersects P Q at X and CB intersects P Q at Y ,then M is also the midpoint of XY.

The theorem gets its name from the shape of the figure, which resembles a butterfly. Version: 5 Owner: giri Author(s): giri

527.32

centroid

The centroid of a triangle (also called center of gravity of the triangle) is the point where the three medians intersect each other. 1889

In the figure, AA0 , BB 0 and CC 0 are medians and G is the centroid of ABC. The centroid G has the property that divides the medians in the ratio 2 : 1, that is AG = 2GA0

BG = 2GB 0

CG = 2GC 0 .

Version: 5 Owner: drini Author(s): drini

527.33

chord

A chord is the line segment joining two points on a curve. Usually it is used to refer to a line segment whose end points lie on a circle. Version: 1 Owner: giri Author(s): giri

527.34

circle

Definition A circle in the plane is determined by a center and a radius. The center is a point in the plane, and the radius is a positive real number. The circle consists of all points whose distance from the center equals the radius. (In this entry, we only work with the standard Euclidean norm in the plane.) A circle determines a closed curve in the plane, and this curve is called the perimeter or circumference of the circle. If the radius of a circle is r, then the length of the perimeter is 2πr. Also, the area of the circle is πr 2 . More precisely, the interior of the perimeter has area πr 2 . The diameter of a circle is defined as d = 2r. The circle is a special case of an ellipse. Also, in three dimensions, the analogous geometric object to a circle is a sphere. The circle in analytic geometry Let us next derive an analytic equation for a circle in Cartesian coordinates (x, y). If the circle has center (a, b) and radius r > 0, we obtain the following condition for the points of the sphere, (x − a)2 + (y − b)2 = r 2 . (527.34.1) In other words, the circle is the set of all points (x, y) that satisfy the above equation. In the special case that a = b = 0, the equation is simply x2 + y 2 = r 2 . The unit circle is the circle x2 + y 2 = 1.

1890

It is clear that equation 526.34.1 can always be reduced to the form x2 + y 2 + Dx + Ey + F = 0,

(527.34.2)

where D, E, F are real numbers. Conversely, suppose that we are given an equation of the above form where D, E, F are arbitrary real numbers. Next we derive conditions for these constants, so that equation (526.34.2) determines a circle [1]. Completing the squares yields x2 + Dx + whence

There are three cases:



D2 E2 D2 E 2 + y 2 + Ey + = −F + + , 4 4 4 4

D x+ 2

2

2  E D 2 − 4F + E 2 . + y+ = 2 4

1. If D 2 −4F +E 2 > 0, then equation (526.34.2) determines a circle with center (− D2 , − E2 ) √ and radius 12 D 2 − 4F + E 2 . 2. If D 2 − 4F + E 2 = 0, then equation (526.34.2) determines the point (− D2 , − E2 ). 3. If D 2 − 4F + E 2 < 0, then equation (526.34.2) has no (real) solution in the (x, y)-plane. The circle in polar coordinates Using polar coordinates for the plane, we can parameterize the circle. Consider the circle with center (a, b) and radius r > 0 in the plane R2 . It is then natural to introduce polar coordinates (ρ, φ) for R2 \ {(a, b)} by x(ρ, φ) = a + ρ cos φ, y(ρ, φ) = b + ρ sin φ, with ρ > 0 and φ ∈ [0, 2π). Since we wish to parameterize the circle, the point (a, b) does not pose a problem; it is not part of the circle. Plugging these expressions for x, y into equation (526.34.1) yields the condition ρ = r. The given circle is thus parameterization by φ 7→ (a + ρ cos φ, b + ρ sin φ), φ ∈ [0, 2π). It follows that a circle is a closed curve in the plane. Three point formula for the circle Suppose we are given three points on a circle, say (x1 , y1 ), (x2 , y2), (x3 , y3 ). We next derive expressions for the parameters D, E, F in terms of these points. We also derive equation (526.34.3), which gives an equation for a circle in terms of a determinant. First, from equation (526.34.2), we have x21 + y12 + Dx1 + Ey1 + F = 0, x22 + y22 + Dx2 + Ey2 + F = 0, x23 + y32 + Dx3 + Ey3 + F = 0. 1891

These equations form a linear set of equations for D, E, F , i.e.,  2      x1 + y12 D x1 y1 1 x2 y2 1 · E  = − x22 + y22 . F x23 + y32 x3 y3 1

Let us denote the matrix on the left hand side by Λ. Also, let us assume that det Λ 6= 0. Then, using Cramer’s rule, we obtain  2  x1 + y12 y1 1 1 det x22 + y22 y2 1 , D=− det Λ x2 + y 2 y3 1  3 23 2  x1 x1 + y1 1 1  det x2 x22 + y22 1 , E=− det Λ x3 x23 + y32 1   x1 y1 x21 + y12 1 det x2 y2 x22 + y22 . F =− det Λ x3 y3 x23 + y32 These equations give the parameters D, E, F as functions of the three given points. Substituting these equations into equation (526.34.2) yields     2 x1 y1 1 x1 + y12 y1 1 (x2 + y 2 ) det x2 y2 1 − x det x22 + y22 y2 1 x23 + y32 y3 1 x3 y3 1   x1 x21 + y12 1 − y det x2 x22 + y22 1 x3 x23 + y32 1   x1 y1 x21 + y12 − det x2 y2 x22 + y22  = 0. x3 y3 x23 + y32

Using the cofactor expansion, we can now write the equation for the circle passing trough (x1 , y1 ), (x2 , y2 ), (x3 , y3 ) as [2, 3]   2 x + y2 x y 1 x21 + y12 x1 y1 1  det  (527.34.3) x22 + y22 x2 y2 1 = 0. x23 + y32 x3 y3 1 See also • Wikipedia’s entry on the circle.

1892

REFERENCES 1. J. H. Kindle, Schaum’s Outline Series: Theory and problems of plane of Solid Analytic Geometry, Schaum Publishing Co., 1950. 2. E. Weisstein, Eric W. Weisstein’s world of mathematics, entry on the circle. 3. L. Rade, ˙ B. Westergren, Mathematics Handbook for Science and Engineering, Studentlitteratur, 1995.

Version: 2 Owner: bbukh Author(s): bbukh, matte

527.35

collinear

A set of points are said to be collinear of they all lie on a straight line. In the following picture, A, P, B are collinear.

Version: 6 Owner: drini Author(s): drini

527.36

complete quadrilateral

Let ABCD be a quadrilateral. Let {F } = AB is a complete quadrilateral.

T

CD and {E} = BC

T

AD. Then AF CE

The complete quadrilateral has four sides : ABF , ADE, BCE, DCF , and six angles: A, B, C, D, E, F .

Version: 2 Owner: vladm Author(s): vladm

527.37

concurrent

A set of lines or curves is said to be concurrent if all of them pass through some point:

Version: 2 Owner: drini Author(s): drini 1893

527.38

cosines law

Cosines Law. Let a, b, c be the sides of a triangle, and let A the angle opposite to a. Then a2 = b2 + c2 − 2bc cos A.

Version: 9 Owner: drini Author(s): drini

527.39

cyclic quadrilateral

Cyclic quadrilateral. A quadrilateral is cyclic when its four vertices lie on a circle.

A necessary and sufficient condition for a quadrilateral to be cyclic, is that the sum of a pair of opposite angles be equal to 180◦ . One of the main results about these quadrilaterals is Ptolemy’s theorem. Also, from all the quadrilaterals with given sides p, q, r, s, the one that is cyclic has the greatest area. If the four sides of a cyclic quadrilateral are known, the area can be found using Brahmagupta’s formula Version: 4 Owner: drini Author(s): drini

527.40

derivation of cosines law

The idea is to prove the cosines law: a2 = b2 + c2 − 2bc cos θ where the variables are defined by the triangle: (60,30)**@- (0,0)**@- ,(20,-3)*c ,(7,2)*θ ,(50,12)*a ,(30,17)*b

1894

Let’s add a couple of lines and two variables, to get

(60,30)**@- (0,0)**@- ,(20,-3)*c ,(7,2)*θ ,(50,12)*a ,(30,17)*b ,(40,0) ;( This is all we need. We can use Pythagoras’ theorem to show that a2 = x2 + y 2 and

b2 = y 2 + (c + x)2

So, combining these two we get a2 = x2 + b2 − (c + x)2 a2 = x2 + b2 − c2 − 2cx − x2 a2 = b2 − c2 − 2cx So, all we need now is an expression for x. Well, we can use the definition of the cosine function to show that c + x = b cos θ x = b cos θ − c With this result in hand, we find that a2 a2 a2 a2

= = = =

b2 − c2 − 2cx b2 − c2 − 2c (b cos θ − c) b2 − c2 − 2bc cos θ + 2c2 b2 + c2 − 2bc cos θ

(527.40.1)

Version: 2 Owner: drini Author(s): fiziko

527.41

diameter

The diameter of a circle or a sphere is the length of the segment joining a point with the one symmetrical respect to the center. That is, the length of the longest segment joining a pair of points. Also, we call any of these segments themselves a diameter. So, in the next picture, AB is a diameter. 1895

The diameter is equal to twice the radius. Version: 17 Owner: drini Author(s): drini

527.42

double angle identity

The double-angle identities are

sin(2a) = 2 cos(a) sin(a) cos(2a) = 2 cos2 (a) − 1 = 1 − 2 sin2 (a) 2 tan(a) tan(2a) = 1 + tan2 (a)

(527.42.1) (527.42.2) (527.42.3)

These are all derived from their respective trig addition formulas. For example,

sin(2a) = sin(a + a) = cos(a) sin(a) + sin(a) cos(a) = 2 cos(a) sin(a) The formula for cosine follows similarly, and tangent is derived by taking the ratio of sine to cosine, as always. The double-angle formulae can also be derived from the de Moivre identity. Version: 5 Owner: akrowne Author(s): akrowne

527.43

equilateral triangle

A triangle with its three sides and its three angles equal.

Therefore, an equilateral triangle has 3 angles of 60◦ . Due to the congruence criterion sideside-side, an equilateral triangle gets completely detrmined by specifying its side. 1896

In an equilateral triangle, the bisector of any angle coincides with the height, the median and the perpendicular bisector of the opposite side. √ If r is the length of the side, then the height is equal to r 3/2. Version: 3 Owner: drini Author(s): drini

527.44

fundamental theorem on isogonal lines

Let 4ABC be a triangle and AX, BY, CZ three concurrent lines at P . If AX 0 , BY 0 , CZ 0 are the respective isogonal conjugate lines for AX, BY, CZ, then AX 0 , BY 0 , CZ 0 are also concurrent at some point P 0 . An applications of this theorem proves the existence of Lemoine point (for it is the intersection point of the symmedians): This theorem is a direct consequence of Ceva’s theorem (trigonometric version). Version: 1 Owner: drini Author(s): drini

527.45

height

Let ABC be a given triangle. A height of ABC is a line drawn from a vertex to the opposite side (or its prolongations) and perpendicular to it. So we have three heights in any triangle. The three heights are always concurrent and the common point is called orthocenter. In the following figure, AD, BE and CE are heights of ABC.

Version: 2 Owner: drini Author(s): drini

527.46

hexagon

An hexagon is a 6-sided polygon.

Figure. A regular hexagon. 1897

Version: 2 Owner: drini Author(s): drini

527.47

hypotenuse

Let ABC a right triangle with right angle at C. Then AB is called hypotenuse.

The middle point P of the hypotenuse coincides with the circumcenter of the triangle, so it is equidistant from the three vertices. When the triangle is inscribed on his circumcircle, the hypotenuse becomes a diameter. So the distance from P to the vertices is precisely the circumradius. The hypotenuse’s lenght can be calculated by means of the Pythagorean theorem: √ c = a2 + b2 Sometimes, the longest side of a triangle is also called an hypotenuse but this naming is seldom seen. Version: 5 Owner: drini Author(s): drini

527.48

isogonal conjugate

Let 4ABC be a triangle, AL the angle bisector of ∠BAC and AX any line passing through A. The isogonal conjugate line to AX is the line AY obtained by reflecting the line AX on the angle bisector AL. In the picture ∠Y AL = ∠LAX. This is the reason why AX and AY are called isogonal conjugates, since they form the same angle with AL. (iso= equal, gonal = angle). Let P be a point on the plane. The lines AP, BP, CP are concurrent by construction. Consider now their isogonals conjugates (reflections on the inner angle bisectors). The isogonals conjugates will also concurr by the fundamental theorem on isogonal lines, and their intersection point Q is called the isogonal conjugate of P . If Q is the isogonal conjugate of P , then P is the isogonal conjugate of Q so both are often referred as an isogonal conjugate pair. An example of isogonal conjugate pair is found by looking at the centroid of the triangle and the Lemoine point. Version: 4 Owner: drini Author(s): drini 1898

527.49

isosceles triangle

A triangle with two equal sides. This definition implies that any equilateral triangle is isosceles too, but there are isosceles triangles that are not equilateral. In any isosceles triangle, the angles opposite to the equal sides are also equal. In an equilateral triangle, the height, the median and the bisector to the third side are the same line. Version: 5 Owner: drini Author(s): drini

527.50

legs

The legs of a triangle are the two sides which are not the hypotenuse. Above: Various triangles, with legs in red.

Note that there are no legs for isosceles or right triangles, just as there is no notion of hypotenuse for these special triangles. Version: 3 Owner: akrowne Author(s): akrowne

527.51

medial triangle

The medial triangle of a triangle ABC is the triangle formed by joining the midpoints of the sides of the triangle 4ABC.

Here, 4A0 B 0 C 0 is the medial triangle. The incircle of the medial triangle is called the Spieker circle and the incenter is called the Spieker center. The circumcircle of the medial triangle is called the medial circle. An important property of the medial triagle is that the medial triangle 4A0 B 0 C 0 of the medial triangle 4DEF of 4ABC is similar to 4ABC.

Version: 2 Owner: giri Author(s): giri 1899

527.52

median

The median of a triangle is a line joining a vertex with the midpoint of the opposite side. In the next figure, AA0 is a median. That is, BA0 = A0 C, or equivalently, A0 is the midpoint of BC.

Version: 7 Owner: drini Author(s): drini

527.53

midpoint

If AB is a segment, then its midpoint is the point P whose distances from B and C are equal. That is, AP = P B. With the notation of directed segments, it’s the point on the line that contains AB such −→ AP that the ratio − − → = 1. PB

Version: 2 Owner: drini Author(s): drini

527.54

nine-point circle

The nine point circle also known as the Euler’s circle or the Feuerbach circle is the circle that passes through the feet of perpendiculars from the vertices A, B and C of a triangle 4ABC.

Some of the properties of this circle are: Property 1 : This circle also passes through the midpoints of the sides AB, BC and CA of 4ABC. This was shown by Euler. Property 2 : Feuerbach showed that this circle also passes through the midpoints of the line segments AH, BH and CH which are drawn from the vertices of 4ABC to its orthocenter H. 1900

These three triples of points make nine in all, giving the circle its name. Property 3 : The radius of the nine-point cirlce is R/2, where R is the circumradius (radius of the circumcircle). Property 4 : The center of the nine-point circle is the midpoint of the line segment joining the orthocenter and the circumcenter, and hence lies on the Euler Line. Property 5 : All triangles inscribed in a given circle and having the same orthocenter, have the same nine-point circle. Version: 3 Owner: mathwizard Author(s): mathwizard, giri

527.55

orthic triangle

If ABC is a triangle and AD, DE, CF are its three heights, then the triangle DEF is called the orthic triangle of ABC. A remarkable property of orthic triangles says that the orthocenter of ABC is also the incenter of the orthic triangle DEF . That is, the heights or ABC are the angle bisectors of DEF .

Version: 2 Owner: drini Author(s): drini

527.56

orthocenter

The orthocenter of a triangle is the point of intersection of its three heights.

In the figure, H is the orthocenter of ABC. The orthocenter H lies inside, on a vertex or outside the triangle depending on the triangle being acute, right or obtuse respectively. Orthocenter is one of the most important triangle centers and it is very related with the orthic triangle (formed by the three height’s foots). It lies on the Euler Line and the four quadrilaterals F HDB, CHEC, AF HE are cyclic. Version: 3 Owner: drini Author(s): drini 1901

527.57

parallelogram

A quadrilateral whose opposite sides are parallel. Some special parallelograms have their own names: squares, rectangles, rhombuses. A rectangle is a parallelogram whose 4 angles are equal, a rhombus is a parallelogram whose 4 sides are equal, and a square is a parallelogram that is a rectangle and a rhombus at the same time. All parallelograms have their opposite sides and opposite angles equal (moreover, if a quadrilateral has a pair of opposite sides equal and parallel, the quadrilateral must be a parallelogram). Also, adjacent angles always add up 180◦ , and the diagonals cut each other by the midpoint. There is also a neat relation between the length of the sides and the length of the diagonals called parallelogram law. Version: 2 Owner: drini Author(s): drini

527.58

parallelogram law

Let ABCD be a parallelogram with side lengths u, v and whose diagonals have lengths d1 and d2 then 2u2 + 2v 2 = d21 + d22 .

Version: 3 Owner: drini Author(s): drini

527.59

pedal triangle

The pedal triangle of any 4ABC is the triangle, whose vertices are the feet of perpendiculars from A, B, and C to the opposite sides of the triangle. In the figure 4DEF is the pedal triangle.

In general, for any point P inside a triangle, the pedal triangle of P is one whose vertices are the feet of perpendiculars from P to the sides of the triangle. 1902

Version: 3 Owner: giri Author(s): giri

527.60

pentagon

A pentagon is a 5-sided polygon. Regular pentagons are of particular interest for geometers. On a regular pentagon, the inner angles are equal to 108◦ . All the diagonals have the same length. If s is the length of a side and d is the length of a diagonal, then √ 1+ 5 d = s 2 that is, the ratio between a diagonal and a side is the golden number. Version: 1 Owner: drini Author(s): drini

527.61

polygon

A polygon is a plane region delimited by straight lines. Some polygons have special names Number of sides 3 4 5 6 7 8

Name of the polygon triangle quadrilateral pentagon hexagon Heptagon Octagon

In general, a polygon with n sides is called an n-gon. In an n-gon, there are n points where two sides meet. These are called the vertices of the n-gon. At each vertex, the two sides that meet determine two angles: the interior angle and the exterior angle. The former angle opens towards to the interior of the polygon, and the latter towards to exterior of the polygon. Below are some properties for polygons. 1. The sum of all its interior angles is (n − 2)180◦. 2. Any polygonal divides the plane into two components, one bounded (the inside of the polygon) and one unbounded. This result is the Jordan curve theorem for polygons. A direct proof can be found in [2], pp. 16-18. 1903

3. In complex analysis, the Schwarz-Christoffel transformation [2] gives a conformal map from any polygon to the upper half plane. 4. The area of a polygon can be calculated using Pick’s theorem.

REFERENCES 1. E.E. Moise, Geometric Topology in Dimensions 2 and 3, Springer-Verlag, 1977. 2. R.A. Silverman, Introductory Complex Analysis, Dover Publications, 1972.

Version: 5 Owner: matte Author(s): matte, drini

527.62

proof of Apollonius theorem

Let b = CA, a = BC, c = AB, and m = AM. Let ∠CMA = θ, so that ∠BMA = π − θ. 2

By the law of cosines, b2 = m2 + a4 − am cos θ and c2 = m2 + 2 m2 + a4 + am cos θ, and adding gives b2 + c2 = 2m2 +

a2 . 2

QED Version: 1 Owner: quincynoodles Author(s): quincynoodles

527.63

proof of Apollonius theorem

Let m be a median of the triangle, as shown in the figure. By Stewart’s theorem we have  a a  a 2  2 + c2 a m + = b2 2 2 2 and thus 2

m +

 a 2 2

b2 + c2 . = 2

Multiplying both sides by 2 gives 2m2 +

a2 = b2 + c2 . 2 1904

a2 4

− am cos(π − θ) =

QED Version: 2 Owner: drini Author(s): drini

527.64

proof of Brahmagupta’s formula

We shall prove that the area of a cyclic quadrilateral with sides p, q, r, s is given by p (T − p)(T − q)(T − r)(T − s)

where T =

p+q+r+s . 2

Area of the cyclic quadrilateral = Area of 4ADB+ Area of 4BDC. 1 1 = pq sin A + rs sin C 2 2 But since ABCD is a cyclic quadrilateral, ∠DAB = 180◦ − ∠DCB. Hence sin A = sin C. Therefore area now is 1 1 Area = pq sin A + rs sin A 2 2 1 (Area)2 = sin2 A(pq + rs)2 4 2 4(Area) = (1 − cos2 A)(pq + rs)2 4(Area)2 = (pq + rs)2 − cos2 A(pq + rs)2

Applying cosines law for 4ADB and 4BDC and equating the expressions for side DB, we have p2 + q 2 − 2pq cos A = r 2 + s2 − 2rs cos C Substituting cos C = − cos A (since angles A and C are suppplementary) and rearranging, we have 2 cos A(pq + rs) = p2 + q 2 − r 2 − s2 substituting this in the equation for area, 1 4(Area)2 = (pq + rs)2 − (p2 + q 2 − r 2 − s2 )2 4 16(Area)2 = 4(pq + rs)2 − (p2 + q 2 − r 2 − s2 )2

which is of the form a2 − b2 and hence can be written in the form (a + b)(a − b) as (2(pq + rs) + p2 + q 2 − r 2 − s2 )(2(pq + rs) − p2 − q 2 + r 2 + s2 ) 1905

= ((p + q)2 − (r − s)2 )((r + s)2 − (p − q)2 ) = (p + q + r − s)(p + q + s − r)(p + r + s − q)(q + r + s − p) Introducing T =

p+q+r+s , 2

16(Area)2 = 16(T − p)(T − q)(T − r)(T − s) Taking square root, we get Area =

p

(T − p)(T − q)(T − r)(T − s)

Version: 3 Owner: giri Author(s): giri

527.65

proof of Erd¨ os-Anning Theorem

Let A, B and C be three non-collinear points. For an additional point P consider the triangle ABP . By using the triangle inequality for the sides P B and P A we find −|AB| 6 |P B| − |P A| 6 |AB|. Likewise, for triangle BCP we get −|BC| 6 |P B| − |P C| 6 |BC|. Geometrically, this means the point P lies on two hyperbola with A and B or B and C respectively as foci. Since all the lengths involved here are by assumption integer there are only 2|AB| + 1 possibilities for |P B| − |P A| and 2|BC| + 1 possibilities for |P B| − |P C|. These hyperbola are distinct since they don’t have the same major axis. So for each pair of hyperbola we can have at most 4 points of intersection and there can be no more than 4(2|AB| + 1)(2|BC| + 1) points satisfying the conditions. Version: 1 Owner: lieven Author(s): lieven

527.66

proof of Heron’s formula

Let α be the angle between the sides b and c, then we get from the cosines law: cos α = Using the equation sin α = we get: sin α = Now we know:



b2 + c2 − a2 . 2bc √

1 − cos2 α

−a4 − b4 − c4 + 2b2 c2 + 2a2 b2 + 2a2 c2 . 2bc 1 ∆ = bc sin α. 2 1906

So we get: 1√ 4 −a − b4 − c4 + 2b2 c2 + 2a2 b2 + 2a2 c2 4 1p (a + b + c)(b + c − a)(a + c − b)(a + b − c) = 4 p s(s − a)(s − b)(s − c). =

∆ =

This is Heron’s formula. 2

Version: 2 Owner: mathwizard Author(s): mathwizard

527.67

proof of Mollweide’s equations

We transform the equation   γ α−β (a + b) sin = c cos 2 2 to



   α β α β α β α β a cos + + b cos + = c cos cos + c sin sin , 2 2 2 2 2 2 2 2 using the fact that γ = π − α − β. The left hand side can be further expanded, so that we get:     β α β β α β β α β α α α a cos cos − sin sin +b cos cos − sin sin = c cos cos +c sin sin . 2 2 2 2 2 2 2 2 2 2 2 2 Collecting terms we get: (a + b − c) cos Using s :=

a+b+c 2

α β α β cos − (a + b + c) sin sin = 0. 2 2 2 2

and using the equations r α (s − b)(s − c) sin = 2 bc r β s(s − a) cos = 2 bc

we get:

r r s(s − c) (s − a)(s − b) s(s − c) (s − a)(s − b)) 2 −2 = 0, c ab c ab which is obviously true. So we can prove the first equation by going backwards. The second equation can be proved in quite the same way. Version: 1 Owner: mathwizard Author(s): mathwizard 1907

527.68

proof of Ptolemy’s inequality

Looking at the quadrilateral ABCD we construct a point E, such that the triangles ACD and AEB are similar (∠ABE = ∠CDA and ∠BAE = ∠CAD).

This means that:

AE AB BE = = , AC AD DC

from which follows that BE =

AB · DC . AD

Also because ∠EAC = ∠BAD and

AB AD = AC AE the triangles EAC and BAD are similar. So we get: EC =

AC · DB . AD

Now if ABCD is cyclic we get ∠ABE + ∠CBA = ∠ADC + ∠CBA = 180◦ . This means that the points C, B and E are on one line ans thus: EC = EB + BC Now we can use the formulas we already found to get: AB · DC AC · DB = + BC. AD AD Multiplication with AD gives: AC · DB = AB · DC + BC · AD. Now we look at the case that ABCD is not cyclic. Then ∠ABE + ∠CBA = ∠ADC + ∠CBA 6= 180◦ , so the points E, B and C form a triangle and from the triangle inequality we know: EC < EB + BC.

1908

Again we use our formulas to get: AB · DC AC · DB < + BC. AD AD From this we get: AC · DB < AB · DC + BC · AD. Putting this together we get Ptolomy’s inequality: AC · DB 6 AB · DC + BC · AD, with equality iff ABCD is cyclic. Version: 1 Owner: mathwizard Author(s): mathwizard

527.69

proof of Ptolemy’s theorem

Let ABCD be a cyclic quadrialteral. We will prove that AC · BD = AB · CD + BC · DA.

Find a point E on BD such that ∠BCA = ∠ECD. Since ∠BAC = ∠BDC for opening the same arc, we have triangle similarity 4ABC ∼ 4DEC and so AB CA = DE CD which implies AC · ED = AB · CD. Also notice that 4ADC ∼ 4BEC since have two pairs of equal angles. The similarity implies AC AD = BC BE which implies AC · BE = BC · DA. So we finally have AC · BD = AC(BE + ED) = AB · CD + BC · DA. Version: 8 Owner: drini Author(s): drini

1909

527.70

proof of Pythagorean theorem

Let ABC be a right triangle with hypotenuse BC. Draw the height AT . Using the right angles ∠BAC and ∠AT B and the fact that the sum of angles on any triangle is 180◦ , it can be shown that ∠BAT = ∠ACT ∠T AC = ∠CBA and therefore we have the following triangle similarities: 4ABC ∼ 4T BA ∼ 4T AC. AB = From those similarities, we have BC 2 thus AC = BC · T C. We have then

TB BA

and thus AB 2 = BC · T B. Also

AC BC

=

TC AC

and

AB 2 + AC 2 = BC(BT + T C) = BC · BC = BC 2

which concludes the proof.

Version: 5 Owner: drini Author(s): drini

527.71

proof of Pythagorean theorem

This is a geometrical proof of Pythagorean theorem. We begin with our triangle: (20,10)**@- (0,0)**@- ,(10,-2)*a ,(23,6)*b ,(10,7)*c

Now we use the hypotenuse as one side of a square: (20,10)**@- (0,0)**@- (-10,20)**@- (10,30)**@- (20,10)**@- ,(10,

and draw in four more identical triangles (20,10)**@- (0,0)**@- (-10,20)**@- (10,30)**@- (20,10)**@- (20,3 Now for the proof. We have a large square, with each side of length a+b, which is subdivided into one smaller square and four triangles. The area of the large square must be equal to the combined area of the shapes it is made out of, so we have   1 2 2 (a + b) = c + 4 ab 2 a2 + b2 + 2ab = c2 + 2ab a2 + b2 = c2 (527.71.1) Version: 4 Owner: drini Author(s): fiziko 1910

527.72

proof of Simson’s line

Given a 4ABC with a point P on its circumcircle (other than A, B, C), we will prove that the feet of the perpendiculars drawn from P to the sides AB, BC, CA (or their prolongations) are collinear.

Since P W is perpendicular to BW and P U is perpendicular to BU the point P lies on the circumcircle of 4BUW . By similar arguments, P also lies on the circumcircle of 4AW V and 4CUV . This implies that P UBW , P UCV and P V W A are all cyclic quadrilaterals. Since P UBW is a cyclic quadrilateral, ∠UP W = 180◦ − ∠UBW implies ∠UP W = 180◦ − ∠CBA Also CP AB is a cyclic quadrilateral, therefore ∠CP A = 180◦ − ∠CBA (opposite angles in a cyclic quarilateral are supplementary). From these two, we get ∠UP W = ∠CP A Subracting ∠CP W , we have ∠UP C = ∠W P A Now, since P V W A is a cyclic quadrilateral, ∠W P A = ∠W V A

1911

also, since UP V C is a cyclic quadrilateral, ∠UP C = ∠UV C Combining these two results with the previous one, we have ∠W V A = ∠UV C This implies that the points U, V, W are collinear. Version: 6 Owner: giri Author(s): giri

527.73

proof of Stewart’s theorem

Let θ be the angle ∠AXB. Cosines law on 4AXB says c2 = m2 + p2 − 2pm cos θ and thus cos θ =

m2 + p2 − c2 2pm

Using cosines law on 4AXC and noting that ψ = ∠AXC = 180◦ −θ and thus cos θ = − cos ψ we get b2 − n2 − p2 cos θ = . 2pn From the expressions above we obtain 2pn(m2 + p2 − c2 ) = 2pm(b2 − n2 − p2 ). By cancelling 2p on both sides and collecting we are lead to m2 n + mn2 + p2 n + p2 m = b2 m + c2 n and from there mn(m + n) + p2 (m + n) = b2 m + c2 n. Finally, we note that a = m + n so we conclude that a(mn + p2 ) = b2 m + c2 n. QED Version: 3 Owner: drini Author(s): drini

1912

527.74

proof of Thales’ theorem

Let M be the center of the circle through A, B and C.

Then AM = BM = CM and thus the triangles AMC and BMC are isosceles. If ∠BMC =: α then ∠MCB = 90◦ − α2 and ∠CMA = 180◦ − α. Therefore ∠ACM = α2 and ∠ACB = ∠MCB + ∠ACM = 90◦ . QED. Version: 3 Owner: mathwizard Author(s): mathwizard

527.75

proof of butterfly theorem

Given that M is the midpoint of a chord P Q of a circle and AB and CD are two other chords passing through M, we will prove that M is the midpoint of XY, where X and Y are the points where AD and BC cut P Q respectively.

Let O be the center of the circle. Since OM is perpendicular to XY (the line from the center of the circle to the midpoint of a chord is perpendicular to the chord), to show that XM = MY, we have to prove that ∠XOM = ∠Y OM. Drop perpendiculars OK and ON from O onto AD and BC, respectively. Obviously, K is the midpoint of AD and N is the midpoint of BC. Further, ∠DAB = ∠DCB and ∠ADC = ∠ABC as angles subtending equal arcs. Hence triangles ADM and CBM are similar and hence AD BC = AM CM or

AK CN = KM NM In other words, in triangles AKM and CNM, two pairs of sides are proportional. Also the angles between the corresponding sides are equal. We infer that the triangles AKM and CNM are similar. Hence ∠AKM = ∠CNM. 1913

Now we find that quadrilaterals OKXM and ONY M both have a pair of opposite straight angles. This implies that they are both cyclic quadrilaterals. In OKXM, we have ∠AKM = ∠XOM and in ONY M, we have ∠CNM = ∠Y OM. From these two, we get ∠XOM = ∠Y OM. Therefore M is the midpoint of XY. Version: 2 Owner: giri Author(s): giri

527.76

proof of double angle identity

sine: sin(2a) = sin(a + a) = sin(a) cos(a) + cos(a) sin(a) = 2 sin(a) cos(a). cosine: cos(2a) = cos(a + a) = cos(a) cos(a) + sin(a) sin(a) = cos2 (a) − sin2 (a). By using the identity sin2 (a) + cos2 (a) = 1 we can change the expression above into the alternate forms cos(2a) = 2 cos2 (a) − 1 = 1 − 2 sin2 (a). tangent: tan(2a) = tan(a + a) tan(a) + tan(a) = 1 − tan(a) tan(a) 2 tan(a) = . 1 − tan2 (a) Version: 1 Owner: drini Author(s): drini 1914

527.77

proof of parallelogram law

The proof follows directly from Apollonius theorem noticing that each diagonal is a median for the triangles in which parallelogram is split by the other diagonal. And also, diagonales bisect each other. Therefore, Apollonius theorem implies  2  2 d2 d1 + = u2 + v 2 . 2 2 2 Multiplying both sides by 2 and simplification leads to the desired expression. Version: 1 Owner: drini Author(s): drini

527.78

proof of tangents law

To prove that

tan( A−B ) a−b 2 = a+b tan( A+B ) 2

we start with the sines law, which says that b a = . sin(A) sin(B) This implies that a sin(B) = b sin(A) We can write sin(A) as sin(A) = sin(

A−B A+B A−B A+B ) cos( ) + cos( ) sin( ). 2 2 2 2

sin(B) = sin(

A+B A−B A+B A−B ) cos( ) − cos( ) sin( ). 2 2 2 2

and sin(B) as

Therefore, we have a(sin(

A−B A+B A−B A+B A−B A+B A−B A+B ) cos( )−cos( ) sin( )) = b(sin( ) cos( )+cos( ) sin( ) 2 2 2 2 2 2 2 2

Dividing both sides by cos( A−B ) cos( A+B ), we have, 2 2 a(tan(

A+B A−B A+B A−B ) − tan( )) = b(tan( ) + tan( )) 2 2 2 2 1915

This gives us

Hence we find that

) + tan( A−B ) tan( A+B a 2 2 = A+B A−B b tan( 2 ) − tan( 2 ) ) tan( A−B a−b 2 = . A+B a+b tan( 2 )

Version: 2 Owner: giri Author(s): giri

527.79

quadrilateral

A four-sided polygon. A very special kind of quadrilaterals are parallelograms (squares, rhombuses, rectangles, etc) although cyclic quadrilaterals are also interesting on their own. Notice however, that there are quadrilaterals that are neither parallelograms nor cyclic quadrilaterals. [Graphic will go here] Version: 2 Owner: drini Author(s): drini

527.80

radius

The radius of a circle or sphere is the distance from the center of the figure to the outer edge (or surface.) This definition actually holds in n dimensions; so 4th and 5th and k-dimensional “spheres” have radii. Since a circle is really a 2-dimensional sphere, its “radius” is merely an instance of the general definition. Version: 2 Owner: akrowne Author(s): akrowne

527.81

rectangle

A parallelogram whose four angles are equal, that is, whose 4 angles are equal to 90◦ . Rectangles are the only parallelograms that are also cyclic. Notice that every square is also a rectangle, but there are rectangles that are not squares [graphic] 1916

Any rectangle has their 2 diagonals equal (and rectangles are the only parallelograms with this property). A nice result following from this, is that joining the midpoints of the sides of a rectangle always gives a rhombus. Version: 1 Owner: drini Author(s): drini

527.82

regular polygon

A regular polygon is a polygon with all its sides equal and all its angles equal, that is, a polygon that is both equilateral and equiangular. Some regular polygons get special names. So, an regular triangle is also known as an equilateral triangle, and a regular quadrilateral is also know as a square. The symmetry group of a regular polygon with n sides is known as the dihedral group of order n (denoted as Dn ). Any regular polygon can be inscribed into a circle and a circle can be inscribed within it. Given a regular polygon with n sides whose side has lenght t, the radius of the circunscribed circle is t R= 2 sin(180◦ /n) and the radius of the inscribed circle is r = 2t tan(180◦ /n). The area can also be calculated using the formula nt2 . A= 4 tan(180◦ /n) Version: 3 Owner: drini Author(s): drini

527.83

regular polyhedron

A regular polyhedron is a polyhedron such that • Every face is a regular polygon. • On each vertex, the same number of edges concur. • The dihedral angle between any two faces is always the same. 1917

These polyhedra are also know as Platonic solids, since Plato described them on his work. There are only 5 regular polyhedra (the first four were known to Plato) and they are Tetrahedron It has 6 edges and 4 vertices and 4 faces, each one being an equilateral triangle. Its symmetry group is S4 . Hexahedron Also known as cube. It has 8 vertices, 12 edges and 6 faces each one being a square. Its symmetry group is S4 × C2 . Octahedron It has 6 vertices, 12 edges and 8 faces, each one being an equilateral triangle Its symmetry group is S4 × C2 . Dodecahedron It has 20 vertices, 30 edges and 12 faces, each one being a regular pentagon. Its symmetry group is A5 × C2 . Icosahedron It has 12 vertices, 30 edges and 20 faces, each one being an equilateral triangle. Its symmetry group is A5 × C2 . where An is the alternating group of order n, Sn is the symmetric group of order n and Cn is the cyclic group with order n. Version: 6 Owner: drini Author(s): drini

527.84

rhombus

A rhombus is a parallelogram with its 4 sides equal. This is not the same as being a square, since the angles need not to be all equal

In any rhombus, the diagonals are always perpendicular. A nice result following from this, is that joining the midpoints of the sides, always gives a rectangle. If D and d are the diagonal’s lenghts, then the area of rhombus can be computed using the formula Dd A= . 2 Version: 5 Owner: drini Author(s): drini

1918

527.85

right triangle

A triangle ABC is right when one of its angles is equal to 90◦ (and therefore has two perpendicular sides).

Version: 1 Owner: drini Author(s): drini

527.86

sector of a circle

A sector is a fraction of the interior of a circle, described by a central angle θ. If θ = 2π, the sector becomes a complete circle.

If the central angle is θ, and the radius of the circle is r, then the area of the sector is given by 1 Area = r 2 θ 2 This is obvious from the fact that the area of a sector is is πr 2 ). Note that, in the formula, θ is in radians.

θ 2π

times the area of the circle (which

Version: 1 Owner: giri Author(s): giri

527.87

sines law

Sines Law. Let ABC be a triangle where a, b, c are the sides opposite to A, B, C respectively, and let R be the radius of the circumcircle. Then the following relation holds: b c a = = = 2R. sin A sin B sin C

Version: 10 Owner: drini Author(s): drini 1919

527.88

sines law proof

Let ABC a triangle. Let T a point in the circumcircle such that BT is a diameter. So ∠A = ∠CAB is equal to ∠CT B (they subtend the same arc). Since 4CBT is a right triangle, from the definition of sine we get sin ∠CT B =

BC a = . BT 2R

On the other hand ∠CAB = ∠CT B implies their sines are the same and so a 2R

sin ∠CAB = and therefore

a = 2R. sin A

Drawing diameters passing by C and A will let us prove in a similar way the relations b = 2R and sin B and we conclude that

c = 2R sin C

b c a = = = 2R. sin A sin B sin C

Q.E.D. Version: 5 Owner: drini Author(s): drini

527.89

some proofs for triangle theorems

The sum of the three angles is A + B + C = 180◦ . The following triangle shows how the angles can be found to make a half revolution, which equals 180◦ 2.

The area formula AREA = pr where p is the half circumfrence p = a+b+c and r is the radius 2 of the inscribed circle is proved by creating the triangles BAO, BCO, ACO from the original triangle ABC.

1920

Then AREAABC = AREABAO + AREABCO + AREAACO AREAABC =

rc ra rb r(a + b + c) + + = = pr 2 2 2 2

2 Version: 6 Owner: Gunnar Author(s): Gunnar

527.90

square

A square is the regular 4-gon, that is, a quadrilateral whose 4 angles and 4 sides are respectively equal. This implies a square is a parallelogram that is both a rhombus and a rectangle at the same time. Notice, however, that if a quadrilateral has its 4 sides equal, we cannot generally say it is a square, since it could be a rhombus as well. If r is the length of √ a side, the diagonals of a square (which are equal since it’s a rectangle too) have length r 2. Version: 2 Owner: drini Author(s): drini

527.91

tangents law

Let ABC be a triangle with a, b and c being the sides opposite to A, B and C respectively. Then the following relation holds. tan( A−B ) a−b 2 = . A+B a+b tan( 2 ) Version: 2 Owner: giri Author(s): giri

527.92

triangle

Triangle. A plane figure bounded by 3 straight lines.

1921

The sum of its three (inner) angles is always 180◦ . In the figure: A + B + C = 180◦ . Triangles can be classified according to the number of their equal sides. So, a triangle with 3 equal sides is called equilateral, triangles with 2 equal sides are isosceles and finally a triangle with no equal sides is called scalene. Notice that an equilateral triangle is also isosceles, but there are isosceles triangles that are not equilateral.

Triangles can also be classified according to the size of the greatest of its three (inner) angles. If the greatest of them is less than 90◦ (and therefore all three) we say that the triangle is acute. If the triangle has a right angle, we say that it is a right triangle. If the greatest angle of the three is greater than 90◦ , we call the triangle obtuse.

There are several ways to calculate a triangle’s area. Let a, b, c be the sides and A, B, C the interior angles opposite to them. Let ha , hb , hc be the heights drawn upon a, b, c respectively, r the inradius and R the circumradius. Finally, let p = a+b+c be the semiperimeter. Then 2

bhb chc aha = = 2 2 2 ab sin C bc sin A ca sin B = = = 2 2 2 abc = pr = 4R p = p(p − a)(p − b)(p − c)

AREA =

Last formula is know as Heron’s formula.

Version: 18 Owner: drini Author(s): drini

527.93

triangle center

On every triangle there are points where special lines or circles intersect, and those points usually have very interesting geometrical properties. Such points are called triangle centers. Some examples of triangle centers are incenter, orthocenter, centroid, circumcenter, excenters, Feuerbach point, Fermat points, etc. 1922

For an online reference please check the Triangle Centers page. Here’s a drawing I made showing the most important lines and centers of a triangle

(XEukleides source code for the drawing)

Version: 5 Owner: drini Author(s): drini

1923

Chapter 528 51-01 – Instructional exposition (textbooks, tutorial papers, etc.) 528.1

geometry

Geometry, or literally, the measurement of land, is among the oldest and largest areas of mathematics. For this reason, a precise definition of what geometry is is quite difficult. Some approaches are listed below.

528.1.1

As invariants under certain transformations

One approach to geometry first formulated by Felix Klein is to describe it as the study of invariants under certain allowed transformations. This involves taking our space as a set S, and consider a sugroup G of the group Bij(S), the set of bijections of S. Objects are subsets of S, and we consider two objects A, B ⊂ S to be equivalent if there is an f ∈ G such that f (A) = B.

528.1.2

Basic Examples

Euclidean Geometry Euclidean geometry deals with Rn as a vector space along with a metric d. The allowed transformations are bijections f : Rn → Rn that preserve the metric, that is, d(x, y) = d(f (x), f (y)) for all x, y ∈ Rn . Such maps are called isometries, and the group is often denoted by Iso(Rn ). Defining a norm by |x| = d(x, 0), for x ∈ Rn , we obtain a notion of length or distance. This gives an inner product by < x, y >= |x − y|, leading to the definiton 1924

of the angle between two vectors x, y ∈ Rn to be ∠xy = cos−1 ( <x,y> ). It is clear that since |x|·|y| isometries preserve the metric, they preserve distance and angle. As an example, it can be shown that the group Iso(R2 ) consists of translations, reflections, glides, and rotations.

Projective Geometry Projective geometry was motivated by how we see objects in everyday life. For example, parallel train tracks appear to meet at a point far away, even though they are always the same distance apart. In projective geometry, the primary invariant is that of incidence. The notion of parallelism and distance is not present as with Eulcidean geometry. There are different ways of approaching projective geometry. One way is to add points of infinity to Euclidean space. For example, we may form the projective line by adding a point of infinity ∞, called the ideal point, to R. We can then create the projective plane where for each line l ∈ R2 , we attach an ideal point, and two ordinary lines have the same ideal point if and only if they are parallel. The projective plane then consists of the regular plane R2 along with the ideal line, which consists of all ideal points of all ordinary lines. The idea here is to make central projections from a point sending a line to another a bijective map. Another approach is more algebraic, where we form P (V ) where V is a vector space. When V = Rn , we take the quotient of Rn+1 − {0} where v λ · v for v ∈ Rn , λ ∈ R. The allowed transformations is the group PGL(Rn+1 ), which is the general linear group modulo the subgroup scalar matrices.

Spherical Geometry Spherical geometry deals with restricting our attention in Euclidean space to the unit sphere S n . The role of straight lines is taken by great circles. Notions of distance and angles can be easily developed, as well as spherical laws of cosines, the law of sines, and spherical triangles.

528.1.3

Differential Geometry

Differential geometry studies geometrical objects using techniques of calculus. Gauss founded much the area with his paper ”Disquisitiones generales circa superficies curvas”. Objects of study in Differential geometry are curves and surfaces in space. Some properties of curves that are examined include arc length, and curvature, telling us how quickly a curve changes shape. Many notions on hypersurface theory can be generalized to the setting of differentiable manifolds. The motivation is the desire to be able to work without coordinates, as they are often unimportant to the problem at hand. This leads to the study of Riemannian manifolds - manifolds with enough structure to be able to differentiate vectors in a natural way.

1925

528.1.4

Axiomatic Method

This method uses axioms and stuff.

Note This entry is very rough at the moment, and requires work. I mainly wrote it to help motivate other entries and to let others work on this entry, if it is at all feasible. Please feel free to help out, including making suggestions, deleting things, adding things, etc. Version: 5 Owner: rmilson Author(s): yark, matte, dublisk

1926

Chapter 529 51-XX – Geometry 529.1

non-Euclidean geometry

A non-Euclidean geometry is a geometry in which Euclid’s parallel postulate fails, so that there is not a unique line failing to meet a given line through a point not on the line. If there is more than one such ”parallel” the geometry is called hyperbolic; if there is no line which fails to meet the given one, the geometry is called elliptic. Version: 2 Owner: vladm Author(s): vladm

529.2

parallel postulate

The parallel postulate is Euclid’s fifth postulate: equivalent to the idea that there is a unique parallel to any line through a point not on the line. Version: 1 Owner: vladm Author(s): vladm

1927

Chapter 530 51A05 – General theory and projective geometries 530.1

Ceva’s theorem

Let ABC be a given triangle and P any point of the plane. If X is the intersection point of AP with BC, Y the intersection point of BP with CA and Z is the intersection point of CP with AB, then AZ BX CY · · = 1. ZB XC Y A Conversely, if X, Y, Z are points on BC, CA, AB respectively, and if AZ BX CY · · =1 ZB XC Y A then AD, BE, CF are concurrent. Remarks: All the segments are directed line segments (that is AB = −BA), and the intersection points may lie in the prolongation of the segments. Version: 8 Owner: drini Author(s): drini

530.2

Menelaus’ theorem

If the points X, Y and Z are on the sides of a triangle ABC (including their prolongations) and collinear, then the equation AZ BY CX · · = −1 ZB Y C XA 1928

holds (all segments are directed line segments). The converse of this theorem also holds (thus: three points on the prolongations of the triangle’s sides are collinear if the above equation holds).

Version: 3 Owner: mathwizard Author(s): mathwizard

530.3

Pappus’s theorem

Let A, B, C be points on a line (not necessarily in that order) and let D, E, F points on another line (not necessarily in that order). Then the intersection points of AD with F C, DBwith CE and BF with EA are collinear. This is a special case of Pascal’s mystic hexagram.

Version: 2 Owner: drini Author(s): drini

530.4

proof of Ceva’s theorem

As in the theorem, we will consider directed line segments. Let X, Y, Z be points on BC, CA, AB respectively such that AX, BY, CZ are concurrent. Draw a parallel to AB throught the point C. Extend AX until it intersects the parallel at a point A0 . Contruct B 0 in a similar way extending BY . The triangles 4ABX and 4A0 CX are similar, and so are 4ABY and 4CB 0 Y . Then the following equalities hold: BX AB = , XC CA0

CY CB 0 = YA BA

and thus

BX CY AB CB 0 CB 0 · = = 0 . (530.4.1) · XC Y A CA0 BA AC Notice that if directed segments are being used, AB and BA have opposite sign and therefore when cancelled change sign of the expression. That’s why we changed CA0 to A0 C. Now we turn to consider the following similarities: 4AZP ∼ 4A0 CP and 4BZP ∼ 4B 0 CP . From them we get the equalities CP A0 C = , ZP AZ

1929

CP CB 0 = ZP ZB

which lead to

AZ A0 C = . ZB CB 0 Multiplying the last expression with (529.4.1) gives AZ BX CY · · =1 ZB XC Y A and we conclude the proof To prove the reciprocal, suppose that X, Y, Z are points on BC, CA, AB respectively and satisfying AZ BX CY · · = 1. ZB XC Y A Let Q the intersection point of AX with BY , and let Z 0 the intersection of CQ with AB. Since then AX, BY, CZ 0 are concurrent, we have AZ 0 BX CY · · =1 Z 0 B XC Y A and thus

AZ AZ = Z 0B ZB 0 which implies Z = Z , and therefore AX, BY, CZ are concurrent.

Version: 1 Owner: drini Author(s): drini

530.5

proof of Menelaus’ theorem

First we note that there are two different cases: Either the line connecting X, Y and Z intersects two sides of the triangle or none of them. So in the first case that it intersects two of the triangle’s sides we get the following picture:

From this we follow (h1 , h2 and h3 being undircted): AZ h1 = − ZB h2 BY h2 = YC h3 CX h3 = . XA h1 1930

Mulitplying all this we get: AZ BY CX h1 h2 h3 · · =− = −1. ZB Y C XA h2 h3 h1 The second case is that the line connecting X, Y and Z does not intersect any of the triangle’s sides:

In this case we get: AZ h1 = − ZB h2 h2 BY = − YC h3 h3 CX = − . XA h1 So multiplication again yields Menelaus’ theorem. Version: 1 Owner: mathwizard Author(s): mathwizard

530.6

proof of Pappus’s theorem

Pappus’s theorem says that if the six vertices of a hexagon lie alternately on two lines, then the three points of intersection of opposite sides are collinear. In the figure, the given lines are A11 A13 and A31 A33 , but we have omitted the letter A. The appearance of the diagram will depend on the order in which the given points appear on the two lines; two possibilities are shown. Pappus’s theorem is true in the affine plane over any (commutative) field. A tidy proof is available with the aid of homogeneous coordinates. No three of the four points A11 , A21 , A31 , and A13 are collinear, and therefore we can choose homogeneous coordinates such that A11 = (1, 0, 0)

A21 = (0, 1, 0)

A31 = (0, 0, 1)

A13 = (1, 1, 1)

That gives us equations for three of the lines in the figure: A13 A11 : y = z

A13 A21 : z = x 1931

A13 A31 : x = y .

These lines contain A12 , A32 , and A22 respectively, so A12 = (p, 1, 1)

A32 = (1, q, 1)

A22 = (1, 1, r)

for some scalars p, q, r. So, we get equations for six more lines: A31 A32 : y = qx

A11 A22 : z = ry

A12 A21 : x = pz

(530.6.1)

A31 A12 : x = py

A11 A32 : y = qz

A21 A22 : z = rx

(530.6.2)

By hypothesis, the three lines (529.6.1) are concurrent, and therefore prq = 1. But that implies pqr = 1, and therefore the three lines (529.6.2) are concurrent, QED. Version: 3 Owner: mathcam Author(s): Larry Hammick

530.7

proof of Pascal’s mystic hexagram

We can choose homogeneous coordinates (x, y, z) such that the equation of the given nonsingular conic is yz + zx + xy = 0, or equivalently z(x + y) = −xy

(530.7.1)

and the vertices of the given parallelogram A1 A5 A3 A4 A2 A6 are A1 = (x1 , y1 , z1 ) A2 = (x2 , y2 , z2 ) A3 = (x3 , y3 , z3 )

A4 = (1, 0, 0) A5 = (0, 1, 0) A6 = (0, 0, 1)

(see Remarks below). The equations of the six sides, arranged in opposite pairs, are then A1 A5 : x1 z = z1 x A4 A2 : y2 z = z2 y A5 A3 : x3 z = z3 x A2 A6 : y2 x = x2 y A3 A4 : z3 y = y3 z A6 A1 : y1 x = x1 y and the three points of intersection of pairs of opposite sides are A1 A5 · A4 A2 = (x1 z2 , z1 y2 , z1 z2 ) A5 A3 · A2 A6 = (x2 x3 , y2 x3 , x2 z3 ) A3 A4 · A6 A1 = (y3 x1 , y3 y1 , z3 y1 )

To say that these are collinear is to say that x1 z2 D = x2 x3 y3 x1 is zero. We have

the determinant z1 y2 z1 z2 y2 x3 x2 z3 y3 y1 z3 y1

1932

D=

x1 y1 y2 z2 z3 x3 − x1 y1 z2 x2 y3 z3 +z1 x1 x2 y2 y3 z3 − y1 z1 x2 y2 z3 x3 +y1 z1 z2 x2 x3 y3 − z1 x1 y2 z2 x3 y3

Using (529.7.1) we get (x1 + y1 )(x2 + y2 )(x3 + y3 )D = x1 y1 x2 y2 x3 y3 S where (x1 + y1 )(x2 + y2 )(x3 + y3 ) 6= 0 and S = + + =

(x1 + y1 )(y2 x3 − x2 y3 ) (x2 + y2 )(y3 x1 − x3 y1 ) (x3 + y3 )(y1 x2 − x1 y2 ) 0

QED. Remarks:For more on the use of coordinates in a projective plane, see e.g. Hirst (an 11-page PDF). A synthetic proof (without coordinates) of Pascal’s theorem is possible with the aid of cross ratios or the related notion of harmonic sets (of four collinear points). Pascal’s proof is lost; presumably he had only the real affine plane in mind. A proof restricted to that case, based on Menelaus’ theorem, can be seen at cut-the-knot.org. Version: 1 Owner: mathcam Author(s): Larry Hammick

1933

Chapter 531 51A30 – Desarguesian and Pappian geometries 531.1

Desargues’ theorem

If ABC and XY Z are two triangles in perspective (that is, AX, BY and CZ are concurrent or parallel) then the points of intersection of the three pairs of lines (BC, Y Z), (CA, ZX), (AB, XY ) are collinear. Also, if ABC and XY Z are triangles with disctinct vertices and the intersection of BC with Y Z, the intersection of CA with ZX and the intersection of AB with XY are three collinear points, then the triangles are in perspective. (XEukleides source code for the drawing)

Version: 6 Owner: drini Author(s): drini

531.2

proof of Desargues’ theorem

The claim is that if triangles ABC and XY Z are perspective from a point P , then they are perspective from a line (meaning that the three points AB · XY

BC · Y Z

CA · ZX

are collinear) and conversely. Since no three of A, B, C, P are collinear, we can lay down homogeneous coordinates such that A = (1, 0, 0) B = (0, 1, 0) C = (0, 0, 1) P = (1, 1, 1) 1934

By hypothesis, there are scalars p, q, r such that X = (1, p, p)

Y = (q, 1, q)

Z = (r, r, 1)

The equation for a line through (x1 , y1, z1 ) and (x2 , y2 , z2 ) is (y1 z2 − z1 y2 )x + (z1 x2 − x1 z2 )y + (x1 y2 − y1 x2 )z = 0 , giving us equations for six lines: AB BC CA XY YZ ZX

: : : : : :

z=0 x=0 y=0 (pq − p)x + (pq − q)y + (1 − pq)z = 0 (1 − qr)x + (qr − q)y + (qr − r)z = 0 (rp − p)x + (1 − rp)y + (rp − r)z = 0

whence AB · XY = (pq − q, −pq + p, 0) BC · Y Z = (0, qr − r, −qr + q) CA · ZX = (−rp + r, 0, rp − p). As claimed, these three points are collinear, since the determinant pq − q −pq + p 0 0 qr − r −qr + q −rp + r 0 rp − p

is zero. (More precisely, all three points are on the line

p(q − 1)(r − 1)x + (p − 1)q(r − 1)y + (p − 1)(q − 1)rz = 0 .) Since the hypotheses are self-dual, the converse is true also, by the principle of duality. Version: 2 Owner: drini Author(s): Larry Hammick

1935

Chapter 532 51A99 – Miscellaneous 532.1

Pick’s theorem

Let P ⊂ R2 be a polygon with all vertices on lattice points on the grid Z2 . Let I be the number of lattice points that lie inside P , and let O be the number of lattice points that lie on the boundary of P . Then the area of P is 1 A(P ) = I + O − 1. 2

In the above example, we have I = 5 and O = 13, so the area is A = 10 21 ; inspection shows this is true. Version: 1 Owner: ariels Author(s): ariels

532.2

proof of Pick’s theorem

Pick’s theorem: Let P ⊂ R2 be a polygon with all vertices on lattice points on the grid Z2 . Let I be the number of lattice points that lie inside P , and let O be the number of lattice points that lie on the boundary of P . Then the area of P is 1 A(P ) = I + O − 1. 2

1936

To prove, we shall first show that Pick’s theorem has an additive character. Suppose our polygon has more than 3 vertices. Then we can divide the polygon P into 2 polygons P1 and P2 such that their interiors do not meet. Both have fewer vertices than P. We claim that the validity of Pick’s theorem for P is equivalent to the validity of Pick’s theorem for P1 and P2 . Denote the area, number of interior lattice points and number of boundary lattice points for Pk by Ak , Ik and Ok , respectively, for k = 1, 2. Clearly A = A1 + A2 . Also, if we denote the number of lattice points on the edges common to P1 and P2 by L, then I = I1 + I2 + L − 2 and O = O1 + O2 − 2L + 2 Hence

1 1 1 I + O − 1 = I1 + I2 + L − 2 + O1 + O2 − L + 1 − 1 2 2 2 1 1 = I1 + O1 − 1 + I2 + O2 − 1 2 2

This proves the claim. Therefore we can triangulate P and it suffices to prove Pick’s theorem for triangles. Moreover by further triangulations we may assume that there are no lattice points on the boundary of the triangle other than the vertices. To prove pick’s theorem for such triangles, embed them into rectangles.

Again by additivity, it suffices to prove Pick’s theorem for rectangles and rectangular triangles which have no lattice points on the hypotenuse and whose other two sides are parallel to the coordinate axes. If these two sides have lengths a and b, respectively, we have 1 A = ab 2 and O = a + b + 1. Furthermore, by thinking of the triangle as half of a rectangle, we get 1 I = (a − 1)(b − 1). 2 (Note that here it is essential that no lattice points are on the hypotenuse) From these equations for A, I and O, Pick’s theorem is satisfied for these triangles. 1937

Finally for a rectangle, whose sides have lengths a and b, we find that A = ab I = (a − 1)(b − 1) and O = 2a + 2b. From these Pick’s theorem follows for rectangles too. This completes our proof. Version: 2 Owner: giri Author(s): giri

1938

Chapter 533 51F99 – Miscellaneous 533.1

Weizenbock’s Inequality

In a triangle with sides a, b, c, and with area A, the following inequality holds: √ a2 + b2 + c2 > 4A 3 The proof goes like this : if s = a+b+c is the semiperimeter of the triangle , then from 2 Heron’s formula we have : p A = s(s − a)(s − b)(s − c)

But by squaring the latter and expanding the parantheses we obtain : 16A2 = 2(a2 b2 + a2 c2 + b2 c2 ) − (a4 + b4 + c4 ) Thus , we only have to prove that :

(a2 + b2 + c2 )2 > 3[2(a2 b2 + a2 c2 + b2 c2 ) − (a4 + b4 + c4 )] or equivalently : 4(a4 + b4 + c4 ) > 4(a2 b2 + a2 c2 + b2 c2 ) which is trivially equivalent to : (a2 − b2 )2 + (a2 − c2 )2 + (b2 − c2 )2 > 0 Equality is achieved if and only if a = b = c (i.e. when the triangle is equilateral) . See also the Hadwiger-Finsler inequality. Version: 2 Owner: mathcam Author(s): mathcam, slash

1939

Chapter 534 51M04 – Elementary problems in Euclidean geometries 534.1

Napoleon’s theorem

Theorem:If equilateral triangles are erected externally on the three sides of any given triangle, then their centres are the vertices of an equilateral triangle. If we embed the statement in the complex plane, the proof is a mere calculation. In the notation of the figure, we can assume that A = 0, B = 1, and C is in the upper half plane. The hypotheses are C−1 0−C 1−0 = = =α (534.1.1) Z −0 X −1 Y −C where α = exp πi/3, and the conclusion we want is N −L =α M −L where

C +Y +0 0+1+Z 1+X +C M= N= . 3 3 3 From (533.1.1) and the relation α2 = α − 1, we get X, Y, Z: L=

X=

C −1 + 1 = (1 − α)C + α α C + C = αC α Z = 1/α = 1 − α

Y =−

1940

(534.1.2)

and so 3(M − L) = = 3(N − L) = = = = =

Y −1−X (2α − 1)C − 1 − α Z −X −C (α − 2)C + 1 − 2α (2α − 2 − α)C − α + 1 − α (2α2 − α)C − α − α2 3(M − L)α

proving (533.1.2). Remarks:The attribution to Napol´eon Bonaparte (1769-1821) is traditional, but dubious. For more on the Story, see MathPages. Version: 2 Owner: drini Author(s): Larry Hammick

534.2

corollary of Morley’s theorem

We describe here, informally, a limiting case of Morley’s theorem. One of the vertices of the triangle ABC, namely C, has been pushed off to infinity. Instead of two segments BC and CA, plus two trisectors between them, we now have four parallel and equally spaced lines. The triangle P QR is still equilateral, and the three triangles adjacent to it are still isosceles, but one of those has become equilateral. We have AQ · BR = AR · BP . Version: 5 Owner: drini Author(s): drini, Larry Hammick

534.3

pivot theorem

If ABC is a triangle, D, E, F points on the sides BC, CA, AB respectively, then the circumcircles of the triangles AEF, BF D and CDE have a common point. Version: 4 Owner: drini Author(s): drini

534.4

proof of Morley’s theorem

1941

The scheme of this proof, due to A. Letac, is to use the sines law to get formulas for the segments AR, AQ, BP , BR, CQ, and CP , and then to apply the cosines law to the triangles ARQ, BP R, and CQP , getting RQ, P R, and QP . To simplify some formulas, let us denote the angle π/3, or 60 degrees, by σ. Denote the angles at A, B, and C by 3a, 3b, and 3c respectively, and let R be the circumradius of ABC. We have BC = 2R sin(3a). Applying the sines law to the triangle BP C, BP/ sin(c) = BC/ sin(π − b − c) = 2R sin(3a)/ sin(b + c) = 2R sin(3a)/ sin(σ − a)

(534.4.1) (534.4.2)

so BP = 2R sin(3a) sin(c)/ sin(σ − a) .

Combining that with the identity

sin(3a) = 4 sin(a) sin(σ + a) sin(σ − a) we get BP = 8R sin(a) sin(c) sin(σ + a) . Similarly, BR = 8R sin(c) sin(a) sin(σ + c) . Using the cosines law now, P R2 = BP 2 + BR2 − 2BP · BR cos(b) = 64 sin2 (a) sin2 (c)[sin2 (σ + a) + sin2 (σ + c) − 2 sin(σ + a) sin(σ + c) cos(b)] .

But we have

(σ + a) + (σ + c) + b = π . whence the cosines law can be applied to those three angles, getting sin2 (b) = sin2 (σ + a) + sin2 (σ + c) − 2 sin(σ + a) sin(σ + c) cos(b) whence P R = 8R sin(a) sin(b) sin(c) . Since this expression is symmetric in a, b, and c, we deduce P R = RQ = QP as claimed. Remarks:It is not hard to show that the triangles RY P , P ZQ, and QXR are isoscoles. By the sines law we have AR BR = sin b sin a

BP CP = sin c sin b 1942

CQ AQ = sin a sin c

whence AR · BP · CQ = AQ · BR · CP . This implies that if we identify the various vertices with complex numbers, then √ −1 + i 3 (P − C)(Q − A)(R − B) = (P − B)(Q − C)(R − A) 2 provided that the triangle ABC has positive orientation, i.e.   C −A >0. Re B−A I found Letac’s proof at cut-the-knot.org, with the reference Sphinx, 9 (1939) 46. Several shorter and prettier proofs of Morley’s theorem can also be seen at cut-the-knot. Version: 3 Owner: mathcam Author(s): Larry Hammick

534.5

proof of pivot theorem

Let 4ABC be a triangle, and let D, E, and F be points on BC, CA, and AB, respectively. The circumcircles of 4AEF and 4BF D intersect in F and in another point, which we call P . Then AEP F and BF P D are cyclic quadrilaterals, so ∠A + ∠EP F = π and ∠B + ∠F P D = π Combining this with ∠A + ∠B + ∠C = π and ∠EP F + ∠F P D + ∠DP E = 2π, we get ∠C + ∠DP E = π. This implies that CDP E is a cyclic quadrilateral as well, so that P lies on the circumcircle of 4CDE. Therefore, the circumcircles of the triangles AEF , BF D, and CDE have a common point, P .

Version: 1 Owner: pbruin Author(s): pbruin

1943

Chapter 535 51M05 – Euclidean geometries (general) and generalizations 535.1

area of the n-sphere

The area of S n the unit n-sphere (or hypersphere) is the same as the total solid angle it subtends at the origin. To calculate it, considere the following integral Pn+1

2

I(n) = intRn+1 e− i=1 xi dn x. P 2 Switching to polar coordinates we let r 2 = n+1 i=1 xi and the integral becomes 2

n −r dr. I(n) = intS n dΩint∞ 0 r e

The first integral is the integral over all solid angles and is exactly what we want to evaluate. Let us denote it by A(n). The second integral can be evaluated with the change of variable t = r2 :   1 1 ∞ n−1 −t n+1 . I(n)/A(n) = int0 t 2 e dt = Γ 2 2 2 We can also evaluate I(n) directly in Cartesian coordinates: h in+1 n+1 ∞ −x2 I(n) = int−∞ e dx =π 2 , 2

−x where we have used the standard Gaussian integral int∞ dx = −∞ e



π.

Finally, we can solve for the area n+1

2π 2 . A(n) = Γ n+1 2

If the radius of the sphere is R and not 1, the correct area is A(n)Rn . Note that this formula works only for n ≥ 1. The first few special cases are 1944

n=1 Γ(1) = 1, hence A(1) = 2π (this is the familiar result for the circumference of the unit circle); √ n=2 Γ(3/2) = π/2, hence A(2) = 4π (this is the familiar result for the area of the unit sphere); n=3 Γ(2) = 1, hence A(3) = 2π 2 ; √ n=4 Γ(5/2) = 3 π/4, hence A(4) = 8π 2 /3. Version: 4 Owner: igor Author(s): igor

535.2

geometry of the sphere

Every loop on the sphere S 2 is contractible to a point, so its fundamental group, π1 (S 2 ), is trivial. Let Hn (S 2 , Z) denote the n-th homology group of S 2 . S 2 is a compact orientable smooth manifold so H2 (S 2 , Z) = Z. S 2 is connected, so H2 (S 2 , Z) = Z. H1 (S 2 , Z) is the abelianization of π1 (S 2 ), so is again trivial. S 2 is two-dimensional, so for k > 2, we have Hk (S 2 , Z) = 0. This generalizes nicely: ( Z k = 0, n Hk (S n ; Z) = 0 else This also provides the proof that the hyperspheres S n and S m are non-homotopic for n 6= m, for this would imply an isomorphism between their homologies. Version: 6 Owner: mathcam Author(s): mathcam

535.3

sphere

A sphere is defined as the locus of the points in three dimensions that are equidistant from a particular point (the center). The equation for a sphere centered at the origin is x2 + y 2 + z 2 = r 2 Where r is the length of the radius. The formula for the volume of a sphere is 1945

4 V = πr 3 3 The formula for the surface area of a sphere is A = 4πr 2 A sphere can be generalized to n dimensions. For n > 3, a generalized sphere is called a hypersphere (when no value of n is given, one can generally assume that “hypersphere” means n = 4). The formula for an n-dimensional sphere is x21 + x22 + · · · + x2n = r 2 where r is the length of the radius. Note that when n = 2, the formula reduces to the formula for a circle, so a circle is a 2-dimensional “sphere”. A one dimensional (filled-in) sphere is a line! The volume of an n-dimensional sphere is n

π 2 rn V (n) = n Γ( 2 + 1) where Γ(n) is the gamma function. Curiously, as n approaches infinity, the volume of the n-d sphere approaches zero! Contrast this to the volume of an n-d box, which always has a volume in proportion to sn (with s the length of the longest dimension of the box), which clearly increases without bound. V (n) has a max at about n = 5. In topology and other contexts, spheres are treated slightly differently. Let the n-sphere be the set S n = {x ∈ Rn+1 : ||x|| = 1} where || · || can be any norm, usually the Euclidean norm. Notice that S n is defined here as a subset of Rn+1 . Thus S 0 is two points on the real line; S 1 is the unit circle, and S 2 is the unit sphere in the everyday sense of the word. It might seem like a strange naming convention to say, for instance, that the 2-sphere is in three-dimensional space. The explanation is that 2 refers to the sphere’s “intrinsic” dimension as a manifold, not the dimension to whatever space in which it happens to be immersed. Sometimes this definition is generalized even more. In topology we usually fail to distinguish homeomorphic spaces, so all homeomorphic images of S n into any topological space are also 1946

called S n . It is usually clear from context whether S n denotes the specific unit sphere in Rn+1 or some arbitrary homeomorphic image. Version: 10 Owner: akrowne Author(s): akrowne, NeuRet

535.4

spherical coordinates

Spherical coordinates are a system of coordinates on the sphere in R3 , or more generally, Rn . In R3 , they are given by     x r sin φ cos θ  y  =  r sin φ sin θ  , z r cos φ

where r is the radius of the sphere, θ is the azimuthal angle defined for θ ∈ [0, 2π), and φ ∈ [0, π] is the polar angle. Note that φ = 0 corresponds to the top of the sphere and φ = π corresponds to the bottom of the sphere. There is a clash between the mathematicians’ and the physicists’ definition of spherical coordinates, interchanging both the direction of φ and the choice of names for the two angles (physicists often use θ as the azimuthal angle and φ as the polar one). Spherical coordinates are a generalization of polar coordinates, and can be further generalized to the n-sphere (or n-hypersphere) with n − 2 polar angles φi and one azimuthal angle θ:     r cos φ1 x1  r sin φ1 cos φ2  x2     .   .  ..  .      .   Q      k−1 . r sin φ cos φ  xk  =  i k i=1   .    ..  ..       .   xn−1    r sin φ1 sin φ2 · · · cos θ xn r sin φ1 sin φ2 · · · sin φn−2 sin θ. Version: 3 Owner: mathcam Author(s): mathcam

535.5

volume of the n-sphere

The volume contained inside S n , the n-sphere (or hypersphere), is given by the integral V (n) = intPn+1 dn+1 x. 2 i=1 xi ≤1 P 2 Going to polar coordinates (r 2 = n+1 i=1 xi ) this becomes V (n) = intS n dΩint10 r n dr. 1947

The first integral is the integral over all solid angles subtended by the sphere and is equal to its n+1

area A(n) =

2π 2 . Γ( n+1 2 )

The second integral is elementary and evaluates to int10 r n dr = 1/(n+1).

Finally, the volume is

n+1 2

π V (n) = n+1 Γ 2

n+1 2

=

π Γ

n+1 2

n+3 2

.

If the sphere has radius R instead of 1, then the correct volume is V (n)Rn+1 . Note that this formula works only for n ≥ 1. The first few cases are n = 1 Γ(2) = 1, hence V (1) = π (this is the familiar result for the area of the unit circle); √ n = 2 Γ(5/2) = 3 π/4, hence V (2) = 4π/3 (this is the familiar result for the volume of the unit sphere); n = 3 Γ(3) = 2, hence V (3) = π 2 /2. Version: 4 Owner: igor Author(s): igor

1948

Chapter 536 51M10 – Hyperbolic and elliptic geometries (general) and generalizations 536.1

Lobachevsky’s formula

Let AB be a line. Let M, T be two points so that M not lies on AB, T lies on AB, and MT perpendicular to AB. Let MD be any other line who meets AT in D.In a hyperbolic geometry, as D moves off to infinity along AT the line MD meets the line MS \ is called the angle of parallelism for which is said to be parallel to AT . The angle SMT perpendicular distance d, and is given by P (d) = 2 tan−1 (e−d ), witch is called Lobachevsky’s formula.

Version: 2 Owner: vladm Author(s): vladm

1949

Chapter 537 51M16 – Inequalities and extremum problems 537.1

Brunn-Minkowski inequality

Let A and B be non-empty compact subsets of Rd . Then vol(A + B)1/d > vol(A)1/d + vol(B)1/d , where A + B denotes the Minkowski sum of A and B, and vol(S) denotes the volume of S.

REFERENCES 1. Jiˇr´ı Matouˇsek. Zbl 0999.52006.

Lectures on Discrete Geometry, volume 212 of GTM.

Version: 2 Owner: bbukh Author(s): bbukh

537.2

Hadwiger-Finsler inequality

In a triangle with sides a, b, c and an area A the following inequality holds: √ a2 + b2 + c2 > (a − b)2 + (b − c)2 + (c − a)2 + 4A 3. Version: 2 Owner: mathwizard Author(s): mathwizard

1950

Springer, 2002.

537.3

isoperimetric inequality

The classical isoperimetric inequality says that if a planar figure has perimeter P and area A, then 4πA 6 P 2 , where the equality holds if and only if the figure is a circle. That is, the circle is the figure that encloses the largest area among all figures of same perimeter. The analogous statement is true in arbitrary dimension. The d-dimensional ball has the largest volume among all figures of equal surface area. The isoperimetric inequality can alternatively be stated using the -neighborhoods. An neighborhood of a set S, denoted here by S , is the set of all points whose distance to S is at most . The isoperimetric inequality in terms of -neighborhoods states that vol(S ) > vol(B ) where B is the ball of the same volume as S. The classical isoperimetric inequality can be recovered by taking the limit  → 0. The advantage of this formulation is that it does not depend on the notion of surface area, and so can be generalized to arbitrary measure spaces with a metric. An example when this general formulation proves useful is the Talagrand’s isoperimetric theory dealing with Hamming-like distances in product spaces. The theory has proven to be very useful in many applications of probability to combinatorics.

REFERENCES 1. Noga Alon and Joel H. Spencer. The probabilistic method. John Wiley & Sons, Inc., second edition, 2000. Zbl 0996.05001. 2. Jiˇr´ı Matouˇsek. Lectures on Discrete Geometry, volume 212 of GTM. Springer, 2002. Zbl 0999.52006.

Version: 7 Owner: bbukh Author(s): bbukh

537.4

proof of Hadwiger-Finsler inequality

From the cosines law we get: a2 = b2 + c2 − 2bc cos α, α being the angle between b and c. This can be transformed into: a2 = (b − c)2 + 2bc(1 − cos α).

1951

Since A = 12 bc sin α we have: a2 = (b − c)2 + 4A

1 − cos α . sin α

Now remember that 1 − cos α = 2 sin2 and sin α = 2 sin

α 2

α α cos . 2 2

Using this we get:

α a2 = (b − c)2 + 4A tan . 2 Doing this for all sides of the triangle and adding up we get:   α β γ 2 2 2 2 2 2 a + b + c = (a − b) + (b − c) + (c − a) + 4A tan + tan + tan . 2 2 2 β and γ being the other angles of the triangle. Now since the halves of the triangle’s angles are less than π2 the function tan is convex we have: tan

α β γ α+β+γ π √ + tan + tan > 3 tan = 3 tan = 3. 2 2 2 6 6

Using this we get: √ a2 + b2 + c2 > (a − b)2 + (b − c)2 + (c − a)2 + 4A 3. This is the Hadwiger-Finsler inequality. 2 Version: 2 Owner: mathwizard Author(s): mathwizard

1952

Chapter 538 51M20 – Polyhedra and polytopes; regular figures, division of spaces 538.1

polyhedron

In geometry, a polyhedron is a solid bounded by a finite number of plane faces, each of which is a polygon. The most special kind are the regular polyhedra, but there are others like prisms, pyramids, parallelepipeds, that are also worthy. Notice that a cone, and a cylinder are not polyhedrons since they have “faces” that are not polygons. In the combinatorial definition, a polyhedron is not necessarily a bounded structure: It is defined as the solution set of a finite system of linear inequalities. Version: 3 Owner: mathcam Author(s): mathcam, drini

1953

Chapter 539 51M99 – Miscellaneous 539.1

Euler line proof

Let O the circumncenter of 4ABC and G its centroid. Extend OG until a point P such that OG/GP = 1/2. We’ll prove that P is the orthocenter H. Draw the median AA0 where A0 is the midpoint of BC. Triangles OGA0 and P AO are similar, since GP = 2OG, AG = 2GA0 and ∠OGA0 = ∠P GA. Then ∠OA0 G = ∠GAP and OA0 k AP . But OA0 ⊥ BC so AP ⊥ BC, that is, AP is a height of the triangle. Repeating the same argument for the other medians prove that H lies on the three heights and therefore it must be the orthocenter. The ratio OG/GH = 1/2 holds since we constructed it that way. Q.E.D. Version: 3 Owner: drini Author(s): drini

539.2

SSA

SSA is a method for determining whether two triangles are congruent by comparing two sides and a non-inclusive angle. However, unlike SAS, SSS, ASA, and SAA, this does not prove congruence in all cases. Suppose we have two triangles, 4ABC and 4P QR. 4ABC ∼ =?4P QR if AB ∼ = P Q, ∼ ∼ ∼ BC = QR, and either ∠BAC = ∠QP R or ∠BCA = ∠QRP . 1954

Since this method does not prove congruence, it is more useful for disproving it. If the SSA method is attempted between 4ABC and 4P QR and fails for every ABC,BCA, and CBA against every P QR,QRP , and RP Q, then 4ABC ∼ 6= 4P QR. Suppose 4ABC and 4P QR meet the SSA test. The specific case where SSA fails, known as the ambiguous case, occurs if the congruent angles, ∠BAC and ∠QP R, are acute. Let us illustrate this. Suppose we have a right triangle, 4XY Z, with right angle ∠XZY . Let P and Q be two ←→ points on XZ equidistant from Z such that P is between X and Z and Q is not. Since ←→ ∠XZY is right, this makes ∠P ZY right, and P ,Q are equidistant from Z, thus Y Z bisects P and Q, and as such, every point on that line is equidistant from P and Q. From this, we know Y is equidistant from P and Q, thus Y P ∼ = Y Q. Further, ∠Y XP is in fact the same angle as ∠Y XQ, thus ∠Y XP ∼ = XY , 4XY P and 4XY Q clearly = ∠Y XQ. Since XY ∼ meet the SSA test, and yet, just as clearly, are not congruent. This results from ∠Y XZ being acute. This example also reveals the exception to the ambiguous case, namely 4XY Z. ←→ If R is a point on XZ such that Y R ∼ = Y Z, then R ∼ = Z. Proving this exception amounts to determining that ∠XZY is right, in which case the congruency could be proven instead with SAA. However, if the congruent angles are not acute, i.e., they are either right or obtuse, then SSA is definitive. Version: 3 Owner: mathcam Author(s): mathcam, greg

539.3

cevian

A cevian of a triangle, is any line joining a vertex with a point of the opposite side. AD is a cevian of 4ABC. Version: 2 Owner: drini Author(s): drini

539.4

congruence

Two geometrical constructs are congruent if there is a sequence or rigid transformations mapping each one into the other. In the usual Euclidean space, rigid transformations are translations, rotations, reflections (and of course, compositions of them) In a less formal sense, saying two constructs are congruent, amounts to say that the two 1955

constructs are essentially “the same” under the geometry that is being used. In the particular case of triangles on the plane, there are some criteria that tell if two given triangles are congruent: • SSS. If two triangles have their corresponding sides equal, they are congruent. • SAS. If two triangles have two corresponding sides equal as well as the angle between them, the triangles are congruent. • ASA If two triangles have 2 pairs of corresponding angles equal, as well as the side between them, the triangles are congruent. Version: 2 Owner: drini Author(s): drini

539.5

incenter

The incenter of a geometrical shape is the center of the incircle (if it has any). On a triangle the incenter always exists and it’s the intersection point of the three internal angle bisectors. So in the next picture, AX, BY, CZ are angle bisectors, and AB, BC, CA are tangent to the circle. Version: 3 Owner: drini Author(s): drini

539.6

incircle

The incircle or inscribed circle of a triangle is a circle interior to the triangle and tangent to its three sides. Moreover, the incircle of a polygon is an interior circle tangent to all of the polygon’s sides. Not every polygon has an inscribed circle, but triangles always do. The center of the incircle is called the incenter, and it’s located at the point where the three angle bisectors intersect. Version: 3 Owner: drini Author(s): drini

1956

539.7

symmedian

On any triangle, the three lines obtained by reflecting the medians in the (internal) angle bisectors are called the symmedians of the triangle. In the picture, BX is angle bisector and BM a median. The reflection of BM on BX is BN, a symmedian. It can be stated as symmedians are isogonal conjugates of medians. Version: 2 Owner: drini Author(s): drini

1957

Chapter 540 51N05 – Descriptive geometry 540.1

curve

Summary The term “curve” is associated with two closely related notions. The first notion is kinematic: a (parameterized) curve is a function of one real variable taking values in some ambient geometric setting. This variable is commonly interpreted as time, and the function can be considered as the trajectory of a moving particle. The second notion is geometric; in this sense a curve, also called an arc, is a 1-dimensional subset of an ambient space. The two notions are related: the image of a parameterized curve describes an arc. Conversely, a given arc admits multiple parameterizations. Kinematic definition Let I ⊂ R be an interval of the real line. A (parameterized) curve, a.k.a. a trajectory, a.k.a. a path, is a continuous mapping γ:I→X taking values in a topological space X. We say that γ is a simple curve if it has no self-intersections, that is if the mapping γ is injective. We say that γ is a closed curve, a.k.a. a loop whenever I = [a, b] is a closed interval, and the endpoints are mapped to the same value: γ(a) = γ(b). Equivalently, a loop may be defined to be a continuous mapping whose domain is the unit circle S1 . A simple closed curve is often called a Jordan curve. In many instances the ambient space X is a differential manifold, in which case we can speak of differentiable curves. Let γ : I → X be a differentiable curve. For every t ∈ I we can speak of the derivative, equivalently the velocity, γ(t) ˙ of the curve at time t. The velocity is 1958

a tangent vector γ(t) ˙ ∈ Tγ(t) X, taking value in the tangent spaces of the manifold X. A differentiable curve γ(t) is called regular, if it’s velocity γ(t) ˙ does not vanish for all t ∈ I. It is also quite common to consider curves whose codomain is just Rn . In this case, a parameterized curve is regarded as a vector-valued function, that is an n-tuple of functions   γ1 (t)   γi : I → R, i = 1, . . . , n. γ(t) =  ...  , γn (t) Geometric definition. A (non-singular) curve C, a.k.a. an arc, is a connected, 1dimensional submanifold of a differential manifold X. This means that for every point p ∈ C there exists an open neighbourhood U ⊂ X of p and a chart α : U → Rn such that \ α(C U) = {(t, 0, . . . , 0) ∈ Rn : − < t < } for some real  > 0.

An alternative, but equivalent definition, describes an arc as the image of a regular parameterized curve. To accomplish this, we need to define the notion of reparameterization. Let I1 , I2 ⊂ R be intervals. A reparameterization is a continuously differentiable function s : I1 → I2 whose derivative is never vanishing. Thus, s is either monotone increasing, or monotone decreasing. Two regular, parameterized curves γi : Ii → X,

i = 1, 2

are said to be related by a reparameterization if there exists a reparameterization s : I1 → I2 such that γ1 = γ2 ◦ s. The inverse of a reparameterization function is also a reparameterization. Likewise, the composition of two parameterizations is again a reparameterization. Thus the reparameterization relation between curves, is in fact an equivalence relation. An arc can now be defined as an equivalence class of regular, simple curves related by reparameterizations. In order to exclude pathological embeddings with wild endpoints we also impose the condition that the arc, as a subset of X, be homeomorphic to an open interval. Version: 12 Owner: rmilson Author(s): vypertd, rmilson

1959

540.2

piecewise smooth

A curve α : [a, b] → Rk is said to be piecewise smooth if each component α1 , ..., αk of α has a bounded derivative αi0 (i = 1, ..., k) which is continuous everywhere in [a, b] except (possibly) at a finite number of points at which left- and right-handed derivatives exist.

Notes (i) Every piecewise smooth curve is rectifiable. (ii) Every rectifiable curve can be approximated by piecewise smooth curves. Version: 1 Owner: vypertd Author(s): vypertd

540.3

rectifiable

Let α : [a, b] → Rk be a curve in Rk and P = {a0 , ..., an } is a partition of the interval [a, b], then the points in the set {α(a0 ), α(a1), ..., α(an )} are called the vertices of the inscribed polygon Π(P ) determined by P . A curve is rectifiable if there exists a positive number M such that the length of the inscribed polygon Π(P ) is less than M for all possible partitions P of [a, b], where [a, b] is Pnthe interval the curve is defined on. The length of the inscribed polygon is defined as t=1 |α(at ) − α(at−1 )|. If α is rectifiable then the length of α is defined as the least upper bound of the lengths of inscribed polygons taken over all possible partitions. Version: 5 Owner: vypertd Author(s): vypertd

1960

Chapter 541 51N20 – Euclidean analytic geometry 541.1

Steiner’s theorem

Let ABC be a triangle and M, N ∈ (BC) be two points such that m(∠BAM ) = m(∠NAC) . Then the cevians AM and AN are called isogonic cevians and the following relation holds : BM BN AB 2 · = MC NC AC 2 Version: 2 Owner: slash Author(s): slash

541.2

Van Aubel theorem

Let ABC be a triangle and AD, BE, CF concurrent cevians at P . Then CP CD CE = + PF DB EA Version: 4 Owner: drini Author(s): drini

541.3

conic section

Definitions In Euclidean 3-space, a conic section, or simply a conic, is the intersection of a plane with a right circular double cone. 1961

But a conic can also defined, in several equivalent ways, without using an enveloping 3-space. In the Euclidean plane, let d be a line and F a point not on d. Let  be a positive real number. For an arbitrary point P , write |P d| for the perpendicular (or shortest) distance from P to the line d. The set of all points P such that |P F | = |P d| is a conic with eccentricity , focus F , and directrix d. An ellipse, parabola, or hyperbola has eccentricity < 1, = 1, or > 1 respectively. For a parabola, the focus and directrix are unique. Any ellipse other than a circle, or any hyperbola, may be defined by either of two focus-directrix pairs; the eccentricity is the same for both. The definition in terms of a focus and a directrix leaves out the case of a circle; still, the circle can be thought of as a limiting case: eccentricity zero, directrix at infinity, and two coincident foci. The chord through the given focus, parallel to the directrix, is called the latus rectum; its length is traditionally denoted by 2l. Given a conic σ which is the intersection of a circular cone C with a plane π, and given a focus F of σ, there is a unique sphere tangent to π at F and tangent also to C at all points of a circle. That sphere is called the Dandelin sphere for F . (Consider a spherical ball resting on a table. Suppose that a point source of light, at some point above the table and outside the ball, shines on the ball. The margin of the shadow of the ball is a conic, the ball is one of the Dandelin spheres of that conic, and the ball meets the table at the focus corresponding to that sphere.) Degenerate conics; coordinates in 2 or 3 dimensions The intersection of a plane with a cone may consist of a single point, or a line, or a pair of lines. Whether we should regard these sets as conics is a matter of convention, but in general they are not so regarded. In the Euclidean plane with the usual Cartesian coordinates, a conic is the set of solutions of an equation of the form P (x, y) = 0 where P is a polynomial of the second degree over R. For a degenerate conic, P has discriminant zero. In three dimensions, if a conic is defined as the intersection of the cone z 2 = x2 + y 2 with a plane αx + βy + γz = δ then, assuming γ 6= 0, we can eliminate z to get a polynomial for the curve in terms of x and y only; a linear change of variables will then give Cartesian coordinates, within the plane, for the given conic. If γ = 0 we can eliminate x or y instead, with the same result. 1962

Conics in a projective plane Conics sections are definable in a projective plane, even though, in such a plane, there is no notion of angle nor any notion of distance. In fact there are several equivalent ways to define it. The following elegant definition was discovered by von Staudt: A conic is the set of self-conjugate points of a hyperbolic polarity. In a little more detail, the polarity is a pair of bijections f :P →L g:L→P

where P is the set of points of the plane, L is the set of lines, f maps collinear points to concurrent lines, and g maps concurrent lines to collinear points. The set of fixed points of g ◦ f is a conic, and f (x) is the tangent to the given conic at the given point x. A projective conic has no focus, directrix, or eccentricity, for in a projective plane there is no notion of distance (nor angle). Indeed all projective conics are alike; there is no distinction between a parabola and a hyperbola, for example. Version: 5 Owner: drini Author(s): Larry Hammick, quincynoodles

541.4

proof of Steiner’s theorem

Using α, β, γ, δ to denote angles as in the diagram at left, the sines law yields AB sin(γ) NB sin(α + δ) MC sin(α + δ) MB sin(α) NC sin(α)

= = = = =

AC sin(β) NA sin(β) MA sin(γ) MA sin(β) NA sin(γ)

Dividing (2) and (3), and (4) by (5): MA NB sin(γ) NA MB = = NA MC sin(β) MA NC and therefore

by (1).

NB · MB sin2 (γ) AB 2 = = MC · NC AC 2 sin2 (β)

Version: 2 Owner: mathcam Author(s): Larry Hammick 1963

(541.4.1) (541.4.2) (541.4.3) (541.4.4) (541.4.5)

541.5

proof of Van Aubel theorem

We want to prove

CD CE CP = + PF DB EA

On the picture, let us call φ to the angle ∠ABE and ψ to the angle ∠EBC. A generalization of bisector’s theorem states

and

CB sin ψ CE = EA AB sin φ

on 4ABC

CB sin ψ CP = PF F B sin φ

on 4F BC.

From the two equalities we can get CE · AB CP · F B = EA PF and thus

CE · AB CP = . PF EA · F B Since AB = AF + F B, substituting leads to CE · AB CE(AF + F B) = EA · F B EA · F B CE · F B CE · AF + = EA · F B EA · F B CE CE · AF + = EA · F B EA But Ceva’s theorem states

CE AF BD · · =1 EA F B DC

and so

CE · AF CD = EA · F B DB Subsituting the last equality gives the desired result. Version: 3 Owner: drini Author(s): drini

1964

541.6

proof of Van Aubel’s Theorem

As in the figure, let us denote by u, v, w, x, y, z the areas of the six component triangles. Given any two triangles of the same height, their areas are in the same proportion as their bases (Euclid VI.1). Therefore u+v y+z = x w

w+x y+z = v u

u+v w+x = z y

and the conclusion we want is y+z+u z+u+v y+z + = . v+w+x w+x+y x Clearing the denominators, the hypotheses are w(y + z) = x(u + v) y(u + v) = z(w + x) u(w + x) = v(y + z)

(541.6.1) (541.6.2) (541.6.3)

vxz = uwy

(541.6.4)

which imply and the conclusion says that x(wy + wz + uw + xy + xz + ux + y 2 + yz + uy +vz + uv + v 2 + wz + uw + vw + xz + ux + vx) equals (y + z)(vw + vx + vy + w 2 + wx + wy + wx + x2 + xy) or equivalently (after cancelling the underlined terms) x(uw + xz + ux + uy + vz + uv + v 2 + wz + uw + vw + ux + vx) equals (y + z)(vw + vx + vy + w 2 + wx + wy) = (y + z)(v + w)(w + x + y) . i.e. x(u + v)(v + w + x) + x(xz + ux + uy + vz + wz + uw) = (y + z)w(v + w + x) + (y + z)(vx + vy + wy) i.e. by (1) x(xz + ux + uy + vz + wz + uw) = (y + z)(vx + vy + wy) i.e. by (3) x(xz + uy + vz + wz) = (y + z)(vy + wy) . 1965

Using (4), we are down to x2 z + xuy + uwy + xwz = (y + z)y(v + w) i.e. by (3) x2 z + vy(y + z) + xwz = (y + z)y(v + w) i.e. xz(x + w) = (y + z)yw . But in view of (2), this is the same as (4), and the proof is complete. Remarks: Ceva’s theorem is an easy consequence of (4). A better proof than the above, but still independant of Ceva, is in preparation. Version: 3 Owner: mathcam Author(s): mathcam, Larry Hammick

541.7

three theorems on parabolas

In the Cartesian plane, pick a point with coordinates (0, 2f ) (subtle hint!) and construct (1) the set S of segments s joining F = (0, 2f ) with the points (x, 0), and (2) the set B of right-bisectors b of the segments s ∈ S. Theorem 21. The envelope described by the lines of the set B is a parabola with x-axis as directrix and focal length |f |. W e’re lucky in that we don’t need a fancy definition of envelope; considering a line to be S a set of points it’s just the boundary of the set C = b∈B b. Strategy: fix an x coordinate and find the max/minimum of possible y’s in C with that x. But first we’ll pick an s from S by picking a point p = (w, 0) on the x axis. The midpoint of the segment s ∈ S through p is M = ( w2 , f ). Also, the slope of this s is − 2f . The corresponding right-bisector will also w w w pass through ( 2 , f ) and will have slope 2f . Its equation is therefore

w 2y − 2f = . 2x − w 2f Equivalently,

wx w 2 − . 2f 4f By any of many very famous theorems (Euclid book II theorem twenty-something, CauchySchwarz-Bunyakovski (overkill), differential calculus, what you will) for fixed x, y is an extremum for w = x only, and therefore the envelope has equation y=f+

y=f+ 1966

x2 . 4f

I could say I’m done right now because we “know” that this is a parabola, with focal length f and x-axis as directrix. I don’t want to, though. The most popular definition of parabola I know of is “set of points equidistant from some line d and some point f .” The line responsible for the point on the envelope with given ordinate x was found to bisect the segment s ∈ S through H = (x, 0). So pick an extra point Q ∈ b ∈ B where b is the perpendicular bisector of s. We then have ∠F MQ = ∠QMH because they’re both right angles, lengths F M = MH, and QM is common to both triangles F MQ and HMQ. Therefore two sides and the angles they contain are respectively equal in the triangles F MQ and HMQ, and so respective angles and respective sides are all equal. In particular, F Q = QH. Also, since Q and H have the same x coordinate, the line QH is the perpendicular to the x-axis, and so Q, a general point on the envelope, is equidistant from F and the x-axis. Therefore etc. Because of this construction, it is clear that the lines of B are all tangent to the parabola in question. We’re not done yet. Pick a random point P outside C (“inside” the parabola), and call the parabola π (just to be nasty). Here’s a nice quicky: Theorem 22 (The Reflector Law). For R ∈ π, the length of the path P RF is minimal when P R produced is perpendicular to the x-axis. uite simply, assume P R produced is not necessarily perpendicular to the x-axis. Because π is a parabola, the segment from R perpendicular to the x-axis has the same length as RF . So let this perpendicular hit the x-axis at H. We then have that the length of P RH equals that of P RF . But P RH (and hence P RF ) is minimal when it’s a straight line; that is, when P R produced is perpendicular to the x-axis. QED Q

Hey! I called that theorem the “reflector law”. Perhaps it didn’t look like one. (It is in the Lagrangian formulation), but it’s fairly easy to show (it’s a similar argument) that the shortest path from a point to a line to a point makes “incident” and “reflected” angles equal. One last marvelous tidbit. This will take more time, though. Let b be tangent to π at R, and let n be perpendicular to b at R. We will call n the normalto π at R. Let n meet the x-axis at G. Theorem 23. The radius of the “best-fit circle” to π at R is twice the length RG. Note: the ≈’s need to be phrased in terms of upper and lower bounds, so I can use the sandwich theorem, but the proof schema is exactly what is required). (

Take two points R, R0 on π some small distance  from eachother (we don’t actually use , it’s just a psychological trick). Construct the tangent t and normal n at R, normal n0 at R0 . Let n, n0 intersect at O, and t intersect the x-axis at G. Join RF, R0 F . Erect perpendiculars 1967

g, g 0 to the x-axis through R, R0 respectively. Join RR0 . Let g intersect the x-axis at H. Let P, P 0 be points on g, g 0 not in C. Construct RE perpendicular to RF with E in R0 F . We now have i)

∠P RO = ∠ORF = ∠GRH ≈ ∠P 0 R0 O = ∠OR0F

ii)

ER ≈ F R · ∠EF R

iii)

∠R0 RE + ∠ERO ≈

iv)

∠ERO + ∠ORF =

v)

∠R0 ER ≈

vi)

∠R0 OR = 12 ∠R0 F R

vii)

R0 R ≈ OR · ∠R0 OR

viii)

F R = RH

π 2

(That’s the number π, not the parabola)

π 2

π 2

From (iii),(iv) and (i) we have ∠R0 RE ≈ ∠GRH, and since R0 is close to R, and if we let R0 approach R, the approximations approach equality. Therefore, we have that triangle R0 RE approaches similarity with GRH. Therefore we have RR0 : ER ≈ RG : RH. Combining this with (ii),(vi),(vii), and (viii) it follows that RO ≈ 2RG, and in the limit R0 → R, RO = 2RG. QED This last theorem is a very nice way of short-cutting all the messy calculus needed to derive the Schwarzschild “Black-Hole” solution to Einstein’s field equations, and that’s why I enjoy it so. Version: 12 Owner: quincynoodles Author(s): quincynoodles

1968

Chapter 542 52A01 – Axiomatic and generalized convexity 542.1

convex combination

Let V be some vector space over R. Let X be some set of elements of V . Then a convex combination of elements from X is a linear combination of the form λ1 x1 + λ2 x2 + · · · + λn xn P for some n > 0, where each xi ∈ X, each λi ≥ 0 and i λi = 1.

Let co(X) be the set of all convex combinations from X. We call co(X) the convex hull, or convex envelope, or convex closure of X. It is a convex set, and is the smallest convex set which contains X. A set X is convex if and only if X = co(X). Version: 8 Owner: antizeus Author(s): antizeus

1969

Chapter 543 52A07 – Convex sets in topological vector spaces 543.1

Fr´ echet space

We consider two classes of topological vector spaces, one more general than the other. Following Rudin [1] we will define a Fr´echet space to be an element of the smaller class, and refer to an instance of the more general class as an F-space. After giving the definitions, we will explain why one definition is stronger than the other.

Definition 1. An F-space is a complete topological vector space whose topology is induced by a translation invariant metric. To be more precise, we say that U is an F-space if there exists a metric function d:U ×U →R such that d(x, y) = d(x + z, y + z),

x, y, z ∈ U;

and such that the collection of balls B (x) = {y ∈ U : d(x, y) < },

x ∈ U,  > 0

is a base for the topology of U.

Note 1. Recall that a topological vector space is a uniform space. The hypothesis that U is complete is formulated in reference to this uniform structure. To be more precise, we say that a sequence an ∈ U, n = 1, 2, . . . is Cauchy if for every neighborhood O of the origin there exists an N ∈ N such that an − am ∈ O for all n, m > N. The completeness condition then takes the usual form of the hypothesis that all Cauchy sequences possess a limit point. 1970

Note 2. It is customary to include the hypothesis that U is Hausdorff in the definition of a topological vector space. Consequently, a Cauchy sequence in a complete topological space will have a unique limit.

Note 3. Since U is assumed to be complete, the pair (U, d) is a complete metric space. Thus, an equivalent definition of an F-space is that of a vector space equipped with a complete, translation-invariant (but not necessarily homogeneous) metric, such that that the operations of scalar multiplication and vector addition are continuous with respect to this metric.

Definition 2. A Fr´echet space is a complete topological vector space (either real or complex) whose topology is induced by a countable family of semi-norms. To be more precise, there exist semi-norm functions k − kn : U → R,

n ∈ N,

such that the collection of all balls B(n) (x) = {y ∈ U : kx − ykn < },

x ∈ U,  > 0, n ∈ N,

is a base for the topology of U. Proposition 22. Let U be a complete topological vector space. Then, U is a Fr´echet space if and only if it is a locally convex F-space. Proof. First, let us show that a Fr´echet space is a locally convex F-space, and then prove the converse. Suppose then that U is Fr´echet. The semi-norm balls are convex; this follows directly from the semi-norm axioms. Therefore U is locally convex. To obtain the desired distance function we set d(x, y) =

∞ X n=0

2−n

kx − ykn , 1 + kx − ykn

x, y ∈ U.

(543.1.1)

We now show that d satisfies the metric axioms. Let x, y ∈ U such that x 6= y be given. Since U is Hausdorff, there is at least one seminorm such kx − ykn > 0. Hence d(x, y) > 0. Let a, b, c > 0 be three real numbers such that a 6 b + c.

1971

A straightforward calculation shows that a b c 6 + , 1+a 1+b 1+c

(543.1.2)

as well. The above trick underlies the definition (542.1.1) of our metric function. By the seminorm axioms we have that kx − zkn 6 kx − ykn + ky − zkn ,

x, y, z ∈ U

for all n. Combining this with (542.1.1) and (542.1.2) yields the triangle inequality for d. Next let us suppose that U is a locally convex F-space, and prove that it is Fr´echet. For every n = 1, 2, . . . let Un be an open convex neighborhood of the origin, contained inside a ball of radius 1/n about the origin. Let k − kn be the seminorm with Un as the unit ball. By definition, the unit balls of these seminorms give a neighborhood base for the topology of U. QED.

Bibliography. Rudin, W. “Functional Analysis” Version: 42 Owner: matte Author(s): rmilson

1972

Chapter 544 52A20 – Convex sets in n dimensions (including convex hypersurfaces) 544.1

Carath´ eodory’s theorem

Suppose a point p lies in the convex hull of points P ⊂ Rd . Then there is a subset P 0 ⊂ P consisting of no more than d + 1 points such that p lies in the convex hull of P 0. Version: 1 Owner: bbukh Author(s): bbukh

1973

Chapter 545 52A35 – Helly-type theorems and geometric transversal theory 545.1

Helly’s theorem

Suppose A1 , . . . , Am ⊂ RdTis a family of convex sets, and every d + 1 of them have a nonempty intersection. Then m i=1 Ai is non-empty.

he proof is by induction on m. If m = d + 1, then the statement T is vacuous. Suppose the statement is true if m is replaced by m − 1. The sets Bj = i6=j Ai are non-empty by inductive hypothesis. Pick a point pj from each of Bj .TBy Radon’s lemma, there is a partition of p’s into two sets P1 and P2 such that I = (◦P1 ) (◦P2 ) 6= ∅. For every Aj either every point in P1 belongs to Aj or every point in P2 belongs to Aj . Hence I ⊆ Aj for every j. T

Version: 3 Owner: bbukh Author(s): bbukh

1974

Chapter 546 52A99 – Miscellaneous 546.1

convex set

Let S a subset of Rn . We say that S is convex when, for any pair of points A, B in S, the segment AB lies entirely inside S. The former statement is equivalent to saying that for any pair of vectors u, v in S, the vector (1 − t)u + tv is in S for all t ∈ [0, 1]. If S is a convex set, for any u1 , u2 , . . . , ur in S, and any positive numbers λ1 , λ2 , . . . , λr such that λ1 + λ2 + · · · + λr = 1 the vector r X

λk u k

k=1

is in S. Examples of convex sets on the plane are circles, triangles, and ellipses. The definition given above can be generalized to any real vector space: Let V be a vector space (over R). A subset S of V is convex, if for all points x, y in S, the line segment {αx + (1 − α)y | α ∈ (0, 1)} is also in S. Version: 8 Owner: drini Author(s): drini

1975

Chapter 547 52C07 – Lattices and convex bodies in n dimensions 547.1

Radon’s lemma

Every set A ⊂ Rd of d + 2 or more points can be partitioned into two disjoint sets A1 and A2 such that the convex hulls of A1 and A2 intersect. ithout loss of generality we assume that the set A consists of exactly d + 2 points which we number a1 , a2 , . . . , ad+2 . Denote by ai,j the j’th component of i’th vector, and write the components in a matrix as   a1,1 a2,1 . . . ad+2,1 a1,2 a2,2 . . . ad+2,2     .. ..  . .. M =  ... . . .    a1,d a2,d . . . ad+2,d  1 1 ... 1 W

Since M has fewer rows than columns, there is a non-zero column vector λ such that Mλ = 0 which is equivalent to existence of a solution to the system

0 = λ1 a1 + λ2 a2 + · · · + λd+2 ad+2 0 = λ1 + λ2 + · · · + λd+2

(547.1.1)

Let A1 to be the set of those ai for which λi is positive, and A2 is the rest. Set s to be the sum of positive λi ’s. Then by the system (546.1.1) above 1 X 1 X λi ai = λi ai s a ∈A s a ∈A i

1

i

2

is a point of intersection of convex hulls of A1 and A2 . 1976

Version: 2 Owner: bbukh Author(s): bbukh

1977

Chapter 548 52C35 – Arrangements of points, flats, hyperplanes 548.1

Sylvester’s theorem

For every finite collection of non-collinear points in Euclidean space, there is a line that passes through exactly two of them. onsider all lines passing through two or more points in the collection. Since not all points lie on the same line, among pairs of points and lines that are non-incident we can find a point A and a line l such that the distance d(A, l) between them is minimal. Suppose the line l contained more than two points. Then at least two of them, say B and C, would lie on the same side of the perpendicular from p to l. But then either d(AB, C) or d(AC, B) would be smaller than the distance d(A, l) which contradicts the minimality of d(A, l). C

Version: 2 Owner: bbukh Author(s): bbukh

1978

Chapter 549 53-00 – General reference works (handbooks, dictionaries, bibliographies, etc.) 549.1

Lie derivative

Let M be a smooth manifold, X a vector field, and T a tensor. Then the Lie derivative LX T of T along X is a tensor of the same rank as T defined as LX T =

d ∗ (ρ (T )) |t=0 dt t

where ρ is the flow of X, and ρ∗t is pullback by ρt . The Lie derivative is a notion of directional derivative for tensors. Intuitively, this is the change in T in the direction of X. If X and Y are vector fields, then LX Y = [X, Y ], the standard Lie bracket of vector fields. Version: 2 Owner: bwebste Author(s): bwebste

549.2

closed differential forms on a simple connected domain

Theorem 24. Let ω(x, y) = a(x, y) dx + b(x, y) dy be a closed differential form defined on a simply connected open set D ⊂ R2 . Then ω is an exact differential form. The proof of this result is a consequence of the following useful lemmata. 1979

Lemma 12. Let ω(x, y) be a closed form defined on an open set D and suppose that γ0 and γ1 are two regular homotopic curves in D (with the same end points). Then intγ0 ω = intγ1 ω. Lemma 13. Let ω(x, y) be a continuous differential form defined on a connected open set D. If given any two curves γ0 , γ1 in D with the same end-points, it holds intγ0 ω = intγ1 ω, then ω is exact. Version: 7 Owner: paolini Author(s): paolini

549.3

exact (differential form)

A differential form η is called exact if there exists another form ξ such that dξ = η. Exact forms are always closed, since d2 = 0. Version: 1 Owner: bwebste Author(s): bwebste

549.4

manifold

Summary. A manifold is a space that is locally like Rn , however lacking a preferred system of coordinates. Furthermore, a manifold can have global topological properties, such as noncontractible loops, that distinguish it from the topologically trivial Rn . Standard Definition. An n-dimensional topological manifold M is a second countable, Hausdorff topological space 1 that is locally homeomorphic to open subsets of Rn . A differential manifold is a topological manifold with some additional structure information. A chart, also known as a system of coordinates, is a continuous injection from an open subset of M to Rn . Let α : Uα → Rn , and β : Uβ → Rn be two charts with overlapping domains. The continuous injection \ β ◦ α−1 : α(Uα Uβ ) → Rn 1

For connected manifolds, the assumption that M is second-countable is logically equivalent to M being paracompact, or equivalently to M being metrizable. The topological hypotheses in the definition of a manifold are needed to exclude certain counter-intuitive pathologies. Standard illustrations of these pathologies are given by the long line (lack of paracompactness) and the forked line (points cannot be separated). These pathologies are fully described in Spivak[2].

1980

is called a transition function, and also called a a change of coordinates. An atlas A is a collection of charts α : Uα → Rn whose domains cover M, i.e. [ M= Uα . α

Note that each transition function is really just n real-valued functions of n real variables, and so we can ask whether these are continuously differentiable. The atlas A defines a differential structure on M, if every transition functions corresponding to A is continuously differentiable. More generally, for k = 1, 2, . . . , ∞, ω, the atlas A is said to define a Ck differential structure, and M is said to be of class Ck , if all the transition functions are k-times continuously differentiable, or real analytic in the case of Cω . Two differential structures of class Ck on M are said to be isomorphic if the union of the corresponding atlases is also a Ck atlas, i.e. if all the new transition functions arising from the merger of the two atlases remain of class Ck . More generally, two Ck manifolds M and N are said to be diffeomorphic, i.e. have equivalent differential structure, if there exists a homeomorphism φ : M → N such that the atlas of M is equivalent to the atlas obtained as φ-pullbacks of charts on N. The atlas allows us to define differentiable mappings to and from a manifold. Let f : U → R,

U ⊂M

be a continuous function. For each α ∈ A we define fα : V → R,

V ⊂ Rn ,

called the representation of f relative to chart α, as the suitably restricted composition fα = f ◦ α−1 . We judge f to be differentiable if all the representations fα are differentiable. A path γ : I → M,

I⊂R

is judged to be differentiable, if for all differentiable functions f , the suitably restricted composition f ◦ γ is a differentiable function from R to R. Finally, given manifolds M, N, we judge a continuous mapping φ : M → N between them to be differentiable if for all differentiable functions f on N, the suitably restricted composition f ◦ φ is a differentiable function on M.

Classical Definition Historically, the data for a manifold was specified as a collection of coordinate domains related by changes of coordinates. The manifold itself could obtained by gluing the domains in accordance with the transition functions, provided the changes of coordinates were free of inconsistencies. 1981

In this formulation, a Ck manifold is specified by two types of information. The first item of information is a collection of open sets V α ⊂ Rn ,

α ∈ A,

indexed by some set A. The second item is a collection of transition functions, that is to say Ck diffeomorphisms σαβ : Vαβ → Rn ,

Vαβ ⊂ Vα , open,

α, β ∈ A,

obeying certain consistency and topological conditions. We call a pair (α, x),

α ∈ A, x ∈ Vα

the coordinates of a point relative to chart α, and define the manifold M to be the set of equivalence classes of such pairs modulo the relation (α, x) ' (β, σαβ (x)). To ensure that the above is an equivalence relation we impose the following hypotheses. • For α ∈ A, the transition function σαα is the identity on Vα . • For α, β ∈ A the transition functions σαβ and σβα are inverses. • For α, β, γ ∈ A we have for a suitably restricted domain σβγ ◦ σαβ = σαγ We topologize M with the least course topology that will make the mappings from each Vα to M continuous. Finally, we demand that the resulting topological space be paracompact and Hausdorff.

Notes. To understand the role played by the notion of a differential manifold, one has to go back to classical differential geometry, which dealt with geometric objects such as curves and surface only in reference to some ambient geometric setting — typically a 2-dimensional plane or 3-dimensional space. Roughly speaking, the concept of a manifold was created in order to treat the intrinsic geometry of such an object, independent of any embedding. The motivation for a theory of intrinsic geometry can be seen in results such as Gauss’s famous Theorema Egregium, that showed that a certain geometric property of a surface, namely the the scalar curvature, was fully determined by intrinsic metric properties of the surface, and was independent of any particular embedding. Riemann[1] took this idea further in his habilitation lecture by describing intrinsic metric geometry of n-dimensional space without recourse to an ambient Euclidean setting. The modern notion of manifold, as a general setting for geometry involving differential properties evolved early in the twentieth century 1982

from works of mathematicians such as Hermann Weyl[3], who introduced the ideas of an atlas and transition functions, and Elie Cartan, who investigation global properties and geometric structures on differential manifolds. The modern definition of a manifold was introduced by Hassler Whitney[4] (more foundational information). References. ¨ 1. Riemann, B., “Uber die Hypothesen welche der Geometrie zu Grunde liegen (On the hypotheses that lie at the foundations of geometry)” in M. Spivaks, A comprehensive introduction to differential geometry, vol. II. 2. Spivak, M., A comprehensive introduction to differential geometry, vols I & II. 3. Weyl, H., The concept of a Riemann surface. (1913) 4. Whitney, H., “Differentiable Manifolds”, Annals of Mathematics (1936). Version: 23 Owner: rmilson Author(s): rmilson

549.5

metric tensor

At any point P in space we can consider the dot product as a mapping of any two vectors ~v , w ~ at P into the real numbers. We call this mapping the metric tensor and express it as g(~v, w) ~ = ~v · w ~

∀ ~v , w ~ at P.

We could similarly compute the components of the metric tensor g relative to a basis {e~1 , e~2 , e~3 } gi,j = g(~ ei , e~j ) for all cyclic permutations if i and j. The components of the metric tensor are dependent on what basis is being used whereas g(~v, w) ~ is independent of basis. Version: 4 Owner: tensorking Author(s): tensorking

549.6

[

proof of closed differential forms on a simple connected domain

lemma 1] Let γ0 and γ1 be two regular homotopic curves in D with the same end-points. 1983

Let σ : [0, 1] × [0, 1] → D be the homotopy between γ0 and γ1 i.e. σ(0, t) = γ0 (t),

σ(1, t) = γ1 (t).

Notice that we may (and shall) suppose that σ is regular too. In fact σ([0, 1] × [0, 1]) is a compact subset of D. Being D open this compact set has positive distance from the boundary ∂D. So we could regularize σ by mollification leaving its image in D. Let ω(x, y) = a(x, y) dx+b(x, y) dy be our closed differential form and let σ(s, t) = (x(s, t), y(s, t)). Define F (s) = int10 a(x(s, t), y(s, t))xt (s, t) + b(x(s, t), y(s, t))yt(s, t) dt; we only have to prove that F (1) = F (0). We have

d int1 axt + byt dt ds 0 = int10 ax xs xt + ay ys xt + axts + bx xs yt + by ys yt + byts dt. F 0 (s) =

Notice now that being ay = bx we have d [axs + bys ] = ax xt xs + ay yt xs + axst + bx xt ys + by yt ys + byst dt = ax xs xt + bx xs yt + axts + ay ys xt + by ys yt + byts hence F 0 (s) = int10

d [axs + bys ] dt = [axs + bys ]10 . dt

Notice, however, that σ(s, 0) and σ(s, 1) are constant hence xs = 0 and ys = 0 for t = 0, 1. So F 0 (s) = 0 for all s and F (1) = F (0). Lemma 2] Let us fix a point (x0 , y0 ) ∈ D and define a function F : D → R by letting F (x, y) be the integral of ω on any curve joining (x0 , y0) with (x, y). The hypothesys assure that F is well defined. Let ω = a(x, y) dx+b(x, y) dy. We only have to prove that ∂F/∂x = a and ∂F/∂y = b. [

Let (x, y) ∈ D and suppose that h ∈ R is so small that for all t ∈ [0, h] also (x + t, y) ∈ D. Consider the increment F (x + h, y) − F (x, y). From the definition of F we know that F (x + h, y) is equal to the integral of ω on a curve which starts from (x0 , y0 ) goes to (x, y) and then goes to (x + h, y) along the straight segment (x + t, y) with t ∈ [0, h]. So we understand that F (x + h, y) − F (x, y) = inth0 a(x + t, y)dt. For the integral mean value theorem we know that the last integral is equal to ha(x + ξ, y) for some ξ ∈ [0, h] and hence letting h → 0 we have F (x + h, y) − F (x, y) = a(x + ξ, y) → a(x, y) h → 0 h 1984

that is ∂F (x, y)/∂x = a(x, y). With a similar argument (exchange x with y) we prove that also ∂F/∂y = b(x, y). Theorem] Just notice that if D is simply connected, then any two curves in D with the same end points are homotopic. Hence we can apply lemma 1 and then Lemma 2 to obtain the desired result. [

Version: 4 Owner: paolini Author(s): paolini

549.7

pullback of a k-form

pullback of a k-form 1: X, Y smooth manifolds 2: f : X → Y 3: ω k-form on Y 4: f ∗ (ω) k-form on X 5: (f ∗ ω)p (v1 , . . . , vk ) = ωf (p) (dfp (v1 ), . . . , dfp (vk )) Note: This is a “seed” entry written using a short-hand format described in this FAQ. Version: 3 Owner: bwebste Author(s): matte, apmxi

549.8

tangent space

Summary The tangent space of differential manifold M at a point x ∈ M is the vector space whose elements are velocities of trajectories that pass through x. The standard notation for the tangent space of M at the point x is Tx M. Definition (Standard). Let M be a differential manifold and x a point of M. Let γi : Ii → M,

Ii ⊂ R,

i = 1, 2

be two differentiable trajectories passing through x at times t1 ∈ I1 , t2 ∈ I2 , respectively. We say that these trajectories are in first order contact at x if for all differentiable functions f : U → R defined in some neighbourhood U ⊂ M of x, we have (f ◦ γ1 )0 (t1 ) = (f ◦ γ2 )0 (t2 ). 1985

First order contact is an equivalence relation, and we define Tx M, the tangent space of M at x, to be the set of corresponding equivalence classes. Given a trajectory γ : I → M,

I⊂R

passing through x at time t ∈ I, we define γ(t) ˙ the tangent vector , a.k.a. the velocity, of γ at time t, to be the equivalence class of γ modulo first order contact. We endow Tx M with the structure of a real vector space by identifying it with Rn relative to a system of local coordinates. These identifications will differ from chart to chart, but they will all be linearly compatible. To describe this identification, consider a coordinate chart α : Uα → Rn ,

Uα ⊂ M,

x ∈ U.

We call the real vector (α ◦ γ)0 (t) ∈ Rn the representation of γ(t) ˙ relative to the chart α. It is a simple exercise to show that two trajectories are in first order contact at x if and only if their velocities have the same representation. Another simple exercise will show that for every u ∈ Rn the trajectory t → α−1 (α(x) + tu) has velocity u relative to the chart α. Hence, every element of Rn represents some actual velocity, and therefore the mapping Tx M → Rn given by [γ] → (α ◦ γ)0 (t),

γ(t) = x,

is a bijection. Finally if β : Uβ → Rn , Uβ ⊂ M, trajectories γ(t) = x we have

x ∈ Uβ is another chart, then for all differentiable

(β ◦ γ)0 (t) = J(α ◦ γ)0 (t),

T where J is the Jacobian matrix at α(x) of the suitably restricted mapping β◦α−1 : α(Uα Uβ ) → Rn . The linearity of the above relation implies that the vector space structure of Tx M is independent of the choice of coordinate chart. Definition (Classical). Historically, tangent vectors were specified as elements of Rn relative to some system of coordinates, a.k.a. a coordinate chart. This point of view naturally leads to the definition of a tangent space as Rn modulo changes of coordinates. Let M be a differential manifold represented as a collection of parameterization domains {Vα ⊂ Rn : α ∈ A} 1986

indexed by labels belonging to a set A, and transition function diffeomorphisms σαβ : Vαβ → Vβα , Set

α, β ∈ A,

Vαβ ⊂ Vα

ˆ = {(α, x) ∈ A × Rn : x ∈ Vα }, M

ˆ modulo an equivand recall that a points of the manifold are represented by elements of M alence relation imposed by the transition functions [see Manifold — Definition (Classical)]. For a transition function σαβ , let Jσαβ : Vαβ → Matn,n (R) denote the corresponding Jacobian matrix of partial derivatives. We call a triple (α, x, u),

α ∈ A, x ∈ Vα , u ∈ Rn

the representation of a tangent vector at x relative to coordinate system α, and make the identification (α, x, u) ' (β, σαβ (x), [Jσαβ ](x)(u)),

α, β ∈ A, x ∈ Vαβ , u ∈ Rn .

to arrive at the definition of a tangent vector at x. Notes. The notion of tangent space derives from the observation that there is no natural way to relate and compare velocities at different points of a manifold. This is already evident when we consider objects moving on a surface in 3-space, where the velocities take their value in the tangent planes of the surface. On a general surface, distinct points correspond to distinct tangent planes, and therefore the velocities at distinct points are not commensurate. The situation is even more complicated for an abstract manifold, where absent an ambient Euclidean setting there is, apriori, no obvious “tangent plane” where the velocities can reside. This point of view leads to the definition of a velocity as some sort of equivalence class. See also: tangent bundle, connection, parallel translation Version: 2 Owner: rmilson Author(s): rmilson

1987

Chapter 550 53-01 – Instructional exposition (textbooks, tutorial papers, etc.) 550.1

curl

Let F be a vector field in R3 . If V is the volume of closed surface S enclosing the point p then a coordinate independent definition of curl of the vector field F~ = F 1 e~1 + F 2 e~2 + F 3 e~3 is the vector field given by 1 curl F~ (p) = lim intintS ~n × F~ dS V →0 V Where n is the outward unit normal vector and {e~1 , e~2 , e~3 } is an arbitrary basis. ~ × F~ . Although this cross product only computes the curl in an curl F~ is often denoted ∇ orthonormal coordinate system the notation is accepted in any context. Curl is easily computed in an arbitrary orthogonal coordinate system by using the appropriate scale factors. That is

curl F~ =

  1 ∂ h3 F 3 − 2 h3 h2 ∂q   1 ∂ h2 F 2 − 1 h1 h2 ∂q

      1 ∂ ∂ ∂ 2 1 3 e~1 + e~2 + h2 F h1 F − 1 h3 F ∂q 3 h3 h1 ∂q 3 ∂q   ∂ 1 h1 F e~3 ∂q 2 1988

for the arbitrary orthogonal curveilinear coordinate system (q 1 , q 2 , q 3 ) having scale factors (h1 , h2 , h3 ). Note the scale factors are given by

hi =



d dxi



d dxi



3 i ∈ {1, 2, 3}.

Non-orthogonal systems are more easily handled with tensor analysis. Curl is often used in physics in areas such as electrodynamics. Version: 6 Owner: tensorking Author(s): tensorking

1989

Chapter 551 53A04 – Curves in Euclidean space 551.1

Frenet frame

Let g : I → R3 be a parameterized space curve, assumed to be regular and free of points of inflection. The moving trihedron, also known as the Frenet frame 1 is an orthonormal basis of vectors (T(t), N(t), B(t)) defined and named as follows: g0 (t) , kg0 (t)k T0 (t) , N(t) = kT0 (t)k T(t) =

the unit tangent; the unit normal;

B(t) = T(t) × N(t) ,

the unit binormal.

A straightforward application of the chain rule shows that these definitions are invariant with respect to reparameterizations. Hence, the above three vectors should be conceived as being attached to the point g(t) of the oriented space curve, rather than being functions of the parameter t. Corresponding to the above vectors are 3 planes, passing through each point of the space curve. The osculating plane is the plane spanned by T and N; the normal plane is the plane spanned by N and B; the rectifying plane is the plane spanned by T and B. Version: 10 Owner: rmilson Author(s): rmilson, slider142 1

Other names for this include the Frenet trihedron, the rep`ere mobile, and the moving frame.

1990

551.2

Serret-Frenet equations

Let g : I → R3 be an arclength parameterization of an oriented space curve — assumed to be regular, and free of points of inflection. Let T(t), N(t), B(t) denote the corresponding moving trihedron, and κ(t), τ (t) the corresponding curvature and torsion functions. The following differential relations, called the Serret-Frenet equations, hold between these three vectors. T0 (t) = κ(t)N(t); N0 (t) = −κ(t)T(t) + τ (t)B(t); B0 (t) = −τ (t)N(t).

(551.2.1) (551.2.2) (551.2.3)

Equation (550.2.1) follows directly from the definition of the normal N(t) and from the definition of the curvature, κ(t). Taking the derivative of the relation N(t) · T(t) = 0, gives N0 (t) · T(t) = −T0 (t) · N(t) = −κ(t).

Taking the derivative of the relation

N(t) · N(t) = 1, gives By the definition of torsion, we have

N0 (t) · N(t) = 0.

N0 (t) · B(t) = τ (t). This proves equation (550.2.2). Finally, taking derivatives of the relations T(t) · B(t) = 0, N(t) · B(t) = 0, B(t) · B(t) = 1, and making use of (550.2.1) and (550.2.2) gives B0 (t) · T(t) = −T0 (t) · B(t) = 0, B0 (t) · N(t) = −N0 (t) · B(t) = −τ (t), B0 (t) · B(t) = 0. This proves equation (550.2.3). It is also convenient to describe the Serret-Frenet equations by using matrix notation. Let F : I → SO( 3) (see - special orthogonal group), the mapping defined by F(t) = (T(t), N(t), B(t)), 1991

t∈I

represent the Frenet frame as a 3 × 3 orthonormal matrix. Equations (550.2.1) (550.2.2) (550.2.3) can be succinctly given as   0 κ(t) 0 0 τ (t) F(t)−1 F0 (t) = −κ(t) 0 −τ (t) 0 In this formulation, the above relation is also known as the structure equations of an oriented space curve. Version: 10 Owner: rmilson Author(s): rmilson, slider142

551.3

curvature (space curve)

Let g : I → R3 be a parameterized space curve, assumed to be regular and free of points of inflection. Physically, we conceive of g(t) as a particle moving through space. Let T(t), N(t), B(t) denote the corresponding moving trihedron. The speed of this particle is given by s(t) = kg0(t)k. The quantity κ(t) =

kg0 (t) × g00 (t)k kT0 (t)k = , s(t) kg0(t)k3

is called the curvature of the space curve. It is invariant with respect to reparameterization, and is therefore a measure of an intrinsic property of the curve, a real number geometrically assigned to the point g(t). Physically, curvature may be conceived as the ratio of the normal acceleration of a particle to the particle’s speed. This ratio measures the degree to which the curve deviates from the straight line at a particular point. Indeed, one can show that of all the circles passing through g(t) and lying on the osculating plane, the one of radius 1/κ(t) serves as the best approximation to the space curve at the point g(t). To treat curvature analytically, we take the derivative of the relation g0 (t) = s(t)T(t). This yields the following decomposition of the acceleration vector: g00 (t) = s0 (t)T(t) + s(t)T0 (t) = s(t) {(log s)0 (t) T(t) + κ(t) N(t)} . Thus, to change speed, one needs to apply acceleration along the tangent vector; to change heading the acceleration must be applied along the normal. Version: 6 Owner: slider142 Author(s): rmilson, slider142 1992

551.4

fundamental theorem of space curves

Informal summary. The curvature and torsion of a space curve are invariant with respect to Euclidean motions. Conversely, a given space curve is determined up to a Euclidean motion, by its curvature and torsion, expressed as functions of the arclength. Theorem. Let g : I → R be a regular, parameterized space curve, without points of inflection. Let κ(t), τ (t) be the corresponding curvature and torsion functions. Let T : R3 → R3 be a Euclidean isometry. The curvature and torsion of the transformed curve T (g(t)) are given by κ(t) and τ (t), respectively. Conversely, let κ, τ : I → R be continuous functions, defined on an interval I ⊂ R, and suppose that κ(t) never vanishes. Then, there exists an arclength parameterization g : I → R of a regular, oriented space curve, without points of inflection, such that κ(t) and τ (t) are ˆ : I → R is another such space curve, the corresponding curvature and torsion functions. If g 3 3 ˆ (t) = T (g(t)). then there exists a Euclidean isometry T : R → R such that g Version: 2 Owner: rmilson Author(s): rmilson

551.5

helix

The space curve traced out by the parameterization   cos(at) g(t) =  sin(at)  , t ∈ R, a, b ∈ R bt

is called a circular helix.

Calculating the Frenet frame, we obtain 

 −a sin(at) 1  a cos(at) ; T= √ 2 a + b2 b   − cos(at) N =  − sin(at)  ; 0   b sin(at) 1 −b cos(at) . B= √ a2 + b2 a

1993

The curvature and torsion are the following constants: a2 a2 + b2 ab τ= 2 a + b2

κ=

Indeed, a circular helix can be conceived of as a space curve with constant, non-zero curvature, and constant, non-zero torsion. Indeed, one can show that if a space curve satisfies the above constraints, then there exists a system of Cartesian coordinates in which the curve has a parameterization of the form shown above. Version: 2 Owner: rmilson Author(s): rmilson

551.6

space curve

Kinematic definition. A parameterized space curve is a parameterized curve taking values in 3-dimensional Euclidean space. Physically, it may be conceived as a particle moving through space. Analytically, a smooth space curve is represented by a sufficiently differentiable mapping g : I → R3 , of an interval I ⊂ R into 3-dimensional Euclidean space R3 . Equivalently, a parameterized space curve can be considered a triple of functions:   γ1 (t) g(t) = γ2 (t) , t ∈ I. γ3 (t) Regularity hypotheses. To preclude the possibility of kinks and corners, it is necessary to add the hypothesis that the mapping be regular, that is to say that the derivative g0 (t) never vanishes. Also, we say that g(t) is a point of inflection if the first and second derivatives g0 (t), g00 (t) are linearly dependent. Space curves with points of inflection are beyond the scope of this entry. Henceforth we make the assumption that g(t) is both regular and lacks points of inflection. Geometric definition. A space curve, per se, needs to be conceived of as a subset of R3 rather than a mapping. Formally, we could define a space curve to be the image of some parameterization g : I → R3 . A more useful concept, however, is the notion of an oriented space curve, a space curve with a specified direction of motion. Formally, an oriented space curve is an equivalence class of parameterized space curves; with g1 : I1 → R3 and g2 : I2 → R3 being judged equivalent if there exists a smooth, monotonically increasing reparameterization function σ : I1 → I2 such that γ1 (t) = γ2 (σ(t)), 1994

t ∈ I1 .

Arclength parameterization. We say that g : I → R3 is an arclength parameterization of an oriented space curve if kg0(t)k = 1, s ∈ I. With this hypothesis the length of the space curve between points g(t2 ) and g(t1 ) is just |t2 − t1 |. In other words, the parameter in such a parameterization measures the relative distance along the curve. Starting with an arbitrary parameterization γ : I → R3 , one can obtain an arclength parameterization by fixing a t0 ∈ I, setting σ(t) = inttt0 kg0 (x)k dx, and using the inverse function σ −1 to reparameterize the curve. In other words, ˆ (t) = g(σ −1 (t)) g is an arclength parameterization. Thus, every space curve possesses an arclength parameterization, unique up to a choice of additive constant in the arclength parameter. Version: 6 Owner: rmilson Author(s): rmilson, slider142

1995

Chapter 552 53A45 – Vector and tensor analysis 552.1

closed (differential form)

A differential form η is called closed if its exterior derivative dη = 0. Version: 3 Owner: bwebste Author(s): bwebste

1996

Chapter 553 53B05 – Linear and affine connections 553.1

Levi-Civita connection

On any Riemannian manifold, there is a unique connection ∇ which is • compatible with the metric, that is X(hY, Zi) = h∇X Y, Zi + hY, ∇X Zi. • and torsion-free, that is

∇X Y − ∇Y X = [X, Y ].

This is called the Levi-Civita connection. In local coordinates {x1 , . . . , xn }, the Christoffel symbols Γijk are determined by   1 ∂gj` ∂gk` ∂gjk i gi` Γjk = + − . 2 ∂xk ∂xj ∂x` Version: 1 Owner: bwebste Author(s): bwebste

553.2

connection

Preliminaries. Let M be a smooth, differential manifold. Let F(M) denote the ring of smooth, real-valued functions on M, and let X(M) denote the real vector space of smooth vector fields. Recall that F(M) both acts and is acted upon by X(M). Given a function f ∈ F(M) and a vector field X ∈ X(M) we write f X ∈ X(M) for the vector field obtained by point-wise multiplying values of X by values of f , and write X(f ) ∈ F(M) for the function obtained by taking the directional derivative of f with respect to X. 1997

Main Definition. A connection on M is a bilinear mapping ∇ : X(M) × X(M) → X(M) (X, Y ) 7→ ∇X Y,

X, Y ∈ X(M)

that for all X, Y ∈ X(M) and all f ∈ F(M) satisfies – ∇f X Y = f ∇X Y,

– ∇X (f Y ) = X(f )Y + f ∇X Y

Note that the lack of tensoriality in the second argument means that a connection is not a tensor field. Also not that we can regard the connection as a mapping from X(M) to the space of type (1,1) tensor fields, i.e. for Y ∈ X(M) the object ∇Y : X(M) → X(M) X 7→ ∇X Y, X ∈ X(M) is a type (1,1) tensor field called the covariant derivative of Y . In this capacity ∇ is often called the covariant derivative operator. [Note: I will define covariant derivatives of general tensor-fields (e.g. Differential forms and metrics) once the background material on tensor fields is developed] Classical definition. In local coordinates a connection is represented by means of the so-called Christoffel symbols Γkij . To that end, let x1 , . . . , xn be a system of local coordinates on U ⊂ M, and ∂x1 , . . . , ∂xn the corresponding frame of coordinate vector-fields. Using indices i, j, k = 1, . . . , n and invoking the usual tensor summation convention, the Christoffel symbols are defined by the following relation: Γkij ∂xk = ∇∂xi ∂xj . Recall that once a system of coordinates is chosen, a given vector field Y ∈ X(M) is represented by means of its components Y i ∈ F(U) according to Y = Y i ∂xi . It is traditional to represent the components of the covariant derivative ∇Y like this Y ;ji using the semi-colon to indicate that the extra index comes from covariant differentiation. The formula for the components follows directly from the defining properties of a connection and the definition of the Christoffel symbols. To wit: Y ;ji = Y ,ji + Γjk i Y k 1998

where the symbol with the comma Y ,ji = ∂xj (Y i ) =

∂Y i ∂xj

denotes a derivate relative to the coordinate frame. A related and frequently encountered notation is ∇i , which indicates a covariant derivatives in direction ∂xi , i.e. ∇i Y = ∇∂xi Y,

Y ∈ X(M).

This notation jibes with the point of view that the covariant derivative is a certain generalization of the ordinary directional derivative. The partials ∂xi are replaced by the covariant ∇i , and the general directional derivative V i ∂xi relative to a vector-field V , is replaced by the covariant derivative operator V i ∇i .

The above notation can lead to some confusion, and this danger warrants an extra comment. The symbol ∇i acting on a function, is customarily taken to mean the same thing as the corresponding partial derivative: ∇i f = ∂xi (f ) =

∂f . ∂xi

Furthermore classically oriented individuals always include the indices when writing vector fields and tensors; they never write Y , only Y j . In particular, the traditionalist will never write ∇i Y , but rather ∇i Y j , and herein lies the potential confusion. This latter symbol must be read as (∇i Y )j not as ∇i (Y j ), i.e. one takes the covariant derivative of Y with respect to ∂xi and then looks at the j th component, rather than the other way around. In other words ∇i Y j means Y ;ij , and not Y ,ij . Related Definitions. The torsion of a connection ∇ is a bilinear mapping T : X(M) × X(M) → X(M) defined by T (X, Y ) = ∇X (Y ) − ∇Y (X) − [X, Y ], where the last term denotes the Lie bracket of X and Y . The curvature of a connection is a tri-linear mapping R : X(M) × X(M) × X(M) → X(M) defined by R(X, Y, Z) = ∇X ∇Y Z − ∇Y ∇X Z − ∇[X,Y ] Z, We note the following facts:

1999

X, Y, Z ∈ X(M).

– The torsion and curvature are tensorial (i.e. F(M)-linear) with respect to their arguments, and therefore define, respectively, a type (1,2) and a type (1,3) tensor field on M. This follows from the defining properties of a connection and the derivation property of the Lie bracket. – Both the torsion and the curvature are, quite evidently, anti-symmetric in their first two arguments. A connection is called torsionless if the corresponding torsion tensor vanishes. If the corresponding curvature tensor vanishes, then the connection is called flat. A connection that is both torsionless and flat is locally Euclidean, meaning that there exist local coordinates for which all of the Christoffel symbols vanish. Notes. The notion of connection is intimately related to the notion of parallel transport, and indeed one can regard the former as the infinitesimal version of the latter. To put it another way, when we integrate a connection we get parallel transport, and when we take the derivative of parallel transport we get a connection. Much more on this in the parallel transport entry. As far as I know, we have Elie Cartan to thank for the word connection. With some trepidation at putting words into the master’s mouth, my guess is that Cartan would lodge a protest against the definition of connection given above. To Cartan, a connection was first and foremost a geometric notion that has to do with various ways of connecting nearby tangent spaces of a manifold. Cartan might have preferred to refer to ∇ as the covariant derivative operator, or at the very least to call ∇ an affine connection, in deference to the fact that there exist other types of connections (e.g. projective ones). This is no longer the mainstream view, and these days, when one wants to speak of such matters, one is obliged to use the term Cartan connection. Indeed, many authors call ∇ an affine connection although they never explain the affine part. 1 One can also define connections and parallel transport in terms of principle fiber bundles. This approach is due to Ehresmann. In this generalized setting an affine connection is just the type of connection that arises when working with a manifold’s frame bundle. Bibliography. [Exact references coming.] - Cartan’s book on projective connection. - Ehresmann’s seminal mid-century papers. - Kobayashi and Nomizu’s books - Spivak, as usual. 1

The silence is puzzling, and I must confess to wondering about the percentage of modern-day geometers who know exactly what is so affine about an affine connection. Has blind tradition taken over? Do we say “affine connection” because the previous person said “affine connection”? The meaning of “affine” is quite clearly explained by Cartan in his writings. There you go esteemed “everybody”: one more reason to go and read Cartan.

2000

Version: 1 Owner: rmilson Author(s): rmilson

553.3

vector field along a curve

Let M be a differentiable manifold and γ : [a, b] → M be a differentiable curve in M. Then a vector field along γ is a differentiable map Γ : [a, b] → T M, the tangent bundle of M, which projects to γ under the natural projection π : T M → M. That is, it assigns to each point t0 ∈ [a, b] a vector tangent to M at the point γ(t), in a continuous manner. A good example of a vector field along a curve is the speed vector γ. ˙ This d is the pushforward of the constant vector field dt by γ, i.e., at t0 , it is the derivation γ(f ˙ ) = dtd (f ◦ γ)|t=t0 . Version: 1 Owner: bwebste Author(s): bwebste

2001

Chapter 554 53B21 – Methods of Riemannian geometry 554.1

Hodge star operator

Let M be a n-dimensional orientable manifold with a Riemannian tensor g. The Hodge star operator (denoted by ∗) is a linear operator mapping p-forms on M to (n − p)-forms, i.e., ∗ : Ωp (M n ) → Ωn−p (M n ).

In local coordinates {x1 , . . . , xn }, where g = gij dxi ⊗ dxj , the ∗-operator is defined as the linear operator that maps the basis elements of Ωp (M n ) as p |g| i1 l1 g · · · g ip lp εl1 ···lp lp+1 ···ln dxlp+1 ∧ · · · ∧ dxln . ∗(dxi1 ∧ · · · ∧ dxip ) = (n − p)! Here, |g| = det gij , and ε is the Levi-Civita permutation symbol

Generally ∗∗ = (−1)p(n−p) id, where id is the identity operator in Ωp (M). In three dimensions, ∗∗ = id for all p = 0, . . . , 3. On R3 with Cartesian coordinates, the metric tensor is g = dx ⊗ dx + dy ⊗ dy + dz ⊗ dz, and the Hodge star operator is ∗dx = dy ∧ dz,

∗dy = dz ∧ dx,

∗dz = dx ∧ dy.

Version: 3 Owner: matte Author(s): matte

554.2

Riemannian manifold

A Riemannian metric on a differentiable manifold M is a collection of inner products h , ip on the tangent spaces Tp (M) of M, one for each p ∈ M, satisfying the following smoothness condition: 2002

Let f : U −→ M be any coordinate chart on M, where U ⊂ Rn is an open set. Let {e1 , . . . , en } denote the standard basis of Rn . Then the function gij : U −→ R defined by gij (x) := h(df )x (ei ), (df )x (ej )if (x) is a smooth function, for every 1 6 i 6 n and 1 6 j 6 n. The functions gij completely determine the Riemannian metric, and it is usual practice to define a Riemannian metric on a manifold M by specifying an atlas over M together with a matrix of functions gij on each coordinate chart which are symmetric and positive definite, with the proviso that the gij ’s must be compatible with each other on overlaps. A manifold M together with a Riemannian metric h , i is called a Riemannian manifold.

Note: A Riemannian metric on M is not a distance metric on M. However, for a connected manifold, it is the case that every Riemannian metric on M induces a distance metric on M, given by  1/2 dc dc d(x, y) := inf{ int10 dt c : [0, 1] −→ M is a rectifiable curve from x to y.} , dt dt c(t) , for any pair of points x, y ∈ M.

It is perhaps more proper to call the collection of gij ’s a metric tensor, and use the term “Riemannian metric” to refer to the distance metric above. However, the practice of calling the collection of gij ’s by the misnomer “Riemannian metric” appears to have stuck. Version: 9 Owner: djao Author(s): djao

2003

Chapter 555 53B99 – Miscellaneous 555.1

germ of smooth functions

If x is a point on a smooth manifold M, then a germ of smooth functions near x is represented by a pair (U, f ) where U ⊆ M is an open neighbourhood of x, and f is a smooth function U → R. Two such pairs (U, f ) and (V, g) are considered equivalent if there is a third open neighbourhood W of x, contained in both U and V , such that f |W = g|W . To be precise, a germ of smooth functions near x is an equivalence class of such pairs. In more fancy language: the set Ox of germs at x is the stalk at x of the sheaf O of smooth functions on M. It is clearly an R-algebra. Germs are useful for defining the tangent space Tx M in a coordinate-free manner: it is simply the space of all R-linear maps X : Ox → R satisfying Leibniz’ rule X(f g) = X(f )g + f X(g). (Such a map is called an R-linear derivation of Ox with values in R.) Version: 1 Owner: draisma Author(s): draisma

2004

Chapter 556 53C17 – Sub-Riemannian geometry 556.1

Sub-Riemannian manifold

A Sub-Riemannian manifold is a triple (M, H, gH) where M is a manifold, H is a distribution (that is, a linear subbundle of the tangent bundle T M of M), and gH is a metric on H induced by a fiber inner product on H. The distribution H is often referred to as the horizontal distribution. Version: 1 Owner: RevBobo Author(s): RevBobo

2005

Chapter 557 53D05 – Symplectic manifolds, general 557.1

Darboux’s Theorem (symplectic geometry)

If (M, ω) is a 2n-dimensional symplectic manifold, and m ∈ M, then there exists a neighborhood U of m with a coordinate chart x = (x1 , . . . , x2n ) : U → R2n , such that ω=

n X i=1

dxi ∧ dxn+i .

These are called canonical or Darboux coordinates. On U, ω is the pullback by X of the standard symplectic form on R2n , so x is a symplectomorphism. Darboux’s theorem implies that there are no local invariants in symplectic geometry, unlike in Riemannian geometry, where there is curvature. Version: 1 Owner: bwebste Author(s): bwebste

557.2

Moser’s theorem

Let ω0 and ω1 be symplectic structures on a compact manifold M. If there is a path in the space of symplectic structures of a fixed DeRham cohomology class connecting ω0 and ω1 (in particular ω0 and ω1 must have the same class), then (M, ω0 ) and (M, ω1 ) are symplectomorphic, by a symplectomorphism isotopic to the identity. Version: 2 Owner: bwebste Author(s): bwebste

2006

557.3

almost complex structure

An almost complex structure on a manifold M is a differentiable map J : T M → T M which preserves each fiber, is linear on each fiber, and whose square is minus the identity. That is π ◦ J = π, where π : T M → M is the standard projection, and J 2 = −I.

If M is a complex manifold, then multiplication by i on each tangent space gives an almost complex structure. Version: 1 Owner: bwebste Author(s): bwebste

557.4

coadjoint orbit

Let G be a Lie group, and g its Lie algebra. Then G has a natural action on g∗ called the coadjoint action, since it is dual to the adjoint action of G on g. The orbits of this action are submanifolds of g∗ which carry a natural symplectic structure, and are in a certain sense, the minimal symplectic manifolds on which G acts. The orbit through a point λ ∈ g∗ is typically denoted Oλ .

The tangent space Tλ Oλ is naturally idenified by the action with g/rλ , where rλ is the Lie algebra of the stabilizer of λ. The symplectic form on Oλ is given by ωλ (X, Y ) = λ([X, Y ]). This is obviously anti-symmetric and non-degenerate since λ([X, Y ]) = 0 for all Y ∈ g if and only if X ∈ rλ . This also shows that the form is well-defined.

There is a close association between coadoint orbits and the representation theory of G, with irreducible representations being realized as the space of sections of line bundles on coadjoint orbits. For example, if g is compact, coadjoint orbits are partial flag manifolds, and this follows from the Borel-Bott-Weil theorem. Version: 2 Owner: bwebste Author(s): bwebste

557.5

examples of symplectic manifolds

Examples of symplectic manifolds: The most basic example of a symplectic manifold is R2n . If we choose coordinate functions x1 , . . . , xn , y1 , . . . yn , then ω=

n X

m=1

dxm ∧ dym

is a symplectic form, and one can easily check that it is closed. Any orientable 2-manifold is symplectic. Any volume form is a symplectic form. If M is any manifold, then the cotangent bundle T ∗ M is symplectic. If x1 , . . . , xn are coordinates on a coordinate patch U on M, and ξ1 , . . . , ξn are the functions T ∗ (U) → R ξi (m, η) = η( 2007

∂ )(m) ∂xi

at any point (m, η) ∈ T ∗ (M), then ω=

n X i=1

dxi ∧ dξi

. One can check that this behaves well under coordinate transformations, and thus defines a form on the whole manifold. One can easily check that this is closed and non-degenerate. All orbits in the coadjoint action of a Lie group on the dual of it Lie algebra are symplectic. In particular, this includes complex Grassmannians and complex projective spaces. Examples of non-symplectic manifolds: obviously, all odd dimensional manifolds are non-symplectic. More subtlely, if M is compact, 2n dimensional and M is a closed 2-form, consider the form ω n . If this form is exact, then ω n must be 0 somewhere, and so ω is somewhere degenerate. Since the wedge of a closed and an exact form is exact, no power ω m of ω can be exact. In particular, H 2m (M) 6= 0 for all 0 6 m 6= n, for any compact symplectic manifold. Thus, for example, S n for n > 2 is not symplectic. Also, this means that any symplectic manifold must be orientable. Version: 2 Owner: bwebste Author(s): bwebste

557.6

hamiltonian vector field

Let (M, ω) be a symplectic manifold, and ω ˜ : T M → T ∗ M be the map from the tangent bundle to the cotangent bundle X → ω(·, X)

and let f : M → R is a smooth function. Then Hf = ω ˜ −1(df ) is the hamiltonian vector field of f . The vector field Hf is symplectic, and a symplectic vector field X is Hamiltonian if and only if the 1-form ω ˜ (X) = ω(·, X) is exact. If T ∗ Q is the cotangent bundle of a manifold Q, which is naturally identified with the phase space of one particle on Q, and f is the Hamiltonian, then the flow of the Hamiltonian vector field Hf is the time flow of the physical system. Version: 1 Owner: bwebste Author(s): bwebste

557.7

isotropic submanifold

If (M, ω) is a symplectic manifold, then a submanifold L ⊂ M is isotropic if the symplectic form vanishes on the tangent space of L, that is, ω(v1, v2 ) = 0 for all v1 , v2 ∈ T` L for all ` ∈ L. Version: 1 Owner: bwebste Author(s): bwebste 2008

557.8

lagrangian submanifold

If (M, ω) is a symplectic 2n-manifold, then a submanifold L is called lagrangian if it is isotropic, and of dimension n. This is the maximal dimension an isotropic submanifold can have, by the non-degeneracy of ω. Version: 1 Owner: bwebste Author(s): bwebste

557.9

symplectic manifold

A symplectic manifold is a pair (M, ω) consisting of a smooth 2n-manifold M and a closed 2-form ω ∈ Ω2 (M), which is non-degenerate at each point. Equivalently, the 2n-form ω n is non-vanishing (the superscript denotes powers of the wedge product). The differential form ω is often called a symplectic form. Let (M, ω) and (N, η) be symplectic manifolds. Then a diffeomorphism f : M → N is called a symplectomorphism if f ∗ η = ω, that is, if the symplectic form on N pulls back to the form on M. Version: 3 Owner: bwebste Author(s): bwebste

557.10

symplectic matrix

A real 2n × 2n matrix A is a symplecticmatrix if AJAT = J, where AT is the 0 I transpose of A, and J is the matrix J = . Here I is the identity n × n −I 0 matrix and O is the zero n × n matrix. Symplectic matrices satisfy the following properties:

1. The determinant of a symplectic matrix equals one. 2. With standard matrix multiplication, symplectic matrices form a group.   A B 3. Suppose Ψ = , where A, B, C, D are n×n matrices. Then Ψ is symplectic C D if and only if AD T − BC T = I,

AB T = BAT , CDT = DC T .

4.  If X and  Y are real n × n matrices, then U = X + iY is unitary if and only if X −Y is symplectic. Y X Version: 2 Owner: matte Author(s): matte

2009

557.11

symplectic vector field

If (M, ω) is a symplectic manifold, then a vector field X ∈ X(M) is symplectic if its flow preserves the symplectic structure. That is, if the Lie derivative LX ω = 0. Version: 1 Owner: bwebste Author(s): bwebste

557.12

symplectic vector space

A symplectic vector space (V, ω) is a finite dimensional real vector space V equipped with an alternating non-degenerate 2-form ω. In other words, the 2-form ω should satisfy the following properties: 1. Alternating: For all a, b ∈ V , ω(a, b) = −ω(b, a).

2. Non-degenerate: If a ∈ V and ω(a, b) = 0 for all b ∈ V , then a = 0. On a symplectic vector space (V, ω), the form ω is called a symplectic form for V . One can show that a symplectic vector space is always even dimensional [1].

REFERENCES 1. D. McDuff, D. Salamon, Introduction to Symplectic Topology, Clarendon Press, 1997.

Version: 1 Owner: matte Author(s): matte

2010

Chapter 558 53D10 – Contact manifolds, general 558.1

contact manifold

Let M be a smooth manifold and α a one form on M. Then α is a contact form on M if 1. for each point m ∈ M, αm 6= 0 and

2. the restriction dαm |ker αm of the differential of α is nondegenerate.

Condition 1 ensures that ξ = ker α is a subbundle of the vector bundle T M. Condition 2 equivalently says dα is a symplectic structure on the vector bundle ξ → M. A contact structure ξ on a manifold M is a subbundle of T M so that for each m ∈ M, there is a contact form α defined on some neighborhood of m so that ξ = ker α. A co-oriented contact structure is a subbundle of T M of the form ξ = ker α for some globally defined contact form α. A (co-oriented) contact manifold is a pair (M, ξ) where M is a manifold and ξ is a (co-oriented) contact structure. Note, symplectic linear algebra implies that dim M is odd. If dim M = 2n + 1 for some positive integer n, then a one form α is a contact form if and only if α ∧ (dα)n is everywhere nonzero.

Suppose now that (M1 , ξ1 = ker α1 ) and (M2 , ξ2 = ker α2 ) are co-oriented contact manifolds. A diffeomorphism φ : M1 → M2 is called a contactomorphism if the pullback along φ of α2 differs from α1 by some positive smooth function f : M1 → R, that is, φ∗ α2 = f α1 . Examples: 1. R3 is a contact manifold with the contact structure induced by the one form α = dz + xdy. 2. Denote by T2 the two-torus T2 = S 1 ×S 1 . Then, R×T2 (with coordinates t, θ1 , θ2 ) is a contact manifold with the contact structure induced by α = cos tθ1 + sin tθ2 . Version: 1 Owner: RevBobo Author(s): RevBobo 2011

Chapter 559 53D20 – Momentum maps; symplectic reduction 559.1

momentum map

Let (M, ω) be a symplectic manifold, G a Lie group acting on that manifold, g its Lie algebra, and g∗ the dual of the Lie algebra. This action induces a map α : g → X(M) where X(M) is the Lie algebra of vector fields on M, such that exp(tX)(m) = ρt (m) where ρ is the flow of α(X). Then a moment map µ : M → g∗ for the action of G is a map such that Hµ(X) = α(X). Here µ(X)(m) = µ(m)(X), that is, µ(m) is a covector, so we apply it to the vector X and get a scalar function µ(X), and Hµ(X) is its hamiltonian vector field. Generally, the moment maps we are interested in are equivariant with respect to the coadjoint action, that is, they satisfy Ad∗g ◦ µ = µ ◦ g. Version: 1 Owner: bwebste Author(s): bwebste

2012

Chapter 560 54-00 – General reference works (handbooks, dictionaries, bibliographies, etc.) 560.1

Krull dimension

If R is a ring, the Krull dimension (or simply dimension) of R, dim R is the supremum of all integers n such that there is an increasing sequence of prime ideals p0 ( · · · ( pn of length n in R. If X is a topological space, the Krull dimension (or simply dimension) of X, dim X is the supremum of all integers n such that there is a decreasing sequence of irreducible closed subsets F0 ) · · · ) Fn of X. Version: 3 Owner: mathcam Author(s): mathcam, nerdy2

560.2

Niemytzki plane

Let Γ be the Euclidean half plane Γ = {(x, y) | y ≥ 0} ⊆ R2 , with the usual subspaceStopology. We enrich the topology on Γ by throwing in open sets of the form {(x, 0)} Br (x, r), that is an open ball of radius r around (x, r) together with its point tangent to R × {0} (Fig. 559.1). The space Γ endowed with the enriched topology is Figure 560.1: A new open set in Γ. It consists of an open disc and the point tangent to y = 0.

called the Niemytzki plane. Some miscellaneous properties of the Niemytzki plane are 2013

– the subspace R × {0} of Γ is discrete, hence the only convergent sequences in this subspace are constant ones; – it is Hausdorff; – it is completely regular; – it is not normal. Version: 4 Owner: igor Author(s): igor

560.3

Sorgenfrey line

The Sorgenfrey line is a nonstandard topology on the real line R. Its topology is defined by the following base of half open intervals B = {[a, b[ | a, b ∈ R, a < b}. Another name is lower limit topology, since a sequence xα converges only if it converges in the standard topology and its limit is a limit from above (which, in this case, means that at most finitely many points of the sequence lie below the limit). For example, the sequence {1/n}n converges to 0, while {−1/n}n does not.

This topology contains the standard topology on R. The Sorgenfrey line is first countable, separable, but not second countable. It is also not metrizable. Version: 3 Owner: igor Author(s): igor

560.4

boundary (in topology)

Definition. Let X be aTtopological space and let A be a subset of X. The boundary of A is the set ∂A = A X\A. From the definition, it follows that ∂A = ∂(X\A).

Version: 1 Owner: matte Author(s): matte

560.5

closed set

Let (X, τ ) be a topological space. Then a subset C ⊆ X is closed if its complement X\C is open under the topology τ . Example: – In any topological space (X, τ ), the sets X and ∅ are always closed.

– ConsiderSR with the standard topology. Then [0, 1] is closed since its complement (−∞, 0) (1, ∞) is open (for it being union of two open sets). 2014

– Consider R with S the lower limit topology. Then [0, 1) is closed since its complement (−∞, 0) [1, ∞) is open.

Closed subsets can also be characterized as follows:

A subset C ⊆ X is closed if and only if C contains all of its cluster points. That is, C = C 0. So the set {1, 1/2, 1/3, 1/4, . . .} is not closed under the standard topology on R since 0 is a cluster point not contained in the set. Version: 2 Owner: drini Author(s): drini

560.6

coarser

If U and V are two topologies defined on the set E, we say that U is weaker than V (or U is coarser than V) if U ⊂ V (or, what is equivalent, if the identity map id : (E, V) → (E, U) is continuous). V is then finer than, or a refinement of, U.

Version: 2 Owner: vypertd Author(s): vypertd

560.7

compact-open topology

Let X and Y be topological spaces, and let C(X, Y ) be the set of continuous maps from X to Y. Given a compact subspace K of X and an open set U in Y, let UK,U := {f ∈ C(X, Y ) : f (x) ∈ U whenever x ∈ K} . Define the compact-open topology on C(X, Y ) to be the topology generated by the subbasis {UK,U : K ⊂ X compact, U ⊂ Y open} . If Y is a uniform space (for example, if Y is a metric space), then this is the topology of uniform convergence on compact sets. That is, a sequence (fn ) converges to f in the compact-open topology if and only if for every compact subspace K of X, (fn ) converges to f uniformly on K. If in addition X is a compact space, then this is the topology of uniform convergence. Version: 5 Owner: antonio Author(s): antonio

560.8

completely normal

Let X be a topological space. X is said to be completely normal if whenever A, B ⊆ T X XT X with A B = ∅ and A B = ∅ then there are disjoint open sets U and V such that A ⊆ U and B ⊆ V . 2015

Equivalently, a topological space X is completely normal if and only if every subspace is normal. Version: 2 Owner: Evandar Author(s): Evandar

560.9

continuous proper map

Let X, Y be topological spaces and p : X → Y a continuous map. The map p is called continuous proper if for every compact K ⊆ Y the preimage p−1 (K) is compact in X. Version: 1 Owner: igor Author(s): igor

560.10

derived set

Definition. Let X be a topological space and let A be a subset of X. The derived set [1] of A is the set of all limit points in A.

REFERENCES 1. J.L. Kelley, General Topology, D. van Nostrand Company, Inc., 1955.

Version: 1 Owner: bwebste Author(s): matte

560.11

diameter

Let A a subset of a pseudometric space (X, d). The diameter of A is defined to be sup{d(x, y) : x ∈ A, y ∈ A} whenever the supremum exists. If the supremum doesn’t exist, diameter of A is defined to be infinite. Having finite diameter is not a topological invariant. Version: 1 Owner: drini Author(s): drini

560.12

every second countable space is separable

Every second countable space is separable. Let (X, τ ) be a second countable space and let B be a countable base.

2016

For every basic set B in B, choose a point xB . The set A of all such xB points is clearly countable and it’s also dense since any open set intersects it and thus the whole space is the closure of A. That is, A is a countably dense subset of X. Therefore, X is separable. Version: 1 Owner: drini Author(s): drini

560.13

first axiom of countability

A topological space (X, τ ) satisfies the second axiom of countability if the neighborhood system of every point x ∈ X has a countable local base.

Version: 1 Owner: drini Author(s): drini

560.14

homotopy groups

The homotopy groups are an infinite series of (covariant) functors πn indexed by non-negative integers from based topological spaces to groups for n > 0 and sets for n = 0. πn (X, x0 ) as a set is the set of all homotopy classes of maps of pairs (D n , ∂D n ) → (X, x0 ), that is, maps of the disk into X, taking the boundary to the point x0 . Alternatively, these can be thought of as maps from the sphere S n into X, taking a basepoint on the sphere to x0 . These sets are given a group structure by declaring the product of 2 maps f, g to simply attaching two disks DS 1 , D2 with the right orientation along part of their boundariesSto get a new disk D1 D2 , and mapping D1 by f and D2 by g, to get a map of D1 D2 . This is continuous because we required that the boundary go to a fixed point, and well defined up to homotopy.

If f : X → Y satisfies f (x0 ) = y0 , then we get a homomorphism of homotopy groups f ∗ : πn (X, x0 ) → πn (Y, y0 ) by simply composing with f . If g is a map D n → X, then f ∗ ([g]) = [f ◦ g]. More algebraically, we can define homotopy groups inductively by πn (X, x0 ) ∼ = πn−1 (ΩX, y0 ), where ΩX is the loop space of X, and y0 is the constant path sitting at x0 . If n > 1, the groups we get are abelian. Homotopy groups are invariant under homotopy equivalence, and higher homotopy groups (n > 1) are not changed by the taking of covering spaces. Some examples are: πn (S n ) = Z. πm (S n ) = 0 if m < n. πn (S 1 ) = 0 if n > 1. πn (M) = 0 for n > 1 where M is any surface of nonpositive Euler characteristic (not a sphere or projective plane). Version: 9 Owner: bwebste Author(s): bwebste, nerdy2 2017

560.15

indiscrete topology

If X is a set and it is endowed with a topology defined by τ = {X, ∅}

(560.15.1)

then X is said to have the indiscrete topology. Furthermore τ is the coarsest topology a set can possess, since τ would be a subset of any other possible topology. This topology gives X many properties. It makes every subset of X sequentially compact. Every function to a space with the indiscrete topology is continuous. X, τ is path connected and hence connected but is arc connected only if it is uncountable. However, it is both hyperconnected and ultraconnected. Version: 7 Owner: tensorking Author(s): tensorking

560.16

interior

If (X, τ ) is an arbitrary topological space and A ⊆ X then the union of all open sets contained in A is defined to be the interior of A. Equivalently, one could define the interior of A to the be the largest open set contained in A. We denote the interior as int(A). Moreover, int(A) is one of the derived sets of a topological space; others include boundary, closure etc. Version: 5 Owner: GaloisRadical Author(s): GaloisRadical

560.17 invariant forms on representations of compact groups Let G be a real Lie group. TFAE: 1. Every real representation of G has an invariant positive definite form, and G has at least one faithful representation. 2. One faithful representation of G has an invariant positive definite form. 3. G is compact. Also, any group satisfying these criteria is reductive, and its Lie algebra is the direct sum of simple algebras and an abelian algebra (such an algebra is often called reductive). T

hat (1) ⇒ (2): Obvious.

That (2) ⇒ (3): Let Ω be the invariant form on a faithful representation V . Let then representation gives an embedding ρ : G → SO(V, Ω), the group of automorphisms of V preserving Ω. Thus, G is homeomorphic to a closed subgroup of SO(V, Ω). Since this group is compact, G must be compact as well. 2018

(Proof that SO(V, Ω) is compact: By induction on dim V . Let v ∈ V be an arbitrary vector. Then there is a map, evaluation on v, from SO(V, Ω) → S dim V −1 ⊂ V (this is topologically a sphere, since (V, ω) is isometric to Rdim V with the standard norm). This is a a fiber bundle, and the fiber over any point is a copy of SO(v ⊥ , Ω), which is compact by the inductive hypothesis. Any fiber bundle over a compact base with compact fiber has compact total space. Thus SO(V, Ω) is compact). That (3) ⇒ (1): Let V be an arbitrary representation of G. Choose an arbitrary positive definite form Ω on V . Then define ˜ w) = intG Ω(gv, gw)dg, Ω(v, where dg is Haar measure (normalized so that intG dg = 1). Since K is compact, this gives a well defined form. It is obviously bilinear, bSO(V, Ω)y the linearity of integration, and positive definite since ˜ Ω(gv, gv) = intG Ω(gv, gv)dg > inf g∈G Ω(gv, gv) > 0. ˜ is invariant, since Furthermore, Ω ˜ ˜ w). Ω(hv, hw) = intG Ω(ghv, ghw)dg = intG Ω(ghv, ghw)d(gh) = Ω(v, For representation ρ : T → GL(V ) of the maximal torus T ⊂ K, there exists a representation ρ0 of K, with ρ a T -subrepresentation of ρ0 . Also, since every conjugacy class of K intersects any maximal torus, a representation of K is faithful if and only if it restricts to a faithful representation of T . Since any torus has a faithful representation, K must have one as well. Given that these criteria hold, let V be a representation of G, Ω is positive definite real form, and W a subrepresentation. Now consider W ⊥ = {v ∈ V |Ω(v, w) = 0 ∀w ∈ W }. By the positive definiteness of Ω, V = W ⊕W ⊥ . By induction, V is completely reducible. Applying this to the adjoint representation of G on g, its Lie algebra, we find that g in the direct sum of simple algebras g1 , . . . , gn , in the sense that gi has no proper nontrivial ideals, meaning that gi is simple in the usual sense or it is abelian. Version: 7 Owner: bwebste Author(s): bwebste

560.18

ladder connected

Definition Suppose X is a topological space. Then X is called ladder connected provided that any open cover U of X has the following property: If p, q ∈ X, then T there exists a finite T number of open sets U1 , . . . , UN from U such that p ∈ U1 , U1 U2 6= ∅, . . . , UN −1 UN 6= ∅, and q ∈ UN . Version: 3 Owner: matte Author(s): matte, apmxi 2019

560.19

local base

A base for a neighborhood system Vx around x, or local base around x is a family FO neighborhoods such that every neighborhood of x contains a member of the family. Version: 2 Owner: drini Author(s): drini

560.20

loop

A loop based at x0 in a topological space X is simply a continuous map f : [0, 1] → X with f (0) = f (1) = x0 . The collection of all such loops, modulo homotopy equivalence, forms a group known as the fundamental group. More generally, the space of loops in X based at x0 with the compact-open topology, represented by Ωx0 , is known as the loop space of X. And one has the homotopy groups πn (X, x0 ) = πn−1 (Ωx0 , ι), where πn represents the higher homotopy groups, and ι is the basepoint in Ωx0 consisting of the constant loop at x0 . Version: 2 Owner: nerdy2 Author(s): nerdy2

560.21

loop space

Let X be a topological space, and give the space of continuous maps [0, 1] → X, the compact-open topology, that is a subbasis for the topology is the collection of sets {σ : σ(K) ⊂ U} for K ⊂ [0, 1] compact and U ⊂ X open. Then for x ∈ X, let Ωx X be the subset of loops based at x (that is σ such that σ(0) = σ(1) = x), with the relative topology.

Ωx X is called the loop space of X at x. Version: 4 Owner: mathcam Author(s): mathcam, nerdy2

560.22

metrizable

A topological space (X, T) is said to be metrizable if there is a metric d : X → [0, ∞] such that the topology induced by d is T. Version: 1 Owner: Evandar Author(s): Evandar

2020

560.23

neighborhood system

Let (X, τ ) be a topological space and x ∈ X. The neighborhood system around x is the family of all the neighborhoods of x. Version: 2 Owner: drini Author(s): drini

560.24

paracompact topological space

A topological space X is said to be paracompact if every open cover of X has a locally finite open refinement. In more detail, if (Ui )i∈I is any family of open subsets of X such that [ Ui = X , i∈I

then there exists another family (Vi )i∈I of open sets such that [ Vi = X i∈I

Vi ⊂ Ui for all i ∈ I

and any specific x ∈ X is in Vi for only finitely many i.

Any metric or metrizable space is paracompact (A. H. Stone). Also, given an open cover of a paracompact space X, there exists a (continuous) partition of unity on X subordinate to that cover. Version: 3 Owner: matte Author(s): Larry Hammick, Evandar

560.25

pointed topological space

Definition Suppose X is a non-empty topological space and x0 is an element of X. Then the pair (X, x0 ) is called a pointed topological space. The idea with pointed topological spaces is simply that one fixes a base point in the space. This is necessary, for instance, when defining the fundamental group of a topological space. Version: 2 Owner: bwebste Author(s): matte, apmxi

560.26

proper map

Definition Suppose X and Y are topological spaces, and f is a map f : X → Y . Then f is a proper map if the inverse image of every compact subset in Y of is a compact set in X. Version: 2 Owner: matte Author(s): matte, apmxi 2021

560.27

quasi-compact

A topological space is called quasi-compact if any open cover of it has a finite subcover. (Some people require a space to be Hausdorff to be compact, hence the distinction.) Version: 1 Owner: nerdy2 Author(s): nerdy2

560.28

regularly open

Given a topological space (X, τ ), a regularly open set is an open set A ∈ τ such that intA = A (the interior of the closure is the set itself).

S An example of non regularly open set on the standard topology for R is A = (0, 1) (1, 2) since intA = (0, 2). Version: 1 Owner: drini Author(s): drini

560.29

separated

Definition [1, 1, 2] Suppose A and B are subsets of a topological space X. Then A and B are separated provided that \ A B = ∅, \ A B = ∅, where A is the closure operator in X. When the ambient topological space is clear from the context, the notation A | B indicates that A and B are separated sets [2].

Properties Theorem 1 Suppose X is a topological space, Y is a subset of X equipped with the subspace topology, and A, B are subsets of Y . If A and B are separated in the topology of Y , then A and B are separated in the topology of X. In other words, if A | B in Y , then A | B in X [3]. Theorem 2 Suppose X is a topological space, T T and A, B, C are subsets of X. If A | B in the topology of X, then (A C) | (B C) in C when C is given the subspace topology from X [2]. Theorem 3 Suppose A, B, C are sets in a topological space X. Then we have [2] 1. ∅ | A. 2022

2. If A | B, then B | A.

3. If B | C and A ⊂ B, then A | C. S 4. If A | B and A | C, then A | (B C).

Theorem 4 [1] Suppose A, B are subsets of a topological space X. If A and B are both closed, or both open, then (A \ B) | (B \ A). Remarks In [3], separated sets are called strongly disjoint sets.

REFERENCES 1. 2. 3. 4.

W. Rudin, Principles of Mathematical Analysis, McGraw-Hill Inc., 1976. J.L. Kelley, General Topology, D. van Nostrand Company, Inc., 1955. L.E. Ward, Topology, An Outline for a First Course, Marcel Dekker, Inc., 1972. G.J. Jameson, Topology and Normed Spaces, Chapman and Hall, 1974.

Version: 2 Owner: matte Author(s): matte

560.30

support of function

Definition [1, 1] Suppose X is a topological space, and f : X → C is a function. Then the support of f (written as supp f ), is the set supp f = {x ∈ X | f (x) 6= 0}. In other words, supp f is the closure of the set where f does not vanish. Properties Suppose f, g : X → C are functions as above. Then we have \ supp(f g) ⊂ supp f supp g. where f g is the function x 7→ f (x)g(x).

REFERENCES 1. R.E. Edwards, Functional Analysis: Theory and Applications, Dover Publications, 1995. 2. J.L. Kelley, General Topology, D. van Nostrand Company, Inc., 1955.

Version: 2 Owner: matte Author(s): matte 2023

560.31

topological invariant

A topological invariant of a space X is a property that depends only on the topology of the space, i.e. it is shared by any topological space homeomorphic to X. Common examples include compactness, connectedness, Hausdorffness, Euler characteristic, orientability, dimension, and algebraic invariants like homology, homotopy groups, and K-theory. Properties of a space depending on an extra structure such as a metric (i.e. volume, curvature, symplectic invariants) typically are not topological invariants, though sometimes there are useful interpretations of topological invariants which seem to depend on extra information like a metric (for example, the Gauss-Bonnet theorem). Version: 2 Owner: bwebste Author(s): bwebste

560.32

topological space

A topological space is a set X together with a set T whose elements are subsets of X, such that – – – –

∅∈T X ∈T S If Uj ∈ T for all j ∈ J, then j∈J Uj ∈ T T If U ∈ T and V ∈ T, then U V ∈ T

Elements of T are called open sets of X. The set T is called a topology on X. A subset C ⊂ X is called a closed set if the complement X \ C is an open set.

A topology T 0 is said to be finer (respectively, coarser) than T if T 0 ⊃ T (respectively, T 0 ⊂ T). Examples

– The discrete topology is the topology T = P(X) on X, where P(X) denotes the power set of X. This is the largest, or finest, possible topology on X. – The indiscrete topology is the topology T = {∅, X}. It is the smallest or coarsest possible topology on X. – subspace topology – product topology – metric topology

REFERENCES 1. J.L. Kelley, General Topology, D. van Nostrand Company, Inc., 1955. 2. J. Munkres, Topology (2nd edition), Prentice Hall, 1999.

Version: 5 Owner: djao Author(s): djao 2024

560.33

topology

topology The origin of topology can be traced to the work of Euler who wrote a paper detailing a solution to the Konigsberg bridge problem. Topology can be thought of as sets of objects with continuity. Here is Eulers original polyhedral formula which you may find of some use: (a) v-e+f =2 (b) Where v is the number of vertices, e the edges and f the faces of a closed polyhedron.

1: X set 2: t set of subsets of X 3: φ ∈ t

4: X ∈ t

5: {Vi }ni=1 ⊂ t ⇒ V1 6: {Vα }α∈I ⊂ t ⇒

T

S

α

V2

T

...

Vα ∈ t

T

Vn ∈ t

Version: 7 Owner: bhaire Author(s): bhaire, apmxi, bbukh

560.34

triangle inequality

Let (X, d) be a metric space. The triangle inequality states that for any three points x, y, z ∈ X we have d(x, y) ≤ d(x, z) + d(z, y). The name comes from the special case of Rn with the standard topology, and geometrically meaning that in any triangle, the sum of the lengths of two sides is greater (or equal) than the third. Actually, the triangle inequality is one of the properties that define a metric, so it holds on any metric space. Two important cases are R with d(x, y) = |x − y| and C with d(x, y) = kx − yk (here we are using complex modulus, not aboslute value). There is a second triangle inequality, which also holds in any metric space and derives from the definitions of metric: d(x, y) ≥ |d(x, z) − d(z, y)| 2025

In planar geometry, this is expressed by saying the each side of a triangle is greater than the difference of the other two. Proof: Let x, y, z ∈ X be given. For any a, b, c ∈ X, from the first triangle inequality we have: d(a, b) ≤ d(a, c) + d(c, b) and thus (using d(b, c) = d(c, b) for any b, c ∈ X): d(a, c) ≥ d(a, b) − d(b, c)

(560.34.1)

and writing (559.34.1) for a = x, b = z, c = y: d(x, y) ≥ d(x, z) − d(z, y)

(560.34.2)

while writing (559.34.1) for a = y, b = z, c = x we get: d(y, x) ≥ d(y, z) − d(z, x) or d(x, y) ≥ d(z, y) − d(x, z);

(560.34.3)

from 559.34.2 and 559.34.3, using the properties of the absolute value, it follows finally: d(x, y) ≥ |d(x, z) − d(z, y)| that is the second triangular inequality. Version: 5 Owner: drini Author(s): drini, Oblomov

560.35

universal covering space

˜ of X Let X be a topological space. A universal covering space is a covering space X which is connected and simply connected. If X is based, with basepoint x, then a based cover of X is cover of X which is also a based space with a basepoint x0 such that the covering is a map of based spaces. Note that any cover can be made into a based cover by choosing a basepoint from the pre-images of x. ˜ x0 ) → The universal covering space has the following universal property: If π : (X, (X, x) is a based universal cover, then for any connected based cover π 0 : (X 0 , x0 ) → ˜ x0 ) → (X 0 , x0 ) such that π = π 0 ◦ π 00 . (X, x), there is a unique covering map π 00 : (X, Clearly, if a universal covering exists, it is unique up to unique isomorphism. But not every topological space has a universal cover. In fact X has a universal cover if and only if it is semi-locally simply connected (for example, if it is a locally finite CW-complex or a manifold). Version: 3 Owner: bwebste Author(s): bwebste, nerdy2 2026

Chapter 561 54A05 – Topological spaces and generalizations (closure spaces, etc.) 561.1 characterization of connected compact metric spaces. Let (A, d) be a compact metric space. Then A is connected ⇐⇒ ∀x, y ∈ A, ∀ε > 0 ∃ n ∈ N, and p1 , ..., pn ∈ A, such that p1 = x, pn = y and d(pi , pi+1 ) < ε (i = 1, ..., n − 1) Version: 9 Owner: gumau Author(s): gumau

561.2

closure axioms

A closure operator on a set X is an operator which assigns a set Ac to each subset A of X, and such that the following (Kuratowski’s closure axioms) hold for any subsets A and B of X: 1. ∅c = ∅;

2. A ⊂ Ac ;

3. (Ac )c = Ac ; S S 4. (A B)c = Ac B c .

The following theorem due to Kuratowski says that a closure operator characterizes a unique topology on X: Theorem. Let c be a closure operator on X, and let T = {X − A : A ⊆ X, Ac = A}. Then T is a topology on X, and Ac is the T-closure of A for each subset A of X. Version: 4 Owner: Koro Author(s): Koro 2027

561.3

neighborhood

A neighborhood of a point x in a topological space X is an open subset U of X which contains x. If X is a metric space, then an open ball around x is one example of a neighborhood 1 . A deleted neighborhood of x is an open set of the form U \ {x}, where U is an open subset of X which contains x. Version: 5 Owner: djao Author(s): djao

561.4

open set

In a metric space M a set O is called open, if for every x ∈ O there is an open ball S around x such that S ⊂ O. If d(x, y) is the distance from x to y then the open ball Br with radius r around x is given as: Br = {y ∈ M|d(x, y) < r}. Using the idea of an open ball one can define a neighborhood of a point x. A set containing x is called a neighborhood of x if there is an open ball around x which is a subset of the neighborhood. These neighborhoods have some properties, which can be used to define a topological space using the Hausdorff axioms for neighborhoods, by which again an open set within a topological space can be defined. In this way we drop the metric and get the more general topological space. We can define a topological space X with a set of neighborhoods of x called Ux for every x ∈ X, which satisfy 1. x ∈ U for every U ∈ Ux

2. If U ∈ Ux and V ⊂ X and U ⊂ V then V ∈ Ux (every set containing a neighborhood of x is a neighborhood of x itself). T 3. If U, V ∈ Ux then U V ∈ Ux . 4. For every U ∈ Ux there is a V ∈ Ux , such that V ⊂ U and V ∈ Up for every p∈V.

The last point leads us back to open sets, indeed a set O is called open if it is a neighborhood of every of its points. Using the properties of these open sets we arrive at the usual definition of a topological space using open sets, which is equivalent to the above definition. In this definition we look at a set X and a set of subsets of X, which we call open sets, called O, having the following properties:

1

1. ∅ ∈ O and X ∈ O.

In fact, some analysis texts, including Rudin’s Principles of Mathematical Analysis, actually define a neighborhood to be an open ball in the metric space context.

2028

2. Any union of open sets is open. 3. Finite intersections of open sets are open. Note that a topological space is more general than a metric space, i.e. on every metric space a topology can be defined using the open sets from the metric, yet we cannot always define a metric on a topological space such that all open sets remain open.

Examples: – On the real axis the interval I = (0; 1) is open because for every a ∈ I the open ball with radius min(a, 1 − a) is always a subset of I.

– The open ball Br around x is open. Indeed, for every y ∈ Br the open ball with radius r − d(x, y) around y is a subset of Br , because for every z within this ball we have: d(x, z) 6 d(x, y) + d(y, z) < d(x, y) + r − d(x, y) = r. So d(x, z) < r and thus z is in Br . This holds for every z in the ball around y and therefore it is a subset of Br – A non-metric topology would be the finite complement topology on infinite sets, in which a set is called open, if its complement is finite. Version: 13 Owner: mathwizard Author(s): mathwizard

2029

Chapter 562 54A20 – Convergence in general topology (sequences, filters, limits, convergence spaces, etc.) 562.1

Banach fixed point theorem

Let (X, d) be a complete metric space. A function T : X → X is said to be a contraction mapping if there is a constant q with 0 6 q < 1 such that d(T x, T y) 6 q · d(x, y) for all x, y ∈ X. Contractions have an important property. Theorem 13 (Banach Fixed Point Theorem). Every contraction has a unique fixed point. There is an estimate to this fixed point that can be useful in applications. Let T be a contraction mapping on (X, d) with constant q and unique fixed point x∗ ∈ X. For any x0 ∈ X, define recursively the following sequence x1 := T x0 x2 := T x1 .. . xn+1 := T xn . The following inequality then holds: d(x∗ , xn ) 6

qn d(x1 , x0 ). 1−q

So the sequence (xn ) converges to x∗ . This recursive estimate is occasionally responsible for this result being known as the method of successive approximations. Version: 15 Owner: mathwizard Author(s): mathwizard, NeuRet 2030

562.2

Dini’s theorem

If a monotonically increasing net {fn } of continuous real-valued functions on a topological space (X, τ ) converges pointwise to a continuous function f , then the net converges to f uniformly on compacts. Version: 1 Owner: drini Author(s): drini

562.3

another proof of Dini’s theorem

This is the version of the Dini’s theorem I will prove: Let K be a compact metric space and (fn )n∈N ⊂ C(K) which converges pointwise to f ∈ C(K).

Besides, fn (x) ≥ fn+1 (x) ∀x ∈ K, ∀n.

Then (fn )n∈N converges uniformly in K. Proof Suppose that the sequence does not converge uniformly. Then, by definition, ∃ε > 0 such that ∀m ∈ N ∃ nm > m, xm ∈ K such that

|fnm (xm ) − f (xm )| ≥ ε.

So, For m = 1 ∃ n1 > 1 , x1 ∈ K such that |fn1 (x1 ) − f (x1 )| ≥ ε ∃ n2 > n1 , x2 ∈ K such that |fn2 (x2 ) − f (x2 )| ≥ ε .. . ∃ nm > nm−1 , xm ∈ K such that |fnm (xm ) − f (xm )| ≥ ε Then we have a sequence (xm )m ⊂ K and (fnm )m ⊂ (fn )n is a subsequence of the original sequence of functions. K is compact, so there is a subsequence of (xm )m which converges in K, that is, xmj j such that xmj −→ x ∈ K

I will prove that f is not continuous in x (A contradiction with one of the hypothesis).  To do this, I will show that f xmj j does not converge to f (x), using above’s ε. Let j0 such that j ≥ j0 ⇒ fnmj (x) − f (x) < ε/4, which exists due to the punctual convergence of the sequence. Then, particularly, fnmjo (x) − f (x) < ε/4. Note that

fnmj (xmj ) − f (xmj ) = fnmj (xmj ) − f (xmj )

because (using the hypothesis fn (y) ≥ fn+1 (y) ∀y ∈ K, ∀n) it’s easy to see that fn (y) ≥ f (y) ∀y ∈ K, ∀n 2031

Then, fnmj (xmj ) − f (xmj ) ≥ ε ∀j. And also the hypothesis implies fnmj (y) ≥ fnmj+1 (y) ∀y ∈ K, ∀j So, j ≥ j0 ⇒ fnmj (xmj ) ≥ fnmj (xmj ), which implies 0

fnmj (xmj ) − f (xmj ) ≥ ε 0

Now,

f (x ) − f (x ) + f (x ) − f (x) ≥ f (x ) − f (x) nmj0 mj nmj0 mj mj ≥ ε ∀j ≥ j0 mj

and so

f (xmj ) − f (x) ≥ ε − fnm (xmj ) − f (x) ∀j ≥ j0 . j 0

On the other hand, fnmj (xmj ) − f (x) ≤ fnmj (xmj ) − fnmj (x) + fnmj (x) − f (x) 0

0

0

0

And as fnmj is continuous, there is a j1 such that 0

j ≥ j1 ⇒ fnmj (xmj ) − fnmj (x) < ε/4 0 0

Then,

j ≥ j1 ⇒ fnmj (xmj ) − f (x) ≤ fnmj (xmj ) − fnmj (x) + fnmj (x) − f (x) < ε/2, 0

0

0

0

which implies

f (xm ) − f (x) ≥ ε − fnm (xm ) − f (x) ≥ ε/ ∀j ≥ max (j0 , j1 ) . j j 2 j 0

Then, particularly, f (xmj )j does not converge to f (x). QED. Version: 8 Owner: gumau Author(s): gumau

562.4

continuous convergence

Let (X, d) and (Y, ρ) be metric spaces, and let fn : X −→ Y be a sequence of functions. We say that fn converges continuously to f at x if fn (xn ) −→ f (x) for every sequence (xn )n ⊂ X such that xn −→ x ∈ X. We say that fn converges continuously to f if it does for every x ∈ X.

Version: 3 Owner: gumau Author(s): gumau 2032

562.5

contractive maps are uniformly continuous

Theorem A contraction mapping is uniformly continuous. Proof Let T : X → X be a contraction mapping in a metric space X with metric d. Thus, for some q ∈ [0, 1), we have for all x, y ∈ X, d(T x, T y) ≤ qd(x, y). To prove that T is uniformly continuous, let ε > 0 be given. There are two cases. If q = 0, our claim is trivial, since then for all x, y ∈ X, d(T x, T y) = 0 < ε. On the other hand, suppose q ∈ (0, 1). Then for all x, y ∈ X with d(x, y) < ε/q, we have d(T x, T y) ≤ qd(x, y) < ε. In conclusion, T is uniformly continuous. 2 The result is stated without proof in [1], pp. 221.

REFERENCES 1. W. Rudin, Principles of Mathematical Analysis, McGraw-Hill Inc., 1976.

Version: 1 Owner: mathcam Author(s): matte

562.6

net

Let X be a set. A net is a map from a directed set to X. In other words, it is a pair (A, γ) where A is a directed set and γ is a map from A to X. If a ∈ A then γ(a) is normally written xa , and then the net is written (xa )a∈A . Note (xa )a∈A is a directed set under the partial order xa 6 xb iff a 6 b. Now suppose X is a topological space, A is a directed set, and (xa )a∈A is a net. Let x ∈ X. (xa ) is said to converge to x iff whenever ever U is an open neighbourhood of x, there is some b ∈ A such that xa ∈ U whenever a > b; that is, (xa ) is residual in every open neighbourhood of x. Similarly, x is said to be an accumulation point of (xa ) iff whenever U is an open neighbourhood of x and b ∈ A there is a ∈ A such that a > b and xa ∈ U, that is, (xa ) is cofinal in every open neighbourhood of x. Now let B be another directed set, and let δ : B → A be an increasing map such that δ(B) is cofinal in A. Then the pair (B, γ ◦ δ) is said to be a subnet of (A, γ).

Under these definitions, nets become a generalisation of sequences to arbitrary topological spaces. For example: 2033

– if X is Hausdorff then any net in X converges to at most one point – if Y is a subspace of X then x ∈ Y iff there is a net in Y converging to x

– if X 0 is another topological space and f : X → X 0 is a map then f is continuous at x iff whenever (xa ) is a net converging to x, (f (xa )) is a net converging to f (x) – X is compact iff every net has a convergent subnet Version: 2 Owner: Evandar Author(s): Evandar

562.7

proof of Banach fixed point theorem

Let (X, d) be a non-empty, complete metric space, and let T be a contraction mapping on (X, d) with constant q. Pick an arbitrary x0 ∈ X, and define the sequence (xn )∞ n=0 by xn := T n x0 . Let a := d(T x0 , x0 ). We first show by induction that for any n ≥ 0, d(T n x0 , x0 ) ≤

1 − qn a. 1−q

For n = 0, this is obvious. For any n ≥ 1, suppose that d(T n−1 x0 , x0 ) ≤

1−q n−1 a. 1−q

Then

d(T n x0 , x0 ) ≤ d(T n x0 , T n−1 x0 ) + d(x0 , T n−1 x0 ) 1 − q n−1 ≤ q n−1 d(T x0 , x0 ) + a 1−q q n−1 − q n 1 − q n−1 = a+ a 1−q 1−q 1 − qn = a 1−q by the triangle inequality and repeated application of the property d(T x, T y) ≤ qd(x, y) of T . By induction, the inequality holds for all n ≥ 0. n

q Given any  > 0, it is possible to choose a natural number N such that 1−q a <  qn for all n ≥ N, because 1−q a → 0 as n → ∞. Now, for any m, n ≥ N (we may assume that m ≥ n),

d(xm , xn ) = d(T m x0 , T n x0 ) ≤ q n d(T m−n x0 , x0 ) 1 − q m−n a ≤ qn 1−q qn a < , < 1−q so the sequence (xn ) is a Cauchy sequence. Because (X, d) is complete, this implies that the sequence has a limit in (X, d); define x∗ to be this limit. We now prove that 2034

x∗ is a fixed point of T . Suppose it is not, then δ := d(T x∗ , x∗ ) > 0. However, because (xn ) converges to x∗ , there is a natural number N such that d(xn , x∗ ) < δ/2 for all n ≥ N. Then d(T x∗ , x∗ ) ≤ d(T x∗ , xN +1 ) + d(x∗ , xN +1 ) ≤ qd(x∗ , xN ) + d(x∗ , xN +1 ) < δ/2 + δ/2 = δ, contradiction. So x∗ is a fixed point of T . It is also unique. Suppose there is another fixed point x0 of T ; because x0 6= x∗ , d(x0 , x∗ ) > 0. But then d(x0 , x∗ ) = d(T x0 , T x∗ ) ≤ qd(x0 , x∗ ) < d(x0 , x∗ ), contradiction. Therefore, x∗ is the unique fixed point of T . Version: 1 Owner: pbruin Author(s): pbruin

562.8

proof of Dini’s theorem

Without loss of generality we will assume that X is compact and, by replacing fn with f − fn , that the net converges monotonically to 0.

Let  > 0. For each x ∈ X, we can choose an nx , such that fnx (x) < . Since fnx is continuous, there is an open neighbourhood Ux of x, such that for each y ∈ Ux , we have fnx (y) < . The open sets Ux cover X, which is compact, so we can choose finitely many x1 , . . . , xk such that the Uxi also cover X. Then, if N > nx1 , . . . , nxk , we have fn (x) <  for each n > N and x ∈ X, since the sequence fn is monotonically decreasing. Thus, {fn } converges to 0 uniformly on X, which was to be proven. Version: 1 Owner: gabor sz Author(s): gabor sz

562.9

theorem about continuous convergence

Let (X, d) and (Y, ρ) be metric spaces, and let fn : X −→ Y be a sequence of continuous functions. Then fn converges continuously to f if and only if fn converges uniformly to f on every compact subset of X. Version: 2 Owner: gumau Author(s): gumau

562.10

ultrafilter

Let X be a set. A collection U of subsets of X is an ultrafilter if U is a filter, and whenever A ⊆ X then either A ∈ U or X r A ∈ U.

Version: 1 Owner: Evandar Author(s): Evandar 2035

562.11

ultranet

A net (xa )a∈A on a set X is said to be an ultranet if whenever E ⊆ X, (xa ) is either residually in E or residually in X r E. Version: 1 Owner: Evandar Author(s): Evandar

2036

Chapter 563 54A99 – Miscellaneous 563.1

basis

Let (X, T) be a topological space. A subset B of T is a basis for T if every member of T is a union of members of B. Equivalently, B is a basis if and only if whenever U is open and x ∈ U then there is an open set V such that x ∈ V ⊆ U.

A basis also satisfies the following:

Whenever B1 , B2 ∈ B and x ∈ B1 T B1 B2 .

T

B2 then there is B3 ∈ B such that x ∈ B3 ⊆

Conversely, any collection B of subsets of X satisfying this condition is a basis for some topology T on X. Specifically, T is the collection of all unions of elements of B. T is called the topology generated by B. Version: 5 Owner: Evandar Author(s): Evandar

563.2

box topology

Let {(Xα , Tα )}α∈A be a collection of topological spaces. Let Y denote the generalized cartesian produc of the sets Xα , that is Y Y = Xα , sometimes also written as Y = Xα .

× α∈A

α∈A

Let B denote the set of all products of open sets of the corresponding spaces, that is ( ) Y B= Uα | ∀α ∈ A : Uα ∈ Tα . α∈A

Now we can construct the product space (Y, S), where S, referred to as the box topology, is the topology generated by the base B. 2037

When A is a finite set, the box topology coincides with the product topology. Version: 2 Owner: igor Author(s): igor

563.3

closure X

The closure A of a subset A of a topological space X is the intersection of all closed sets containing A. X

Equivalently, A

consists of A together with all limit points of A in X. X

Y

Note that if Y is a subspace of X, then A may not be the same as A . For example, X Y if X = R, Y = (0, 1) and A = (0, 1), then A = [0, 1] while A = (0, 1). Many authors use the more simple A where the larger space is clear. Version: 2 Owner: Evandar Author(s): Evandar

563.4

cover

Definition ([1], pp. 49) Let Y be a subset of a set X. A cover for Y is a collection of sets U = {Ui }i∈I such that each Ui is a subset of X, and [ Y ⊂ Ui . i∈I

The collection of sets can be arbitrary, i.e., I can be finite, countable, or infinite. The cover is correspondingly called a finite cover, countable cover, or uncountable cover. A subcover of U is a subset U0 ⊂ U such that U0 is also a cover of X.

A refinement V of U is a cover of X such that for every V ∈ V there is some U ∈ U such that V ⊂ U.

If X is a topological space and the members of U are open sets, then U is said to be an open cover. Open subcovers and open refinements are defined similarly. Examples 1. If X is a set, then {X} is a cover for X.

2. A topology for a set is cover.

REFERENCES 1. J.L. Kelley, General Topology, D. van Nostrand Company, Inc., 1955.

Version: 7 Owner: matte Author(s): matte, Evandar 2038

563.5

dense X

A subset D of a topological space X is said to be dense in X iff the closure D = X. Equivalently, D is dense iff D interesects every nonempty open set. In the special case that X is a metric space with metric d, say, then this can be rephrased as: for all ε > 0 and all x ∈ X there is y ∈ D such that d(x, y) < ε. Version: 3 Owner: Evandar Author(s): Evandar

563.6

examples of filters

– If X is any set and A ⊆ X then F = {F ⊆ X : A ⊆ F } is a fixed filter on X; F is an ultrafilter iff A consists of a single point. – If X is any infinite set, then {F ⊆ X : X r F is finite } is a free filter on X, called the cofinite filter. – The filter on R generated by the filter base {(n, ∞) : n ∈ N} is called the Fr´ echet filter on R; it is a free filter which does not converge or have any accumulation points. – The filter on R generated by the filter base {(0, ε) : ε > 0} is a free filter on R which converges to 0. Version: 3 Owner: Evandar Author(s): Evandar

563.7

filter

Let X be a set. A filter on X is a set F of subsets of X such that – X ∈F

– The intersection of any two elements of F is an element of F. – ∅∈ /F

– If F ∈ F and F ⊂ G ⊂ X then G ∈ F.

The first two axioms can be replaced by one: – Any finite intersection of elements of F is an element of F. with the usual understanding that the intersection of an empty family of subsets of X is the whole set X. A filter F is said to be fixed or principal if the intersection of all elements of F is nonempty; otherwise, F is said to be free or non-principal. If x is any point (or any subset) of any topological space X, the set of neighbourhoods of x in X is a filter, called the neighbourhood filter of x. If F is any filter on the 2039

space X, F is said to converge to x, and we write F → x, if Nx ⊂ F. If every neighbourhood of x meets every set of F, then x is called an accumulation point or cluster point of F. Remarks: The notion of filter (due to H. Cartan) has a simplifying effect on various proofs in analysis and topology. Tychonoff’s theorem would be one example. Also, the two kinds of limit that one sees in elementary real analysis – the limit of a sequence at infinity, and the limit of a function at a point – are both special cases of the limit of a filter: the Fr´echet filter and the neighbourhood filter respectively. The notion of a Cauchy sequence can be extended with no difficulty to any uniform space (but not just a topological space), getting what is called a Cauchy filter; any convergent filter on a uniform space is a Cauchy filter, and if the converse holds then we say that the uniform space is complete. Version: 10 Owner: Koro Author(s): Larry Hammick, Evandar

563.8

limit point

Let X be a topological space, and let A ⊆ X. An element x ∈ X is said to be a limit X point of A if x is in the closure of A with x removed: x ∈ A \ {x} . Equivalently:

– x isTa limit point of A if and only if whenever U is open with x ∈ U then (U A) r {x} = 6 ∅.

– x is a limit point of A if and only if there is a net in A converging to x which is not residually constant.

– x is a limit point of A if and only if there is a filter on A converging to x. – If X is a metric (or first countable) space, x is a limit point of A if and only if there is a sequence in A converging to x which is not ultimately constant. Version: 5 Owner: Evandar Author(s): Evandar

563.9

nowhere dense

In a topological space X, a set A is called nowhere dense if the interior of its closure is empty: int A = ∅. Version: 2 Owner: ariels Author(s): ariels

2040

563.10

perfect set

A set is called perfect if it is equal to the set of its limit points. An non-trivial example of a perfect set is the middle-thirds Cantor set. In fact a more general class of sets is referred to as Cantor sets, which all have (among others) the property of being perfect. Version: 3 Owner: mathwizard Author(s): mathwizard

563.11

properties of the closure operator

Suppose X is a topological space, and A is a subset in X. The the closure operator in X satisfies the following properties [1, 2] : 1. ∅ = ∅, X = X, 2. A ⊂ A,

3. A = A, S 4. A = A A0 , where A0 is the derived set,

5. if B ⊂ X, then

A A 6. A is closed.

[ \

B = A B ⊂ A

[ \

B, B.

REFERENCES 1. R.E. Edwards, Functional Analysis: Theory and Applications, Dover Publications, 1995. 2. R. Abraham, J. E. Marsden, and T. Ratiu, Manifolds, Tensors, Analysis, and Applications, Second Edition. Springer-Verlag, 1988. (available online)

Version: 1 Owner: matte Author(s): matte

563.12

subbasis

Let (X, T) be a topological space. A subset A ⊆ T is said to be a subbasis if the collection B of intersections of finitely many elements of A is a basis for T. Conversely, given an arbitrary collection A of subsets of X, a topology can be formed by first taking the collection B of finite intersections of members of A and then taking the topology T generated by B as basis. T willl then be the smallest topology such that A ⊆ T. Version: 4 Owner: Evandar Author(s): Evandar 2041

Chapter 564 54B05 – Subspaces 564.1

irreducible

A subset F of a topological space X is reducible if it can be written as a union F = S F1 F2 of two closed proper subsets F1 , F2 of F (closed in the T subspace S Ttopology). That is, F is irreducible if it can be written as a union F = (G1 F ) (G2 F ) where G1 ,G2 are closed subsets of X, neither of which contains F . A subset of a topological space is irreducible if it is not reducible.

As an example, consider {(x, y) ∈ R2 : xy = 0} with the subspace topology from R2 . This space is a union of two lines {(x, y) ∈ R2 : x = 0} and {(x, y) ∈ R2 : y = 0}, which are proper closed subsets. So this space is reducible, and thus not irreducible. Version: 7 Owner: drini Author(s): drini, nerdy2

564.2

irreducible component

An irreducible component of a topological space is a maximal irreducible subset. If a subset is irreducible, its closure is, so irreducible components are closed. Version: 3 Owner: nerdy2 Author(s): nerdy2, muqabala

564.3

subspace topology

Let X be a topological space, and let Y ⊂ X be a subset. The subspace T topology on Y is the topology whose open sets are those subsets of Y which equal U Y for some open set U ⊂ X. In this context, the topological space Y obtained by taking the subspace topology is called a topological subspace, or simply subspace, of X.

Version: 3 Owner: djao Author(s): djao 2042

Chapter 565 54B10 – Product spaces 565.1

product topology

Let {(Xα , Tα )}α∈A be a collection of topological spaces, and let Y be the generalized cartesian product of the sets Xα , that is Y Y = Xα . α∈A

There are at least two ways to toplogize Y : using the box topology, or the product topology. If A is finite, these topologies coincide. If A is not finite, then the product topology is, in general, weaker than the box topology [1]. One motivation for the product topology is that it preserves compactness; if all Xα :s are compact, then Y is compact. This result is known as Tychonoff’s theorem. In effect, the product topology is also known as the Tychonoff topology. Another motivation for the product topology comes from category theory: the product topology is simply the categorical product of topological spaces. Next we define the product topology for Y . Let us first recall that an element y ∈ Y is a mapping y : A → Y such that y(α) ∈ Xα for each α ∈ A. For each α, we can then define the projection operators πα : Y → Xα by πα (y) = y(α). With this notation, the product topology T for Y can be defined in three equivalent ways [1]: 1. T is the weakest topology such that each πα is continuous. 2. T is the topology induced by the subbasis A = {πα−1 U | α ∈ A, U ∈ Tα }. 3. T is the topology induced by the basis \ B={ πα−1 Uα | F is a finite subset of A, Uα ∈ Tα for each α ∈ F }. α∈F

2043

REFERENCES 1. J. V¨ais¨al¨ a, Topologia II (in Finnish), Limes, 1999.

Version: 6 Owner: matte Author(s): matte, igor

565.2 erty

product topology preserves the Hausdorff prop-

Theorem Suppose {Xα }α∈A is a collection of Hausdorff spaces. Then the generalized cartesian produc Q α∈A Xα equipped with the product topology is a Hausdorff space. Q Proof. Let Y = α∈A Xα , and let x, y be distinct points in Y . Then there is an index β ∈ A such that x(β) and y(β) are distinct points in the Hausdorff space Xβ . It T follows that there are open sets U and V in Xβ such that x(β) ∈ U, y(β) ∈ V , and U V = ∅. Let πβ be the projection operator Y → Xβ defined here. By the definition of the product topology, πβ is continuous, so πβ−1 (U) and πβ−1 (V ) are open sets in Y . Also, since the preimage commutes with set operations, we have that \  \ V πβ−1 (U) πβ−1 (V ) = πβ−1 U = ∅.

Finally, since x(β) ∈ U, i.e., πβ (x) ∈ U, it follows that x ∈ πβ−1 (U). Similarly, y ∈ πβ−1 (V ). We have shown that U and V are open disjoint neighborhoods of x respectively y. In other words, Y is a Hausdorff space. 2 Version: 1 Owner: matte Author(s): matte

2044

Chapter 566 54B15 – Quotient spaces, decompositions 566.1

Klein bottle

Where a M¨obius strip is a two dimensional object with only one surface and one edge, a Klein bottle is a two dimensional object with a single surface, and no edges. Consider for comparison, that a sphere is a two dimensional surface with no edges, but that has two surfaces. A Klein bottle can be constructed by taking a rectangular subset of R2 and identifying opposite edges with each other, in the following fashion: Consider the rectangular subset [−1, 1]×[−1, 1]. Identify the points (x, 1) with (x, −1), and the points (1, y) with the points (−1, −y). Doing these two operations simultaneously will give you the Klein bottle. Visually, the above is accomplished by the following. Take a rectangle, and match up the arrows on the edges so that their orientation matches:

This of course is completely impossible to do physically in 3-dimensional space; to be able to properly create a Klein bottle, one would need to be able to build it in 4-dimensional space. To construct a pseudo-Klein bottle in 3-dimensional space, you would first take a cylinder and cut a hole at one point on the side. Next, bend one end of the cylinder through that hole, and attach it to the other end of the clyinder.

2045

A Klein bottle may be parametrized by the following equations: (  a cos(u) 1 + sin(u) + r cos(u) cos(v) 0 ≤ u < π  x= a cos(u) 1 + sin(u) + r cos(v + π) π < u ≤ 2π ( b sin(u) + r sin(u) cos(v) 0 ≤ u < π y= b sin(u) π < u ≤ 2π z = r sin(v)

where v ∈ [0, 2π], u ∈ [0, 2π], r = c(1 −

cos(u) ) 2

and a, b, c are chosen arbitrariliy.

Version: 7 Owner: vernondalhart Author(s): vernondalhart

566.2

M¨ obius strip

A M¨obius strip is a non-orientiable 2-dimensional surface with a 1-dimensional boundary. It can be embedded in R3 , but only has a single side. We can parameterize the M¨obius strip by, x = r · cos θ,

y = r · sin θ,

z = (r − 2) tan

θ 2

. The M¨obius strip is therefore a subset of the torus. Topologically, the M¨obius strip formed by taking a quotient space of I 2 = [0, 1] × [0, 1] ⊂ R2 . We do this by first letting M be the partition of I 2 formed by the equivalence relation: (1, x) ∼ (0, 1 − x)where0 6 x 6 1,

and every other point in I 2 is only related to itself.

By giving M the quotient topology given by the quotient map p : I 2 −→ M we obtain the M¨obius strip. Schematically we can represent this identification as follows: Diagram 1: The identifications made on I 2 to make a mobius strip. We identify two opposite sides but with different orientations.

Since the M¨obius strip is homotopy equivalent to a circle, it has Z as its fundamental group. It is not however, homeomorphic to the circle, although it’s boundary is. The famous artist M.C. Escher depicted a M¨obius strip in the following work:

Version: 8 Owner: dublisk Author(s): dublisk 2046

566.3

cell attachment

S Let X be a topological space, and let Y be the adjunction Y := X ϕ D k , where D k is a closed k-ball and ϕ : S k−1 → X is a continuous map, with S k−1 is the (k − 1)-sphere considered as the boundary of D k . Then, we say that Y is obtained from X by the attachment of a k-cell, by the attaching map◦ ϕ. The image ek of D k in Y is called a ◦ closed k-cell, and the image ek of the interior D := D k \S k−1 of D k is the corresponding open k-cell. Note that for k = 0 the above definition reduces to the statement that Y is the disjoint union of X with a one-point space. More generally, we say S that kYi is obtained from X by cell attachment if Y is homeomorphic to an adjunction X {ϕi } D , where the maps {ϕi } into X are defined on the boundary  spheres of closed balls D ki .

Version: 7 Owner: antonio Author(s): antonio

566.4

quotient space

Let X be a topological space, and let ∼ be an equivalence relation on X. Write X ∗ for the set of equivalence classes of X under ∼. The quotient topology on X ∗ is the topology whose open sets are the subsets U ⊂ X ∗ such that [ U ⊂X is an open subset of X. The space X ∗ is called the quotient space of the space X with respect to ∼. It is often written X/ ∼.

The projection map π : X −→ X ∗ which sends each element of X to its equivalence class is always a continuous map. In fact, the map π satisfies the stronger property that a subset U of X ∗ is open if and only if the subset π −1 (U) of X is open. In general, any surjective map p : X −→ Y that satisfies this stronger property is called a quotient map, and given such a quotient map, the space Y is always homeomorphic to the quotient space of X under the equivalence relation x ∼ x0 ⇔ p(x) = p(x0 ). As a set, the construction of a quotient space collapses each of the equivalence classes of ∼ to a single point. The topology on the quotient space is then chosen to be the strongest topology such that the projection map π is continuous. For A ⊂ X, one often writes X/A for the quotient space obtained by identifying all the points of A with each other. Version: 2 Owner: djao Author(s): djao

2047

566.5

torus

Visually, the torus looks like a doughnut. Informally, we take a rectangle, identify two edges to form a cylinder, and then identify the two ends of the cylinder to form the torus. Doing this gives us a surface of genus one. It can also be described as the cartesian product of two circles, that is, S 1 × S 1 . The torus can be parameterized in cartesian coordinates by: x = cos(s) · (R + r · cos(t)) y = sin(s) · (R + r · cos(t)) z = r · sin(t) with R and r constants, and s, t ∈ [0, 2π). Figure 1: A torus generated with Mathematica 4.1

To create the torus mathematically, we start with the closed subset X = [0, 1]×[0, 1] ⊆ R2 . Let X ∗ be the set with elements: {x × 0, x × 1 | 0 < x < 1} {0 × y, 1 × y | 0 < y < 1} and also the four-point set {0 × 0, 1 × 0, 0 × 1, 1 × 1}. This can be schematically represented in the following diagram. Diagram 1: The identifications made on I 2 to make a torus. opposite sides are identified with equal orientations, and the four corners are identified to one point.

Note that X ∗ is a partition of X, where we have identified opposite sides of the square together, and all four corners together. We can then form the quotient topology induced by the quotient map p : X −→ X ∗ by sending each element x ∈ X to the corresponding element of X ∗ containing x. Version: 9 Owner: dublisk Author(s): dublisk

2048

Chapter 567 54B17 – Adjunction spaces and similar constructions 567.1

adjunction space

Let X and Y be topological spaces, and Slet A be a subspace of Y . Given a continuous function f : A → X, define the space Z := X f Y to be the quotient space X q Y / ∼, where the symbol q stands for disjoint union and the equivalence relation ∼ is generated by y ∼ f (y) for all y ∈ A. Z is called an adjunction of Y to X along f (or along A, if the map f is understood). This construction has the effect of gluing the subspace A of Y to its image in X under f. Version: 4 Owner: antonio Author(s): antonio

2049

Chapter 568 54B40 – Presheaves and sheaves 568.1

direct image

If f : X → Y is a continuous map of topological spaces and F is a sheaf on X, the direct image sheaf, f∗ F on Y is defined by (f∗ F)(V ) = F(f −1 (V ))

for open sets V ⊂ Y , with the restriction maps induced from those of F.

Version: 4 Owner: nerdy2 Author(s): nerdy2

2050

Chapter 569 54B99 – Miscellaneous 569.1

cofinite and cocountable topology

The cofinite topology on a set X is defined to be the topology T where T = {A ⊆ X : A = ∅orX \ Ais finite} In other words, the closed sets in the cofinite topology are X and the finite subsets of X. Analogously, the cocountable topology on X is defined to be the topology in which the closed sets are X and the countable subsets of X. If X is finite, the cofinite topology on X is the discrete topology. Similarly if X is countable, the cocountable topology on X is the discrete topology. A set X together with the cofinite topology forms a compact topological space. Version: 11 Owner: saforres Author(s): saforres

569.2

cone

Given a topological space X, the cone on X (sometimes denoted by CX) is the quotient space X × [0, 1]/X × {0} . Note that there is a natural inclusion X ,→ CX which sends x to (x, 1). If (X, x0 ) is a based topological space, there is a similar reduced cone construction, given S by X × [0, 1]/(X × {0}) ({x0 } × [0, 1]). With this definition, the natural inclusion x 7→ (x, 1) becomes a based map, where we take (x0 , 0) to be the basepoint of the reduced cone. Version: 2 Owner: antonio Author(s): antonio

2051

569.3

join

Given two topological spaces X and Y , their join, denoted by X ? Y, is defined to be the quotient space X ? Y := X × [0, 1] × Y / ∼, where the equivalence relation ∼ is generated by (x, 0, y1 ) ∼ (x, 0, y2) for any x ∈ X, y1 , y2 ∈ Y, and (x1 , 1, y) ∼ (x2 , 1, y) for any y ∈ Y, x1 , x2 ∈ X. Intuitively, X ?Y is formed by taking the disjoint union of the two spaces and attaching a line segment joining every point in X to every point in Y. Version: 2 Owner: antonio Author(s): antonio

569.4

order topology

Let (X, 6) be a linearly ordered set. The order topology on X is defined to be the topology T generated by the subbasis consisting of open rays, that is sets of the form (x, ∞) = {y ∈ X|y > x} (−∞, x) = {y ∈ X|y < x}, for some x ∈ X.

This is equivalent to saying that T is generated by the basis of open intervals; that is, the open rays as defined above, together with sets of the form (x, y) = {z ∈ X|x < z < y}

for some x, y ∈ X.

The standard topologies on R, Q and N are the same as the order topologies on these sets.

If Y is a subset of X, then Y is a linearly ordered set under the induced order from X. Therefore, Y has an order topology S defined by this ordering, the induced order topology. Moreover, Y has a subspace topology T 0 which it inherits as a subspace of the topological space X. The subspace topology is always finer than the induced order topology, but they are not in general the same. S For example, consider the subset Y = {−1} { n1 |n ∈ N} ⊆ Q. Under the subspace topology, the singleton set {−1} is open in Y, but under the order topology on Y , any set containing −1 must contain all but finitely many members of the space. Version: 3 Owner: Evandar Author(s): Evandar

2052

569.5

suspension

569.5.1

The unreduced suspension

Given a topological space X, the suspension of X, often denoted by SX, is defined to be the quotient space X × [0, 1]/ ∼, where (x, 0) ∼ (y, 0) and (x, 1) ∼ (y, 1) for any x, y ∈ X.

Given a continuous map f : X → Y, there is a map Sf : SX → SY defined by Σf ([x, t]) := [f (x), t]. This makes S into a functor from the category of topological spaces into itself. Note that SX is homeomorphic to the join X ? S 0 , where S 0 is a discrete space with two points. The space SX is sometimes called the unreduced or free suspension of X, to distinguish it from the reduced suspension described below.

569.5.2

The reduced suspension

If (X, x0 ) is a based topological space, the reduced suspension of X, often denoted ΣX (or Σx0 X when the basepoint needs S S to be explicit), is defined to be the quotient space X × [0, 1]/(X × {0} X × {1} {x0 } × [0, 1]. Setting the basepoint of ΣX to be the equivalence class of (x0 , 0), the reduced suspension is a functor from the category of based topological spaces into itself. An important property of this functor is that it is a left adjoint to the functor Ω taking a (based) space X to the space ΩX of loops in X starting and ending at the basepoint. In other words, Maps∗ (ΣX, Y ) ' Maps∗ (X, ΩY ) naturally, where Maps∗ (X, Y ) stands for continuous maps which preserve basepoints. Version: 5 Owner: antonio Author(s): antonio

2053

Chapter 570 54C05 – Continuous maps 570.1

Inverse Function Theorem (topological spaces)

Let X and Y be topological spaces, with X compact and Y Hausdorff. Suppose f : X → Y is a continuous bijection. Then f is a homeomorphism, i.e. f −1 is continuous. Note if Y is a metric space, then it is hausdorff, and the theorem holds. Version: 4 Owner: mathcam Author(s): mathcam, vitriol

570.2

continuity of composition of functions

All functions are functions from R to R. Example 1 Let f (x) = 1 for x <= 0 and f (x) = 0 for x > 0, let h(x) = 0 when x ∈ Q and 1 when x is irrational, and let g(x) = h(f (x)). Then g(x) = 0 for all x ∈ R, so the composition of two discontinuous functions can be continuous. Example 2 If g(x) = h(f (x)) is continuous for all functions f , then h is continous. Simply put f (x) = x. Same thing for h and f . If g(x) = h(f (x)) is continuous for all functions h, then f is continous. Simply put h(x) = x. Example 3 Suppose g(x) = h(f (x)) is continuous and f is continous. Then h does not need to be continuous. For a conterexample, put h(x) = 0 for all x 6= 0, and h(0) = 1, and f (x) = 1 + |x|. Now h(f (x) = 0 is continuous, but h is not.

Example 4 Suppose g(x) = h(f (x)) is continuous and h is continous. Then f does not need to be continous. For a counterexample, put f (x) = 0 for all x 6= 0, and f (0) = 1, and h(x) = 0 for all x. Now h(f (x)) = 0 is continuous, but f is not. Version: 2 Owner: matte Author(s): matte

2054

570.3

continuous

Let X and Y be topological spaces. A function f : X −→ Y is continuous if, for every open set U ⊂ Y , the inverse image f −1 (U) is an open subset of X.

In the case where X and Y are metric spaces (e.g. Euclidean space, or the space of real numbers), a function f : X −→ Y is continuous at x if and only if for each real number  > 0, there exists a real number δ > 0 such that whenever a point z ∈ X has distance less than δ to x, the point f (z) ∈ Y has distance less than  to f (x). Continuity at a point

A related notion is that of local continuity, or continuity at a point (as opposed to the whole space X at once). When X and Y are topological spaces, we say f is continuous at a point x ∈ X if, for every open subset V ⊂ Y containing f (x), there is an open subset U ⊂ X containing x whose image f (U) is contained in V . Of course, the function f : X −→ Y is continuous in the first sense if and only if f is continuous at every point x ∈ X in the second sense (for students who haven’t seen this before, proving it is a worthwhile exercise). In the common case where X and Y are metric spaces (e.g., Euclidean spaces), a function f is continuous at a ∈ X if and only if the limit satisfies lim f (x) = f (a).

x→a

Version: 4 Owner: djao Author(s): djao

570.4

discontinuous

Definition Suppose A is an open set in R (say an interval A = (a, b), or A = R), and f : A → R is a function. Then f is discontinuous at x ∈ A, if f is not continuous at x. We know that f is continuous at x if and only if limz→x f (z) = f (x). Thus, from the properties of the one-sided limits, which we denote by f (x+) and f (x−), it follows that f is discontinuous at x if and only if f (x+) 6= f (x), or f (x−) 6= f (x). If f is discontinuous at x, we can then distinguish four types of different discontinuities as follows [1, 2]: 1. If f (x+) = f (x−), but f (x) 6= f (x±), then x is called a removable discontinuity of f . If we modify the value of f at x to f (x) = f (x±), then f will become continuous at x. Indeed, since the modified f satisfies f (x) = f (x+) = f (x−), it follows that f is continuous at x. 2. If f (x+) = f (x−), but x is not in A (so f (x) is not defined), then x is also called a removable discontinuity. If we assign f (x) = f (x±), then this modification renders f continuous at x. 2055

3. If f (x−) 6= f (x+), then f has a jump discontinuity at x Then the number f (x+) − f (x−) is then called the jump, or saltus, of f at x.

4. If either (or both) of f (x+) or f (x−) does not exist, then f has an essential discontinuity at x (or a discontinuity of the second kind).

Examples 1. Consider the function f : R → R, f (x) =



1 when x 6= 0, 0 when x = 0.

Since f (x−) = 1, f (0) = 0,

f (x+) = 1,

it follows that f has a removable discontinuity at x = 0. If we modify f (0) to f (0) = 1, then f becomes the continuous function f (x) = 1. 2. Let us consider the function defined by the formula f (x) =

sin x x

where x is real. When x = 0, the formula is undefined, so f is only determined for x 6= 0. Let us show that this point is a removable discontinuity. Indeed, it is easy to see that f is continuous for all x 6= 0, and using L’Hˆopital’s rule we have f (0+) = f (0−) = 1. Thus, if we assign f (0) = 1, then f becomes a continuous function defined for all real x. 3. The signum function sign : R → R is defined as   −1 when x < 0, 0 when x = 0, sign(x) =  1 when x > 0.

Since sign(0+) = 1, sign(0) = 0, and since sign(0−) = −1, it follows that sign has a jump discontinuity at x = 0 with jump sign(0+) − sign(0−) = 2. 2

4. The function f : R → R [2],

f (x) =



1 when x = 0, sin(1/x) when x 6= 0.

has an essential discontinuity at x = 0. A more general definition is as follows: Definition Let X, Y be topological spaces, and let f be a mapping f : X → Y . Then f is discontinuous at x ∈ X, if f is not continuous at x. 2056

Notes A jump discontinuity is also called a simple discontinuity, or a discontinuity of the first kind. An essential discontinuity is also called a discontinuity of the second kind.

REFERENCES 1. R.F. Hoskins, Generalised functions, Ellis Horwood Series: Mathematics and its applications, John Wiley & Sons, 1979. 2. P. B. Laval, http://science.kennesaw.edu/ plaval/spring2003/m4400 02/Math4400/contwork.pdf.

Version: 7 Owner: matte Author(s): matte

570.5

homeomorphism

A homeomorphism f of topological spaces is a continuous, bijective map such that f −1 is also continuous. We also say that two spaces are homeomorphic if such a map exists. Version: 5 Owner: RevBobo Author(s): RevBobo

570.6 proof of Inverse Function Theorem (topological spaces) We only have to prove that whenever A ⊂ X is an open set, then also B = (f −1 )−1 (A) = f (A) ⊂ Y is open (f is an open mapping). Equivalently it is enough to prove that B 0 = Y \ B is closed. Since f is bijective we have B 0 = Y \ B = f (X \ A)

As A0 = X \ A is closed and since X is compact A0 is compact too (this and the following are well know properties of compact spaces). Moreover being f continuous we know that also B 0 = f (A0 ) is compact. Finally since Y is Hausdorff then B 0 is closed. Version: 1 Owner: paolini Author(s): paolini

570.7 restriction of a continuous mapping is continuous Theorem Suppose X and Y are topological spaces, and suppose f : X → Y is a continuous function. For a subset A ⊂ X, the restriction of f to A (that is f |A ) is a continuous mapping f |A : A → Y , where A is given the subspace topology from X. 2057

Proof. We need to show that for any open set V ⊂ Y , we can write (f |A )−1 (V ) = T A U for some set U that is open in X. However, by the properties of the inverse image (see this page), we have for any open set V ⊂ Y , \ (f |A )−1 (V ) = A f −1 (V ). Since f : X → Y is continuous, f −1 (V ) is open in X, and our claim follows. 2

Version: 2 Owner: matte Author(s): matte

2058

Chapter 571 54C10 – Special maps on topological spaces (open, closed, perfect, etc.) 571.1

densely defined

Given a topological space X, we say that a map f : Y → X is densely defined if its domain Y is a dense subset of X. This terminology is commonly used in the theory of linear operators with he follwoing meaning: In a normed space X, a linear operator A : D(A) ⊂ X → X is said to be densely defined if D(A) is a dense vector subspace of X. Version: 3 Owner: Koro Author(s): Koro

571.2

open mapping

Let X and Y be two topological spaces. A function f : X → Y is said to be open if f (U) is open for each open subset U of X. Accordingly, if f (C) is closed for each closed subset C of X, we say that f is closed. Version: 1 Owner: Koro Author(s): Koro

2059

Chapter 572 54C15 – Retraction 572.1

retract

Let X be a topological space and Y a subspace of X. If there exists a continuous map r : X → Y such that r(y) = y for all y ∈ Y , then we say Y is a retract of X and r is a retraction. Version: 1 Owner: RevBobo Author(s): RevBobo

2060

Chapter 573 54C70 – Entropy 573.1

differential entropy

Let (X, B, µ) be a probability space, and let f ∈ Lp (X, B, µ), ||f ||p = 1 be a function. The differential entropy h(f ) is defined as h(f ) ⇔ −intX |f |p log |f |p dµ

(573.1.1) P Differential entropy is the continuous version of the Shannon entropy, H[p] = − i pi log pi . Consider first ua , the uniform 1-dimensional distribution on (0, a). The differential entropy is 1 1 h(ua ) = −inta0 log dµ = log a. a a

(573.1.2)

Next consider probability distributions such as the function g=

1 − (t−µ)2 2 e 2σ , 2πσ

(573.1.3)

the 1-dimensional Gaussian. This pdf has differential entropy h(g) = −intR g log g dt =

1 log 2πeσ 2 . 2

(573.1.4)

For a general n-dimensional gaussian Nn (µ, K) with mean vector µ and covariance matrix K, Kij = cov(xi , xj ), we have h(Nn (µ, K)) = where |K| = det K.

1 log(2πe)n |K| 2

Version: 9 Owner: mathcam Author(s): mathcam, drummond 2061

(573.1.5)

Chapter 574 54C99 – Miscellaneous 574.1

Borsuk-Ulam theorem

Call a continuous map f : S m → S n antipode preserving if f (−x) = −f (x) for all x ∈ S m. Theorem: There exists no continuous map f : S n → S n−1 which is antipode preserving for n > 0.

Some interesting consequences of this theorem have real-world applications. For example, this theorem implies that at any time there exists antipodal points on the surface of the earth which have exactly the same barometric pressure and temperature. It is also interesting to note a corollary to this theorem which states that no subset of Rn is homeomorphic to S n . Version: 3 Owner: RevBobo Author(s): RevBobo

574.2

ham sandwich theorem

Let A1 , . . . , Am be measurable bounded subsets of Rm . Then there exists an (m − 1)dimensional hyperplane which divides each Ai into two subsets of equal measure. This theorem has such a colorful name because in the case m = 3 it can be viewed as cutting a ham sandwich in half. For example, A1 and A3 could be two pieces of bread and A2 a piece of ham. According to this theorem it is possible to make one cut to simultaneously cut all three objects exactly in half. Version: 3 Owner: mathcam Author(s): mathcam, bs

574.3

proof of Borsuk-Ulam theorem

Proof of the Borsuk-Ulam theorem: I’m going to prove a stronger statement than the one given in the statement of the Borsak-Ulam theorem here, which is: 2062

Every odd (that is, antipode-preserving) map f : S n → S n has odd degree.

Proof: We go by induction on n. Consider the pair (S n , A) where A is the equatorial sphere. f defines a map f˜ : RP n → RP n . By cellular approximation, this may be assumed to take the hyperplane at infinity (the n − 1-cell of the standard cell structure on RP n ) to itself. Since whether a map lifts to a covering depends only on its homotopy class, f is homotopic to an odd map taking A to itself. We may assume that f is such a map. The map f gives us a morphism of the long exact sequences: i

j



i

i

j



i

Hn (A; Z2 ) −−−→ Hn (S n ; Z2 ) −−−→ Hn (S n , A; Z2 ) −−−→ Hn−1 (A; Z2 ) −−−→ Hn−1 (S n , A; Z2 )           f ∗y f ∗y f ∗y f ∗y f ∗y

Hn (A; Z2 ) −−−→ Hn (S n ; Z2 ) −−−→ Hn (S n , A; Z2 ) −−−→ Hn−1 (A; Z2 ) −−−→ Hn−1 (S n , A; Z2 ) Clearly, the map f |A is odd, so by the induction hypothesis, f |A has odd degree. Note that a map has odd degree if and only if f ∗ : Hn (S n ; Z2 ) → Hn (S n , Z2 ) is an isomorphism. Thus f ∗ : Hn−1 (A; Z2 ) → Hn−1 (A; Z2 ) is an isomorphism. By the commutativity of the diagram, the map f ∗ : Hn (S n , A; Z2 ) → Hn (S n , A; Z2 ) is not trivial. I claim it is an isomorphism. Hn (S n , A; Z2 ) is generated by cycles [R+ ] and [R− ] which are the fundamental classes of the upper and lower hemispheres, and the antipodal map exchanges these. Both of these map to the fundamental class of A, [A] ∈ Hn−1 (A; Z2 ). By the commutativity of the diagram, ∂(f ∗ ([R± ])) = f ∗ (∂([R± ])) = f ∗ ([A]) = [A]. Thus f ∗ ([R+ ]) = [R± ] and f ∗ ([R− ]) = [R∓ ] since f commutes with the antipodal map. Thus f ∗ is an isomorphism on Hn (S n , A; Z2 ). Since Hn (A, Z2 ) = 0, by the exactness of the sequence i : Hn (S n ; Z2 ) → Hn (S n , A; Z2 ) is injective, and so by the commutativity of the diagram (or equivalently by the 5-lemma) f ∗ : Hn (S n ; Z2 ) → Hn (S n ; Z2 ) is an isomorphism. Thus f has odd degree. The other statement of the Borsuk-Ulam theorem is: There is no odd map S n → S n−1 .

Proof: If f where such a map, consider f restricted to the equator A of S n . This is an odd map from S n−1 to S n−1 and thus has odd degree. But the map f ∗ Hn−1 (A) → Hn−1 (S n−1) factors through Hn−1 (S n ) = 0, and so must be zero. Thus f |A has degree 0, a contradiction. Version: 2 Owner: bwebste Author(s): bwebste 2063

Chapter 575 54D05 – Connected and locally connected spaces (general aspects) 575.1

Jordan curve theorem

Informally, the Jordan curve theorem states that every Jordan curve divides the Euclidean plane into an “outside” and an “inside”. The proof of this geometrically plausible result requires surprisingly heavy machinery from topology. The difficulty lies in the great generality of the statement and inherent difficulty in formalizing the exact meaning of words like “curve”, “inside”, and “outside.” There are several equivalent formulations. Theorem 14. If Γ is a simple closed curve in R2 , then R2 \ Γ has precisely two connected components. Theorem 15. If Γ is a simple closed curve in the sphere S 2 , then S 2 \ Γ consists of precisely two connected components. Theorem 16. Let h : R → R2 be a one-to-one continuous map such that |h(t)| → ∞ as |t| → ∞. Then R2 \ h(R) consists of precisely two connected components. The two connected components mentioned in each formulation are, of course, the inside and the outside the Jordan curve, although only in the first formulation is there a clear way to say what is out and what is in. There we can define “inside” to be the bounded connected component, as any picture can easily convey. Version: 4 Owner: rmilson Author(s): rmilson, NeuRet

575.2

clopen subset

A subset C of a topological space X is called clopen if it is both open and closed. 2064

Theorem 17. The clopen subsets form a Boolean algebra under the operation of union, intersection and complement. In other words: – X and ∅ are clopen,

– the complement of a clopen set is clopen, – finite unions and intersections of clopen sets are clopen.

B

oth open and closed sets have these properties.

Examples of clopen sets are the connected components of a space, so that a space is connected if and only if, its only clopen subsets are itself and the empty set. Version: 2 Owner: Dr Absentius Author(s): Dr Absentius

575.3

connected component

Two points x, y in a topological space X are said to be in the same connected component if there exists a subspace of X containing x and y which is connected. This relation is an equivalence relation, and the equivalence classes of X under this relation are called the connected components of X. Version: 2 Owner: djao Author(s): rmilson, djao

575.4

connected set

Definition [1, 2, 2, 3, 1, 2] Let E be a set in a topological space X. Then E is a connected set in X, if E is a connected space with the subspace topology of X. Properties Theorem 1 (connected sets and the closure operator) 1. If E is a connected set in a topological space X, and E ⊂ B ⊂ E, then B is connected. [2, 1] by 1) If E is a connected set in a topological space X, then E (the closure of E) is also a connected set [1, 3]. by 2) If a topological space X has a subset which is dense and connected, then X is connected [3]. Theorem 2 [1, 2] Suppose {Eα | α ∈ I} is an arbitrary collection of connected sets T that are mutually Sdisjoint. In other words, for distinct α and β, we have Eα Eβ 6= ∅. Then the union α∈I Eα is connected. 2065

REFERENCES 1. J.L. Kelley, General Topology, D. van Nostrand Company, Inc., 1955. 2. E.E. Moise, Geometric Topology in Dimensions 2 and 3, Springer-Verlag, 1977. 3. G.L. Naber, Topological methods in Euclidean spaces, Cambridge University Press, 1980. 4. G.J. Jameson, Topology and Normed Spaces, Chapman and Hall, 1974. 5. A. Mukherjea, K. Pothoven, Real and Functional analysis, Plenum press, 1978. 6. I.M. Singer, J.A.Thorpe, Lecture Notes on Elementary Topology and Geometry, Springer-Verlag, 1967.

Version: 5 Owner: drini Author(s): matte

575.5

connected set in a topological space

Let Y be a topological space and X ⊂ Y be given the subspace topology. We say T that S X is connected iff we cannot find open sets U, V ⊂ X such that U V = ∅ and U V = X. Version: 2 Owner: ack Author(s): ack, apmxi

575.6

connected space

A topological space X is said to be connected ifTthere is no pairSof nonempty subsets U, V such that both U and V are open in X, U V = ∅ and U V = X. If X is not connected, that is, there are sets U and V with the above properties, then we say that X is disconnected. Version: 8 Owner: mathcam Author(s): mathcam, RevBobo

575.7 connectedness is preserved under a continuous map Theorem [3, 2] Suppose f : X → Y is a continuous map between topological spaces X and Y . If X is a connected space, and f is surjective, then Y is a connected space. Proof of theorem. (Following [3].)SFor a contradiction, suppose there are disjoint open sets A, B in Y such that Y = A B. Then f −1 (A) and f −1 (B) are open disjoint sets in X. (See the inverse image.) Since f is surjective, we have S properties of the −1 S −1 −1 f (X) = Y = A B. Hence, X = f f (X) = f (A) f (B) contradicting the assumption that X is connected. 2

2066

REFERENCES 1. G.J. Jameson, Topology and Normed Spaces, Chapman and Hall, 1974. 2. G.L. Naber, Topological methods in Euclidean spaces, Cambridge University Press, 1980.

Version: 1 Owner: drini Author(s): matte

575.8

cut-point

Theorem Suppose X is a connected space and x is a point in X. If X \ {x} is a disconnected set in X, then x is a cut-point of X [3, 2]. Examples 1. Any point of R with the usual topology is a cut-point. 2. If X is a normed vector space with dim X > 1, then X has no cut-points [3].

REFERENCES 1. G.J. Jameson, Topology and Normed Spaces, Chapman and Hall, 1974. 2. L.E. Ward, Topology, An Outline for a First Course, Marcel Dekker, Inc., 1972.

Version: 1 Owner: mathcam Author(s): matte

575.9 example of a connected space that is not pathconnected This standard example shows that a connected topological space need not be path-connected (the converse is true, however). Consider the topological spaces X1 = {(0, y) : y ∈ [−1, 1]}  X2 = (x, sin x1 ) : x > 0 S X = X1 X2 with the topology induced from R2 . X2 is often called the “topologist’s sine curve”, and X is its closure.

2067

X is not path-connected. Indeed, assume to the contrary that there exists a path γ : [0, 1] → X with γ(0) = ( π1 , 0) and γ(1) = (0, 0). Let c = inf {t : γ(t) = (0, y) for some y} ≤ 1.

Then γ([0, c]) contains only a single point on the Y axis, while γ([0, c]) contains all of X1 . So γ([0, c]) is not compact, and γ cannot be continuous (a continuous image of a compact set is compact). But X is connected. Since both “parts” of the topologist’s sine curve are themselves connected, neither can be partitioned into two open sets. And any open set which contains points of the line segment X1 must contain points of X2 . So no partition of X into two open sets is possible – X is connected. Version: 4 Owner: ariels Author(s): ariels

575.10 example of a semilocally simply connected space which is not locally simply connected Let HR be the Hawaiian rings, and define X to be the cone over HR. Then, X is connected, locally connected, and semilocally simply connected, but not locally simply connected. Too see this, let p ∈ HR be the point to which the circles converge in HR, and represent X as HR×[0, 1]/HR×{0} . Then, every small enough neighborhood of q := (p, 1) ∈ X fails to be simply connected. However, since X is a cone, it is contractible, so all loops (in particular, loops in a neighborhood of q) can be contracted to a point within X. Version: 2 Owner: antonio Author(s): antonio

575.11 example of a space that is not semilocally simply connected An example of a space that is not semilocally simply connected is the following: Let (  2  2 ) [ 1 1 2 2 HR = (x, y) ∈ R | x − n + y = 2 2n n∈N

endowed with the subspace topology. Then (0, 0) has no simply connected neighborhood. Indeed every neighborhood of (0, 0) contains (ever diminshing) homotopically nontrivial loops. Furthermore these loops are homotopically non-trivial even when considered as loops in HR. It is essential in this example that HR is endowed with the topology induced by its inclusion in the plane. In contrast, the same set endowed with the CW topology is just a bouquet of countably many circles and (as any CW complex) it is semilocaly simply connected. Version: 6 Owner: Dr Absentius Author(s): Dr Absentius 2068

Figure 575.1: The Hawaiian rings

575.12

locally connected

A topological space X is locally connected at a point x ∈ X if every neighborhood U of x contains a connected neighborhood V of x. The space X is locally connected if it is locally connected at every point x ∈ X.

A topological space X is locally path connected at a point x ∈ X if every neighborhood U of x contains a path connected neighborhood V of x. The space X is locally path connected if it is locally path connected at every point x ∈ X.

Version: 2 Owner: djao Author(s): djao

575.13

locally simply connected

Let X be a topological space and x ∈ X. X is said to be locally simply connected at x, if every neighborhood of x contains a simply connected neighborhood of x. X is said to be locally simply connected if it is locally simply connected at every point. Version: 2 Owner: Dr Absentius Author(s): Dr Absentius

575.14

path component

Two points x, y in a topological space X are said to be in the same path component if there exists a path from x to y in X. The equivalence classes of X under this equivalence relation are called the path components of X. Version: 2 Owner: djao Author(s): djao

2069

575.15

path connected

Let I = [0, 1] ⊂ R. A topological space X is path connected if for any x, y ∈ X there is a continuous map f : I → X such that f (0) = x and f (1) = y. We call the function f a path (in X). A path connected space is always a connected space, but a connected space need not be path connected. Version: 6 Owner: RevBobo Author(s): RevBobo

575.16

products of connected spaces

Theorem [2, 1] Let (Xi )i∈I be a family of topological spaces. Then the product space Y Xi i∈I

with the product topology is connected if and only if each space Xi is connected.

REFERENCES 1. S. Lang, Analysis II, Addison-Wesley Publishing Company Inc., 1969. 2. A. Mukherjea, K. Pothoven, Real and Functional analysis, Plenum press, 1978.

Version: 1 Owner: drini Author(s): matte

575.17 nected

proof that a path connected space is con-

S Let X be a path connected topological space. Suppose that X = A B, where A and B are non empty, disjoint, open sets. Let a ∈ A, b ∈ B, and let γ : I → X denote a path from a to b. S We have I = γ −1 (A) γ −1 (B), where γ −1 (A), γ −1 (B) are non empty, open and disjoint. Since I is connected, this is a contradiction, which concludes the proof.

Version: 3 Owner: n3o Author(s): n3o

575.18

quasicomponent

Let X be a topological space. Define a relation ∼ on X as follows: S x ∼ y if there is no partition of X into disjoint open sets U and V such that U V = X, x ∈ U and y ∈ V. This is an equivalence relation on X. The equivalence classes are called the quasicomponents of X. Version: 1 Owner: Evandar Author(s): Evandar 2070

575.19

semilocally simply connected

A topological space X is semilocally simply connected if, for every point x ∈ X, there exists a neighborhood U of x such that the map of fundamental groups π1 (U, x) −→ π1 (X, x) induced by the inclusion map U ,→ X is the trivial homomorphism. A topological space X is connected, locally path connected, and semilocally simply connected if and only if it has a universal cover. Version: 3 Owner: djao Author(s): djao

2071

Chapter 576 54D10 – Lower separation axioms (T0–T3, etc.) 576.1

T0 space

A topological space (X, τ ) is said to be T0 (or said to hold the T0 axiom ) if, given x, y ∈ X, (x 6= y), there exists an open set U ∈ τ such that (x ∈ U and y ∈ / U) or (x ∈ / U and y ∈ U) An example of T0 space is the Sierpinski space, which is not T1 . Version: 8 Owner: drini Author(s): drini

576.2

T1 space

A topological space (X, τ ) is said to be T1 (or said to hold the T1 axiom) if for all distinct points x, y ∈ X (x 6= y), there exists an open set U ∈ τ such that x ∈ U and y∈ / U. A space being T1 is equivalent to the following statements: – For every x ∈ X, the set {x} is closed.

– Every subset of X is equal to the intersection of all the open sets that contain it. Version: 6 Owner: drini Author(s): drini

576.3

T2 space

A topological space (X, τ ) is said to be T2 (or said to hold the T2 axiom) if given T x, y ∈ X (x 6= y), there exist disjoint open sets U, V ∈ τ (that is, U V = ∅) such that x ∈ U and y ∈ V . 2072

A T2 space is also known as a Hausdorff space. A space being Hausdorff is equivalent to the set ∆ = {(x, y) ∈ X × X : x = y} being closed under the product topology (X × X, τ × τ ).

A very important kind of Hausdorff spaces are metric spaces. A Hausdorff topology for a set X is a topology τ such that (X, τ ) is a Hausdorff space. Version: 12 Owner: drini Author(s): drini

576.4

T3 space

A topological space is said to be T3 if it is regular and also T0 . Every T3 space is also a T2 or Hausdorff space. Version: 2 Owner: drini Author(s): drini

576.5

a compact set in a Hausdorff space is closed

Theorem. A compact set in a Hausdorff space is closed. Proof. Let A be a compact set in a Hausdorff space X. The case when A is empty is trivial, so let us assume that A is non-empty. Using this theorem, it follows that each point y in A◦ has a neighborhood Uy , which is disjoint to A. (Here, we denote the complement of A by A◦ .) We can therefore write [ A◦ = Uy . y∈A◦

Since an arbitrary union of open sets is open, it follows that A is closed. 2 Note. The above theorem can, for instance, be found in [1] (page 141), or [2] (section 2.1, Theorem 2).

REFERENCES 1. J.L. Kelley, General Topology, D. van Nostrand Company, Inc., 1955. 2. I.M. Singer, J.A.Thorpe, Lecture Notes on Elementary Topology and Geometry, Springer-Verlag, 1967.

Version: 3 Owner: mathcam Author(s): matte

2073

576.6 proof of A compact set in a Hausdorff space is closed Let X be a Hausdorff space, and C ⊆ X a compact subset. We are to show that C is closed. We will do so, by showing that the complement U = X − C is open. To prove that U is open, it suffices to demonstrate that, for each x ∈ U, there exists an open set V with x ∈ V and V ⊆ U.

Fix x ∈ U. For each y ∈ C, using the Hausdorff assumption, choose disjoint open sets Ay and By with x ∈ Ay and y ∈ By .

Since every y ∈ C is an element of By , the collection {By | y ∈ C} is an open covering of C. Since C is compact,Sthis S open cover admits a finite subcover. So choose y1 , . . . , yn ∈ C such that C ⊆ By1 · · · Byn . T T Notice Ay1 · · · Ayn , being a finite intersection of open sets, is open, and contains x. Call this neighborhood of x by the name V . All we need to do is show that V ⊆ U. S S For any point z ∈ C, we have z ∈ By1 · · · Byn , and therefore T z∈ S Byk for some k. Since Ayk and Byk are disjoint, z ∈ / Ayk , and therefore z ∈ / Ay1 · · · Ayn = V . Thus C is disjoint from V , and V is contained in U. Version: 1 Owner: jay Author(s): jay

576.7

regular

A topological space X is said to be regular if whenever C ⊆ X is closed and x ∈ X rC there are disjoint open sets U and V with x ∈ U and C ⊆ V .

Some authors also require point sets to be closed for a space to be called either regular or T3 . Version: 3 Owner: Evandar Author(s): Evandar

576.8

regular space

A topological space (X, τ ) is said to be regular if given a closed set C ⊂ X and a point x ∈ X − C, there exist disjoint open sets U, V ∈ τ such that x ∈ U and C ⊂ V .

Example.

Consider the set R with the topology σ generated by the basis β = {U = V − C : V is open with the standard topology and C is numerable}. Since Q is numerable and R open, the set of irrational numbers R − Q is open and therefore Q is closed. It can be shown that R − Q is an open set with this topology and Q is closed. 2074

Take any irrational number x. Any open set V containing all Q must contain also x, so the regular space property cannot be satisfied. Therefore, (R, σ) is not a regular space. Version: 6 Owner: drini Author(s): drini

576.9

separation axioms

The separation axioms are additional conditions which may be required to a topological space in order to ensure that some particular types of sets can be separated by open sets, thus avoiding certain pathological cases. axiom T0 T1 T2 T2 1 2 T3 T3 1 2 T4 T5

Definition given two distinct points, there is an open set containing exactly one of them; given two distinct points, there is a neighborhood of each of them which does not c given two distinct points, there are two disjoint open sets each of which contains on given two distinct points, there are two open sets, each of which contains one of the given a closed set A and a point x ∈ / A, there are two disjoint open sets U and V su given a closed set A and a point x ∈ / A, there is an Urysohn function for A and {b} given two disjoint closed sets A and B, there are two disjoint open sets U and V su given two separated sets A and B, there are two disjoint open sets U and V such th

If a topological space satisfies a Ti axiom, it is called a Ti -space. The following table shows other common names for topological spaces with these or other additional separation properties. Name Separation properties Kolmogorov space T0 Fr´echet space T1 Hausdorff space T2 Completely Hausdorff space T2 1 2 regular space T3 and T0 Tychonoff or completely regular space T3 1 and T0 2 normal space T4 and T1 Perfectly T4 space T4 and every closed set is a Gδ Perfectly normal space T1 and perfectly T4 completely normal space T5 and T1 The following implications hold strictly: (T2 and T3 ) ⇒ T2 1 2

(T3 and T4 ) ⇒ T3 1 2

T3 1 ⇒ T3 2

T5 ⇒ T4

2075

Completely normal ⇒ normal ⇒ completely regular ⇒ regular ⇒ T2 1 ⇒ T2 ⇒ T1 ⇒ T0 2

Remark. Some authors define T3 spaces in the way we defined regular spaces, and T4 spaces in the way we defined normal spaces (and vice-versa); there is no consensus on this issue. Bibliography: Counterexamples in Topology, L. A. Steen, J. A. Seebach Jr., Dover Publications Inc. (New York) Version: 11 Owner: Koro Author(s): matte, Koro

576.10 topological space is T1 if and only if every singleton is closed. Theorem. Let X be a topological space. Then X is a T1 space if and only if every singleton in X is a closed set. Proof. Suppose first that X is a T1 space. Let x be a point in X. We claim that {x} is closed, or, equivalently, that the complement {x}c is open. Since X is T1 , it follows that for any y ∈ X there exists a neighborhood Uy of y such that x ∈ / Uy . We can then write [ {x}c = Uy . y∈{x}c

Since arbitrary unions of open sets are open, the first claim follows. Next, suppose every singleton in X is closed. If a and b are distinct points in X, then {a}c is a neighborhood for b such that a ∈ / {a}c . Similarly, {b}c is a neighborhood for c a such that b ∈ / {b} . 2 The above result with proof can be found as Theorem 1 (Section 2.1) in [2].

REFERENCES 1. I.M. Singer, J.A.Thorpe, Lecture Notes on Elementary Topology and Geometry, Springer-Verlag, 1967.

Version: 6 Owner: matte Author(s): matte

2076

Chapter 577 54D15 – Higher separation axioms (completely regular, normal, perfectly or collectionwise normal, etc.) 577.1

Tietze extension theorem

Let X be a topological space. Then the following are equivalent: 1. X is normal. 2. If A is a closed subset in X, and f : A → [−1, 1] is a continuous function, then f has an continuous extension to all of X. (In other words, there is a continuous function f ∗ : X → [−1, 1] such that f and f ∗ coincide on A.) The present result can be found in [1].

REFERENCES 1. A. Mukherjea, K. Pothoven, Real and Functional analysis, Plenum press, 1978.

Version: 1 Owner: matte Author(s): matte

577.2

Tychonoff

A topological space X is said to be Tychonoff (or T3 1 or completely regular) 2 if whenever C ⊆ X is closed and x ∈ X r C then there is a continuous function f : X → [0, 1] with f (x) = 0 and f (C) ⊆ {1}.

2077

Some authors require point sets to be closed in X for X to be called either Tychonoff or T3 1 or completely regular. 2

Version: 2 Owner: Evandar Author(s): Evandar

577.3

Urysohn’s lemma

Let X be a normal topological space, and let C, D ⊆ X be disjoint closed subsets. Then there is a continuous function f : X → [0, 1] such that f (C) ⊆ {0} and f (D) ⊆ {1}. Version: 3 Owner: Evandar Author(s): Evandar

577.4

normal

Let X be a topological space. X is said to be normal (or T4 ) if whenever C, D ⊆ X are disjoint closed subsets then there are disjoint open sets U and V such C ⊆ U and D ⊆V.

Some authors also require point sets to be closed for a space to be called either normal or T4 . Version: 5 Owner: Evandar Author(s): Evandar

577.5

proof of Urysohn’s lemma

First we construct a family Up of open sets of X indexed by the rationals such that if p < q, then U¯p ⊆ Uq . These are the sets we will use to define our continuous function. T Let P = Q [0, 1]. Since P is countable, we can use induction (or recursive definition if you prefer) to define the sets Up . List the elements of P is an infinite sequence in some way; let us assume that 1 and 0 are the first two elements of this sequence. Now, define U1 = X\D (the complement of D in X). Since C is a closed set of X contained in U1 , by normality of X we can choose an open set U0 such that C ⊆ U0 and U¯0 ⊆ U1 . In general, let Pn denote the set consisting of the first n rationals in our sequence. Suppose that Up is defined for all p ∈ Pn and if p < q, then U¯p ⊆ Uq .

(577.5.1)

S Let r be the next rational number in the sequence. Consider Pn+1 = Pn {r}. It is a finite subset of [0, 1] so it inherits the usual ordering < of R. In such a set, every element (other than the smallest or largest) has an immediate predecessor and successor. We know that 0 is the smallest element and 1 the largest of Pn+1 so r cannot be either of these. Thus r has an immediate predecessor p and an immediate successor q in

2078

Pn+1 . The sets Up and Uq are already defined by the inductive hypothesis so using the normality of X, there exists an open set Ur of X such that U¯p ⊆ Ur and U¯r ⊆ Uq . We now show that (1) holds for every pair of elements in Pn+1 . If both elements are in Pn , then (1) is true by the inductive hypothesis. If one is r and the other s ∈ Pn , then if s ≤ p we have U¯s ⊆ U¯p ⊆ Ur and if s ≥ q we have

U¯r ⊆ Uq ⊆ Us .

Thus (1) holds for ever pair of elements in Pn+1 and therefore by induction, Up is defined for all p ∈ P .

We have defined Up for all rationals in [0, 1]. Extend this definition to every rational p ∈ R by defining Up = ∅ if p < 0 Up = X if p > 1. Then it is easy to check that (1) still holds. Now, given x ∈ X, define Q(x) = {p : x ∈ Up }. This set contains no number less than 0 and contains every number greater than 1 by the definition of Up for p < 0 and p > 1. Thus Q(x) is bounded below and its infimum is an element in [0, 1]. Define f (x) = inf Q(x). Finally we show that this function f we have defined satisfies the conditions of lemma. If x ∈ C, then x ∈ Up for all p ≥ 0 so Q(x) equals the set of all nonnegative rationals and f (x) = 0. If x ∈ D, then x ∈ / Up for p ≤ 1 so Q(x) equals all the rationals greater than 1 and f (x) = 1.

To show that f is continuous, we first prove two smaller results: (a) x ∈ U¯r ⇒ f (x) ≤ r

Proof. If x ∈ U¯r , then x ∈ Us for all s > r so Q(x) contains all rationals greater than r. Thus f (x) ≤ r by definition of f .

(b) x ∈ / Ur ⇒ f (x) ≥ r.

Proof. If x ∈ / Ur , then x ∈ / Us for all s < r so Q(x) contains no rational less than r. Thus f (x) ≥ r. Let x0 ∈ X and let (c, d) be an open interval of R containing f (x). We will find a neighborhood U of x0 such that f (U) ⊆ (c, d). Choose p, q ∈ Q such that c < p < f (x0 ) < q < d. Let U = Uq \U¯p . Then since f (x0 ) < q, (b) implies that x ∈ Uq and since f (x0 ) > p, (a) implies that x0 ∈ / U¯p . Hence x0 ∈ U. 2079

Finally, let x ∈ U. Then x ∈ Uq ⊆ U¯q , so f (x) ≤ q by (a). Also, x ∈ / U¯p so x ∈ / Up and f (x) ≥ p by (b). Thus f (x) ∈ [p, q] ⊆ (c, d) as desired. Therefore f is continuous and we are done. Version: 1 Owner: scanez Author(s): scanez

2080

Chapter 578 54D20 – Noncompact covering properties (paracompact, Lindel¨ of, etc.) 578.1

Lindel¨ of

A topological space X is said to be Lindel¨ of if every open cover has a countable subcover. Version: 3 Owner: Evandar Author(s): Evandar

578.2

countably compact

A topological space X is said to be countably compact if every countable open cover has a finite subcover. Countable compactness is equivalent to limit point compactness if A is T1 spaces, and is equivalent to compactness if X is a metric space. Version: 4 Owner: Evandar Author(s): Evandar

578.3

locally finite

A collection U of subsets of a topological space X is said toTbe locally finite if whenever x ∈ X there is an open set V with x ∈ V such that V U = ∅ for all but finitely many U ∈ U. Version: 2 Owner: Evandar Author(s): Evandar

2081

Chapter 579 54D30 – Compactness 579.1 Y is compact if and only if every open cover of Y has a finite subcover Theorem. Let X be a topological space and Y a subset of X. Then the following statements are equivalent. 1. Y is compact as a subset of X. 2. Every open cover of Y (with open sets in X) has a finite subcover. Proof. Suppose Y is compact,Tand {Ui }i∈I is an arbitrary open cover of Y , where Ui are open sets in X. Then {Ui Y }i∈I is a collection of open sets S in Y with T union Y . Since S Y is compact, there T S is a finite subset J ⊂ I such that Y = i∈J (Ui Y ). Now Y = ( i∈J Ui ) Y ⊂ i∈J Ui , so {Ui }i∈J is finite open cover of Y .

Conversely, suppose every open cover of Y has a finite subcover, and {Ui }i∈I is an arbitrary collection of open sets (in Y ) with union T Y . By the definition of the subspace topology, each Ui is of the form Ui = Vi Y for some open set Vi in X. Now Ui ⊂ Vi , so {Vi }i∈I is a cover of Y by open sets in X. By assumption, it has a finite subcover {Vi }i∈J . It follows that {Ui }i∈J covers Y , and Y is compact. 2 The above proof follows the proof given in [1].

REFERENCES 1. B.Ikenaga, Notes on Topology, August 16, 2000, http://www.millersv.edu/ bikenaga/topology/topnote.html.

Version: 3 Owner: mathcam Author(s): matte

2082

available

online

579.2

Heine-Borel theorem

A subset A of Rn is compact if and only if A is closed and bounded. Version: 4 Owner: Evandar Author(s): Evandar

579.3

Tychonoff ’s theorem

Let (Xi )i∈I be a family of nonempty topological spaces. product topology) Y Xi

The product space (see

i∈I

is compact if and only if each of the spaces Xi is compact.

Not surprisingly, if I is infinite, the proof requires the axiom of choice. Conversely, one can show that Tychonoff’s theorem implies that any product of nonempty sets is nonempty, which is one form of the Axiom of Choice. Version: 8 Owner: matte Author(s): matte, Larry Hammick, Evandar

579.4 a space is compact if and only if the space has the finite intersection property Theorem. A topological space X is compact if and only if X has the finite intersection property. The above theorem is essentially the definition of a compact space rewritten using de Morgan’s laws. The usual definition of a compact space is based on open sets and unions. The above characterization, on the other hand, is written using closed sets and intersections. Proof. Suppose X is compact, i.e., any collection of open subsets that cover X has a finite collection that also cover X. Further, suppose {Fi }i∈I is an arbitrary collection of T closed subsets with the finite intersection property. We claim that i∈I Fi is non-empty. T Suppose otherwise, i.e., suppose i∈I Fi = ∅. Then, \ X = ( Fi )c i∈I

=

[

Fic .

i∈I

(Here, the complement of a set A in X is written as Ac .) Since each Fi is closed, the collection {Fic }i∈I is an S open cover for X. By compactness, there T T is a finite subset c c J ⊂ I such that X = i∈J Fi . But then X = ( i∈J Fi ) , so i∈J Fi = ∅, which contradicts the finite intersection property of {Fi }i∈I . 2083

The proof in the other direction is analogous. Suppose X has the finite intersection property. To prove that X is compact, let {Fi }i∈I be a collection of open sets in X that cover X. We claim that this collection contains a finite subcollection of sets that S also cover X. The proof is by contradiction. Suppose that X 6= i∈J Fi holds for all finite J ⊂ I. Let us first show that the collection of closed subsets {Fic }i∈I has the finite intersection property. If J is a finite subset of I, then  [ c \ c Ui = Ui 6= ∅, i∈J

i∈J

where the last assertion follows since J was finite. Then, since X has the finite intersection property,  [ c \ Ui . ∅ = 6 Fi = i∈I

i∈I

This contradicts the assumption that {Fi }i∈I is a cover for X. 2

REFERENCES 1. B.Ikenaga, Notes on Topology, August 16, 2000, http://www.millersv.edu/ bikenaga/topology/topnote.html.

available

online

Version: 8 Owner: mathcam Author(s): matte

579.5

closed set in a compact space is compact

Proof. Let A be a closed set in a compact space X. To show that A is compact, we show that an arbitrary open cover has a finite subcover. For this purpose, suppose {Ui }i∈I be an arbitrary open cover for A. Since A is closed, the complement of A, which we denote by Ac , is open. Hence Ac and {Ui }i∈I together form an open cover for X. Since X is compact, this cover has a finite subcover that covers X. Let D be this subcover. Either Ac is part of D or Ac is not. In any case, D\{Ac } is a finite open cover for A, and D\{Ac } is a subcover of {Ui }i∈I . The claim follows. 2 Version: 6 Owner: mathcam Author(s): matte

579.6

closed subsets of a compact set are compact

Theorem [1, 2, 3, 2] Suppose X is a topological space. If K is a compact subset of X, and C is a closed set in X, and C ⊂ K, then C is compact set in X.

In general, the converse of the above theorem does not hold. Indeed, suppose X is a set with the indiscrete topology, that is, only X and the empty set are open sets. Then any non-empty set A with A 6= X is compact, but not closed[1]. However, if 2084

we assume that X is a Hausdorff space, then any compact set is also closed. For the details, see this entry. The below proof follows e.g. [3]. An alternative proof, based on the finite intersection property is given in [2]. Proof. Suppose F = {Vα | α ∈ I} is an arbitrary open cover for C. Since X \ C is open, it follows that F together with X \ C is an open cover for K. Thus K can be covered by a finite number of sets, say, V1 , . . . , VN from F together with possibly X \ C. Since C ⊂ K, it follows that V1 , . . . , VN cover C, whence C is compact. 2

REFERENCES 1. 2. 3. 4.

J.L. Kelley, General Topology, D. van Nostrand Company, Inc., 1955. S. Lang, Analysis II, Addison-Wesley Publishing Company Inc., 1969. G.J. Jameson, Topology and Normed Spaces, Chapman and Hall, 1974. I.M. Singer, J.A. Thorpe, Lecture Notes on Elementary Topology and Geometry, Springer-Verlag, 1967.

Version: 4 Owner: bbukh Author(s): bbukh, matte

579.7

compact

A topological space X is compact if, for every collection {Ui }i∈I of open sets in X whose union is X, there exists a finite subcollection {Uij }nj=1 whose union is also X.

A subset Y of a topological space X is said to be compact if Y with its subspace topology is a compact topological space. Note: Some authors require that a compact topological space be Hausdorff as well, and use the term quasi-compact to refer to a non-Hausdorff compact space. Version: 5 Owner: djao Author(s): djao

579.8 map

compactness is preserved under a continuous

Theorem [2, 1] Suppose f : X → Y is a continuous map between topological spaces X and Y . If X is a compact space, and f is surjective, then Y is a compact space. Corollary [2, 3] Suppose X, Y are topological spaces, and f : X → Y is a continuous map. If X is a compact space, then f (X) is compact in Y . Proof of theorem. (Following [2].) Suppose {Vα | α ∈ I} is an arbitrary open cover for f (X). Since f is continuous, it follows that {f −1 (Vα ) | α ∈ I} 2085

is a collection of open sets in X. Since A ⊂ f −1 f (A) for any A ⊂ X, and since the inverse commutes with unions (see this page), we have X ⊂ f −1 f (X) [  = f −1 (Vα ) =

α∈I

[

f

−1

(Vα ).

α∈I

Thus {f −1 (Vα ) | α ∈ I} is an open cover for X. Since X is compact, there exists a finite subset J ⊂ I such that {f −1(Vα ) | α ∈ J} is a finite open cover for X. Since f is a surjection, we have f f −1(A) = A for any A ⊂ Y (see this page). Thus [  f (X) = f f −1 (Vα ) i∈J

= f f −1 =

[

[

f −1 (Vα )

i∈J

Vα .

i∈J

Thus {Vα | α ∈ J} is an open cover for f (X), and f (X) is compact. 2

REFERENCES 1. I.M. Singer, J.A.Thorpe, Lecture Notes on Elementary Topology and Geometry, Springer-Verlag, 1967. 2. J.L. Kelley, General Topology, D. van Nostrand Company, Inc., 1955. 3. G.J. Jameson, Topology and Normed Spaces, Chapman and Hall, 1974.

Version: 3 Owner: matte Author(s): matte

579.9

examples of compact spaces

Here are some examples of compact spaces: – The unit interval [0,1] is compact. This follows from the Heine-Borel theorem. Proving that theorem is about as hard as proving directly that [0,1] is compact. The half-open interval (0,1] is not compact: the open cover (1/n, 1] for n = 1, 2, ... does not have a finite subcover. – Again from the Heine-Borel Theorem, we see that the closed unit ball of any finite-dimensional normed vector space is compact. This is not true for infinite dimensions; in fact, a normed vector space is finite-dimensional if and only if its closed unit ball is compact. – Any finite topological space is compact. 2086

– Consider the set 2N of all infinite sequences with entries in {0, 1}. We can turn it into a metric space by defining d((xn ), (yn )) = 1/k, where k is the smallest index such that xk 6= yk (if there is no such index, then the two sequences are the same, and we define their distance to be zero). Then 2N is a compact space, a consequence of Tychonoff’s theorem. In fact, 2N is homeomorphic to the Cantor set (which is compact by Heine-Borel). This construction can be performed for any finite set, not just {0,1}.

– Consider the set K of all functions f : R → [0, 1] and defined a topology on K so that a sequence (fn ) in K converges towards f ∈ K if and only if (fn (x)) converges towards f (x) for all x ∈ R. (There is only one such topology; it is called the topology of pointwise convergence). Then K is a compact topological space, again a consequence of Tychonoff’s theorem. – Take any set X, and define the cofinite topology on X by declaring a subset of X to be open if and only if it is empty or its complement is finite. Then X is a compact topological space. – The prime spectrum of any commutative ring with the Zariski topology is a compact space important in algebraic geometry. These prime spectra are almost never Hausdorff spaces. – If H is a Hilbert space and A : H → H is a continuous linear operator, then the spectrum of A is a compact subset of C. If H is infinite-dimensional, then any compact subset of C arises in this manner from some continuous linear operator A on H. – If A is a complex C*-algebra which is commutative and contains a one, then the set X of all non-zero algebra homomorphisms φ : A → C carries a natural topology (the weak-* topology) which turns it into a compact Hausdorff space. A is isomorphic to the C*-algebra of continuous complex-valued functions on X with the supremum norm. – Any profinite group is compact Hausdorff: finite discrete spaces are compact Hausdorff, therefore their product is compact Hausdorff, and a profinite group is a closed subset of such a product. – Any locally compact Hausdorff space can be turned into a compact space by adding a single point to it (Alexandroff one-point compactification). The onepoint compactification of R is homeomorphic to the circle S 1 ; the one-point compactification of R2 is homeomorphic to the sphere S 2 . Using the one-point compactification, one can also easily construct compact spaces which are not Hausdorff, by starting with a non-Hausdorff space. – Other non-Hausdorff compact spaces are given by the left order topology (or right order topology) on bounded totally ordered sets. Version: 13 Owner: alinabi Author(s): AxelBoldt, Larry Hammick, yark, iwnbap

2087

579.10

finite intersection property

Definition Let X be set, and let A = {Ai }i∈I be a collection of subsets in X. Then A T has the finite intersection property, if for any finite J ⊂ I, the intersection i∈J Ai is non-empty.

A topological space X has the finite intersection property if the following implication holds: If {Ai }i∈I is a collection T of closed subsets in X with the finite intersection property, then the intersection i∈I Ai is non-empty. The finite intersection property is usually abbreviated by f.i.p. Examples. 1. In N = {1, 2, . . .}, the subsets Ai = {i, i + 1, T. . .} with i ∈ N form a collection with the finite intersection property. However, i∈N Ai = ∅.

2. For a topological space space X, the finite intersection property is equivalent to X being compact. Version: 1 Owner: matte Author(s): matte

579.11

limit point compact

A topological space X is said to be limit point compact if every infinite subset of X has a limit point. Limit point compactness is equivalent to countable compactness if X is T1 and is equivalent to compactness if X is a metric space. Version: 3 Owner: Evandar Author(s): Evandar

579.12 point and a compact set in a Hausdorff space have disjoint open neighborhoods. Theorem Let X be a Hausdorff space, let A be a compact non-empty set in X, and let y a point in the complement of A. Then there exist disjoint open sets U and V in X such that A ⊂ U and y ∈ V .

Proof. First we use the fact that X is a Hausdorff space. Thus, for all x ∈ A there exist disjoint open sets Ux and Vx such that x ∈ Ux and y ∈ Vx . Then {Ux }x∈A is an open cover for A. Using this characterization of compactness, it follows that there exist a finite set A0 ⊂ A such that {Ux }x∈A0 is a finite open cover for A. Let us define [ \ U= Ux , V = Vx . x∈A0

x∈A0

2088

Next we show that these sets satisfy the given conditions for U and V . First, it is clear that U and V are open. We also have that A ⊂ U and y ∈ V . To see that U and V are disjoint, suppose z ∈ U. Then z ∈ Ux for some x ∈ A0 . Since Ux and Vx are disjoint, z can not be in Vx , and consequently z can not be in V . 2 The above result and proof follows [1] (Chapter 5, Theorem 7) or [2] (page 27).

REFERENCES 1. J.L. Kelley, General Topology, D. van Nostrand Company, Inc., 1955. 2. I.M. Singer, J.A.Thorpe, Lecture Notes on Elementary Topology and Geometry, Springer-Verlag, 1967.

Version: 8 Owner: drini Author(s): matte

579.13

proof of Heine-Borel theorem

We shall prove the result for R, and then generalize to Rn .

Proof for R: Note that the result is trivial if A = ∅, so we may assume A is nonempty. First we’ll assume that A is compact, and then show it is closed and bounded. We first show that A must be bounded. Let C = {B(α, 1) : α ∈ A}, where B(α, 1) = (α − 1, α + 1). Since A is compact there exists a finite subcover C 0 . Since C 0 is finite, the set S = {α ∈ A : B(α, 1) ∈ C 0 } is finite. Let α1 = min(S) and α2 = max(S). Then A ⊂ (α1 − 1, α2 + 1), and so A is bounded. Next, we show A must be closed. Suppose it is not; i.e. suppose there exists an accumulation point b with b 6∈ A. Since b is an accumulation point of A, for any n ∈ N there exists a point an ∈ A such that |an − b| < n1 . S Define Un = (−∞, b − n1 ) (b + n1 , ∞), and let C = {Un | n ∈ N}. Note that each set Un is the union of two open intervals and is therefore open. Note also that C covers 1 A, since for any a ∈ A one may simply choose n > |a−b| and see that a ∈ Un . Thus C is an open cover of A. Since A is compact, C has a finite subcover C 0 . Let N = max{n ∈ N : Un ∈ C 0 }. Note that since the point aN +1 satisfies |aN +1 − b| < N 1+1 , then aN +1 ∈ (b − N 1+1 , b + N 1+1 ) and so aN +1 is covered by C but not by C 0 . But this is a contradiction, since aN +1 ∈ A. Thus A is closed.

2089

Now we’ll prove the other direction and show that a closed and bounded set must be compact. Let A be closed and bounded, and let C be an open cover for A. Let a = inf(A) and b = sup(A); note these are well-defined because A is nonempty and bounded. Also note that a, b ∈ A since a and b are accumulation points of A. Define B as \ B = {x : x ∈ [a, b] | a finite subcollection of C covers [a, x] A} clearly B is nonempty since a ∈ B. Define c = sup(B). Since B ⊂ [a, b], then c = sup(B) 6 sup([a, b]) = b.

Suppose that c < b. First assume c 6∈ A. Since A is closed, R\A is open, so there is a neighbourhood N ⊂ A of c contained in [a, b]. But this contradicts the fact that c = sup(B), so we must have c ∈ A. Hence c ∈ U for some open set U ∈ C. Pick w, z ∈ U such that w < c < z. Then, by T 0 definition of B and c, there is a finite subcollection C of C which T S covers [a, w] TA, but 0 no finite subcollection which covers [a, z] A. However, C {U} covers [a, z] A, so z ∈ B, which is a contradiction. So we must have c = b.

Note this doesn’t immediately give us the desired result, for we don’t know that b ∈ B. Let U0 be the member of the open cover C which contains b. Since U0 is open, we may find a neighbourhood (b − , b + ) ⊂ U0 . Since b −  < sup(A) = c, there exists d ∈ B such that Then there is a finite subcollection C 00 of C which covers T b −  < d 6 sup(A). S 00 [a, d] A, and then C {U0 } forms a finite subcollection of C which covers A.

Generalization to Rn :

Finally, we generalize the proof to Rn : lemma: Let A ⊂ Rn . Define the projection map πi : Rn → R by πi (a1 , . . . , an ) = ai . The following are true: 1. πi is continuous for i = 1, . . . , n. 2. A ⊂

Tn

i=1

πi −1 (πi (A)).

3. A is closed and bounded if and only if πi (A) is open (closed) for each i = 1, . . . , n. 4. A is compact if and only if πi (A) is compact for each i = 1, . . . , n.

T

he proof has several steps.

2090

1. To see that πi is continuous, take any x, y ∈ A and pick  > 0. Write x = (x1 , . . . , xn ) and y = (y1 , . . . , yn ). Then 2

kx − yk =

n X j=1

|xj − yj |2 > |xi − yi |2 = |πi (x) − πi (y)|2

so |πi (x) − πi (y)| 6 kx − yk and hence kx − yk <  implies |πi (x) − πi (y)| < . 2. Let a = (a1 , . . . , an ) ∈ A. Then = ai ∈ πi (A). Hence a ∈ πi −1 (πi (A)) for Tn πi (a) each i = 1, . . . , n, and so a ∈ i=1 πi −1 (πi (A)).

3. Suppose A is closed and bounded. Then there exist (m1 , . . . , mn ) ∈ Rn and (M1 , . . . , Mn ) ∈ Rn such that A ⊂ [m1 , M1 ]×. . .×[mn , Mn ]. Hence, if a1 , . . . , an ∈ A, then πi (a1 , . . . , an ) = ai ∈ [mi , Mi ], and so πi (A) is bounded for i = 1, . . . , n. Conversely, assume πi (A) is closed and bounded for each i. Then for each i = 1, . . . , n there exist mi , Mi ∈ R such that πi (A) ⊂ [mi , Mi ]. Then if (a1 , . . . , an ) ∈ A we have πi (a1 , . . . , an ) = ai ∈ [mi , Mi ], so (a1 , . . . , an ) ∈ [m1 , M1 ] × . . . × [mn , Mn ]. Hence A is bounded. To see that A is closed, take a convergent sequence ∞ {xj }∞ j=1 ∈ A. Then {πi (xj )}j=1 is a convergent sequence in πi (A), and since πi (A) is closed, limj→∞ πi (xj ) ∈ π(A). Thus there exists ` = (`1 , . . . , `n ) ∈ A such that limj→∞ πi (xj ) = `i for each i, and since each πi is continuous, lim xj = `. So A is closed.

4. If A is compact, then since the continuous image of a compact set is compact, we see that πi (A) is compact for each i. Thus A is compact if and only if πi (A) is compact for each i, which by our previous result is true if and only if each πi (A) is closed and bounded for each i. This, in turn, is true if and only if A is closed and bounded. Version: 17 Owner: saforres Author(s): saforres

579.14

properties of compact spaces

We list the following commonly quoted properties of compact topological spaces. – A closed subset of a compact space is compact – A compact subspace of a Hausdorff space is closed – The continuous image of a compact space is compact – Compact is equivalent to sequentially compact in the metric topology Version: 1 Owner: vitriol Author(s): vitriol 2091

579.15

relatively compact

Definition [1, 2] A set K in a topological space X is relatively compact if the closure of K is a compact set in X. For metric spaces, we have the following theorem due to Hausdorff [1]. Theorem Suppose K is a set in a complete metric space X. Then K relatively compact if and only if for any ε > 0 there is a finite ε-net for K.

REFERENCES 1. R. Cristescu, Topological vector spaces, Noordhoff International Publishing, 1977. 2. E. Kreyszig, Introductory Functional Analysis With Applications, John Wiley & Sons, 1978.

Version: 2 Owner: matte Author(s): matte

579.16

sequentially compact

A topological space X is sequentially compact if every sequence in X has a convergent subsequence. When X is a metric space, the following are equivalent: – X is sequentially compact. – X is limit point compact. – X is compact. – X is totally bounded and complete. Version: 2 Owner: mps Author(s): mps

579.17 two disjoint compact sets in a Hausdorff space have disjoint open neighborhoods. Theorem. Let A and B be two disjoint compact sets in a Hausdorff space X. Then there exist open sets U and V such that A ⊂ U, B ⊂ V , and U and V are disjoint.

Proof. Let us start by covering the trivial cases. First, if A = B = ∅, we can set U = A and V = B. Second, if either of A or B, say A, is empty and B is nonempty, we can set U = ∅ and V = X. Let us then assume that A and B are both non-empty. By this theorem, it follows that for each a ∈ A, there exist disjoint open sets Ua and Va such that a ∈ Ua and B ⊂ Va . Then {Ua }a∈A is an open cover for 2092

A. Using this characterization of compactness, it follows that there exist a finite set A0 ⊂ A such that {Aa }a∈A0 is a finite open cover for A. Let us define \ [ Va . Ua , V = U= a∈A0

a∈A0

We next show that these sets satisfy the given conditions for U and V . First, it is clear that U and V are open. We also have that A ⊂ U and B ⊂ V . To see that U and V are disjoint, suppose z ∈ U. Then z ∈ Ub for some b ∈ B0 . Since Ub and Vb are disjoint, z can not be in Vb , and consequently z can not be in V . 2 Note The above result can, for instance, be found in [1] (page 141) or [2] (Section 2.1, Theorem 3).

REFERENCES 1. J.L. Kelley, General Topology, D. van Nostrand Company, Inc., 1955. 2. I.M. Singer, J.A.Thorpe, Lecture Notes on Elementary Topology and Geometry, Springer-Verlag, 1967.

Version: 2 Owner: matte Author(s): matte

2093

Chapter 580 54D35 – Extensions of spaces (compactifications, supercompactifications, completions, etc.) 580.1

Alexandrov one-point compactification

The Alexandrov one-point compactification of a non-compact topological space X is S obtained by adjoining a new point ∞ and defining the topology on X {∞} to consist S of the open sets of X together with the sets of the form U {∞}, where U is an open subset of X with compact complement. S With this topology, X {∞} is always compact. Furthermore, it is Hausdorff if and only if X is Hausdorff and locally compact. Version: 3 Owner: yark Author(s): yark

580.2

compactification

Let X be a topological space. A (Hausdorff) compactification of X is a pair (K, h) where K is a Hausdorff topological space and h : X → K is a continuous function such that – K is compact – h is a homeomorphism between X and h(X) K

K

– h(X) = K where A

denotes closure in K for any subset A of K

h is often considered to be the inclusion map, so that X ⊆ K with X Version: 4 Owner: Evandar Author(s): Evandar 2094

K

= K.

Chapter 581 54D45 – Local compactness, σ-compactness 581.1

σ-compact

A topological space is σ-compact if it is a countable union of compact sets. Version: 1 Owner: Koro Author(s): Koro

581.2 examples of locally compact and not locally compact spaces Examples of locally compact spaces include: – The Euclidean spaces Rn with the standard topology: their local compactness follows from the Heine-Borel theorem. The complex plane C carries the same topology as R2 and is therefore also locally compact. – All topological manifolds are locally compact since locally they look like Euclidean space. – Any closed or open subset of a locally compact space is locally compact. In fact, a subset of a locally compact Hausdorff space X is locally compact if and only if it is the difference of two closed subsets of X (equivalently: the intersection of an open and a closed subset of X). – The space of p-adic rationals is homeomorphic to the Cantor set minus one point, and since the Cantor set is compact as a closed bounded subset of R, we see that the p-adic rationals are locally compact. – Any discrete space is locally compact, since the singletons can serve as compact neighborhoods. – The long line is a locally compact topological space. 2095

– If you take any unbounded totally ordered set and equip it with the left order topology (or right order topology), you get a locally compact space. This space, unlike all the others we have looked at, is not Hausdorff. Examples of spaces which are not locally compact include: – The rational numbers Q with the standard topology inherited from R: the only compact subsets of Q are the finite sets, but they aren’t neighborhoods. – All infinite-dimensional normed vector spaces: a normed vector space is finite-dimensional if and only if its closed unit ball is compact. S – The subset X = {(0, 0)} {(x, y) | x > 0} of R2 : no compact subset of X contains a neighborhood of (0, 0). Version: 13 Owner: AxelBoldt Author(s): AxelBoldt

581.3

locally compact

A topological space X is locally compact at a point x ∈ X if there exists a compact set K which contains a nonempty neighborhood U of x. The space X is locally compact if it is locally compact at every point x ∈ X.

Note that local compactness at x does not require that x have a neighborhood which is actually compact, since compact open sets are fairly rare and the more relaxed condition turns out to be more useful in practice. Version: 1 Owner: djao Author(s): djao

2096

Chapter 582 54D65 – Separability 582.1

separable

A topological space is said to be separable if it has a countable dense subset. Version: 4 Owner: Evandar Author(s): Evandar

2097

Chapter 583 54D70 – Base properties 583.1

second countable

A topological space is said to be second countable if it has a countable basis. It can be shown that a second countable space is both Lindel¨of and separable, although the converses fail. Version: 9 Owner: Evandar Author(s): Evandar

2098

Chapter 584 54D99 – Miscellaneous 584.1

Lindel¨ of theorem

If a topological space (X, τ ) satisfies the second axiom of countability, and if A is any subset of X. Then any open cover for A has a countable subcover. In particular, we have that (X, τ ) is a Lindel¨of space. Version: 2 Owner: drini Author(s): drini

584.2

first countable

Let X be a topological space and let x ∈ X. X is said to be first countableat x if there is a sequence (Bn )n∈N of open sets such that whenever U is an open set containing x, there is n ∈ N such that x ∈ Bn ⊆ U.

The space X is said to be first countable if for every x ∈ X, X is first countable at x. Version: 1 Owner: Evandar Author(s): Evandar

584.3

proof of Lindel¨ of theorem

Let X be a second countable topological space, A ⊆ X any Tsubset and U an open cover of A. Let B be a countable basis for X; then B0 = {B A : B ∈ B} is a countable basis of the subspace topology on A. Then for each a ∈ A there is some Ua ∈ U with a ∈ Ua , and so there is Ba ∈ B0 such that a ∈ Ba ⊆ Ua .

Then {Ba ∈ B0 : a ∈ A} ⊆ B is a countable open cover of A. For each Ba , choose UBa ∈ U such that Ba ⊆ UBa . Then {UBa : a ∈ A} is a countable subcover of A from U. Version: 2 Owner: Evandar Author(s): Evandar 2099

584.4

totally disconnected space

A topological space X is called totally disconnected if all connected components of X are singletons. Examples of totally disconnected spaces include: – Every discrete space, – Cantor set, – the set of rational numbers Q endowed with the subspace topology induced by Q ,→ R and, – the set of irrational numbers R \ Q again with the subspace topology. Version: 4 Owner: Dr Absentius Author(s): Dr Absentius

2100

Chapter 585 54E15 – Uniform structures and generalizations 585.1

topology induced by uniform structure

Let U be a uniform structure on a set X. We define a subset A to be open if and only if for each x ∈ A there exist an entourage U ∈ U such that whenever (x, y) ∈ U, then y ∈ A. Let’s verify that this defines a topology on X.

Clearly, T the subsets ∅ and X are open. If A and B are two open set, then for each x ∈ A B, there exist an entourage U such that, whenever (x, y) ∈ U, then y ∈ A, and anTentourage V such that, whenever (x, y) ∈ T V , then y ∈ B. T T Consider the entourage U V : whenever (x, y) ∈ U V , then y ∈ A B, hence A B is open. S Suppose F is an arbitrary family of open subsets. For each x ∈ F, there exists A ∈ F such that x ∈ A. Let U be the entourage whose existence is granted by theSdefinition of open set. We have that whenever (x, y) ∈ U, then y ∈ A; hence y ∈ F, which concludes the proof. Version: 2 Owner: n3o Author(s): n3o

585.2

uniform space

A uniform structure (or uniformity) on a set X is a non empty set U of subsets of X × X which satisfies the following axioms: 1. Every subset of X × X which contains a set of U belongs to U

2. Every finite intersection of sets of U belongs to U

3. Every set is the graph of a reflexive relation (i.e. contains the diagonal) 4. If V belongs to U, then V 0 = {(y, x) : (x, y) ∈ V } belongs to U. 2101

5. If V belongs to U, then exists V 0 in U such that, whenever (x, y), (y, z) ∈ V 0 , then (x, z) ∈ V . The sets of U are called entourages. The set X, together with the uniform structure U, is called a uniform space.

Every uniform space can be considered a topological space with a natural topology induced by uniform The uniformity, however, provides in general a richer structure, which formalize the concept of relative closeness: in a uniform space we can say that x is close to y as z is to w, which makes no sense in a topological space. It follows that uniform spaces are the most natural environment for uniformly continuous functions and Cauchy sequences, in which these concepts are naturally envolved. Examples of uniform spaces are metric spaces and topological groups. Version: 4 Owner: n3o Author(s): n3o

585.3

uniform structure of a metric space

Let (X, d) be a metric space. There is a natural uniform structure on X, which induces the same topology as the metric. We define a subset V of the cartesian product X × X to be an entourage if and only if it contains a subset of the form Vε = {(x, y) ∈ X × X : d(x, y) < ε} for some ε. Version: 2 Owner: n3o Author(s): n3o

585.4

uniform structure of a topological group

Let G be a topological group. There is a natural uniform structure on G which induces its topology. We define a subset V of the cartesian product G × G to be open if and only if it contains a subset of the form VN = {(x, y) ∈ G × G : xy −1 ∈ N} for some N neighborhood of the identity element. This is called the right uniformity of the topological group, with which right multiplication becomes a uniformly continuous map. The left uniformity is defined in a similar fashion, but in general they don’t coincide, although they both induce the same topology on G. Version: 2 Owner: n3o Author(s): n3o

2102

585.5

ε-net

Definition Suppose X is a metric space with a metric d, and suppose S is a subset of X. Let ε be a positive real number. A subset N ⊂ S is an ε-net for S if, for all x ∈ S, there is an y ∈ N, such that d(x, y) < ε.

For any ε > 0 and S ⊂ X, the set S is trivially an ε-net for itself.

Theorem Suppose X is a metric space with a metric d, and suppose S is a subset of X. Let ε be a positive real number. Then N is an ε-net for S, if and only if {Bε (y) | y ∈ N} is a cover for S. (Here Bε (x) is the open ball with center x and radius ε.) Proof. Suppose N is an ε-net for S. If x ∈ S, there is an y ∈ N such that x ∈ Bε (y). Thus, x is covered by some set in {Bε (x) | x ∈ N}. Conversely, suppose {Bε (y) | y ∈ N} is a cover for S, and suppose x ∈ S. By assumption, there is an y ∈ N, such that x ∈ Bε (y). Hence d(x, y) < ε with y ∈ N. 2 Example In X = R2 with the usual Cartesian metric, the set N = {(a, b) | a, b ∈ Z} √ is an ε-net for X assuming that ε > 2/2. 2 The above definition and example can be found in [1], page 64-65.

REFERENCES 1. G. Bachman, L. Narici, Functional analysis, Academic Press, 1966.

Version: 1 Owner: Koro Author(s): matte

585.6

Euclidean distance

If u = (x1 , y1 ) and v = (x2 , y2 ) are two points on the plane, their Euclidean distance is given by p (x1 − x2 )2 + (y1 − y2 )2 . (585.6.1) Geometrically, it’s the length of the segment joining u and v, and also the norm of the difference vector (considering Rn as vector space).

This distance induces a metric (and therefore a topology) on R2 , called Euclidean metric (on R2 ) or standard metric (on R2 ). The topology so induced is called standard topology and one basis can be obtained considering the set of all the open balls.

2103

If a = (x1 , x2 , . . . , xn ) and b = (y1 , y2 , . . . , yn ), then formula 584.6.1 can be generalized to Rn by defining the Euclidean distance from a to b as p d(a, b) = (x1 − y1 )2 + (x2 − y2 )2 + · · · + (xn − yn )2 . (585.6.2) Notice that this distance coincides with absolute value when n = 1. Euclidean distance on Rn is also a metric (Euclidean or standard metric), and therefore we can give Rn a topology, which is called the standard topology of Rn . The resulting (topological and vectorial) space is known as Euclidean space. Version: 7 Owner: drini Author(s): drini

585.7

Hausdorff metric

Let (X, d) be a metric space. We denote the “distance” from a point x to a set A by d(x, A) = inf{d(x, y) : y ∈ A}; The Hausdorff metric is a metric dH defined on the family F of compact sets in X by dH (A, B) = max{sup{d(A, x) : x ∈ B}, sup{d(B, x), x ∈ A}} for any A and B in F. This is usually stated in the following equivalent way: If K(A, r) denotes the set of points which are less than r appart from A (i.e. an r-neighborhood of A), then dH (A, B) is the smallest r such that A ⊂ K(B, r) and B ⊂ K(A, r). Version: 1 Owner: Koro Author(s): Koro

585.8

Urysohn metrization theorem

Let X be a topological space which is regular and second countable and in which singleton sets are closed. Then X is metrizable. Version: 3 Owner: Evandar Author(s): Evandar

585.9

ball

Let X be a metric space, and c ∈ X. A ball around c with radius r > 0 is the set Br (c) = {x ∈ X : d(c, x) < r} where d(c, x) is the distance from c to x. 2104

On the Euclidean plane, balls are open discs and in the line they are open intervals. So, on R (with the standard topology), the ball with radius 1 around 5 is the open interval given by {x : |5 − x| < 1}, that is, (4, 6).

It should be noted that the definition of ball depends on the metric attached to the space. If we had considered R2 with the taxicab metric, the ball with radius 1 around zero would be the rhombus with vertices at (−1, 0), (0, −1), (1, 0), (0, 1).

Balls are open sets under the topology induced by the metric, and therefore are examples of neighborhoods. Version: 8 Owner: drini Author(s): drini

585.10

bounded

Let (X, d) be a metric space. A subset A ⊆ X is said to be bounded if there is some positive real number M such that d(x, y) 6 M whenever x, y ∈ A.

A function f : X → Y from a set X to a metric space Y is said to be bounded if its range is bounded in Y . Version: 5 Owner: Evandar Author(s): Evandar

585.11

city-block metric

The city-block metric, defined on Rn , is d(a, b) =

n X i=1

|bi − ai |

where a and b are vectors in Rn with a = (a1 , . . . , an ) and b = (b1 , . . . , bn ). In two dimensions and with discrete-valued vectors, when we can picture the set of points in Z × Z as a grid, this is simply the number of edges between points that must be traversed to get from a to b within the grid. This is the same problem as getting from corner a to b in a rectilinear downtown area, hence the name “city-block metric.” Version: 5 Owner: akrowne Author(s): akrowne

585.12

completely metrizable

Let X be a topological space. X is said to be completely metrizable if there is a metric d on X under which X is complete. In particular, a completely metrizable space is metrizable. Version: 1 Owner: Evandar Author(s): Evandar 2105

585.13

distance to a set

Let X be a metric space with a metric d. If A is a non-empty subset of X and x ∈ X, then the distance from x to A [1] is defined as d(x, A) := inf a∈A d(x, a). We also write d(x, A) = d(A, x). Suppose that x, y are points in X, and A ⊂ X is non-empty. Then we have the following triangle inequality d(x, A) = inf a∈A d(x, a) ≤ d(x, y) + inf a∈A d(y, a) = d(x, y) + d(y, A). If X is only a pseudo-metric space, then the above definition and triangle-inequality also hold.

REFERENCES 1. J.L. Kelley, General Topology, D. van Nostrand Company, Inc., 1955.

Version: 1 Owner: matte Author(s): matte

585.14

equibounded

Let X and Y be metric spaces. A family F of functions from X to Y is said to be equibounded if there exists a bounded subset B of Y such that for all f ∈ F and all x ∈ X it holds f (x) ∈ B.

Notice that if F ⊂ Cb (X, Y ) (continuous bounded functions) then F is equibounded if and only if F is bounded (with respect to the metric of uniform convergence). Version: 2 Owner: paolini Author(s): paolini

585.15

isometry

Let (X1 , d1), (X2 , d2 ) be metric spaces. A function f : X1 → X2 is said to be an isometry if f is surjective and whenever x, y ∈ X1 , d1 (x, y) = d2 (f (x), f (y)) Note that an isometry is necessarily injective; for if x, y ∈ X1 with x 6= y then d1 (x, y) > 0, and so d2 (f (x), f (y)) > 0, and then f (x) 6= f (y). 2106

In the case where there is an isometry between spaces (X1 , d1 ) and (X2 , d2), they are said to be isometric. Isometric spaces are essentially identical as metric spaces. Moreover, an isometry between X1 and X2 induces a homeomorphism between the underlying sets in the induced topologies, and so in particular isometric spaces are homeomorphic. Warning: some authors do not require isometries to be surjective (and in this case, the isometry will not necessarily be a homeomorphism). It’s generally best to check the definition when looking at a text for the first time. Version: 3 Owner: Evandar Author(s): Evandar

585.16

metric space

A metric space is a set X together with a real valued function d : X × X −→ R (called a metric, or sometimes a distance function) such that, for every x, y, z ∈ X, – d(x, y) > 0, with equality if and only if x = y – d(x, y) = d(y, x) – d(x, z) 6 d(x, y) + d(y, z) For x ∈ X and  ∈ R, the open ball around x of radius  is the set B (x) := {y ∈ X | d(x, y) < }. An open set in X is a set which equals an arbitrary union of open balls in X, and X together with these open sets forms a Hausdorff topological space. The topology on X formed by these open sets is called the metric topology. Similarly, the set {y ∈ X | d(x, y) 6 } is called a closed ball around x of radius . Every closed ball is a closed subset of X in the metric topology. The prototype example of a metric space is R itself, with the metric defined by d(x, y) := |x − y|. More generally, any normed vector space has an underlying metric space structure; when the vector space is finite dimensional, the resulting metric space is isomorphic to Euclidean space.

REFERENCES 1. J.L. Kelley, General Topology, D. van Nostrand Company, Inc., 1955.

Version: 7 Owner: djao Author(s): djao

585.17

non-reversible metric

Definition A non-reversible metric [1] on a set X is a function d : X × X → [0, ∞) that satisfies the properties 2107

1. d(x, y) > 0, with equality if and only if x = y, 2. d(x, y) 6= d(y, x) for some x, y ∈ X,

3. d(x, z) 6 d(x, y) + d(y, z).

In other words, a non-reversible metric satisfies all the properties of a metric except the condition d(x, y) = d(y, x) for all x, y ∈ X. To distinguish a non-reversible metric from a metric (with the usual definition), a metric is sometimes called a reversible metric. Any non-reversible metric d induces a reversible metric d0 given by d0 (x, y) =

 1 d(x, y) + d(y, x) . 2

An example of a non-reversible metric is the Funk-metric [1]. The reversible metric induced by the Funk metric is sometimes called the Klein metric [1].

REFERENCES 1. Z. Shen, Lectures on Finsler Geometry, World Scientific, 2001.

Version: 3 Owner: matte Author(s): matte

585.18

open ball

open ball,closed ball Let (X, d) be a metric space. The open ball with center a ∈ X and radius r > 0 is the set B(a, r) := {x ∈ X : d(a, x) < r} This is sometimes referred simply as ball. The closed ball with center a and radius r is defined as D(a, r) := {x ∈ X : d(a, x) 6 r} This is sometimes referred as disc (or closed disc) with center a and radius r Version: 2 Owner: drini Author(s): drini, apmxi

585.19

some structures on Rn

Let n ∈ {1, 2, . . .}. Then, as a set, Rn is the n-fold cartesian product of the real numbers.

2108

Vector space structure of Rn If u = (u1 , . . . , un ) and v = (v1 , . . . , vn ) are points in Rn , we define their sum as u + v = (u1 + v1 , . . . , un + vn ). Also, if λ is a scalar (real number), then scalar multiplication is defined as λ · u = (λu1 , . . . , λun ). With these operations, Rn becomes a vector space (over R) with dimension n. In other words, with this structure, we can talk about, vectors, lines, subspaces of different dimension. Inner product for Rn For u and v as above, we define the inner product as hu, vi = u1 v1 + · · · + un vn . With this product, Rn is called an Euclidean space. p hu, ui, which gives Rn the structure of a We have also an induced norm kuk = normed space (and thus metric space). This inner product let us talk about length, angle between vectors, orthogonal vectors. Topology for Rn The usual topology for Rn is the topology induced by the metric d(x, y) = kx − yk. As a basis for the topology induced by the above norm, one can take open balls B(x, r) = {y ∈ Rn | kx − yk < r} where r > 0 and x ∈ Rn .

Properties of the topological space Rn are:

1. Rn is second countable, i.e., Rn has a countable basis. 2. (Heine-Borel theorem) A set in Rn is compact if and only if it is closed and bounded. 3. Since Rn is a metric space, Rn is a Hausdorff space. Version: 6 Owner: drini Author(s): drini, matte

2109

585.20

totally bounded

+ A metric space X is said to be totally bounded if and only Sn if for every  ∈ R , there exists a finite subset {x1 , x2 , . . . , xn } of X such that X ⊆ k=1 B(xk , ), where B(xk , ) denotes the open ball around xk with radius .

An alternate definition using ε-nets

Let X be a metric space with a metric d. A subset S ⊂ X is totally bounded if, for any ε > 0, S has a finite ε-net.

REFERENCES 1. G. Bachman, L. Narici, Functional analysis, Academic Press, 1966.

page 65. Version: 7 Owner: drini Author(s): drini, mnemo

585.21

ultrametric

Any metric d : X × X → R on a set X must satisfy the triangle inequality: (∀x, y, z) d(x, z) 6 d(x, y) + d(y, z) An ultrametric must additionally satisfy a stronger version of the triangle inequality: (∀x, y, z) d(x, z) 6 max{d(x, y), d(y, z)} Here’s an example of an ultrametric on a space with 5 points, labeled a, b, c, d, e: a b c d e a 0 12 4 6 12 b 0 12 12 5 c 0 6 12 d 0 12 e 0 The ultrametric condition is equivalent to the ultrametric three point condition: (∀x, y, z) x, y, z can be renamed such that d(x, z) 6 d(x, y) = d(y, z) Ultrametrics can be used to model bifurcating hierarchial systems. The distance between nodes in a weight-balanced binary tree is an ultrametric. Similarly, an ultrametric can be modelled by a weight-balanced binary tree, although the choice of tree is 2110

not necessarily unique. Tree models of ultrametrics are sometimes called ”ultrametric trees”. The metrics induced by non-archimedian valuations are ultrametrics. Version: 11 Owner: bshanks Author(s): bshanks

585.22

Lebesgue number lemma

Lebesgue number lemma: For every open cover U of a compact metric space X, there exists a real number δ > 0 such that every open ball in X of radius δ is contained in some element of U. The number δ above is called a Lebesgue number for the covering U in X. Version: 1 Owner: djao Author(s): djao

585.23

proof of Lebesgue number lemma

By way of contradiction, suppose that no Lebesgue number existed. Then there exists an open cover U of X such that for all δ > 0 there exists an x ∈ X such that no U ∈ U contains Bδ (x) (the open ball of radius δ around x). Specifically, for each n ∈ N, since 1/n > 0 we can choose an xn ∈ X such that no U ∈ U contains B1/n (xn ). Now, X is compact so there exists a subsequence (xnk ) of the sequence of points (xn ) that converges to some y ∈ X. Also, U being an open cover of X implies that there exists λ > 0 and U ∈ U such that Bλ (y) ⊆ U. Since the sequence (xnk ) converges to y, for k large enough it is true that d(xnk , y) < λ/2 (d is the metric on X) and 1/nk < λ/2. Thus after an application of the triangle inequality, it follows that B1/nk (xnk ) ⊆ Bλ (y) ⊆ U, contradicting the assumption that no U ∈ U contains M1/n (xn ). Hence a Lebesgue number for U does exist. Version: 2 Owner: scanez Author(s): scanez

585.24

complete

A metric space X is complete if every Cauchy sequence in X is a convergent sequence. Examples: – The space Q of rational numbers is not complete: the sequence 3, 3.1, 3.14, 3.141, 3.1415, 3.14159, consisting of finite decimals converging to π ∈ R is a Cauchy sequence in Q that does not converge in Q. 2111

– The space R of real numbers is complete, as it is the completion of Q. More generally, the completion of any metric space is a complete metric space. – Every Banach space is complete. For example, the Lp –space of p-integrable functions is a complete metric space. Version: 3 Owner: djao Author(s): djao

585.25

completeness principle

The completeness principle is a property of the real numbers, and is the foundation of analysis. There are a number of equivalent formulations: 1. The limit of every infinite decimal sequence is a real number 2. Every bounded monotonic sequence is convergent 3. A sequence is convergent iff it is a Cauchy sequence 4. Every non-empty set bounded above has a supremum Version: 7 Owner: mathcam Author(s): mathcam, vitriol

585.26

uniformly equicontinuous

A family F of functions from a metric space (X, d) to a metric space (X 0 , d0 ) is uniformly equicontinuous if, for each ε > 0 there exists δ > 0 such that, ∀f ∈ F, ∀x, y ∈ X, d(x, y) < δ ⇒ d0 (f (x), f (y)) < ε. Version: 3 Owner: Koro Author(s): Koro

585.27

Baire category theorem

In a non-empty complete metric space, any countable intersection of dense, open subsets is non-empty. In fact, such countable intersections of dense, open subsets are in fact dense. So the theorem holds also for any non-empty open subset of a complete metric space. Alternative formulations: Call a set first category, or a meagre set, if it is a countable union of nowhere dense sets, otherwise second category. The Baire category theorem is often stated as “no non-empty complete metric space is of first category”, or, trivially, as “a non-empty, complete metric space is of second category”.

2112

In functional analysis, this important property of complete metric spaces forms the basis for the proofs of the important principles of Banach spaces: the open mapping theorem and the closed graph theorem. It may also be taken as giving a concept of “small sets”, similar to sets of measure zero: a countable union of these sets remains “small”. However, the real line R may be partitioned into a set of measure zero and a set of first category; the two concepts are distinct. Note that, apart from the requirement that the set be a complete metric space, all conditions and conclusions of the theorem are phrased topologically. This “metric requirement” is thus something of a disappointment. As it turns out, there are two ways to reduce this requirement. First, if a topological space T is homeomorphic to a non-empty open subset of a complete metric space, then we can transfer the Baire property through the homeomorphism, so in T too any countable intersection of open dense sets is non-empty (and, in fact, dense). The other formulations also hold in this case. Second, the Baire category theorem holds for a locally compact, Hausdorff 1 topological space T. Version: 7 Owner: ariels Author(s): ariels

585.28

Baire space

satisfies A Baire space is a topological space such that the intersection of any countable family of open and dense sets is dense. Version: 1 Owner: Koro Author(s): Koro

585.29 orem

equivalent statement of Baire category the-

Let (X, d) be a complete metric space. Then every subset B ⊂ X of first category has empty interior. Corollary: Every non empty complete metric space is of second category. Version: 5 Owner: gumau Author(s): gumau 1

Some authors only define a locally compact space to be a Hausdorff space; that is the sense required for this theorem.

2113

585.30

generic

A property that holds for all x in some residual subset of a Baire space X is said to be generic in X, or to hold generically in X. In the study of generic properties, it is common to state “generically, P (x)”, where P (x) is some proposition about x ∈ X. The useful fact about generic properties is that, given countably many generic properties Pn , all of them hold simultaneously in a residual set, i.e. we have that, generically, Pn (x) holds for each n. Version: 7 Owner: Koro Author(s): Koro

585.31

meager

A meager set is a countable union of nowhere dense sets. Version: 2 Owner: Koro Author(s): Koro

585.32 proof for one equivalent statement of Baire category theorem First, let’s assume S∞ Baire’s category theorem and proove the alternative statement. We have B = n=1 Bn , with int(Bk ) = ∅ ∀k ∈ N. Then X = X − int(Bk ) = X − Bk ∀k ∈ N

Then X − Bk is dense in X for every k. Besides, X − Bk is open because X is open and Bk closed. So, by Baire’s Category Theorem, we have that ∞ \

(X − Bn ) = X −

n=1

S∞

∞ [

Bn

n=1

S∞

is dense in X. But B ⊂ n=1 Bn =⇒ X − n=1 Bn ⊂ X − B, and then X = S X− ∞ n=1 Bn ⊂ X − B = X − int(B) =⇒ int(B) = ∅.

Now, let’s assume our alternative statement as the hypothesis, and let (Bk )k∈N be a collection of open dense sets in a complete metric space X. Then int(X − Bk ) = int(X − int(Bk )) = int(X − Bk ) = X − Bk = ∅ and so X − Bk is nowhere dense for every k. T T∞ S∞ Then X − ∞ n=1 Bn = int(X − n=1 Bn ) = int( n=1 X −Bn ) = ∅ due to our hypothesis. Hence Baire’s category theorem holds. QED Version: 4 Owner: gumau Author(s): gumau 2114

585.33

proof of Baire category theorem

Let (X, d) be a complete metric space, and Uk a countable collection of dense,Topen subsets. Let x0 ∈ X and 0 > 0 be given. We must show that there exists a x ∈ k Uk such that d(x0 , x) < 0 . Since U1 is dense and open, we may choose an 1 > 0 and an x1 ∈ U1 such that 0 0 d(x0 , x1 ) < , 1 < , 2 2 and such that the open ball of radius 1 about x1 lies entirely in U1 . We then choose an 2 > 0 and a x2 ∈ U2 such that 1 1 d(x1 , x2 ) < , 2 < , 2 2 and such that the open ball of radius 2 about x2 lies entirely in U2 . We continue by induction, and construct a sequence of points xk ∈ Uk and positive k such that k−1 k−1 , k < , d(xk−1 , xk ) < 2 2 and such that the open ball of radius k lies entirely in Uk . By construction, for 0 6 j < k we have   1 1 0 d(xj , xk ) < j + . . . k−j < j 6 j . 2 2 2 Hence the sequence xk , k = 1, 2, . . . is Cauchy, and converges by hypothesis to some x ∈ X. It’s clear that for every k we have d(x, xk ) 6 k . Moreover it follows that d(x, xk ) 6 d(x, xk+1 ) + d(xk , xk+1 ) < k+1 +

k , 2

and hence a fortiori d(x, xk ) < k for every k. By construction then, x ∈ Uk for all k = 1, 2, . . ., as well. QED

Version: 6 Owner: rmilson Author(s): rmilson

585.34

residual

In a topological space X, a set A is called residual if A is second category and X \ A is first category. Version: 1 Owner: bs Author(s): bs 2115

585.35

six consequences of Baire category theorem

1. Rn is not a countable union of proper vector subspaces 2. Let (X, d) be a complete metric space with no isolated points, and let D ⊂ X be a countable dense set. Then D is not a Gδ . 3. There is no function f : R → R continuous only in Q

4. There are continuous functions in the interval [0, 1] which are not monotonic at any subinterval 5. Let E be a Banach space of infinite dimension. Then it doesn’t have a countable algebraic basis. 6. There is no continuous function f : R → R such that f (Q) ⊂ R − Q and f (R − Q) ⊂ Q Version: 5 Owner: gumau Author(s): gumau

585.36

Hahn-Mazurkiewicz theorem

Let X be a Hausdorff space. Then there is a continuous map from [0, 1] onto X if and only if X is compact, connected, locally connected and metrizable. Version: 1 Owner: Evandar Author(s): Evandar

585.37

Vitali covering

A colection of sets V in a metric space X is called a Vitali covering (or Vitali class) for X if for each x ∈ X and δ > 0 there exists U ∈ V such that x ∈ U and 0 < diam(U) 6 δ. Version: 2 Owner: Koro Author(s): Koro

585.38

compactly generated

A Hausdorff topological space X T is said to be compactly generated if for each A ⊂ X, A is closed if and only if A C is closed for every compact C ⊂ X. Version: 1 Owner: RevBobo Author(s): RevBobo

2116

Chapter 586 54G05 – Extremally disconnected spaces, F -spaces, etc. 586.1

extremally disconnected

A topological space X is said to be extremally disconnected if every open set in X has an open closure. It can be shown that X is extremally disconnected iff any two disjoint open sets in X have disjoint closures. Version: 1 Owner: igor Author(s): igor

2117

Chapter 587 54G20 – Counterexamples 587.1

Sierpinski space

Sierpinski space is the topological space given by ({x, y}, {{x, y}, {x}, ∅}).

In other words, the set consists of the two elements x and y and the open sets are {x, y}, {x} and ∅. Sierpinski space is T0 but not T1 .

Version: 2 Owner: Evandar Author(s): Evandar

587.2

long line

The long line is non-paracompact Hausdorff 1-dimensional manifold constructed as follows. Let Ω be the first uncountable ordinal and consider the set L := Ω × [0, 1) endowed with the order topology induced by the lexicographical order, that is the order defined by (α1 , t1 ) < (α2 , t2 ) ⇔ α1 < α2

or (α1 = α2

and t1 < t2 ) .

Intuitively L is obtained by “filling the gaps” between consecutive ordinals in Ω with intervals, much the same way that nonnegative reals are obtained by filling the gaps between consecutive natural numbers with intervals. Some of the properties of the long line: – L is not compact; in fact L is not Lindel¨of. Indeed { [ 0, α) : α < Ω} is an open cover of L that has no countable subcovering. To see this notice that [ { [ 0, αx ) : x ∈ X} = [ 0, sup{αx : x ∈ X}) 2118

and since the supremum of a countable collection of countable ordinals is a countable ordinal such a union can never be [ 0, Ω). – However, L is sequentially compact. Indeed every sequence has a convergent subsequence. To see this notice that given a sequence a := (an ) of elements of L there is an ordinal α such that all the terms of a are in the subset [ 0, α ]. Such a subset is compact since it is homeomorphic to [ 0, 1 ]. – L therefore is not metrizable. – L is a 1–dimensional manifold with boundary. – L therefore is not paracompact. – L is first countable. – L is not separable. – All homotopy groups of L are trivial. – However, L is not contractible.

Variants There are several variations of the above construction. – Instead of [ 0, Ω) one can use (0, Ω) or [ 0, Ω ]. The latter (obtained by adding a single point to L) is compact. – One can consider the “double” of the above construction. That is the space obtained by gluing two copies of L along 0. The resulting open manifold is not homeomorphic to L \ {0}. Version: 7 Owner: Dr Absentius Author(s): AxelBoldt, yark, igor, Dr Absentius

2119

Chapter 588 55-00 – General reference works (handbooks, dictionaries, bibliographies, etc.) 588.1

Universal Coefficient Theorem

The universal coefficient theorem for homology expresses the homology groups with coefficients in an arbitrary abelian group G in terms of the homology groups with coefficients in Z. Theorem (universal Coefficients for Homology) Let K be a chain complex of free abelian groups, and let G any abelian group. Then there exists a split short exact sequence 0

Hn (K) ⊗Z G

α

Hn (K ⊗Z G)

β

Tor(Hn−1 , G)

0.

As well, α and β are natural with respect to chain maps and homomorphisms of coefficient groups. The diagram splits naturally with respect to coefficient homomorphisms but not with respect to chain maps. Here, the functor Tor( , G) is TorZ1 ( , G), the first left derived functor of ⊗Z G.

We can define the map α as follows: Chose a cycle [u] ∈ Hn (K) represented by u ∈ Kn . Then u ⊗ x ∈ Kn ⊗ G is a cycle, so we set α([u] ⊗ x) to be the homology class of u ⊗ x. Of course, one must check that this is well defined, in that it does not depend on our representative for [u]. The universal coefficient theorem for cohomology expresses the cohomology groups of a complex in terms of its homology groups. More specifically we have the following Theorem (Universal Coefficients for Cohomology) Let K be a chain complex of free abelian groups, and let G be any abelian group. Then there exists a split short exact sequence 2120

0

Ext(Hn−1 (K), G)

β

H n (Hom(K, G))

α

Hom(Hn (K), G)

0.

The homomorphisms β and α are natural with respect to coefficient homomorphisms and chain maps. The diagram splits naturally with respect to coefficient homomorphisms but not with respect to chain maps. Here Ext( , G) is Ext1Z ( , G), the first right derived functor of HomZ ( , G). The map α above is defined above in the following manner: Let [u] ∈ H n (Hom(K, G)) be represented by the cocycle u ∈ Hom(Kn , G). For [x] a cycle in Hn (K) represented by x ∈ Kn , we have u(x) ∈ G. We therefore set α([u])([x]) = u(x). Again, it is necessary to check that this does not depend on the chosen representatives x and u. References

REFERENCES 1. W. Massey, Singular Homology theory, Springer-Verlag, 1980

Version: 9 Owner: dublisk Author(s): dublisk

588.2

invariance of dimension

The following non-trivial result was proven by Brouwer [1] around 1910 [2]. Theorem (Invariance of dimension) Suppose U and V are open subsets of Rn respectively Rm . If U and V are non-empty and homeomorphic, then n = m.

REFERENCES 1. The MacTutor History of Mathematics archive, entry on Luitzen Egbertus Jan Brouwer 2. A. Hatcher, Algebraic Topology, Cambridge University Press, 2002. Also available online.

Version: 4 Owner: matte Author(s): matte

2121

Chapter 589 55M05 – Duality 589.1

Poincar´ e duality

If M is a compact, oriented, n-dimensional manifold, then there is a canonical isomorphism D : H q (M, Z) → Hn−q (M, Z) (where H q (M, Z) is the qth homology of M with integer coefficients and Hq (M, Z) the qth cohomology) for all q, which is given by cap product with a generator of Hn (M, Z) (a choice of a generator here corresponds to an orientation). This isomorphism exists with coefficients in Z2 regardless of orientation. This isomorphism gives a nice T interpretation to cup product. If X, Y are transverse submanifolds of M, then X Y is also a submanifold. All of these submanifolds represent homology classes of M in the appropriate dimensions, and [ \ D −1 ([X]) D −1 ([Y ]) = D −1 ([X Y ]), where

S

is cup product, and

T

in intersection, not cap product.

Version: 4 Owner: bwebste Author(s): bwebste

2122

Chapter 590 55M20 – Fixed points and coincidences 590.1

Sperner’s lemma

Let ABC be a triangle, and let S be the set of vertices of some triangulation T of ABC. Let f be a mapping of S into a three-element set, say {1, 2, 3} = T (indicated by red/green/blue respectively in the figure), such that: – any point P of S, if it is on the side AB, satisfies f (P ) ∈ {1, 2} – similarly if P is on the side BC, then f (P ) ∈ {2, 3} – if P is on the side CA, then f (P ) ∈ {3, 1}

(It follows that f (A) = 1, f (B) = 2, f (C) = 3.) Then some (triangular) simplex of T , say UV W , satisfies f (U) = 1

f (V ) = 2

f (W ) = 3 .

We will informally sketch a proof of a stronger statement: Let M (resp. N) be the number of simplexes satisfying (1) and whose vertices have the same orientation as ABC (resp. the opposite orientation). Then M − N = 1 (whence M > 0). The proof is in the style of well-known proofs of, for example, Stokes’ theorem in the plane, or Cauchy’s theorems about a holomorphic function. Define an antisymmetric function d : T × T → Z by d(1, 1) = d(2, 2) = d(3, 3) = 0 d(1, 2) = d(2, 3) = d(3, 1) = 1 d(2, 1) = d(3, 2) = d(1, 3) = −1 . 2123

Let’s define a “circuit” of size n as an injective mapping z of the cyclic group Z/nZ into V such that z(n) is adjacent to z(n + 1) for all n in the group. Any circuit z has what we will call a contour integral Iz, namely X Iz = d(z(n), z(n + 1)) . n

Let us say that two vertices P and Q are equivalent if f (P ) = f (Q). There are four steps: 1) Contour integrals are added when their corresponding circuits are juxtaposed. 2) A circuit of size 3, hitting the vertices of a simplex P QR, has contour integral – 0 if any two of P , Q, R are equivalent, else – +3 if they are inequivalent and have the same orientation as ABC, else – -3 3) If y is a circuit which travels around the 3) perimeter of the whole triangle ABC, and with same orientation as ABC, then Iy = 3. 4) Combining the above results, we get X 3= Iw = 3M − 3N

where the sum contains one summand for each simplex P QR. Remarks: In the figure, M = 2 and N = 1: there are two “red-green-blue” simplexes and one blue-green-red. With the same hypotheses as in Sperner’s lemma, there is such a simplex UV W which is connected (along edges of the triangulation) to the side AB (resp. BC,CA) by a set of vertices v for which f (v) ∈ {1, 2} (resp. {2, 3}, {3, 1}). The figure illustrates that result: one of the red-green-blue simplexes is connected to the red-green side by a red-green “curve”, and to the other two sides likewise. The original use of Sperner’s lemma was in a proof of Brouwer’s fixed point theorem in two dimensions. Version: 7 Owner: mathcam Author(s): Larry Hammick

2124

Chapter 591 55M25 – Degree, winding number 591.1

degree (map of spheres)

Given a non-negative integer n, let S n denote the n-dimensional sphere. Suppose e n ( ), f : S n → S n is a continuous map. Applying the nth reduced homology functor H e n (S n ) → H e n (S n ). Since H e n (S n ) ≈ Z, it follows that we obtain a homomorphism f∗ : H f∗ is a homomorphism Z → Z. Such a map must be multiplication by an integer d. We define the degree of the map f , to be this d.

591.1.1

Basic Properties

1. If f, g : S n → S n are continuous, then deg(f ◦ g) = deg(f )·deg(g).

2. If f, g : S n → S n are homotopic, then deg(f ) = deg(g).

3. The degree of the identity map is +1. 4. The degree of the constant map is 0.

5. The degree of a reflection through an (n + 1)-dimensional hyperplane through the origin is −1. 6. The antipodal map, sending x to −x, has degree (−1)n+1 . This follows since the map fi sending (x1 , . . . , xi , . . . , xn+1 ) 7→ (x1 , . . . , −xi , . . . , xn+1 ) has degree −1 by (4), and the compositon f1 · · · fn+1 yields the anitpodal map.

591.1.2

Examples

If we identify S 1 ⊂ C, then the map f : S 1 → S 1 defined by f (z) = z k has degree k. It is also possible, for any positive integer n, and any integer k, to construct a map f : S n → S n of degree k.

Using degree, one can prove several theorems, including the so-called ’hairy ball theorem’, which states that there exists a continuous non-zero vector field on S n if and only if n is odd. 2125

Version: 7 Owner: dublisk Author(s): dublisk

591.2

winding number

Winding numbers are a basic notion in algebraic topology, and play an important role in connection with analytic functions of a complex variable. Intuitively, given a closed curve t 7→ S(t) in an oriented Euclidean plane (such as the complex plane C), and a point p not in the image of S, the winding number (or index) of S with respect to p is the net number of times S surrounds p. It is not altogether easy to make this notion rigorous. Let us take C for the plane. We have a continuous mapping S : [a, b] → C where a and b are some reals with a < b and S(a) = S(b). Denote by θ(t) the angle from the positive real axis to the ray from z0 to S(t). As t moves from a to b, we expect θ to increase or decrease by a multiple of 2π, namely 2ωπ where ω is the winding number. One therefore thinks of using integration. And indeed, in the theory of functions of a complex variable, it is proved that the value 1 dz intS 2πi z − z0 is an integer and has the expected properties of a winding number around z0 . To define the winding number in this way, we need to assume that the closed path S is rectifiable (so that the path integral is defined). An equivalent condition is that the real and imaginary parts of the function S are of bounded variation. But if S is any continuous mapping [a, b] → C having S(a) = S(b), the winding number is still definable, without any integration. We can break up the domain of S into a finite number of intervals such that the image of S, on any of those intervals, is contained in a disc which does not contain z0 . Then 2ωπ emerges as a finite sum: the sum of the angles subtended at z0 by the sides of a polygon. Let A, B, and C be any three distinct rays from z0 . The three sets S −1 (A)

S −1 (B)

S −1 (C)

are closed in [a, b], and they determine the winding number of S around z0 . This result can provide an alternative definition of winding numbers in C, and a definition in some other spaces also, but the details are rather subtle. For one more variation on the theme, let S be any topological space homeomorphic to a circle, and let f : S → S be any continuous mapping. Intuitively we expect that if a point x travels once around S, the point f (x) will travel around S some integral number of times, say n times. The notion can be made precise. Moreover, the number n is determined by the three closed sets f −1 (a)

f −1 (b)

f −1 (c)

where a, b, and c are any three distinct points in S. Version: 5 Owner: vypertd Author(s): Larry Hammick, vypertd 2126

Chapter 592 55M99 – Miscellaneous 592.1

genus of topological surface

Theorem 18. Let Σ be a compact, orientable connected 2–dimensional manifold (a.k.a. surface) without boundary. Then the following two numbers are equal (in particular the first number is an integer) (i) half the first Betti number of Σ 1 dim H 1 (Σ ; Q) 2

,

(ii) the cardinality of a minimal set C of mutually non-isotopic simple closed curves with the property that Σ \ C is a connected planar surface. Definition 48. The integer of the above theorem is called the genus of the surface. Theorem 19. Any c. o. surface without boudary is a connected sum of g tori, where g is its genus. Remark 3. The previous theorem is the reason why genus is sometimes referred to as ”the number of handles”. Theorem 20. The genus is a complete homeomorphism invariant, i.e. two compact orientable surfaces without bundary are homeomorphic, if and only if, they have the same genus. Version: 16 Owner: Dr Absentius Author(s): Dr Absentius, rmilson

2127

Chapter 593 55N10 – Singular theory 593.1

Betti number

Let X denote a topological space, and let Hk (X, Z) denote the k-th homology group of X. If Hk (X, Z) is finitely generated, then its rank is called the k-th Betti number of X. Version: 2 Owner: mathcam Author(s): mathcam

593.2

Mayer-Vietoris sequence

Let X T is a topological space, and A, B ⊂ X are such that X = int(A) C = A B. Then there is an exact sequence: i∗ ⊕−j∗

j∗ +i∗

S

int(B), and



∗ → Hn−1 (C) −−−→ · · · · · · −−−→ Hn (C) −−−−→ Hn (A) ⊕ Hn (B) −−−→ Hn (X) −−−

Here, i∗ is induced by the inclusion i : (B, C) ,→ (X, A) and j∗ by j : (A, C) ,→ (X, B), and ∂∗ is the following map: if x is in Hn (X), then it can be written as the sum of a chain in A and one in B, x = a + b. ∂a = −∂b, since ∂x = 0. Thus, ∂a is a chain in C, and so represents a class in Hn−1 (C). This is ∂∗ x. One can easily check (by standard diagram chasing) that this map is well defined on the level of homology. Version: 2 Owner: bwebste Author(s): bwebste

593.3

cellular homology

If X is a cell space, then let (C∗ (X), d) be the cell complex where the n-th group Cn (X) is the free abelian group on the cells of dimension n, and the boundary map is as follows: If en is an n-cell, then we can define a map ϕf : ∂en → f n−1 , where 2128

f n−1 is any cell of dimension n − 1 by the following rule: let ϕ : en → skn−1 X be the attaching map for en , where skn−1 X is the (n − 1)-skeleton of X. Then let πf be the natural projection πf : skn−1 X → skn−1 X/(skn−1 X − f ) ∼ = f /∂f. Let ϕf = πf ◦ ϕ. Now, f /∂f is a (n-1)-sphere, so the map ϕf has a degree deg f which we use to define the boundary operator: X d([en ]) = (deg ϕf )[f n−1 ]. dim f =n−1

The resulting chain complex is called the cellular chain complex. Theorem 21. The homology of the cellular complex is the same as the singular homology of the space. That is H∗ (C, d) = H∗ (C, ∂). Cellular homology is tremendously useful for computations because the groups involved are finitely generated. Version: 3 Owner: bwebste Author(s): bwebste

593.4

homology (topological space)

Homology is a name by which a number of functors from topological spaces to abelian groups (or more generally modules over a fixed ring) go by. It turns out that in most reasonable cases a large number of these (singular homology, cellular homology, simplicial homology, Morse homology) all coincide. There are other generalized homology theories, but I won’t consider those. In an intuitive sense, homology measures “holes” in topological spaces. The idea is that we want to measure the topology of a space by looking at sets which have no boundary, but are not the boundary of something else. These are things that have wrapped around “holes” in our topological space, allowing us to detect those “holes.” Here I don’t mean boundary in the formal topological sense, but in an intuitive sense. Thus a loop has no boundary as I mean here, even though it does in the general topological definition. You will see the formal definition below. Singular homology is defined as follows: We define the standard n-simplex to be the subset n X n ∆n = {(x1 , . . . , xn ) ∈ R |xi > 0, xi 6 1} i=1

n

of R . The 0-simplex is a point, the 1-simplex a line segment, the 2-simplex, a triangle, and the 3-simplex, a tetrahedron. 2129

A singular n-simplex in a topological space X is a continuous map f : ∆n → X. A singular n-chain is a formal linear combination (with integer coefficients) of a finite number of singular n-simplices. The n-chains in X form a group under formal addition, denoted Cn (X). Next, we define a boundary operator ∂n : Cn (X) → Cn−1 (X). Intuitively, this is just taking all the faces of the simplex, and considering their images as simplices of one lower dimension with the appropriate sign to keep orientations correct. Formally, we let v0 , v1 , . . . , vn be the vertices of ∆n , pick an order on the vertices of the n − 1 simplex, and let [v0 , . . . , vˆi , . . . , vn ] be the face spanned by all vertices other than vi , identified with the n − 1-simplex by mapping the vertices v0 , . . . , vn except for vi , in that order, to the vertices of the (n − 1)-simplex in the order you have chosen. Then if ϕ : ∆n → X is an n-simplex, ϕ([v0 , . . . , vˆi , . . . , vn ]) is the map ϕ, restricted to the face [v0 , . . . , vˆi , . . . , vn ], made into a singular (n − 1)-simplex by the identification with the standard (n − 1)-simplex I defined above. Then ∂n (ϕ) =

n X

(−1)i ϕ([v0 , . . . , vˆi , . . . , vn ]).

i=0

If one is bored, or disinclined to believe me, one can check that ∂n ◦ ∂n+1 = 0. This is simply an exercise in reindexing. For example, if ϕ is a singular 1-simplex (that is a path), then ∂(ϕ) = ϕ(1) − ϕ(0). That is, it is the difference of the endpoints (thought of as 0-simplices). Now, we are finally in a position to define homology groups. Let Hn (X), the n homology group of X be the quotient Hn (X) =

ker ∂n . i∂n+1

The association X 7→ Hn (X) is a functor from topological spaces to abelian groups, and the maps f∗ : Hn (X) → Hn (Y ) induced by a map f : X → Y are simply those induced by composition of an singular n-simplex with the map f . From this definition, it is not at all clear that homology is at all computable. But, in fact, homology is often much more easily computed than homotopy groups or most other topological invariants. Important tools in the calculation of homology are long exact sequences, the Mayer-Vietoris sequence, cellular homology, and homotopy invariance. Some examples of homology groups: ( Z m=0 Hm (Rn ) = 0 m= 6 0

This reflects the fact that Rn has “no holes”

2130

For n even,   Z n Hm (RP ) = Z2   0

m=0 m ⇔ 1 (mod 2), n > m > 0 m ⇔ 0 (mod 2), n > m > 0 and m > n

  Z n Hm (RP ) = Z2   0

m = 0, n m ⇔ 1 (mod 2), n > m > 0 m ⇔ 0 (mod 2), n > m > 0 and m > n

and for n odd,

Version: 6 Owner: bwebste Author(s): bwebste

593.5

homology of RP3.

We need for this problem knowledge of the homology groups of S 2 and BRP 2 . We will simply assume the former: ( Z k = 0, 2 Hk (S 2 ; Z) = 0 else Now, for BRP 2 , we can argue without Mayer-Vietoris. X = BRP 2 is connected, so H0 (X; Z) = Z. X is non-orientable, so H2 (X; Z) is 0. Last, H1 (X; Z) is the abelianization of the already abelian fundamental group π1 (X) = Z/2Z, so we have:   k=0 Z 2 Hk (BRP ; Z) = Z/2Z k = 1   0 k>2 Now that we have the homology of BRP 2 , we can compute the the homology of BRP 3 3 from Mayer-Vietoris. Let X = BRP 3 , V = BRP \{pt} ∼ BRP 2 (by vieweing BRP 3 T as a CW-complex), U ∼ D 3 ∼ {pt}, and U V ∼ S 2 , where ∼ denotes equivalence through a deformation retract. Then the Mayer-Vietoris sequence gives

2131

0

H3 (X; Z)

H2 (S 2 ; Z)

H2 (pt; Z) ⊕ H2 (BRP 2 ; Z)

H2 (X; Z)

H1 (S 2 ; Z)

H1 (pt; Z) ⊕ H1 (BRP 2 ; Z)

H1 (X; Z)

H0 (S 2 ; Z)

H0 (pt; Z) ⊕ H0 (BRP 2 ; Z)

H0 (X; Z)

0

From here, we substitute in the information from above, and use the fact that the k-th homology group of an n-dimensional object is 0 if k > n, and begin to compute using the theory of short exact sequences. Since we have as a subsequence the short exact sequence 0 → H3 (X; Z) → Z → 0, we can conclude H3 (X; Z) = Z. Since we have as a subsequence the short exact sequence 0 → H2 (X; Z) → 0, we can conclude H2 (X; Z) = 0. Since the bottom sequence splits, we get 0 → Z/2Z → H1 (X; Z) → 0, so that H1 (X; Z) = Z/2Z. We thus conclude that   Z      Z/2Z 3 Hk (BRP ; Z) = 0    Z    0

k k k k k

=0 =1 =2 =3 >3

Version: 4 Owner: mathcam Author(s): mathcam

593.6

long exact sequence (of homology groups)

If X is a topological space, and A and B are subspaces with X ⊃ A ⊃ B, then there is a long exact sequence: j∗

i



∗ ∗ → Hn−1 (A, B) −−−→ → Hn (X, B) −−−→ Hn (X, A) −−− · · · −−−→ Hn (A, B) −−−

where i∗ is induced by the inclusion i : (A, B) ,→ (X, B), j∗ by the inclusion j : (X, B) ,→ (X, A), and ∂ is the following map: given a ∈ Hn (X, A), choose a chain representing it. ∂a is an (n − 1)-chain of A, so it represents an element of Hn−1 (A, B). This is ∂∗ a. When B is the empty set, we get the long exact sequence of the pair (X, A): i

j∗



∗ ∗ → Hn−1 (A) −−−→ → Hn (X) −−−→ Hn (X, A) −−− · · · −−−→ Hn (A) −−−

2132

The existence of this long exact sequence follows from the short exact sequence i]

j]

0 −−−→ C∗ (A, B) −−−→ C∗ (X, B) −−−→ C∗ (X, A) −−−→ 0 where i] and j] are the maps on chains induced by i and j, by the Snake lemma. Version: 5 Owner: mathcam Author(s): mathcam, bwebste

593.7

relative homology groups

If X is a topological space, and A a subspace, then the inclusion map A ,→ X makes Cn (A) into a subgroup of Cn (X). Since the boundary map on C∗ (X) restricts to the boundary map on C∗ (A), we can take the quotient complex C∗ (X, A), ∂





←−−− Cn (X)/Cn (A) ←−−− Cn+1 (X)/Cn+1(A) ←−−− The homology groups of this complex Hn (X, A), are called the relative homology groups of the pair (X, A). Under relatively mild hypotheses, Hn (X, A) = Hn (X/A) where X/A is the set of equivalence classes of the relation x ∼ y if x = y or if x, y ∈ A, given the quotient topology (this is essentially X, with A reduced to a single point). Relative homology groups are important for a number of reasons, principally for computational ones, since they fit into long exact sequences, which are powerful computational tools in homology. Version: 2 Owner: bwebste Author(s): bwebste

2133

Chapter 594 55N99 – Miscellaneous 594.1

suspension isomorphism

Proposition 2. Let X be a topological space. There is a natural isomorphism s : Hn+1 (SX) → Hn (X), where SX stands for the unreduced suspension of X. If X has a basepoint, there is a natural isomorphism e n+1 (ΣX) → H e n (X), s:H

where ΣX is the reduced suspension.

A similar proposition holds with homology replaced by cohomology. In fact, these propositions follow from the Eilenberg-Steenrod axioms without the dimension axiom, so they hold for any generalized (co)homology theory in place of integral (co)homology. Version: 1 Owner: antonio Author(s): antonio

2134

Chapter 595 55P05 – Homotopy extension properties, cofibrations 595.1

cofibration

If X is a topological space and A is a subspace of X, the inclusion map i : A → X is a cofibration if it has the homotopy extension property with respect to any space Y . Version: 2 Owner: RevBobo Author(s): RevBobo

595.2

homotopy extension property

Let X be a topological space and A a subspace of X. Suppose there is a continuous map f : X → Y and a homotopy of maps F : A × I → Y . The inclusion map i : A → X is said to have the homotopy extension property if there exists a continuous map 0 F such that the following diagram commutes: A

i0

A×I

i

X

F f i0

Y

i×idI F

0

X ×I Here, i0 = (x, 0) for all x ∈ X.

Version: 11 Owner: RevBobo Author(s): RevBobo

2135

Chapter 596 55P10 – Homotopy equivalences 596.1

Whitehead theorem

Theorem 22 (J.H.C. Whitehead). If f : X → Y is a weak homotopy equivalence and X and Y are path-connected and of the homotopy type of CW complexes, then f is a strong homotopy equivalence. Remark 1. It is essential to the theorem that isomorphisms between πk (X) and πk (Y ) for all k are induced by a map f : X → Y ; if an isomorphism exists which is not induced by a map, it need not be the case that the spaces are homotopy equivalent. For example, let X = RP m × S n and Y = RP n × S m . Then the two spaces have isomorphic homotopy groups because they both have a universal covering space homeomorphic to S m × S n , and it is a double covering in both cases. However, for m < n, X and Y are not homotopy equivalent, as can be seen, for example, by using homology: Hm (X; Z/2Z) ' Z/2Z, but Hm (Y ; Z/2Z) ' Z/2Z ⊕ Z/2Z. (Here, RP n is n-dimensional real projective space, and S n is the n-sphere.) Version: 6 Owner: antonio Author(s): antonio

596.2

weak homotopy equivalence

A continuous map f : X → Y between path-connected based topological spaces is said to be a weak homotopy equivalence if for each k ≥ 1 it induces an isomorphism f∗ : πk (X) → πk (Y ) between the kth homotopy groups. X and Y are then said to be weakly homotopy equivalent. Remark 2. It is not enough for πk (X) to be isomorphic to πk (Y ) for all k. The definition requires these isomorphisms to be induced by a space-level map f. Version: 4 Owner: antonio Author(s): antonio 2136

Chapter 597 55P15 – Classification of homotopy type 597.1

simply connected

A topological space is said to be simply connected if it is path connected and the fundamental group of the space is trivial (i.e. the one element group). Version: 4 Owner: RevBobo Author(s): RevBobo

2137

Chapter 598 55P20 – Eilenberg-Mac Lane spaces 598.1

Eilenberg-Mac Lane space

Let π be a discrete group. A based topological space X is called an Eilenberg-Mac Lane space of type K(π, n), where n ≥ 1, if all the homotopy groups πk (X) are trivial except for πn (X), which is isomorphic to π. Clearly, for such a space to exist when n ≥ 2, π must be abelian. Given any group π, with π abelian if n ≥ 2, there exists an Eilenberg-Mac Lane space of type K(π, n). Moreover, this space can be constructed as a CW complex. It turns out that any two Eilenberg-Mac Lane spaces of type K(π, n) are weakly homotopy equivalent. The Whitehead theorem then implies that there is a unique K(π, n) space up to homotopy equivalence in the category of topological spaces of the homotopy type of a CW complex. We will henceforth restrict ourselves to this category. With a slight abuse of notation, we refer to any such space as K(π, n). An important property of K(π, n) is that, for π abelian, there is a natural isomorphism H n (X; π) ' [X, K(π, n)] of contravariant set-valued functors, where [X, K(π, n)] is the set of homotopy classes of based maps from X to K(π, n). Thus one says that the K(π, n) are representing spaces for cohomology with coefficients in π. Remark 3. Even when the group π is nonabelian, it can be seen that the set [X, K(π, 1)] is naturally isomorphic to Hom(π1 (X), π)/π; that is, to conjugacy classes of homomorphisms from π1 (X) to π. In fact, this is a way to define H 1 (X; π) when π is nonabelian. Remark 4. Though the above description does not include the case n = 0, it is natural to define a K(π, 0) to be any space homotopy equivalent to π. The above statement about cohomology then becomes true for the reduced zeroth cohomology functor. Version: 2 Owner: antonio Author(s): antonio

2138

Chapter 599 55P99 – Miscellaneous 599.1

fundamental groupoid

Definition 49. Given a topological space X the fundamental groupoid Π1 (X) of X is defined as follows: – The objects of Π1 (X) are the points of X Obj(Π1 (X)) = X , – morphisms are homotopy classes of paths “rel endpoints” that is HomΠ1 (x) (x, y) = Paths(x, y)/ ∼ , where, ∼ denotes homotopy rel endpoints, and,

– composition of morphisms is defined via concatenation of paths. It is easily checked that the above defined category is indeed a groupoid with the inverse of (a morphism represented by) a path being (the homotopy class of) the “reverse” path. Notice that for x ∈ X, the group of automorphisms of x is the fundamental group of X with basepoint x, HomΠ1 (X) (x, x) = π1 (X, x) . Definition 50. Let f : X → Y be a continuous function between two topological spaces. Then there is an induced functor Π1 (f ) : Π1 (X) → Π1 (Y ) defined as follows – on objects Π1 (f ) is just f ,

2139

– on morphisms Π1 (f ) is given by “composing with f ”, that is if α : I → X is a path representing the morphism [α] : x → y then a representative of Π1 (f )([α]) : f (x) → f (y) is determined by the following commutative diagram I Π1 (f )(α)

α

X

f

Y

It is straightforward to check that the above indeed defines a functor. Therefore Π1 can (and should) be regarded as a functor from the category of topological spaces to the category of groupoids. This functor is not really homotopy invariant but it is “homotopy invariant up to homotopy” in the sense that the following holds. Theorem 23. A homotopy between two continuous maps induces a natural transformation between the corresponding functors. A reader who understands the meaning of the statement should be able to give an explicit construction and supply the proof without much trouble. Version: 3 Owner: Dr Absentius Author(s): Dr Absentius

2140

Chapter 600 55Pxx – Homotopy theory 600.1

nulhomotopic map

Let f : X → Y be a continuous map. We say that f is nullhomotopic if there is a homotopy from f to a constant map c, that is, f ' c.

Version: 2 Owner: drini Author(s): drini, apmxi

2141

Chapter 601 55Q05 – Homotopy groups, general; sets of homotopy classes 601.1

Van Kampen’s theorem

Van Kampen’s theorem is usually stated as follows: Theorem 24. Let X be a connected topological space, S T Xk k = 1, 2 two connected subspaces such that X = X1 X2 , and X0 := X1 X2 is connected. Let further ∗ ∈ X0 and ik : π1 (X0 , ∗) → π1 (Xk , ∗), jk : π1 (Xk , ∗) → π1 (X, ∗) be induced by the inclusions for k = 1, 2. Then π1 (X, ∗) = π1 (X1 , ∗)Fπ1 (X0 ,∗) π1 (X2 , ∗) , that is, the fundamental group of X is the free product of the fundamental groups of X1 and X2 with amalgamated subgroup the fundamental group of X0 . There is also a “basepoint-free” version about fundamental groupoids: Theorem 25. The fundamental groupoid functor preserves pushouts. That is, given a commutative diagram of spaces where all maps are inclusions X1 j1

i1

X0

X i2

j2

X2

2142

there is an induced pushout diagram in the category of groupoids: Π1 (X1 ) Π1 (i1 )

Π1 (j1 )

Π1 (X0 )

Π1 (X)

Π1 (i2 )

Π1 (j2 )

Π1 (X2 ) Notice that in the basepoint-free version it is not required that the spaces are connected. Version: 2 Owner: Dr Absentius Author(s): Dr Absentius

601.2

category of pointed topological spaces

A pointed topological space, written as (X, x0 ), consists of a non-empty topological space X together with an element x0 ∈ X. A morphism from (X, x0 ) to (Y, y0) is a continuous map f : X → Y satisfying f (x0 ) = y0 . With these morphisms, the pointed topological spaces form a category.

Two pointed topological spaces (X, x0 ) and (Y, y0) are isomorphic in this category if there exists a homeomorphism f : X → Y with f (x0 ) = y0 .

Every singleton (a pointed topological space of the form ({x0 }, x0 )) is a zero object in this category. For every pointed topological space (X, x0 ), we can construct the fundamental group π(X, x0 ) and for every morphism f : (X, x0 ) → (Y, y0) we obtain a group homomorphism π(f ) : π(X, x0 ) → π(Y, y0 ). This yields a functor from the category of pointed topological spaces to the category of groups. Version: 2 Owner: nobody Author(s): AxelBoldt, apmxi

601.3

deformation retraction

Let X and Y be topological spaces such that Y ⊂ X. A deformation retraction of X onto Y is a collection of mappings ft : X → X, t ∈ [0, 1] such that 1. f0 = idX , the identity mapping on X, 2. f1 (X) = Y , 3. ft = idY for all t, 4. the mapping X × I → X, (x, t) 7→ ft (x) is continuous. Examples 2143

– If x0 ∈ Rn , then ft (x, t) = (1 − t)x + tx0 , x ∈ Rn shows that Rn deformation retracts onto {x0 }. Since {x0 } ⊂ Rn , it follows that deformation retraction is not an equivalence relation. x – Setting ft (x, t) = (1 − t)x + t ||x|| , x ∈ Rn \{0}, n > 0, we obtain a deformation retraction of Rn \{0} onto the (n − 1)-sphere S n−1 .

– The M¨obius strip deformation retracts onto the circle S 1 .

– The 2-torus with one point removed deformation retracts onto two copies of S 1 joined at one point. (The circles can be chosen to be longitudinal and latitudinal circles of the torus.) Version: 2 Owner: matte Author(s): matte

601.4

fundamental group

Let (X, x0 ) be a pointed topological space (i.e. a topological space with a chosen basepoint x0 ). Denote by [(S 1 , 1), (X, x0 )] the set of homotopy classes of maps σ : S 1 → X such that σ(1) = x0 . Here, 1 denotes the basepoint (1, 0) ∈ S 1 . Define a product [(S 1 , 1), (X, x0 )] × [(S 1 , 1), (X, x0 )] → [(S 1 , 1), (X, x0 )] by [σ][τ ] = [στ ], where στ means “travel along σ and then τ ”. This gives [(S 1 , 1), (X, x0 )] a group structure and we define the fundamental group of X to be π1 (X, x0 ) = [(S 1 , 1), (X, x0 )]. The fundamental group of a topological space is an example of a homotopy group. Two homotopically equivalent spaces have the same fundamental group. Moreover, it can be shown that π1 is a functor from the category of (small) pointed topological spaces to the category of (small) groups. Thus the fundamental group is a topological invariant in the sense that if X is homeomorphic to Y via a basepoint preserving map, π1 (X, x0 ) is isomorphic to π1 (Y, y0 ). Examples of the fundamental groups of some familiar spaces are: π1 (Rn ) ∼ = {0} for each n, π1 (S 1 ) ∼ = Z and π1 (T ) ∼ = Z ⊕ Z where T is the torus. Version: 7 Owner: RevBobo Author(s): RevBobo

601.5

homotopy of maps

Let X, Y be topological spaces, A a closed subspace of X and f, g : X → Y continuous maps. A homotopy of maps is a continuous function F : X × I → Y satisfying 1. F (x, 0) = f (x) for all x ∈ X

2. F (x, 1) = g(x) for all x ∈ X

3. F (x, t) = f (x) = g(x) for all x ∈ A, t ∈ I.

2144

We say that f is homotopic to g relative to A and denote this by f ' g relA. If A = ∅, this can be written f ' g. If g is the constant map (i.e. g(x) = y for all x ∈ X), then we say that f is nullhomotopic. Version: 5 Owner: RevBobo Author(s): RevBobo

601.6

homotopy of paths

Let X be a topological space and p, q paths in X with the same initial point x0 and terminal point x1 . If there exists a continuous function F : I × I → X such that 1. F (s, 0) = p(s) for all s ∈ I

2. F (s, 1) = q(s) for all s ∈ I

3. F (0, t) = x0 for all t ∈ I

4. F (1, t) = x1 for all t ∈ I

we call F a homotopy of paths in X and say p, q are homotopic paths in X. F is also called a continuous deformation. Version: 4 Owner: RevBobo Author(s): RevBobo

601.7

long exact sequence (locally trivial bundle)

Let π : E → B is a locally trivial bundle, with fiber F . exact sequence of homotopy groups i

π

Then there is a long



∗ ∗ ∗ → πn−1 (F ) −−−→ · · · → πn (B) −−− → πn (E) −−− · · · −−−→ πn (F ) −−−

Here i∗ is induced by the inclusion i : F ,→ E as the fiber over the basepoint of B, and ∂∗ is the following map: if [ϕ] ∈ πn (B), then ϕ lifts to a map of (D n , ∂D n ) into (E, F ) (that is a map of the n-disk into E, taking its boundary to F ), sending the basepoint on the boundary to the base point of F ⊂ E. Thus the map on ∂D n = S n−1 , the n − 1-sphere, defines an element of πn−1 (F ). This is ∂∗ [ϕ]. The covering homotopy property of a locally trivial bundle shows that this is well-defined. Version: 3 Owner: bwebste Author(s): bwebste

2145

Chapter 602 55Q52 – Homotopy groups of special spaces 602.1

contractible

A topological space is said to be contractible if it is homotopy equivalent to a point. Equivalently, the space is contractible if the constant map is homotopic to the identity map. A contractible space has a trivial fundamental group. Version: 3 Owner: RevBobo Author(s): RevBobo

2146

Chapter 603 55R05 – Fiber spaces 603.1

classification of covering spaces

Let X be a connected, locally path connected and semilocally simply connected space. Assume furthermore that X has a basepoint ∗.

A covering p : E → X is called based if E is endowed with a basepoint e and p(e) = ∗. Two based coverings pi : Ei → X, i = 1, 2 are called equivalent if there is a basepoint preserving equivalence T : E1 → E2 that covers the identity, i.e. T is a homeomorphism and the following diagram commutes T

(E1 , e1 ) p

(E2 , e2 ) p

(X, ∗) Theorem 1 (Classification of connected coverings). – equivalence classes of based coverings p : (E, e) → (X, ∗) with connected total space E are in bijective correspondence with subgroups of the fundamental group π1 (X, ∗). The bijection assigns to the based covering p the subgroup p∗ (π1 (E, e)).

– Equivalence classes of coverings (not based) are in bijective correspondence with conjugacy class of subgroups of π1 (X, ∗).

Under the bijection of the above theorem normal coverings correspond to normal subgroups ˜ → X corresponds to the of π1 (X, e), and in particular the universal covering π ˜: X trivial subgroup while the trivial covering id : X → X corresponds to the whole group.

Rough sketch of proof] We describe the based version. clearly the set of equivalences of two based coverings form a torsor of the group of deck transformations Aut(p). From our discussion of that group it follows then that equivalent (based) coverings give the same subgroup. Thus the map is well defined. To see that it is a bijection construct ˜ → X and a subgroup π of its inverse as follows: There is a universal covering π ˜: X [

2147

˜ by the restriction of the monodromy action. The covering which π1 (X, ∗) acts on X ˜ corresponds to π is then X/π. Version: 1 Owner: Dr Absentius Author(s): Dr Absentius

603.2

covering space

Let X and E be topological spaces and suppose there is a (continuous) map p : E → X which satisfies the following condition: for each x ∈ X, there is an open neighborhood U of x such that p−1 (U) is a union of disjoint sets Ei ⊂ E and each Ei is mapped homeomorphically onto U via p. Then E is called a covering space, p is a covering map, the Ei ’s are sheets of the covering and for each x ∈ X, p−1 (x) is the fiber of p above x. The open set U is said to be evenly covered. If E is simply connected, it is called the universal covering space. From this we can derive that p is a local homeomorphism, so that any local property E has is inherited by X (local connectedness, local path connectedness etc.). Covering spaces are foundational in the study of the fundamental group of a topological space. Version: 10 Owner: matte Author(s): RevBobo

603.3

deck transformation

Let p : E → X be a covering map. A deck transformation or covering transformation is a map D : E → E such that p ◦ D = p, that is, such that the following diagram commutes. D

E p

E p

X It is straightforward to check that the set of deck transformations is closed under compositions and the operation of taking inverses. Therefore the set of deck transformations is a subgroup of the group of homeomorphisms of E. This group will be denoted by Aut(p) and referred to as the group of deck transformations or as the automorphism group of p. In the more general context of fiber bundles deck transformations correspond to isomorphisms over the identity since the above diagram could be expanded to: E

D

p

X

E p

id

2148

X

An isomorphism not necessarily over the identity is called an equivalence. In other words an equivalence between two covering maps p : E → X and p0 : E 0 → X 0 is a pair of maps (f˜, f ) that make the following diagram commute E0



p

p0

X0

E

f

X

i.e. such that p ◦ f˜ = f ◦ p0 .

Deck transformations should be perceived as the symmetries of p (hence the notation Aut(p)), and therefore they should be expected to preserve any concept that is defined in terms of P . Most of what follows is an instance of this meta–principle.

Properties of deck transformations For this section we assume that the total space E is connected and locally path connected. Notice that a deck transformation is a lifting of p : E → X and therefore (according to the lifting theorem) it is uniquely determined by the image of a point. In other words: Proposition 3. Let D1 , D2 ∈ Aut(p). If there is e ∈ E such that D1 (e) = D2 (e) then D1 = D2 . In particular if D1 (e) = e for some e ∈ E then D1 = id. Another simple (or should I say double?) application of the lifting theorem gives Proposition 4. Given e, e0 ∈ E with p(e) = p(e0 ), there is a D ∈ Aut(p) such that D(e) = e0 if and only if p∗ (π1 (E, e)) = p∗ (π1 (E, e0 )), where p∗ denotes π1 (p). Proposition 5. Deck transformations commute with the monodromy action. That is if ∗ ∈ X, e ∈ p−1 (∗), γ ∈ π1 (X, ∗) and D ∈ Aut(p) then D(x · γ) = D(x) · γ where · denotes the monodromy action. I

f γ˜ is a lifting of γ starting at e, then D ◦ γ˜ is a lifting of γ staring at D(e).

We simplify notation by using πe to denote the fundamental group π1 (E, e) for e ∈ E. Theorem 26. For all e ∈ E Aut(p) ∼ = N (p∗ (πe )) /p∗ (πe ) where, N(p∗ πe ) denotes the normalizer of p∗ πe inside π1 (X, p(e)).

2149

enote N(p∗ πe ) by N. Note that if γ ∈ N then p∗ (πe·γ ) = p∗ (πe ). Indeed, recall that p∗ (πe ) is the stabilizer of e under the momodromy action and therefore we have D

p ∗ (πe·γ ) = Stab(e · γ) = γ Stab(e)γ −1 = γp∗ (πe )γ −1 = p∗ (πe ) where, the last equality follows from the definition of normalizer. One can then define a map ϕ : N → Aut(p)

as follows: For γ ∈ N let ϕ(γ) be the deck transformation that maps e to e · γ. Notice that proposition 4 ensures the existence of such a deck transformation while Proposition 3 guarantees its uniqueness. Now – ϕ is a homomorphism. Indeed ϕ(γ1 γ2 ) and ϕ(γ1 ) ◦ ϕ(γ2) are deck transformations that map e to e· (γ1γ2 ). – ϕ is onto. Indeed given D ∈ Aut(p) since E is path connected one can find a path α in E connecting e and D(e). Then p ◦ α is a loop in X and D = ϕ(p ◦ α). – ker (ϕ) = p∗ (πe ). Obvious. Therefore the theorem follows by the first isomorphism theorem. Corollary 2. If p is regular covering then Aut(p) ∼ = π1 (X, ∗)/p∗ (π1 (E, e)) . Corollary 3. If p is the universal cover then Aut(p) ∼ = π1 (X, ∗) . Version: 11 Owner: mathcam Author(s): mathcam, Dr Absentius, rmilson

603.4

lifting of maps

Let p : E → B and f : X → B be (continuous) maps. Then a lifting of f to E is a (continuous) map f˜: X → E such that p ◦ f˜ = f . The terminology is justified by the following commutative diagram E f˜

X

f

p

B

which expresses this definition. f˜ is also said to lift f or to be over f . This notion is especially useful if p : E → B is a fiber bundle. In particular lifting of paths is instrumental in the investigation of covering spaces. This terminology is used in more general contexts: X, E and B could be objects (and p, f and f˜ be morphisms) in any category. Version: 2 Owner: Dr Absentius Author(s): Dr Absentius 2150

603.5

lifting theorem

Let p : E → B be a covering map and f : X → B be a (continuous) map where X, B and E are path connected and locally path conncted. Also let x ∈ X and e ∈ E be ˜ = e, if and points such that f (x) = p(e). Then f lifts to a map f˜: X → E with f(x) only if, π1 (f ) maps π1 (X, x) inside the image π1 (p) (π1 (E, e)), where π1 denotes the fundamental group functor. Furthermore f˜ is unique (provided it exists of course). The following diagrams might be usefull: To check (E, e) ?f˜

(X, x)

f

p

(B, b)

one only needs to check π1 (E, e) π1 (p)

π1 (p) (π1 (E, e)) ?

π1 (X, x)

π1 (f )

π1 (f ) (π1 (X, x))

⊂ ⊂

π1 (B, b)

Corollary 4. Every map from a simply connected space X lifts. In particular: 1. a path γ : I → B lifts,

2. a homotopy of paths H : I × I → B lifts, and

3. a map σ : S n → B, lifts if n > 2.

Note that (3) is not true for n = 1 because the circle is not simply connected. So although by (1) every closed path in B lifts to a path in E it does not necessarily lift to a closed path. Version: 3 Owner: Dr Absentius Author(s): Dr Absentius

603.6

monodromy

Let (X, ∗) be a connected and locally connected based space and p : E → X a covering map. We will denote p−1 (∗), the fiber over the basepoint, by F , and the fundamental group π1 (X, ∗) by π. Given a loop γ : I → X with γ(0) = γ(1) = ∗ and a point e ∈ F there exists a unique γ˜ : I → E, with γ˜ (0) = e such that p ◦ γ˜ = γ, that is, a lifting of γ starting at e. Clearly, the endpoint γ˜ (1) is also a point of the fiber, which we will denote by e · γ. 2151

Theorem 27. With notation as above we have: 1. If γ1 and γ2 are homotopic relative ∂I then ∀e ∈ F

e · γ1 = e · γ2 .

2. The map F × π → F,

(e, γ) 7→ e · γ

defines a right action of π on F . 3. The stabilizer of a point e is the image of the fundamental group π1 (E, e) under the map induced by p: Stab(x) = p∗ (π1 (E, e)) . 1) Let e ∈ F , γ1 , γ2 : I → X two loops homotopic relative ∂I and γ˜1 , γ˜2 : I → E their liftings starting at e. Then there is a homotopy H : I × I → X with the following properties: (

– H(•, 0) = γ1 , – H(•, 1) = γ2 , – H(0, t) = H(1, t) = ∗,

∀t ∈ I.

˜ : I ×I → E with H(0, 0) = e. According to the lifting theorem H lifts to a homotopy H ˜ ˜ Notice that H(•, 0) = γ˜1 (respectively H(•, 1) = γ˜2 ) since they both are liftings of ˜ •) is a path that lies γ1 (respectively γ2 ) starting at e. Also notice that that H(1, entirely in the fiber (since it lifts the constant path ∗). Since the fiber is discrete this ˜ •) is a constant path. In particular H(1, ˜ 0) = H(1, ˜ 1) or equivalently means that H(1, γ˜1 (1) = γ˜2 (1). (2) By (1) the map is well defined. To prove that it is an action notice that firstly the constant path ∗ lifts to constant paths and therefore ∀e ∈ F,

e· 1 = e.

Secondly the concatenation of two paths lifts to the concatenation of their liftings (as is easily verified by projecting). In other words, the lifting of γ1 γ2 that starts at e is the concatenation of γ˜1 , the lifting of γ1 that starts at e, and γ˜2 the lifting of γ2 that starts in γ1 (1). Therefore e · (γ1 γ2 ) = (e · γ1 ) · γ2 . (3) This is a tautology: γ fixes e if and only if its lifting starting at e is a loop. 2152

Definition 51. The action described in the above theorem is called the monodromy action and the corresponding homomorphism ρ : π → Sym(F ) is called the monodromy of p. Version: 4 Owner: mathcam Author(s): mathcam, Dr Absentius

603.7

properly discontinuous action

Let G be a group and E a topological space on which G acts by homeomorphisms, that is there is a homomorphism ρ : G → Aut(E), where the latter denotes the group of self-homeomorphisms of E. The action is said to be properly discontinuous if each point e ∈ E has a neighborhood U with the property that all non trivial elements of G move U outside itself: \ ∀g ∈ G g 6= id ⇒ gU U = ∅.

For example, let p : E → X be a covering map, then the group of deck transformations of p acts properly discontinuously on E. Indeed if e ∈ E and D ∈ Aut(p) then one can take as U to be any neighborhood with the property that p(U) is evenly covered. The following shows that this is the only example: Theorem 2. Assume that E is a connected and locally path connected Hausdorff space. If the group G acts properly discontinuously on E then the quotient map p : E → E/G is a covering map and Aut(p) = G. Version: 5 Owner: Dr Absentius Author(s): Dr Absentius

603.8

regular covering

Theorem 28. Let p : E → X be a covering map where E and X are connected and locally path connected and let X have a basepoint ∗. The following are equivalent: 1. The action of Aut(p), the group of covering transformations of p, is transitive on the fiber p−1 (∗), 2. for some e ∈ p−1 (∗), p∗ (π1 (E, e)) is a normal subgroup of π1 (X, ∗), where p∗ denotes π1 (p), 3. ∀e, e0 ∈ p−1 (∗),

p∗ (π1 (E, e)) = p∗ (π1 (E, e0 )),

4. there is a discrete group G such that p is a principal G-bundle.

2153

All the elements for the proof of this theorem are contained in the articles about the monodromy action and the deck transformations. Definition 52. A covering with the properties described in the previous theorem is called a regular or normal covering. The term Galois covering is also used sometimes. Version: 2 Owner: Dr Absentius Author(s): Dr Absentius

2154

Chapter 604 55R10 – Fiber bundles 604.1

associated bundle construction

Let G be a topological group, π : P → X a (right) principal G-bundle, F a topological space and ρ : G → Aut(F ) a representation of G as homeomorphisms of F . Then the fiber bundle associated to P by ρ, is a fiber bundle πρ : P ×ρ F → X with fiber F and group G that is defined as follows: – The total space is defined as P ×ρ F := P × F/G where the (left) action of G on P × F is defined by  g · (p, f ) := pg −1, ρ(g)(f ) , ∀g ∈ G, p ∈ P, F ∈ F .

– The projection πρ is defined by

πρ [p, f ] := π(p) , where [p, f ] denotes the G–orbit of (p, f ) ∈ P × F . Theorem 29. The above is well defined and defines a G–bundle over X with fiber F . Furthermore P ×ρ F has the same transition functions as P . Scetch of proof] To see that πρ is well defined just notice that for p ∈ P and g ∈ G, π(pg) = π(p). To see that the fiber is F notice that since the principal action is simply transitive, given p ∈ P any orbit of the G–action on P × F contains a unique representative of the form (p, f ) for some f ∈ F . It is clear that an open cover that trivializes P trivializes P ×ρ F as well. To see that P ×ρ F has the same transition functions as P notice that transition functions of P act on the left and thus commute with the principal G–action on P . [

2155

Notice that if G is a Lie group, P a smooth principal bundle and F is a smooth manifold and ρ maps inside the diffeomorphism group of F , the above construction produces a smooth bundle. Also quite often F has extra structure and ρ maps into the homeomorphisms of F that preserve that structure. In that case the above construction produces a “bundle of such structures.” For example when F is a vector space and ρ(G) ⊂ GL(F ), i.e. ρ is a linear representation of G we get a vector bundle; if ρ(G) ⊂ SL(F ) we get an oriented vector bundle, etc. Version: 2 Owner: Dr Absentius Author(s): rmilson, Dr Absentius

604.2 π

bundle map π

Let E1 →1 B1 and E2 →2 B2 be fiber bundles for which there is a continuous map f : B1 → B2 of base spaces. A bundle map (or bundle morphism) is a commutative square E1



π1

B1

E2 π2

f

B2

such that the induced map E1 → f −1 E2 is a homeomorphism (here f −1 E2 denotes the pullback of f along the bundle projection π2 ). Version: 2 Owner: RevBobo Author(s): RevBobo

604.3

fiber bundle

Let F be a topological space and G be a topological group which acts on F on the left. A fiber bundle with fiber F and structure group G consists of the following data: – a topological space B called the base space, a space E called the total space and a continuous surjective map π : E → B called the projection of the bundle,

– an open cover {Ui } of B along with a collection of continuous maps {φi : π −1 Ui → F } called local trivializations and T – a collection of continuous maps {gij : Ui Uj → G} called transition functions

which satisfy the following properties

1. the map π −1 Ui → Ui × F given by e 7→ (π(e), φi(e)) is a homeomorphism for each i, T 2. for all indices i, j and e ∈ π −1 (Ui Uj ), gji(π(e)) · φi (e) = φj (e) and T T 3. for all indices i, j, k and b ∈ Ui Uj Uk , gij (b)gjk (b) = gik (b). 2156

ˇ Readers familiar with Cech cohomology may recognize condition 3), it is often called the cocycle condition. Note, this imples that gii (b) is the identity in G for each b, and gij (b) = gji (b)−1 . If the total space E is homeomorphic to the product B×F so that the bundle projection is essentially projection onto the first factor, then π : E → B is called a trivial bundle. Some examples of fiber bundles are vector bundles and covering spaces. There is a notion of morphism of fiber bundles E, E 0 over the same base B with the same structure group G. Such a morphism is a G-equivariant map ξ : E → E 0 , making the following diagram commute ξ

E π

E0 . π0

B Thus we have a category of fiber bundles over a fixed base with fixed structure group. Version: 7 Owner: bwebste Author(s): bwebste, RevBobo

604.4

locally trivial bundle

A locally trivial bundle is a map π : E → B of topological spaces satisfying the following condition: for any x ∈ B, there is a neighborhood U 3 x, and a homeomorphism g : π −1 (U) → U × π −1 (x) such that the following diagram commutes: π −1 (U)

g

U × π −1 (x)

π

id×{x} id

U

U

Locally trivial bundles are useful because of their covering homotopy property, and because of the associated the long exact sequence and Serre spectral sequence. Version: 4 Owner: bwebste Author(s): bwebste

604.5

principal bundle

Let E be a topological space on which a topological group G acts continuously and freely. The map π : E → E/G = B is called a principal bundle (or principal G-bundle) if the projection map π : E → B is a locally trivial bundle.

Any principal bundle with a section σ : B → E is trivial, since the map φ : B × G → E given by φ(b, g) = g · σ(b) is an isomorphism. In particular, any G-bundle 2157

which is topologically trivial is also isomorphic to B × G as a G-space. Thus any local trivialization of π : E → B as a topological bundle is an equivariant trivialization.

Version: 5 Owner: bwebste Author(s): bwebste, RevBobo

604.6

pullback bundle

If π : E → B is a bundle and f : B 0 → B is an arbitrary continuous map, then there exists a pullback, or induced, bundle f ∗ (π) : E 0 → B 0 , where E 0 = {(e, b) ∈ E × B 0 |f (b) = π(e)}, and f ∗ (π) is the restriction of the projection map to B 0 . There is a natural bundle map from f ∗ (π) to π with the map B 0 → B given by f , and the map ϕ : E 0 → E given by the restriction of projection. If π is locally trivial, a principal G-bundle, or a fiber bundle, then f ∗ (π) is as well. The pullback satisfies the following universal property: X E0 B

f ∗π

ϕ

f

π

0

E B

(i.e. given a diagram with the solid arrows, a map satisfying the dashed arrow exists). Version: 4 Owner: bwebste Author(s): bwebste

604.7

reduction of structure group

Given a fiber bundle p : E → B with typical fiber F and structure group G (henceforth called an (F, G)-bundle over B), we say that the bundle admits a reduction of its structure group to H, where H < G is a subgroup, if it is isomorphic to an (F, H)bundle over B. Equivalently, E admits a reduction of structure group to H if there is a choice of local trivializations covering E such that the transition functions all belong to H. Remark 5. Here, the action of H on F is the restriction of the G-action; in particular, this means that an (F, H)-bundle is automatically an (F, G)-bundle. The bundle isomorphism in the definition then becomes meaningful in the category of (F, G)bundles over B. 2158

Example 1. Let H be the trivial subgroup. Then, the existence of a reduction of structure group to H is equivalent to the bundle being trivial. For the following examples, let E be an n-dimensional vector bundle, so that F ' Rn with G = GL(n, R), the general linear group acting as usual. Example 2. Set H = SL(n, R), the special linear group. A reduction to H is equivalent to an orientation of the vector bundle. In the case where B is a smooth manifold and E = T B is its tangent bundle, this coincides with other definitions of an orientation of B. Example 3. Set H = O(n), the orthogonal group. A reduction to H is called a Riemannian or Euclidean structure on the vector bundle. It coincides with a continuous fiberwise choice of a positive definite inner product, and for the case of the tangent bundle, with the usual notion of a Riemannian metric on a manifold. When B is paracompact, an argument with partitions of unity shows that a Riemannian structure always exists on any given vector bundle. For this reason, it is often convenient to start out assuming the structure group to be O(n). Example 4. Let n = 2m be even, and let H = U(m), the unitary group embedded in GL(n, R) by means of the usual identification of C with R2 . A reduction to H is called a complex structure on the vector bundle, and it is equivalent to a continuous fiberwise choice of an endomorphism J satisfying J 2 = −I. A complex structure on a tangent bundle is called an almost-complex structure on the manifold. This is to distinguish it from the more restrictive notion of a complex structure on a manifold, which requires the existence of an atlas with charts in Cm such that the transition functions are holomorphic. Example 5. Let H = GL(1, R) × GL(n − 1, R), embedded in GL(n, R) by (A, B) 7→ A⊕B. A reduction to H is equivalent to the existence of a splitting E ' E1 ⊕E2 , where E1 is a line bundle, or equivalently to the existence of a nowhere-vanishing section of the vector bundle. For the tangent bundle, this is a nowhere-vanishing tangent vector field. More generally, a reduction to GL(k, R) × GL(n − k, R) is equivalent to a splitting E ' E1 ⊕ E2 , where E1 is a k-plane bundle.

Remark 6. These examples all have two features in common, namely:

– the subgroup H can be interpreted as being precisely the subgroup of G which preserves a particular structure, and, – a reduction to H is equivalent to a continuous fiber-by-fiber choice of a structure of the same kind. For example, O(n) is the subgroup of GL(n, R) which preserves the standard inner product of Rn , and reduction of structure to O(n) is equivalent to a fiberwise choice of inner products. This is not a coincidence. The intuition behind this is as follows. There is no obstacle to choosing a fiberwise inner product in a neighborhood of any given point x ∈ B: 2159

we simply choose a neighborhood U on which the bundle is trivial, and with respect to a trivialization p−1 (U) ∼ = Rn × U, we can let the inner product on each p−1 (y) be the standard inner product. However, if we make these choices locally around every point in B, there is no guarantee that they “glue together” properly to yield a global continuous choice, unless the transition functions preserve the standard inner product. But this is precisely what reduction of structure to O(n) means. A similar explanation holds for subgroups preserving other kinds of structure. Version: 4 Owner: antonio Author(s): antonio

604.8

section of a fiber bundle

Let p : E → B be a fiber bundle, denoted by ξ.

A section of ξ is a continuous map s : B → E such that the composition p ◦ s equals the identity. That is, for every b ∈ B, s(b) is an element of the fiber over b. More generally, given a topological subspace A of B, a section of ξ over A is a section of the restricted bundle p|A : p−1 (A) → A.

The set of sections of ξ over A is often denoted by Γ(A; ξ), or by Γ(ξ) for sections defined on all of B. Elements of Γ(ξ) are sometimes called global sections, in contrast with the local sections Γ(U; ξ) defined on an open set U.

Remark 7. If E and B have, for example, smooth structures, one can talk about smooth sections of the bundle. According to the context, the notation Γ(ξ) often denotes smooth sections, or some other set of suitably restricted sections. Example 6. If ξ is a trivial fiber bundle with fiber F, so that E = F × B and p is projection to B, then sections of ξ are in a natural bijective correspondence with continuous functions B → F.

Example 7. If B is a smooth manifold and E = T B its tangent bundle, a (smooth) section of this bundle is precisely a (smooth) tangent vector field.

In fact, any tensor field on a smooth manifold M is a section of an appropriate vector bundle. For instance, a contravariant k-tensor field is a section of the bundle T M ⊗k obtained by repeated tensor product from the tangent bundle, and similarly for covariant and mixed tensor fields. Example 8. If B is a smooth manifold which is smoothly embedded in a Riemannian manifold M, we can let the fiber over b ∈ B be the orthogonal complement in Tb M of the tangent space Tb B of B at b. These choices of fiber turn out to make up a vector bundle ν(B) over B, called the normal bundle of B. A section of ν(B) is a normal vector field on B. Example 9. If ξ is a vector bundle, the zero section is defined simply by s(b) = 0, the zero vector on the fiber. It is interesting to ask if a vector bundle admits a section which is nowhere zero. The answer is yes, for example, in the case of a trivial vector bundle, but in general it 2160

depends on the topology of the spaces involved. A well-known case of this question is the hairy ball theorem, which says that there are no nonvanishing tangent vector fields on the sphere. Example 10. If ξ is a principal G-bundle, the existence of any section is equivalent to the bundle being trivial. Remark 8. The correspondence taking an open set U in B to Γ(U; ξ) is an example of a sheaf on B. Version: 6 Owner: antonio Author(s): antonio

604.9

some examples of universal bundles

The universal bundle for a topological group G is usually written as π : EG → BG. Any principal G-bundle for which the total space is contractible is universal; this will help us to find universal bundles without worrying about Milnor’s construction of EG involving infinite joins. – G = Z2 : EZ2 = S ∞ and BZ2 = RP ∞ . – G = Zn : EZn = S ∞ and BZn = S ∞ /Zn . Here Zn acts on S ∞ (considered as a subset of separable complex Hilbert space) via multiplication with an n-th root of unity. – G = Zn : EZn = Rn and BZn = T n . – More generally, if G is any discrete group then one can take BG to be any Eilenberg-Mac Lane space K(G, 1) and EG to be its universal cover. Indeed EG is simply connected, and it follows from the lifting theorem that πn (EG) = 0 for n > 0. This example includes the previous three and many more. – G = S 1 : ES 1 = S ∞ and BS 1 = CP ∞ . – G = SU(2): ESU(2) = S ∞ and BSU(2) = HP ∞ . – G = O(n), the n-th orthogonal group: EO(n) = V (∞, n), the manifold of frames of n orthonormal vectors in R∞ , and BO(n) = G(∞, n), the Grassmanian of nplanes in R∞ . The projection map is taking the subspace spanned by a frame of vectors. Version: 7 Owner: bwebste Author(s): AxelBoldt, mhale, Dr Absentius, bwebste

604.10

universal bundle

Let G be a topological group. A universal bundle for G is a principal bundle p : EG → BG such that for any principal bundle π : E → B, with B a CW-complex, 2161

there is a map ϕ : B → BG, unique up to homotopy, such that the pullback bundle ϕ∗ (p) is equivalent to π, that is such that there is a bundle map ϕ0 . E

ϕ0 (E)

p

π

B

EG

ϕ0 (B)

BG

with ϕ0 (B) = ϕ, such that any bundle map of any bundle over B extending ϕ factors uniquely through ϕ0 . As is obvious from the universal property, the universal bundle for a group G is unique up to unique homotopy equivalence. The base space BG is often called a classifying space of G, since homotopy classes of maps to it from a given space classify G-bundles over that space. There is a useful criterion for universality: a bundle is universal if and only if all the homotopy groups of EG, its total space, are trivial. In 1956, John Milnor gave a general construction of the universal bundle for any topological group G (see Annals of Mathematics, Second series, Volume 63 Issue 2 and Issue 3 for details). His construction uses the infinite join of the group G with itself to define the total space of the universal bundle. Version: 9 Owner: bwebste Author(s): bwebste, RevBobo

2162

Chapter 605 55R25 – Sphere bundles and vector bundles 605.1

Hopf bundle

Consider S 3 ⊂ R4 = C2 . The structure of C2 gives a map C2 −{0} → CP 1 , the complex projective line by the natural projection. Since CP 1 is homeomorphic to S 2 , by restriction to S 3 , we get a map π : S 3 → S 2 . We call this the Hopf bundle. This is a principal S 1 -bundle, and a generator of π3 (S 2 ). From the long exact sequence of the bundle: · · · πn (S 1 ) → πn (S 3 ) → πn (S 2 ) → · · · we get that πn (S 3 ) ∼ = Z. = π3 (S 3 ) ∼ = πn (S 2 ) for all n > 3. In particular, π3 (S 2 ) ∼ Version: 2 Owner: bwebste Author(s): bwebste

605.2

vector bundle

A vector bundle is a fiber bundle having a vector space as a fiber and the general linear group of that vector space (or some subgroup) as structure group. Common examples of a vector bundle include the tangent bundle of a differentiable manifold. Version: 1 Owner: RevBobo Author(s): RevBobo

2163

Chapter 606 55U10 – Simplicial sets and complexes 606.1

simplicial complex

An abstract simplicial complex K is a collection of nonempty finite sets with the property that for any element σ ∈ K, if τ ⊂ σ is a nonempty subset, then τ ∈ K. An element of K of cardinality n + 1 is called an n–simplex. An element of an element of K is called a vertex. In what follows, we may occasionally identify a vertex V with its corresponding singleton set {V } ∈ K; the reader will be alerted when this is the case. Although there is an established notion of infinite simplicial complexes, the treatment is much simpler in the finite case and so for now we will assume that K is a finite set. The standard n–simplex, denoted by ∆n , is the simplicial complex consisting of all nonempty subsets of {0, 1, . . . , n}.

606.1.1

Geometry of a simplicial complex

Let K be a simplicial complex, and let V be the set of vertices of K. We introduce the vector space RV of formal R–linear combinations of elements of V ; i.e., RV := {a1 V1 + a2 V2 + . . . ak Vk | ai ∈ R, Vi ∈ V }, and the vector space operations are defined by formal addition and scalar multiplication. Note that we may regard each vertex in V as a one–term formal sum, and thus as a point in RV . The geometric realization of K, denoted |K|, is the subset of RV consisting of the union, over all σ ∈ K, of the convex hull of σ ⊂ RV . The set |K| inherits a metric from RV making it into a metric space and topological space. 2164

Examples: 1. ∆2 = {{0}, {1}, {2}, {0, 1}, {0, 2}, {1, 2}, {0, 1, 2}} has V = 3, so its realization |∆2 | is a subset of R3 , consisting of all points on the hyperplane x + y + z = 1 that are inside or on the boundary of the first quadrant. These points form a triangle in R3 with one face, three edges, and three vertices (for example, the convex hull of {0, 1} ∈ ∆2 is the edge of this triangle that lies in the xy–plane). 2. Similarly, the realization of the standard n–simplex ∆n is an n–dimensional tetrahedron contained inside Rn+1 . 3. A triangle without interior (a ”wire frame” triangle) can be geometrically realized by starting from the simplicial complex {{0}, {1}, {2}, {0, 1}, {0, 2}, {1, 2}}. Notice that, under this procedure, an element of K of cardinality 1 is geometrically a vertex; an element of cardinality 2 is an edge; cardinality 3, a face; and, in general, an element of cardinality n is realized as an n–face inside RV . In general, a triangulation of a topological space X is a simplicial complex K together with a homeomorphism from |K| to X.

606.1.2

Homology of a simplicial complex

In this section we define the homology and cohomology groups associated to a simplicial complex K. We do so not because the homology of a simplicial complex is so intrinsically interesting in and of itself, but because the resulting homology theory is identical to the singular homology of the associated topological space |K|, and therefore provides an accessible way to calculate the latter homology groups (and, by extension, the homology of any space X admitting a triangulation by K). As before, let K be a simplicial complex, and let V be the set of vertices in K. Let the chain group Cn (K) be the subgroup of the exterior algebra Λ(RV ) generated by all elements of the form V0 ∧ V1 ∧ · · · ∧ Vn such that Vi ∈ V and {V0 , V1 , . . . , Vn } ∈ K. Note that we are ignoring here the R–vector space structure of RV ; the group Cn (K) under this definition is merely a free abelian group, generated by the alternating products of the above form and with the relations that are implied by the properties of the wedge product. Define the boundary map ∂n : Cn (K) −→ Cn−1 (K) by the formula n X ∂n (V0 ∧ V1 ∧ · · · ∧ Vn ) := (−1)j (V0 ∧ · · · ∧ Vˆj ∧ · · · ∧ Vn ), j=0

where the hat notation means the term under the hat is left out of the product, and extending linearly to all of Cn (K). Then one checks easily that ∂n−1 ◦ ∂n = 0, so the collection of chain 2165

groups Cn (K) and boundary maps ∂n forms a chain complex C(K). The simplicial homology and cohomology groups of K are defined to be that of C(K). Theorem: The simplicial homology and cohomology groups of K, as defined above, are canonically isomorphic to the singular homology and cohomology groups of the geometric realization |K| of K. The proof of this theorem is considerably more difficult than what we have done to this point, requiring the techniques of barycentric subdivision and simplicial approximation, and so for now we refer the reader who wants to learn more to [1].

REFERENCES 1. Munkres, James. Elements of Algebraic Topology, Addison–Wesley, New York, 1984.

Version: 2 Owner: djao Author(s): djao

2166

Chapter 607 57-00 – General reference works (handbooks, dictionaries, bibliographies, etc.) 607.1

connected sum

Let M and N be two n-manifolds. Choose points m ∈ M and n ∈ N, and let U, V be neighborhoods of these points, respectively. Since M and N are manifolds, we may assume that U and V are balls, with boundaries homeomorphic to (n − 1)-spheres, since this is possible in Rn . Then let ϕ : ∂U → ∂V be a homeomorphism. If M and N are oriented, this should be orientation preserving with respect to the induced orientation (that is, degree 1). Then the connected sum M]N is M − U and N − V glued along the boundaries by ϕ. That is, the disjoint union of M − U and N − V modulo the equivalence relation x ∼ y if x ∈ ∂U, y ∈ ∂V and ϕ(x) = y. Version: 1 Owner: bwebste Author(s): bwebste

2167

Chapter 608 57-XX – Manifolds and cell complexes 608.1

CW complex

A Hausdorff topological space X is said to be a CW complex if it satisfies the following conditions: 1. There exists a filtration by subspaces

with X =

S

X (−1) ⊆ X (0) ⊆ X (1) ⊆ X (2) ⊆ · · · X (n) .

n≥−1

2. X (−1) is empty, and, for n ≥ 0, X (n) is obtained from X (n−1) by attachment of a collection {enι : ι ∈ In } of n-cells. 3. (“closure-finite”) Every closed cell is contained in a finite union of open cells. 4. (“weak topology”) X has the weak topology with respect to the collection of all cells. That is, A ⊂ X is closed in X if and only if the intersection of A with every closed cell e is closed in e with respect to the subspace topology. The letters ‘C’ and ‘W’ stand for “closure-finite” and “weak topology,” respectively. In particular, this means that one shouldn’t look too closely at the initials of J.H.C. Whitehead, who invented CW complexes. The subspace X (n) is called the n-skeleton of X. Note that there normally are many possible choices of a filtration by skeleta for a given CW complex. A particular choice of skeleta and attaching maps for the cells is called a CW structure on the space. Intuitively, X is a CW complex if it can be constructed, starting from a discrete space, by first attaching one-cells, then two-cells, and so on. Note that the definition above does not allow 2168

one to attach k-cells before h-cells if k > h. While some authors allow this in the definition, it seems to be common usage to restrict CW complexes to the definition given here, and to call a space constructed by cell attachment with unrestricted order of dimensions a cell complex. This is not essential for homotopy purposes, since any cell complex is homotopy equivalent to a CW complex. CW complexes are a generalization of simplicial complexes, and have some of the same advantages. In particular, they allow inductive reasoning on the basis of skeleta. However, CW complexes are far more flexible than simplicial complexes. For a space X drawn from “everyday” topological spaces, it is a good bet that it is homotopy equivalent, or even homeomorphic, to a CW complex. This includes, for instance, smooth finite-dimensional manifolds, algebraic varieties, certain smooth infinite-dimensional manifolds (such as Hilbert manifolds), and loop spaces of CW complexes. This makes the category of spaces homotopy equivalent to a CW complex a very popular category for doing homotopy theory. Remark 9. There is potential for confusion in the way words like “open” and “interior” are used for cell complexes. If ek is a closed k-cell in CW complex X it does not follow that the ◦ corresponding open cell ek is an open set of X. It is, however, an open set of the k-skeleton. ◦k Also, while e is often referred to as the “interior” of ek , it is not necessarily the case that it is the interior of ek in the sense of pointset topology. In particular, any closed 0-cell is its own corresponding open 0-cell, even though it has empty interior in most cases. Version: 7 Owner: antonio Author(s): antonio

2169

Chapter 609 57M25 – Knots and links in S 3 609.1

connected sum

The connected sum of knots K and J is a knot constructed by removing a segment from K and a segment of J and joining the free ends to form a knot (that is joing the ends so as to create a link of one component). The connected sum of oriented knots K and J is the knot constructed by removing a segment from K and a segment of J and joining the ends to form a knot with a consistent orientation inherited from K and J. The connected sum of K and J is denoted K#J. The connected sum of of two knots always exists but is not necessarily unique. The connected sum of two oriented knots exist and is unique. Version: 1 Owner: basseykay Author(s): basseykay

609.2

knot theory

Knot theory is the study of knots and links. Roughly a knot is a simple closed curve in R3 , and two knots are considered equivalent if and only if one can be smoothly deformed into another. This will often be used as a working definition as it is simple and appeals to intuition. Unfortunately this definition can not be taken too seriously because it includes many pathological cases, or wild knots, such as the connected sum of an infinite number of trefoils. Furthermore one must be careful about defining a “smooth deformation”, or all knots might turn out to be equivalent! (We shouldn’t be allowed to shrink part of a knot down to nothing.)

2170

Links are defined in terms of knots, so once we have a definition for knots we have no trouble defining them. Definition 53. A link is a set of disjoint knots. Each knot is a component of the link. In particular a knot is a link of one component. Luckily the knot theorist is not usually interested in the exact form of a knot or link, but rather the in its equivalence class. (Even so a possible formal definition for knots is given at the end of this entry.) All the “interesting” information about a knot or link can be described using a knot diagram. (It should be noted that the words “knot” and “link” are often used to mean an equivalence class of knots or links respectively. It is normally clear from context if this usage is intended.) A knot diagram is a projection of a link onto a plane such that no more than two points of the link are projected to the same point on the plane and at each such point it is indicated which strand is closest to the plane (usually by erasing part of the lower strand). This can best be explained with some examples:

some knot diagrams Two different knot diagrams may both represent the same knot — for example the last two diagrams both represent the unknot, although this is not obvious. Much of knot theory is devoted to telling when two knot diagrams represent the same link. In one sense this problem is easy — two knot diagrams represent equivalent links if and only if there exists a sequence of Reidemeister moves transforming one diagram to another. Definition 54. A Reidemeister move consists of modifying a portion of knot diagram in one of the following ways: 1. A single strand may be twisted, adding a crossing of the strand with itself. 2. A single twist may be removed from a strand removing a crossing of the strand with itself. 3. When two strands run parallel to each other one may be pushed under the other creating to new over-crossings. 4. When one strand has two consecutive over-crossings with another strand the strands may be straightened so that the two run parallel. 5. Given three strands A, B and C so that A passes below B and C, B passes between A and C, and C passes above A and B, the strand A may be moved to either side of the crossing of B and C. 2171

Note that number 1. is the inverse of number 2. and number 3. is the inverse of number 4. Number 5 is its own inverse. In pictures: ←→

←→

←→

Finding such a sequence of Reidemeister moves is generally not easy, and proving that no such sequence exists can be very difficult, so other approaches must be taken. Knot theorists have accumulated a large number of knot invariants, values associated with a knot diagram which are unchanged when the diagram is modified by a Reidemeister move. Two diagrams with the same invariant may not represent the same knot, but two diagrams with different invariant never represent the same knot. Knot theorists also study ways in which a complex knot may be described in terms of simple pieces — for example every knot is the connected sum of non trivial prime knots and many knots can be described simply using Conway notation.

formal definitions of knot polygonal knots This definition is used by Charles Livingston in his book Knot Theory. It avoids the problem of wild knots by restricting knots to piece-wise linear (polygonal) curves. Every knot that is intuitively “tame” can be approximated by such knot. We also define the vertices, elementary deformation, and equivalence of knots. Definition 55. A knot is a simple closed polygonal curve in S3 . Definition 56. The vertices of a knot are the smallest ordered set of points such that the knot can be constructed by connecting them. Definition 57. A knot J is an elementary deformation of a knot K if one is formed from the other by adding a single vertex v0 not on the knot such that the triangle formed by v0 together with its adjacent vertices v1 and v2 intersects the knot only along the segment [v1 , v2 ]. 2172

Definition 58. A knot K0 is equivalent to a knot Kn if there exists a sequence of knots K1 , . . . , Kn−1 such that Ki is an elementary deformation of Ki−1 for 1 < i 6 n. smooth submanifold This definition is used by Raymond Lickorish in An Introduction to Knot Theory. Definition 59. A link is a smooth one dimensional submanifold of the 3-sphere S 3 . A knot is a link consisting of one component. Definition 60. Links L1 and L2 are defined to be equivalent if there is an orientationpreserving homeomorphism h : S 3 → S 3 such that h(L1 ) = L2 . Version: 5 Owner: basseykay Author(s): basseykay

609.3

unknot

The unknot is the knot with a projection containing no crossings, or equivalently the knot with three vertices. The unknot forms an identity when working with the connected sum of knots; that is, if U is the unknot, then K = K#U for any knot K. Version: 3 Owner: basseykay Author(s): basseykay

2173

Chapter 610 57M99 – Miscellaneous 610.1

Dehn surgery

Let M be a smooth 3-manifold, and K ⊂ M a smooth knot. Since K is an embedded submanifold, by the tubular neighborhood theorem there is a closed neighborhood U of K diffeomorphic to the solid torus D 2 × S 1 . We let U 0 denote the interior of U. Now, ` let 0 0 ϕ : ∂U → ∂U be an automorphism of the torus, and consider the manifold M = M\U ϕ U, which is the disjoint union of M\U 0 and U, with points in the boundary of U identified with their images in the boundary of M\U 0 under ϕ. It’s a bit hard to visualize how this actually results in a different manifold, but it generally does. For example, if M = S 3 , the 3-sphere, K is the trivial knot, and ϕ is the automorphism exchanging meridians and parallels (i.e., since U ∼ = = D 2 × S 1 , get an isomorphism ∂U ∼ 1 1 1 S × S , and ϕ is the map interchanging to the two copies of S ), then one can check that M0 ∼ = S 1 × S 2 (S 3 \U is also a solid torus, and after our automorphism, we glue the two solid tori, meridians to meridians, parallels to parallels, so the two copies of D 2 paste along the edges to make S 2 ). Every compact 3-manifold can obtained from the S 3 by surgery around finitely many knots. Version: 1 Owner: bwebste Author(s): bwebste

2174

Chapter 611 57N16 – Geometric structures on manifolds 611.1

self-intersections of a curve

self-intersections of a curve Let X be a topological manifold and γ : [0, 1] → X a segment of a curve in X. Then the curve is said to have a self-intersection in a point p ∈ X if γ fails to be surjective in this point, i.e. if there exists a, b ∈ (0, 1), with a 6= b such that γ(a) = γ(b). Usually, the case when the curve is closed i.e. γ(0) = γ(1), is not considered as a self-intersecting curve. Version: 4 Owner: mike Author(s): mike, apmxi

2175

Chapter 612 57N70 – Cobordism and concordance 612.1

h-cobordism

A cobordism (N, M, M 0 ) is called an h-cobordism if the inclusion maps i : M → N and i0 : M 0 → N are homotopy equivalences. Version: 1 Owner: bwebste Author(s): bwebste

612.2

Smale’s h-cobordism theorem

Let (N, M, M 0 ) be an h-cobordism of smooth manifolds with N (and hence M and M 0 ) simply connected, and dim(M) > 5. Then N ∼ = M × [0, 1] and M and M 0 are diffeomorphic. Version: 2 Owner: bwebste Author(s): bwebste

612.3

cobordism

Two oriented n-manifolds M and M 0 are cobordant if there is an oriented n+1 manifold ` called 0opp with boundary N such that ∂N = M M where M 0opp is M 0 with orientation reversed. The triple (N, M, M 0 ) is called a cobordism. Cobordism is an equivalence relation, and a very coarse invariant of manifolds. For example, all surfaces are cobordant to the empty set (and hence to each other). There is a cobordism category, where the objects are manifolds, and the morphisms are cobordisms between them. This category is important in topological quantum field theory.

2176

Version: 2 Owner: nobody Author(s): bwebste

2177

Chapter 613 57N99 – Miscellaneous 613.1

orientation

There are many definitions of an orientation of a manifold. The most general, in the sense that it doesn’t require any extra structure on the manifold, is based on (co-)homology theory. For this article manifold means a connected, topological manifold possibly with boundary. Theorem 30. Let M be a closed, n–dimensional manifold. Then Hn (M ; Z) the top dimensional homology group of M, is either trivial ({0}) or isomorphic to Z. Definition 61. A closed n–manifold is called orientable if its top homology group is isomorphic to the integers. An orientation of M is a choice of a particular isomorphism o : Z → Hn (M ; Z). An oriented manifold is a (necessarily orientable) manifold M endowed with an orientation. If (M, o) is an oriented manifold then o(1) is called the fundamental class of M , or the orientation class of M, and is denoted by [M]. Remark 4. Notice that since Z has exactly two automorphisms an orientable manifold admits two possible orientations. Remark 5. The above definition could be given using cohomology instead of homology. The top dimensional homology of a non-closed manifold is always trivial, so it is trickier to define orientation for those beasts. One approach (which we will not follow) is to use special kind of homology (for example relative to the boundary for compact manifolds with boundary). The approach we follow defines (global) orientation as compatible fitting together of local orientations. We start with manifolds without boundary. Theorem 31. Let M be an n-manifold without boundary and x ∈ M. Then the relative homology group Hn (M, M \ x ; Z) ∼ =Z 2178

Definition 62. Let M be an n-manifold and x ∈ M. An orientation of M at x is a choice of an isomorphism ox : Z → Hn (M, M \ x ; Z). One way to make precise the notion of nicely fitting together of orientations at points, is to require that for nearby points the orientations are defined in a uniform way. Theorem 32. Let U be an open subset of M that is homeomorphic to Rn (e.g. the domain of a chart). Then, Hn (M, M \ U ; Z) ∼ = Z. Definition 63. Let U be an open subset of M that is homeomorphic to Rn . A local orientation of M on U is a choice of an isomorphism oU : Hn (M, M \ U ; Z) → Z. Now notice that with U as above and x ∈ U the inclusion ıUx : M \ U ,→ M \ x

induces a map (actually isomorphism) ıUx ∗ : Hn (M, M \ U ; Z) → Hn (M, M \ x ; Z) and therefore a local orientation at U induces (by composing with the above isomorphism) an orientation at each point x ∈ U. It is natural to declare that all these orientations fit nicely together. Definition 64. Let M be a manifold without boundary. An orientation of M is a choice of an orientation ox for each point x ∈ M, with the property that each point x has a coordinate neighborhood U so that for each y ∈ U, the orientation oy is induced by a local orientation on U. A manifold is called orientable if it admits an orientation. Remark 6. Although we avoided using this terminology what we did was to indicate how a sheaf of orientations could be defined, and then we defined an orientation to be a global section of that sheaf. Definition 65. Let M be a manifold with non-empty boundary, ∂M 6= ∅. M is called orientable if its double [ ˆ := M M M is orientable, where

S

∂M

∂M

denotes gluing along the boundary.

ˆ An orientation of M is determined by an orientation of M. Version: 8 Owner: Dr Absentius Author(s): Dr Absentius 2179

Chapter 614 57R22 – Topology of vector bundles and fiber bundles 614.1

hairy ball theorem

Theorem 33. If X is a vector field on S 2n , then X has a zero. Alternatively, there are no continuous unit vector fields on the sphere. Furthermore, the tangent bundle of the sphere is non-trivial. irst, the low tech proof. Think of S 2n as a subset of R2n+1 . Let X : S 2n → S 2n be a unit vector field. Now consider F : S 2n × [0, 1] → S 2n , F (v, t) = (cos πt)v + (sin πt)X(v). For any v ∈ S 2n , X(v) ⊥ v, so kF (v, t)k = 1 for all v ∈ S 2n . Clearly, F (v, 0) = v and F (v, 1) = −v. Thus, F is a homotopy between the identity and antipodal map. But the identity is orientation preserving, and the antipodal map is orientation reversing, since it is the composition of 2n + 1 reflections in hyperplanes. Alternatively, one has degree 1, and the other degree −1. Thus they are not homotopic. F

This also implies that the tangent bundle of S 2n is non-trivial, since any trivial bundle has a non-zero section. It is not difficult to show that S 2n+1 has non-vanishing vector fields for all n. A much harder result of Adams shows that the tangent bundle of S m is trivial if and only if n = 0, 1, 3, 7, corresponding to the unit spheres in the 4 real division algebras. The hairy ball theorem is, in fact, a consequence of a much more general theorem about vector fields on smooth compact manifolds. Near a zero of a vector field, we can consider a small sphere around the zero, and restrict the vector field to that. By normalizing, we get a map from the sphere to itself. We define the index of the vector field at a zero to be the degree of that map. 2180

Theorem 34. (Poincar´ e-Hopf) If X is a vector field on a compact manifold M with isolated P zeroes, then χ(M) = v∈Z(X) ι(v) where Z(X) is the set of zeroes of X, and ι(v) is the index of x at v, and χ(M) is the Euler characteristic of M. A corollary of this is that if M has a nonvanishing vector field, the above sum is empty and the Euler characteristic is 0. Since χ(S 2 ) = 2, the hairy ball theorem follows immediately. This is a much more difficult theorem (the only proof I know requires Morse theory and Poincar´e duality). Version: 5 Owner: bwebste Author(s): bwebste

2181

Chapter 615 57R35 – Differentiable mappings 615.1

Sard’s theorem

Let φ : X n → Y m be a smooth map on smooth manifolds. A critical point of φ is a point p ∈ X such that the differential φ∗ : Tp X → Tφ(p) Y considered as a linear transformation of real vector spaces has rank < m. A critical value of φ is the image of a critical point. A regular value of φ is a point q ∈ Y which is not the image of any critical point. In particular, q is a regular value of φ if q ∈ Y \ φ(X). Sard’s Theorem. Let φ : X → Y be a smooth map on smooth manifolds. Then the set of regular values of φ has measure zero. Version: 4 Owner: mathcam Author(s): mathcam, bs

615.2

differentiable function

Let f : V → W be a function, where V and W are Banach spaces. (A Banach space has just enough structure for differentiability to make sense; V and W could also be differentiable manifolds. The most familiar case is when f is a real function, that is V = W = R. See the derivative entry for details.) For x ∈ V , the function f is said to be differentiable at x if its derivative exists at that point. Differentiability at x ∈ V implies continuity at x. If S ⊆ V , then f is said to be differentiable on S if f is differentiable at every point x ∈ S. df For the most common example, a real function f : R → R is differentiable if its derivative dx exists for every point in the region of interest. For another common case of a real function of n variables f (x1 , x2 , . . . , xn ) (more formally f : Rn → R), it is not sufficient that the ∂f partial derivatives dx exist for f to be differentiable. The derivative of f must exist in the i original sense at every point in the region of interest, where Rn is treated as a Banach space

2182

under the usual Euclidean vector norm. If the derivative of f is continuous, then f is said to be C 1 . If the kth derivative of f is continuous, then f is said to be C k . By convention, if f is only continuous but does not have a continuous derivative, then f is said to be C 0 . Note the inclusion property C k+1 ⊆ C k . And if the kthe derivative of f is continuous for all k, then f is said to be C ∞ . In other T∞ ∞ ∞ k words C is the intersection C = k=0 C .

Differentiable functions are often referred to as smooth. If f is C k , then f is said to be k-smooth. Most often a function is called smooth (without qualifiers) if f is C ∞ or C 1 , depending on the context. Version: 17 Owner: matte Author(s): igor

2183

Chapter 616 57R42 – Immersions 616.1

immersion

Let X and Y be manifolds, and let f be a mapping f : X → Y . Let x ∈ X. Let y = f (x). Let dfx : Tx (X) → Ty (Y ) be the derivative of f at point x (where Tx (X) means the tangent space of manifold X at point x). If dfx is injective, then f is said to be an immersion at x. If f is an immersion at every point, it is called an immersion. Version: 3 Owner: bshanks Author(s): bshanks

2184

Chapter 617 57R60 – Homotopy spheres, Poincar´ e conjecture 617.1

Poincar´ e conjecture

Conjecture (Poincar´e) Every 3-manifold which is homotopy equivalent to the 3-sphere is in fact homeomorphic to it. Or, in a more elementary form, every simply-connected compact 3-manifold is homeomorphic to S 3 . The first statement is known to be true when 3 is replaced by any other number, but has thus far resisted proof in the 3-dimensional case. The Poincar´e conjecture is one of the Clay Institute Millenium Prize Problems. You can read more about the conjecture here. In 2003, Grisha Perelman announced results which would imply the Poincar´e conjecture if they prove to be true, but since they are highly technical, they are still being reviewed. Version: 3 Owner: bwebste Author(s): bwebste

617.2

The Poincar´ e dodecahedral space

Poincar´e originally conjectured that a homology 3-sphere must be homeomorphic to S 3 . (See “Second compl´ement `a l’analysis situs”, Proceedings of the LMS, 1900.) He later found a counterexample based on the group of rotations of the dodecahedron, and restated his conjecture in terms of the fundamental group. (See “Cinqui`eme compl´ement `a l’analysis situs”, Proceedings of the LMS, 1904.) This conjecture is one of the Clay Mathematics Institute’s Millenium Problems. It remains unresolved as of December 2003, altho Grisha Perelman has

2185

circulated manuscripts which purport to solve Thurston’s Geometrization Conjecture, from which the Poincar´e conjecture follows. Let Γ be the rotations of the regular dodecahedron. It is easy to check that Γ ∼ = A5 . (Indeed, Γ permutes transitively the 6 pairs of opposite faces, and the stabilizer of any pair induces a dihedral group of order 10.) In particular, Γ is perfect. Let P be the quotient space P = SO3 R/Γ. We check that P is a homology sphere. To do this it is easier to work in the universal cover SU(2) of SO3 R, since SU(2) ∼ = S 3 . The ˆ Hence P = SU(2)/Γ. ˆ Γ ˆ is a nontrivial central extension lift of Γ to SU(2) will be denoted Γ. ˆ → Γ. This is of A5 by {±I}, which means that there is no splitting to the surjection Γ ˆ like because A5 has no nonidentity 2-dimensional unitary representations. In particular, Γ, Γ, is perfect. ˆ whence H 1 (P, Z) is the abelianization of Γ, ˆ which is 0. By Poincar´e duality Now π1 (P ) ∼ = Γ, 2 ∼ and the universal coefficient theorem, H (P, Z) = 0 as well. Thus, P is indeed a homology sphere. The dodecahedron is a fundamental cell in a tiling of hyperbolic 3-space, and hence P can also be realized by gluing the opposite faces of a solid dodecahedron. Alternatively, Dehn showed how to construct it using surgery around a trefoil. Dale Rolfson’s fun book “Knots and Links” (Publish or Perish Press, 1976) has more on the surgical view of Poincar´e’s example. Version: 19 Owner: livetoad Author(s): livetoad, bwebste

617.3

homology sphere

A compact n-manifold M is called a homology sphere if its homology is that of the n-sphere S n , i.e. H0 (M; Z) ∼ = Z and is zero otherwise. = Hn (M; Z) ∼ An application of the Hurewicz theorem and homological Whitehead theorem shows that any simply connected homology sphere is in fact homotopy equivalent to S n , and hence homeomorphic to S n for n 6= 3, by the higher dimensional equivalent of the Poincar´e conjecture. The original version of the Poincar´e conjecture stated that every 3 dimensional homology sphere was homeomorphic to S 3 , but Poincar´e himself found a counter-example. There are, in fact, a number of interesting 3-dimensional homology spheres. Version: 1 Owner: bwebste Author(s): bwebste

2186

Chapter 618 57R99 – Miscellaneous 618.1

transversality

Transversality is a fundamental concept in differential topology. We say that two smooth submanifolds A, B of a smooth manifold M intersect transversely, if at any point x ∈ T A B, we have Tx A + Tx B = Tx X, where Tx denotes the tangent space at x, and we naturally identify Tx A and Tx B with subspaces of Tx X. T In this case, A and B intersect properly in the sense that A B is a submanifold of M, and \ codim(A B) = codim(A) + codim(B).

A useful generalization is obtained if we replace the inclusion A ,→ M with a smooth map f : A → M. In this case we say that f is transverse to B ⊂ M, if for each point a ∈ f −1 (B), we have dfa (Ta A) + Tf (a) B = Tf (a) M. In this case it turns out, that f −1 (B) is a submanifold of A, and codim(f −1 (B)) = codim(B). Note that if B is a single point b, then the condition of f being transverse to B is precisely that b is a regular value for f . The result is that f −1 (b) is a submanifold of A. A further generalization can be obtained by replacing the inclusion of B by a smooth function as well. We leave the details to the reader. The importance of transversality is that it’s a stable and generic condition. This means, in broad terms that if f : A → M is transverse to B ⊂ M, then small perturbations of f are 2187

also transverse to B. Also, given any smooth map A → M, it can be perturbed slightly to obtain a smooth map which is transverse to a given submanifold B ⊂ M. Version: 3 Owner: mathcam Author(s): mathcam, gabor sz

2188

Chapter 619 57S25 – Groups acting on specific manifolds 619.1

Isomorphism of the group P SL2(C) with the group of Mobius transformations

We identify the group G of M¨obius transformations with the projective special linear group P SL2 (C). The isomorphism Ψ (of topological groups) is given by Ψ : [( ac db )] 7→ az+b . cz+d This mapping is:  0 0  Well-defined: If [( ac db )] = ac0 db0 then (a0 , b0 , c0 , d0 ) = t(a, b, c, d) for some t, so z 7→ 0 z+b0 is the same transformation as z 7→ ca0 z+d 0.

az+b cz+d

A homomorphism: Calculating the composition ew+f a gw+h +b az + b (ae + bg)w + (af + bh) = = ew+f cz + d z= ew+f (ce + dg)w + (cf + dh) c gw+h + d gw+h 

  = Ψ [( ac db )] · ge fh .  a0 b0  a b )]) = Ψ A monomorphism: If Ψ ([( , then it follows that (a0 , b0 , c0 , d0 ) = t(a, b, c, d), c d c0 d0  0 0 so that [( ac db )] = ac0 db0 . we see that Ψ ([( ac db )]) · Ψ

e f g h



An epimorphism: Any M¨obius transformation z 7→ Version: 2 Owner: ariels Author(s): ariels

2189

az+b cz+d

is the image Ψ ([( ac db )]).

Chapter 620 58A05 – Differentiable manifolds, foundations 620.1

partition of unity

Let M be a manifold with a locally finite cover {Ui }. A Definition 66. partition of unity is a collection of continuous functions εi (x) such that 1. 0 6 εi (x) 6 1, 2. εi (x) = 0 if x ∈ / Ui , P 3. i εi (x) = 1 for any x ∈ M.

Example 26 (Circle). A partition of unity for S1 is given by {sin2 (θ/2), cos2 (θ/2)} subordinate to the covering {(0, 2π), (−π, π)}. Application to integration Let M be an orientable manifold with volume form ω and a partition of unity {εi (x)}. Then, the integral of a function f (x) over M is given by X intM f (x)ω = intUi εi (x)f (x)ω. i

It is independent of the choice of partition of unity. Version: 2 Owner: mhale Author(s): mhale

2190

Chapter 621 58A10 – Differential forms 621.1

differential form

Let M be a differential manifold. Let F denote the ring of smooth functions on M; let V = Γ(TM) denote the F-module of sections of the tangent bundle, i.e. the vector fields on M; and let A = Γ(T∗ M) denote the F-module of sections of the cotangent bundle, i.e. the differentials on M. Definition 1. A differential form is an element of ΛF(A), the exterior algebra over F generated by the differentials. A differential k-form, or simply a k-form is a homogeneous element of Λk (A). The F module of differential k-forms in typically denoted by Ωk (M); typically Ω∗ (M) is used to denote the full exterior algebra. Definition 2. The exterior derivative d : Ω∗ (M) → Ω∗ (M), is the unique first-order differential operator satisfying df : V 7→ V [f ],

f ∈ F, V ∈ V,

where V [f ] denotes the directional derivative of a function f with respect to a vector field V , as well as satisfying d(α ∧ β) = d(α) ∧ β + (−1)deg α α ∧ d(β),

α, β ∈ Ω∗ (M).

Let (x1 , . . . , xn ) be a system of local coordinates, and let { ∂x∂ 1 , . . . , ∂x∂n } be the corresponding basis of vector fields: ∂ j [x ] = δij , (Kronecker deltas). ∂xi 2191

By the definition of the exterior derivative, it follows that   ∂ j = δij , dx ∂xi and hence the differentials dx1 , . . . , dxn give the corresponding dual basis. Taking exterior products of these basic 1-forms one obtains basic k-forms dxj1 ∧ . . . ∧ dxjk . An arbitrary k-form is a linear combination of these basic forms with smooth function coefficients. Thus, every α ∈ Ωk (M) has a unique expression of the form X αj1 ...jk dxj1 ∧ . . . ∧ dxjk , αj1 ...jk ∈ F. α= j1 <j2 <...<jk

P P Sample calculations. Let α = j αj dxj and β = j βj dxj be 1-forms, expressed relative to a system of local coordinates. The corresponding expressions for the exterior derivative and the exterior product are: X ∂αj X  ∂αk ∂αj  k j dα = dx ∧ dx = − k dxj ∧ dxk k j ∂x ∂x ∂x jk j
j
Grad, Div, and Curl On a Riemannian manifold, and in Euclidean space in partiucular, one can express the gradient, the divergence, and the curl operators in terms of the exterior derivative. The metric tensor gij and its inverse g ij define isomorphisms g:V→A

g−1 : A → V

between vectors and 1-forms. In a system of local coordinates these isomorphisms are expressed as X X ∂ ∂ j i → g dx , dx → g ij j . ij i ∂x ∂x j j

Commonly, this isomorphism is expressed by raising and lowering the indices of a tensor field, using contractions with gij and g ij . The gradient operator, which in tensor notation is expressed as (∇f )i = g ij f,j ,

f ∈ F,

∇f = g−1 (df ),

f ∈ F.

can now be defined as

Another natural structure on an n-dimensional Riemannian manifold is the volume form, ω ∈ Ωn (M), defined by p ω = det gij dx1 ∧ . . . ∧ dxn . 2192

Multiplication by the volume form defines a natural isomorphism between functions and n-forms: f → f ω, f ∈ F. Contraction with the volume form defines a natural isomorphism between vector fields and (n − 1)-forms: X → X · ω, X ∈ V, or equivalently

p ∂ i+1 ci ∧ . . . ∧ dxn , → (−1) det gij dx1 ∧ . . . ∧ dx ∂xi

ci indicates an omitted factor. The divergence operator, which in tensor notation is where dx expressed as div X = X;ii , X ∈ cV can also be defined by the following relation (div X) ω = d(X · ω),

X ∈ V.

Note: the proof of the equivalence is non-trivial. Finally, if the manifold is 3-dimensional we may define the curl operator by means of the following relation: (curl X) · ω = d(g(X)), X ∈ V. Version: 10 Owner: rmilson Author(s): rmilson, quincynoodles

2193

Chapter 622 58A32 – Natural bundles 622.1

conormal bundle

Let X be an immersed submanifold of M, with immersion i : X → M. Then as with the normal bundle, we can pull the cotangent bundle back to X, forming a bundle i∗ T ∗ M. This has a canonical pairing with i∗ T M, essentially by definition. Since T X is a natural subbundle, we can consider its annihilator: the subbundle of i∗ T ∗ M given by ∗ {(x, λ)|x ∈ X, λ ∈ Ti(x) M, λ(v) = 0∀v ∈ Tx X}.

This subbundle is denoted N ∗ X, and called the conormal bundle of X. The conormal bundle to any submanifold is a natural lagrangian submanifold of T ∗ M. Version: 1 Owner: bwebste Author(s): bwebste

622.2

cotangent bundle

Let M be a differentiable manifold. Analogously to the construction of the tangent bundle, we can make the set of covectors on a given manifold into a vector bundle over M, denoted T ∗ M and called the cotangent bundle of M. This is simply the vector bundle dual to T M. On any differentiable manifold, T ∗ M ∼ = T M (for example, by the existence of a Riemannian metric), but this identification is by no means canonical, and thus it is useful to distinguish between these two objects. For example, the cotangent bundle to any manifold has a natural symplectic structure, which is in some sense unique. This is not true of the tangent bundle. Version: 1 Owner: bwebste Author(s): bwebste

2194

622.3

normal bundle

Let X be an immersed submanifold of M, with immersion i : X → M. Then we can restrict the tangent bundle of M to N or more properly, take the pullback i∗ T M. This, as an abstract vector bundle over X should contain a lot of information about the embedding of X into M. But there is a natural injection T X → i∗ T M, and the subbundle which is the image of this only has information on the intrinsic properties of X, and thus is useless in obtaining information about the embedding of X into M. Instead, to get information on this, we take the quotient i∗ T M/T X = NX, the normal bundle of X. The normal bundle is very strongly dependent on the immersion i. If E is any vector bundle on X, then E is the normal bundle for the embedding of X into E as the zero section. The normal bundle determines the local geometry of the embedding of X into M in the following sense: In M, there exists an open neighborhood U ⊃ X which is diffeomorphic to NX by a diffeomorphism taking X to the zero section. Version: 2 Owner: bwebste Author(s): bwebste

622.4

tangent bundle

Let M be a differentiable manifold. Let the tangent bundle T M of M be(as a set) the ` disjoint union m∈M Tm M of all the tangent spaces to M, i.e., the set of pairs {(m, x)|m ∈ M, x ∈ Tm M}.

This naturally has a manifold structure, given as follows. For M = Rn , T Rn is obviously isomorphic to R2n , and is thus obviously a manifold. By the definition of a differentiable manifold, for any m ∈ M, there is a neighborhood U of m and a diffeomorphism ϕ : Rn → U. Since this map is a diffeomorphism, its derivative is an isomorphism at all points. Thus T ϕ : T Rn = R2n → T U is bijective, which endows T U with a natural structure of a differentiable manifold. Since the transition maps for M are differentiable, they are for T M as well, and T M is a differentiable manifold. In fact, the projection π : T M → M forgetting the tangent vector and remembering the point, is a vector bundle. A vector field on M is simply a section of this bundle. The tangent bundle is functorial in the obvious sense: If f : M → N is differentiable, we get a map T f : T M → T N, defined by f on the base, and its derivative on the fibers. Version: 2 Owner: bwebste Author(s): bwebste

2195

Chapter 623 58C35 – Integration on manifolds; measures on manifolds 623.1

general Stokes theorem

Let M be an oriented r-dimensional differentiable manifold with a piecewise differentiable boundary ∂M. Further, let ∂M have the orientation induced by M. If ω is an (r − 1)-form on M with compact support, whose components have continuous first partial derivatives in any coordinate chart, then intM dω = int∂M ω. Version: 8 Owner: matte Author(s): matte, quincynoodles

623.2

proof of general Stokes theorem

We divide the proof in several steps. Step One. Suppose M = (0, 1] × (0, 1)n−1 and cj ∧ · · · ∧ dxn ω(x1 , . . . , xn ) = f (x1 , . . . , xn ) dx1 ∧ · · · ∧ dx

(i.e. the term dxj is missing). Hence we have   ∂f ∂f cj ∧ · · · ∧ dxn dω(x1 , . . . , xn ) = dx1 + · · · + dxn ∧ dx1 ∧ · · · ∧ dx ∂x1 ∂xn ∂f dx1 ∧ · · · ∧ dxn = (−1)j−1 ∂xj 2196

and from the definition of integral on a manifold we get intM dω = int10 · · · int10 (−1)j−1

∂f dx1 · · · dxn . ∂xj

From the fundamental theorem of calculus we get d1 · · · int1 f (x , . . . , 1, . . . , x )−f (x , . . . , 0, . . . , x )dx · · · dx cj · · · dxn . intM dω = (−1)j−1 int10 · · · int 1 n 1 n 1 0 0 Since ω and hence f have compact support in M we obtain   int10 · · · int10 f (1, x2 , . . . , xn )dx2 · · · dxn if j = 1 intM dω =  0 if j > 1.

On the other hand we notice that int∂M ω is to be understood as int∂M i∗ ω where i : ∂M → M is the inclusion map. Hence it is trivial to verify that when j 6= 1 then i∗ ω = 0 while if j = 1 it holds i∗ ω(x) = f (1, x2 , . . . , xn )dx2 ∧ . . . ∧ dxn and hence, as wanted int∂M i∗ ω = int10 · · · int10 f (x)dx2 · · · dxn . Step Two. Suppose now that M = [0, 1) × (0, 1)n−1 and let ω be any differential form. We can always write X cj ∧ · · · ∧ dxn ω(x) = fj (x)dx1 ∧ · · · ∧ dx j

and by the additivity of the integral we can reduce ourself to the previous case. Step Three.

When M = (0, 1)n we could follow the proof as in the first case and end up with intM dω = 0 while, in fact, ∂M = ∅. Step Four. Consider now the general case. First of all we consider an oriented atlas (Ui , φi ) such that either Ui is the cube (0, 1]×(0, 1)n−1 or Ui = (0, 1)n . This is always possible. In fact given any open set U in [0, +∞) × Rn−1 and a point x ∈ U up to translations and rescaling it is possible to find a “cubic” neighbourhood of x contained in U. Then consider a partition of unity αi for this atlas.

2197

From the properties of the integral on manifolds we have X X intM dω = intUi αi φ∗ dω = intUi αi d(φ∗ ω) i

=

X i

i



intUi d(αi · φ ω) −

X i

intUi (dαi) ∧ (φ∗ ω).

P P The second integral in the last equality is zero since i dαi = d i αi = 0, while applying the previous steps to the first integral we have X intM dω = int∂Ui αi · φ∗ ω. i

On the other hand, being (∂Ui , φ|∂Ui ) an oriented atlas for ∂M and being αi|∂Ui a partition of unity, we have X int∂M ω = int∂Ui αi φ∗ ω i

and the theorem is proved.

Version: 4 Owner: paolini Author(s): paolini

2198

Chapter 624 58C40 – Spectral theory; eigenvalue problems 624.1

spectral radius

If V is a vector space over C, the spectrum of a linear mapping T : V → V is the set σ(T ) = {λ ∈ C : T − λIis not invertible}, where I denotes the identity mapping. If V is finite dimensional, the spectrum of T is precisely the set of its eigenvalues. For infinite dimensional spaces this is not generally true, although it is true that each eigenvalue of T belongs to σ(T ). The spectral radius of T is ρ(T ) = sup{|λ| : λ ∈ σ(T )}. Version: 7 Owner: Koro Author(s): Koro

2199

Chapter 625 58E05 – Abstract critical point theory (Morse theory, Ljusternik-Schnirelman (Lyusternik-Shnirelman) theory, etc.) 625.1

Morse complex

Let M be a smooth manifold, and u : M → R be a Morse function. Let Cnu (M) be a vector space of formal C-linear combinations of critical points of u with index n. Then there exists a differential ∂n : Cn → Cn−1 based on the Morse flow making C∗ into a chain complex called the Morse complex such that the homology of the complex is the singular homology of M. In particular, the number of critical points of u of index n on M is at least the n-th Betti number, and the alternating sum of the number of critical points of u is the Euler characteristic of M. Version: 1 Owner: bwebste Author(s): bwebste

625.2

Morse function

Let M be a smooth manifold. A critical point of a map u : M → R at x ∈ M is called nondegenerate if the Hessian matrix Hu (in any local coordinate system) at x is non-degenerate. A smooth function u : M → R is called Morse if all its critical points are non-degenerate. Morse functions exist on any smooth manifold, and in fact form an open dense subset of smooth functions on M (this fact is often phrased ”a generic smooth function is Morse”).

2200

Version: 4 Owner: bwebste Author(s): bwebste

625.3

Morse lemma

Let u : Rn → R be a smooth map with a non-degenerate critical point at the origin. Then there exist neighborhoods U and V of the origin and a diffeomorphism f : U → V such that u = u0 ◦ f where u0 = −x21 − · · · − x2m + x2m+1 + · · · + x2n . The integer m is called the index of the critical point of the origin. Version: 4 Owner: bwebste Author(s): bwebste

625.4

centralizer

Let G a group acting on itself by conjugation. Let X be a subset of G. The stabilizer of X is called then centralizer of X and it’s the set CG (X) = {g ∈ G : gxg −1 = x for all x ∈ X} Abelian subgroups of G can be characterized as those who are also subgroups of CG (G). Version: 2 Owner: drini Author(s): drini, apmxi

2201

Chapter 626 60-00 – General reference works (handbooks, dictionaries, bibliographies, etc.) 626.1

Bayes’ theorem

Let (An ) be a sequence of mutually independent events which completely cover the sample space and let E be any event. Bayes’ Theorem states

for any Aj ∈ (An ).

P (Aj )P (E|Aj ) P (Aj |E) = P i P (Ai )P (E|Ai )

Version: 3 Owner: akrowne Author(s): akrowne

626.2

Bernoulli random variable

X is a Bernoulli random variable with parameter p if fX (x) = px (1 − p)1−x , x = {0, 1} Parameters:

? p ∈ [0, 1] 2202

syntax: X ∼ Bernoulli(p) Notes:

1. X represents the number of successful results in a Bernoulli trial. A Bernoulli trial is an experiment in which only two outcomes are possible: success, with probability p, and failure, with probability 1 − p. 2. E[X] = p 3. V ar[X] = p(1 − p) 4. MX (t) = pet + (1 − p) Version: 2 Owner: Riemann Author(s): Riemann

626.3

Gamma random variable

A Gamma random variable with parameters α > 0 and λ > 0 is one whose probability density function is given by fX (x) =

λα α−1 −λx x e Γ(α)

for x > 0, and is denoted by X ∼ Gamma(α, λ). Notes:

1. Gamma random variables are widely used in many applications. Taking α = 1 reduces the form to that of an exponential random variable. If α = n2 and λ = 12 , this is a chi-squared random variable. t−1 −x 2. The function Γ : [0, ∞] → R is the gamma function, defined as Γ(t) = int∞ e dx. 0 x

3. The expected value of a Gamma random variable is given by E[X] = variance by V ar[X] = λα2

α , λ

and the

4. The moment generating function of a Gamma random variable is given by MX (t) = λ α ( λ−t ) . 2203

If the first parameter is a positive integer, the variate is usually called Erlang random variate. The sum of n exponentially distributed variables with parameter λ is a Gamma (Erlang) variate with parameters n, λ. Version: 7 Owner: mathcam Author(s): mathcam, Riemann

626.4

beta random variable

X is a beta random variable with parameters a and b if fX (x) =

xa−1 (1−x)b−1 , β(a,b)

x ∈ [0, 1]

Parameters:

? a>0 ? b>0 syntax: X ∼ Beta(a, b) Notes:

1. X is used in many statistical models. 2. The function β : R × R → R is defined as β(a, b) = int10 xa−1 (1 − x)b−1 dx. β(a, b) (For information on the Γ function, see the can be calculated as β(a, b) = Γ(a)Γ(b) Γ(a+b) Gamma random variable) 3. E[X] =

a a+b

4. V ar[X] =

ab (a+b+1)(a+b)2

5. MX (t) not useful Version: 4 Owner: Riemann Author(s): Riemann 2204

626.5

chi-squared random variable

X is a chi-squared random variable with n degrees of freedom if n

fX (x) =

( 12 ) 2 n −1 − 1 x x2 e 2 , Γ( n ) 2

x>0

Where Γ represents the gamma function. Parameters:

? n∈N syntax: X ∼ χ2(n) Notes:

1. This distribution is very widely used in statistics, like in hypothesis tests and confidence intervals. 2. The chi-squared distribution with n degrees of freedom is a result of evaluating the gamma distribution with α = n2 and λ = 21 . 3. E[X] = n 4. V ar[X] = 2n n

1 5. MX (t) = ( 1−2t )2

Version: 1 Owner: Riemann Author(s): Riemann

626.6

continuous density function

Let X be a continuous random variable. The function fX : R → [0, 1] defined as fX (x) = ∂FX , where FX (x) is the cumulative distribution functionhttp://planetmath.org/encyclopedia/DistributionF ∂x 2205

of X, is called the continuous density function of X. Please note that if X is a continuous random variable, then fX (x) does NOT equal P [X = x] (for more information read the topic on cumulative distribution functions). Analog to the discrete case, this function must satisfy: 1. fX (x) > 0 ∀x 2. intx fX (x)dx = 1 Version: 2 Owner: Riemann Author(s): Riemann

626.7

expected value

Consider a discrete random variable X. The expected value, or expectation, of X, denoted E[X], is a weighted average of the possible values of X by their corresponding probabilities. P This is: E[X] = x xfX (x) (recall that fX (x) is the density function of X).

If X is a continuous random variable, then E[X] = intx xfX (x)dx.

Note that the expectation does not always exists (if the corresponding sum or integral does not converge, the expectation does not exist. One example of this situation is the Cauchy random variable). The expectation of a random variable gives an idea of ”what to expect” from repeated outcomes of the experiment represented by X. Expectation is a linear function. As such, it satisfies: E[aX + b] = aE[X] + b Consider Y = g(X). Then Y is a new random variable. The expectation of Y can be computed easily: P E[Y ] = E[g(X)] = x g(x)fX (x), if X is discrete, or E[Y ] = E[g(X)] = intx g(x)fX (x)dx, if X is continous. Expectation is the first moment of a random variable (For more information about moments of a random variable, read the related topic).

2206

Technical note: An expectation is a special case of the Riemann-Stieltjes integral intxdFX . If FX (x) is differentiable, then that integral turns out to be intx xfX (x)dx. Version: 7 Owner: mathwizard Author(s): mathwizard, Riemann

626.8

geometric random variable

A geometric random variable with parameter p ∈ [0, 1] is one whose density distribution function is given by fX (x) = p(1 − p)x , x = {0, 1, 2, ...} This is denoted by X ∼ Geo(p) Notes: 1. A standard application of geometric random variables is where X represents the number of failed Bernoulli trials before the first success. 2. The expected value of a geometric random variable is given by E[X] = variance by V ar[X] = 1−p p2

1−p , p

and the

3. The moment generating function of a geometric random variable is given by MX (t) = p . 1−(1−p)et Version: 4 Owner: mathcam Author(s): mathcam, Riemann

626.9

proof of Bayes’ Theorem

The proof of Bayes’ theorem is no more than an exercise in substituting the definition of conditional probability into the formula, and applying the total probability theorem. T T P {B} P {E|B} P {E B} P {E B} P T =P == = P {B|E} . P {A } P {E|A } P {E Ai } P {E} i i i i

Version: 2 Owner: ariels Author(s): ariels

2207

626.10

random variable

Let A be a σ-algebra and Ω the space of events relative to the experiment. A function X : (Ω, A, P ()) → R is a random variable if for every subset Ar = {ω : X(ω) 6 r}, r ∈ R, the condition Ar ∈ A is satisfied. A random variable X is said to be discrete if the set {X(ω) : ω ∈ Ω} (i.e. the range of X) is finite or countable. A random variable Y is said to be continuous if it has a cumulative distribution function which is absolutely continuous. Example: Consider the event of throwing a coin. Thus, Ω = {H, T } where H is the event in which the coin falls head and T the event in which falls tails. Let X =number of tails in the experiment. Then X is a (discrete) random variable. Version: 9 Owner: mathcam Author(s): mathcam, Riemann

626.11

uniform (continuous) random variable

X is a uniform (continuous) random variable with parameters a and b if fX (x) =

1 , b−a

x ∈ [a, b]

Parameters:

? a
1. Also called rectangular distribution, considers that all points in the interval [a, b] have the same mass. 2208

2. E[X] =

a+b 2

3. V ar[X] = 4. MX (t) =

(b−a)2 12

ebt −eat (b−a)t

Version: 3 Owner: Riemann Author(s): Riemann

626.12

uniform (discrete) random variable

X is a uniform (discrete) random variable with parameter N if fX (x) =

1 , N

x = {1, 2, ..., N}

Parameters:

? N ∈ {1, 2, ...} syntax: X ∼ U{N} Notes:

1. X represents the experiment in which all N outcomes are equally likely to ocurr. 2. E[X] =

N +1 2 2

3. V ar[X] = N 12−1 P 1 jt 4. MX (t) = N j=1 N e Version: 2 Owner: Riemann Author(s): Riemann

2209

Chapter 627 60A05 – Axioms; other general questions 627.1

example of pairwise independent events that are not totally independent

Consider a fair tetrahedral die whose sides are painted red, green, blue, and white. Roll the die. Let Xr , Xg , Xb be the events that die falls on a side that have red, green, and blue color components, respectively. Then

but

1 P (Xr ) = P (Xg ) = P (Xb) = , 2 \ 1 P (Xr Xg ) = P (Xw ) = = P (Xr )P (Xg ), 4

P (Xr

\

Xg

\

Xb ) =

1 1 6= = P (Xr )P (Xg )P (Xb ). 4 8

Version: 2 Owner: bbukh Author(s): bbukh

627.2

independent

In a probability space, we say that the random events A1 , . . . , An are independent if P (A1

\

A2 . . .

\

An ) = P (A1 ) . . . P (An ). 2210

An arbitraty family of random events is independent if every finite subfamily is independent. The random variables X1 , . . . , Xn are independent if, given any Borel sets B1 , . . . , Bn , the random evenents [X1 ∈ B1 ], . . . , [Xn ∈ Bn ] are independent. This is equivalent to saying that FX1 ,...,Xn = FX1 . . . FXn where FX1 , . . . , FXn are the distribution functions of X1 , . . . , Xn , respectively, and FX1 ,...,Xn is the joint distribution function. When the density functions fX1 , . . . , fXn and fX1 ,...,Xn exist, an equivalent condition for independence is that fX1 ,...,Xn = fX1 . . . fXn . An arbitrary family of random variables is independent if every finite subfamily is independent. Version: 3 Owner: Koro Author(s): Koro, akrowne

627.3

random event

In a probability space, measurable sets are usually called (random) events. Version: 1 Owner: Koro Author(s): Koro

2211

Chapter 628 60A10 – Probabilistic measure theory 628.1

Cauchy random variable

X is a Cauchy random variable with parameters θ ∈ R and β > 0 ∈ R, commonly denoted X ∼ Cauchy(θ, β) if fX (x) =

1 . πβ[1 + ( x−θ )2 ] β

Cauchy random variables are used primarily for theoretical purposes, the key point being that the values E[X] and V ar[X] are undefined for Cauchy random variables. Version: 5 Owner: mathcam Author(s): mathcam, Riemann

628.2

almost surely

Let (Ω, B, µ) be a probability space. A condition holds almost surely on Ω if it holds ”with probability 1”,i.e. if it holds everywhere except for a subsetset of Ω with measure 0. For example, let X and Y be nonnegative random variables on Ω. Suppose we want the condition intΩ Xdµ ≤ intΩ Y dµ (628.2.1) to hold. Certainly X(ω) ≤ Y (ω) ∀ω ∈ Ω would work. But in fact it’s enough to have X(λ) ≤ Y (λ) ∀λ ∈ Λ ⊆ Ω,

(628.2.2)

µ(Ω \ Λ) = µ(Ω) = 1.

(628.2.3)

where

2212

If Ω = [0, 1], then Y might be less than X on the Cantor set, an uncountable set with measure 0, and still satisfy the condition. We say that X ≤ Y almost surely (often abbreviated a.s.). In fact, we need only have that X and Y are almost surely nonnegative as well. Note that this term is the probabilistic equivalent of the term ”almost everywhere” from non-probabilistic measure theory. Version: 3 Owner: mathcam Author(s): mathcam, drummond

2213

Chapter 629 60A99 – Miscellaneous 629.1

Borel-Cantelli lemma

Let A1 , A2 , . . . be random evenents in a probability space. P∞

< ∞, then P (An i. o.) = 0; P 2. If A1 , A2 , . . . are independent, and ∞ n=1 P (An ) = ∞, then P (An i. o.) = 1 1. If

n=1 P (An )

where A = [An i. o.] represents the event ”An happens for infinitely many values of n.” Formally, A = lim sup An , which is a limit superior of sets. Version: 3 Owner: Koro Author(s): Koro

629.2

Chebyshev’s inequality

Let X ∈ L2 be a real-valued random variable with mean µ = E X and variance σ 2 = Var X. Then for any standard of accuracy t > 0, σ2 P {|X − µ| ≥ t} ≤ 2 . t Note: There is another Chebyshev’s inequality, which is unrelated. Version: 2 Owner: ariels Author(s): ariels

2214

629.3

Markov’s inequality

For a non-negative random variable X and a standard of accuracy d > 0, Markov’s inequality states that P [X ≥ d] ≤ d1 E[X]. Version: 2 Owner: aparna Author(s): aparna

629.4

cumulative distribution function

Let X be a random variable. Define FX : R → [0, 1] as FX (x) = Pr[X 6 x] for all x. The function FX (x) is called the cumulative distribution function of X. Every cumulative distribution function satisfies the following properties: 1. limx→−∞ FX (x) = 0 and limx→+∞ FX (x) = 1, 2. FX is a monotonically nondecreasing function, 3. FX is continuous from the right, 4. Pr[a < X 6 b] = FX (b) − FX (a). If X is aP discrete random variable, then the cumulative distribution can be expressed as FX (x) = k6x Pr[X = k].

Similarly, if X is a continuous random variable, then FX (x) = intx−∞ fX (y)dy where fX is the density distribution function. Version: 4 Owner: bbukh Author(s): bbukh, Riemann

629.5

limit superior of sets

Let A1 , A2 , . . . be a sequence of sets. The limit superior of sets is defined by lim sup An =

∞ [ ∞ \

Ak .

n=1 k=n

It is easy to see that x ∈ lim sup An if and only if x ∈ An for infinitely many values of n. Because of this, in probability theory the notation [An i. o.] is often used to refer to lim sup An , where i.o. stands for infinitely often. 2215

The limit inferior of sets is defined by

lim inf An =

∞ \ ∞ [

Ak ,

n=1 k=n

and it can be shown that x ∈ An if and only if x belongs to An for all values of n large enough. Version: 3 Owner: Koro Author(s): Koro

629.6

proof of Chebyshev’s inequality

The proof of Chebyshev’s inequality follows from the application of Markov’s inequality. Define Y = (X − µ)2 . Then Y ≥ 0 is a random variable in L1 , and E Y = Var X = σ 2 . Applying Markov’s inequality to Y , we see that  σ2 1 P {|X − µ| ≥ t} = P Y ≥ t2 ≤ 2 E Y = 2 . t t Version: 1 Owner: ariels Author(s): ariels

629.7

proof of Markov’s inequality

Define

( d X ≥d Y = . 0 otherwise

Then 0 ≤ Y ≤ X. Additionally, it follows immediately from the definition that Y is a random variable (i.e., that it is measurable). Computing the expected value of Y , we have that E X ≥ E Y = d · P {X ≥ d} , and the inequality follows. Version: 2 Owner: ariels Author(s): ariels

2216

Chapter 630 60E05 – Distributions: general theory 630.1

Cram´ er-Wold theorem

Let X n = (Xn1 , . . . , Xnk ) and X = (X1 , . . . , Xk ) be k-dimensional random vectors. Then X n converges to X in distribution if and only if k X i=1

D

ti Xni −−−→ n→∞

k X

ti Xi .

i=1

for each (t1 , . . . , tk ) ∈ Rk . That is, if every fixed linear combination of the coordinates of X n converges in distribution to the correspondent linear combination of coordinates of X. Version: 1 Owner: Koro Author(s): Koro

630.2

Helly-Bray theorem

Let F, F1 , F2 , . . . be distribution functions. If Fn converges weakly to F , then intR g(x)dFn (x) −−−→ intR g(x)dF (x) n→∞

for each continuous bounded function g : R → R. Remark. The integrals involved are Riemann-Stieltjes integrals. Version: 3 Owner: Koro Author(s): Koro

2217

Figure 630.1: A typical Zipf-law rank distribution. The y-axis represents occurrence frequency, and the x-axis represents rank (highest at the left)

630.3

Scheff´ e’s theorem

Let X, X1 , X2 , . . . be continuous random variables in a probability space, whose probability density function are f, f1 , f2 , . . . , respectively. If fn → f almost everywhere (relative to Lebesgue measure,) D then Xn converges to X in distribution: Xn − → X. Version: 2 Owner: Koro Author(s): Koro

630.4

Zipf ’s law

Zipf’s law (named for Harvard linguistic professor George Kingsley Zipf) models the occurrence of distinct objects in particular sorts of collections. Zipf’s law says that the ith most frequent object will appear 1/iθ times the frequency of the most frequent object, or that the ith most frequent object from an object “vocabulary” of size V occurs O(i) =

n θ (V )

iθ H

times in a collection of n objects, where Hθ (V ) is the harmonic number of order θ of V .

Zipf’s law typically holds when the “objects” themselves have a property (such as length or size) which is modelled by an exponential distribution or other skewed distribution that places restrictions on how often “larger” objects can occur. An example of where Zipf’s law applies is in English texts, to frequency of word occurrence. The commonality of English words follows an exponential distribution, and the nature of communication is such that it is more efficient to place emphasis on using shorter words. Hence the most common words tend to be short and appear often, following Zipf’s law. The value of θ typically ranges between 1 and 2, and is between 1.5 and 2 for the English text case. Another example is the populations of cities. These follow Zipf’s law, with a few very populous cities, falling off to very numerous cities with a small population. In this case,

2218

there are societal forces which supply the same type of “restrictions” that limited which length of English words are used most often. A final example is the income of companies. Once again the ranked incomes follow Zipf’s law, with competition pressures limiting the range of incomes available to most companies and determining the few most successful ones. The underlying theme is that efficiency, competition, or attention with regards to resources or information tends to result in Zipf’s law holding to the ranking of objects or datum of concern.

630.4.1

References

• References on Zipf’s law - http://linkage.rockefeller.edu/wli/zipf/ • Baeza-Yates and Ribeiro-Neto, Modern Information Retrieval, ACM Press, 1999. Version: 4 Owner: akrowne Author(s): akrowne

630.5

binomial distribution

Consider an experiment with two possible outcomes (success and failure), which happen randomly. Let p be the probability of success. If the experiment is repeated n times, the probability of having exactly x successes is n px (1 − p)(n−x) . f (x) = x The distribution function determined by the probability function f (x) is called a Bernoulli distribution or binomial distribution. Here are some plots for f (x) with n = 20 and p = 0.3, p = 0.5. The corresponding distribution function is X n F (n) = pk q n−k k k6x

where q = 1 − p. Notice that if we calculate F (n) we get the binomial expansion for (p + q)n , and this is the reason for the distribution being called binomial. We will use the moment generating function to calculate the mean and variance for the distribution. The mentioned function is n n X G(t) = etx px q n−x x x=0 2219

wich simplifies to G(t) = (pet + q)n . Differentiating gives us G0 (t) = n(pe + q)n−1 pet and therefore the mean is µ = E[X] = G0 (0) = np. Now for the variance we need the second derivative G00 (t) = n(n − 1)(pet + q)n−2 + n(pet + q)n−1 pet so we get E[X 2 ] = G00 (0) = n(n − 1)p2 + np and finally the variance (recall q = 1 − p): σ 2 = E[X 2 ] − E[X]2 = npq. For large values of n, the binomial coefficients are hard to compute, however in this cases we can use either the Poisson distribution or the normal distribution to approximate the probabilities. Version: 11 Owner: drini Author(s): drini

630.6

convergence in distribution

A sequence of distribution functions F1 , F2 , . . . converges weakly to a distribution function F if Fn (t) → F (t) for each point t at which F is continuous. If the random variables X, X1 , X2 , . . . have associated distribution functions F, F1 , F2 , . . . , respectively, then we say that Xn converges in distribution to X, and denote this by D Xn − → X. This definition holds for joint distribution functions and random vectors as well. This is probably the weakest type of convergence of random variables, but it has showed to be very useful. Some results involving this type of convergence are the central limit theorems, Helly-Bray theorem, Paul L´evy continuity theorem, Cram´er-Wold theorem and Scheff´e’s theorem. Version: 7 Owner: Koro Author(s): Koro

2220

630.7

density function

Let X be a discrete random variable with sample space {x1 , x2 , . . .}. Let pk be the probability of X taking the value xk . The function f (x) =

(

pk 0

if x = xk otherwise

is called the probability function or density function. It must hold:

∞ X

f (xj ) = 1

j=1

If the density function for a random variable is known, we can calculate the probability of X being on certain interval: X X P [a < X 6 b] = f (xj ) = pj . a<xj 6b

a<xj 6b

The definition can be extended to continuous random variables in a direct way: let px the probability of X taking a particular value x and then make f (x) equal to px (and 0 if x is not in the sample space). In this case, the probability of x being on a given interval is calculated with an integral instead of using a summation: P [a < X 6 b] = intba f (x)dx. For a more formal approach using measure theory, look at probability distribution function entry. Version: 6 Owner: drini Author(s): drini

630.8

distribution function

Let X be a real random variable with density function f (x). For each number x we can consider the probability of X taking a value smaller or equal than x. Such probability depends on the particular value of x, so it’s a function of x: F (x) = P [X 6 x]. If X is a discrete random variable, we have then X F (x) = f (xj ) xj 6x

2221

and if X is a continuous random variable then F (x) = intx−∞ f (t)dt Due to the properties of integrals and summations, we can use the distribution function to calculate the probability of X being on a given interval: P [a < X 6 b] = F (b) − F (a). Two special cases arise: From the definition: P [X 6 b] = F (b), and P [a < X] = 1 − F (a) since the complement of an interval (a, ∞) is an interval (−∞, a] and together they cover the whole sample space. For the continuous case, we have a relationship linking the density function and the distribution function: F 0 (x) = f (x). Version: 3 Owner: drini Author(s): drini

630.9

geometric distribution

A random experiment has two possible outcomes, success with proability p and failure with probability q = 1 − p. The experiment is repeated until a success happens. The number of trials before the success is a random variable X with density function f (x) = q x p. The distribution function determined by f (x) is called a geometric distribution with parameter p and it is given by X F (x) = q k p. k6x

The picture shows the graph for f (x) with p = 4. Notice the quick decreasing. interpretation is that a long run of failures is very unlikely.

An

We can use the moment generating function method in order to get the mean and variance. This function is ∞ ∞ X X tk k G(t) = e q p=p (et q)k . k=0

k=0

The last expression can be simplified as

G(t) =

p . 1 − et q

2222

The first and second derivatives are G0 (t) = so the mean is

et pq (1 − et q)2

q µ = E[X] = G0 (0) = . p

In order to find the variance, we get the second derivative and thus E[X 2 ] = G00 (0) =

2q 2 q + p2 p

and therefore the variance is σ 2 = E[X 2 ] − E[X]2 = G00 (0) − G0 (0)2 =

q . p2

Version: 5 Owner: drini Author(s): drini

630.10

relative entropy

Let p and q be probability distributions with support X and Y respectively. The relative entropy or Kullback-Liebler distance between two probability distributions p and q is defined as

D(p||q) :=

X

p(x) log

x∈X

p(x) . q(x)

(630.10.1)

While D(p||q) is often called a distance, it is not a true metric because it is not symmetric and does not satisfy the triangle inequality. However, we do have D(p||q) ≥ 0 with equality iff p = q.

2223

−D(p||q) = − =

X

p(x) log

x∈X

X

p(x) log

x∈X

p(x) q(x)

q(x) p(x)

X

q(x) p(x) ≤ log p(x) x∈X ! X = log q(x) x∈X

≤ log =0

X x∈Y

q(x)

(630.10.2) (630.10.3)

!

!

(630.10.4) (630.10.5) (630.10.6) (630.10.7)

where the first inequality follows from the concavity of log(x) and the second from expanding the sum over the support of q rather than p. Relative entropy also comes in a continuous version which looks just as one might expect. For continuous distributions f and g, S the support of f , we have f D(f ||g) := intSf log . g

(630.10.8)

Version: 4 Owner: mathcam Author(s): mathcam, drummond

630.11

Paul L´ evy continuity theorem

Let F1 , F2 , . . . be distribution functions with characteristic functions ϕ1 , ϕ2 , . . . , respectively. If ϕn converges pointwise to a limit ϕ, and if ϕ(t) is continuious at t = 0, then there exists a distribution function F such that Fn → F weakly, and the characteristic function associated to F is ϕ. Remark. The reciprocal of this theorem is a simple corollary to the Helly-Bray theorem; hence Fn → F weakly if and only if ϕn → ϕ pointwise; but this theorem says something stronger than the sufficiency of that proposition: it says that the limit of a sequence of characteristic functions is a characteristic function whenever it is continuous at 0. Version: 2 Owner: Koro Author(s): Koro

2224

630.12

characteristic function

Let X be a random variable. The characteristic function of X is a function ϕX : R → C defined by ϕX (t) = EeitX = E cos(tX) + iE sin(tX), that is, ϕX (t) is the expectation of the random variable eitX . Given a random vector X = (X1 , . . . , Xn ), the characteristic function of X, also called joint characteristic function of X1 , . . . , Xn , is a function ϕX : Rn → C defined by ϕX (t) = Eeit·X , where t = (t1 , . . . , tn ) and t · X = t1 X1 + · · · + tn Xn (the dot product.) Remark. If FX is the distribution function associated to X, by the properties of expectation we have ϕX (t) = intR eitx dFX (x), which is known as the Fourier-Stieltjes transform of FX , and provides an alternate definition of the characteristic function. From this, it is clear that the characteristic function depends only on the distribution function of X, hence one can define the characteristic function associated to a distribution even when there is no random variable involved. This implies that two random variables with the same distribution must have the same characteristic function. It is also true that each characteristic function determines a unique distribution; hence the name, since it characterizes the distribution function (see property 6.) Properties 1. The characteristic function is bounded by 1, i.e. |ϕX (t)| 6 1 for all t; 2. ϕX (0) = 1; 3. ϕX (t) = ϕX (−t), where z denotes the complex conjugate of z; 4. ϕX is uniformly continuous in R; 5. If X and Y are independent random variables, then ϕX+Y = ϕX ϕY ; 6. The characteristic function determines the distribution function; hence, ϕX = ϕY if and only if FX = FY . This is a consequence of the inversion formula: Given a random variable X with characteristic function ϕ and distribution function F , if x and y are continuity points of F such that x < y, then F (x) − F (y) =

1 e−itx − e−ity lim ints−s ϕ(t)dt; 2π s→∞ it

7. A random variable X has a symmetrical distribution (i.e. one such that FX = F−X ) if and only if ϕX (t) ∈ R for all t ∈ R; 2225

8. For real numbers a, b, ϕaX+b (t) = eitb ϕX (at); 9. If E|X|n < ∞, then ϕX has continuous n-th derivatives and

d k ϕX (k) (t) = ϕX (t) = intR (ix)k eitx dFX (x), 1 6 k 6 n. k dt

(k)

Particularly, ϕX (0) = ik EX k ; characteristic functions are similar to moment generating functions in this sense. Similar properties hold for joint characteristic functions. Other important result related to characteristic functions is the Paul L´evy continuity theorem. Version: 1 Owner: Koro Author(s): Koro

630.13

Kolmogorov’s inequality

Let X1 , . . . , Xn be independent random variables in a probability space, such that E Xk = 0 and Var Xk < ∞ for k = 1, . . . , n. Then, for each λ > 0, P



max |Sk | > λ

16k6n



n 1 1 X 6 2 Var Sn = 2 Var Xk , λ λ k=1

where Sk = X1 + · · · + Xk . Version: 2 Owner: Koro Author(s): Koro

630.14

discrete density function

Let X be a discrete random variable. The function fX : R → [0, 1] defined as fX (x) = P [X = x] is called the discrete probability function of X. Sometimes the syntax pX (x) is used, to mark the difference between this function and the continuous density function. If X has discrete density function fX (x), it is said that the random variable X has the distribution or is distributed fX (x), and this fact is denoted as X ∼ fX (x). Discrete density functions are required to satisfy the following properties: • fX (x) > 0 for all x P • x fX (x) = 1

Version: 11 Owner: drini Author(s): drini, yark, Riemann 2226

630.15

probability distribution function

630.15.1

Definition

Let (Ω, B, µ) be a measure space, and let (R, Bλ , λ) be the measure space of real numbers with Lebesgue measure λ. A probability distribution function on Ω is a function f : Ω −→ R such that: 1. f is measurable 2. f is nonnegative almost everywhere with respect to µ 3. f satisfies the equation intΩ f (x) dµ = 1 The main feature of a probability distribution function is that it induces a probability measure P on the measure space (Ω, B), given by P (X) := intX f (x) dµ = intΩ 1X f (x) dµ, for all X ∈ B. The measure P is called the associated probability measure of f . Note that P and µ are different measures, even though they both share the same underlying measurable space (Ω, B).

630.15.2

Examples

630.15.3

Discrete case

Let I be a countable set, and impose the counting measure on I (µ(A) := |A|, the cardinality of A, for any subset A ⊂ I). A probability distribution function on I is then a nonnegative function f : I −→ R satisfying the equation X f (i) = 1. i∈I

One example is the Poisson distribution Pr on N (for any real number r), which is given by −r r

Pr (i) := e

i

i!

for any i ∈ N. Given any probability space (Ω, B, µ) and any random variable X : Ω −→ I, we can form a distribution function on I by taking f (i) := µ({X = i}). The resulting function is called the distribution of X on I. 2227

630.15.4

Continuous case

Suppose (Ω, B, µ) equals (R, Bλ , λ). Then a probability distribution function f : R −→ R is simply a measurable, nonnegative almost everywhere function such that int∞ −∞ f (x) dx = 1. The associated measure has Radon–Nikodym derivative with respect to λ equal to f : dP = f. dλ One defines the cumulative distribution function F of f by the formula F (x) := P ({X 6 x}) = intx−∞ f (t) dt, for all x ∈ R. A well known example of a probability distribution function on R is the Gaussian distribution, or normal distribution 1 2 2 f (x) := √ e−(x−m) /2σ . σ 2π Version: 3 Owner: drummond Author(s): djao, drummond

2228

Chapter 631 60F05 – Central limit and other weak theorems 631.1

Lindeberg’s central limit theorem

Theorem (Lindeberg’s central limit theorem) Let X1 , X2 , . . . be independent random variables with distribution functions F1 , F2 , . . . , respectively, such that EXn = µn and Var Xn = σn2 < ∞, with at least one σn > 0. Let q p Sn = X1 + · · · + Xn and sn = Var(Sn ) = σ12 + · · · + σn2 . n converge in distribution to a random variable Then the normalized partial sums Sn −ES sn with normal distribution N(0, 1) (i.e. the normal convergence holds,) if the following Lindeberg condition is satisfied:

n 1 X ∀ε > 0, lim 2 int|x−µk |>εsn (x − µk )2 dFk (x) = 0. n→∞ sn k=1

Corollary 1 (Lyapunov’s central limit theorem) If the Lyapunov condition n 1 X

s2+δ n

k=1

E|Xk − µk |2+δ −−−→ 0 n→∞

is satisfied for some δ > 0, the normal convergence holds. Corollary 2 2229

If X1 , X2 , . . . are identically distributed random variables, EXn = µ and Var Sn = σ 2 , with −nµ 0 < σ < ∞, then the normal convergence holds; i.e. Sσn√ converges in distribution to a n random variable with distribution N(0, 1). Reciprocal (Feller) The reciprocal of Lindeberg’s central limit theorem holds under the following additional assumption:  2 σk max −−−→ 0. 16k6n s2n n→∞ Historical remark The normal distribution was historically called the law of errors. It was used by Gauss to model errors in astronomical observations, which is why it is usually referred to as the Gaussian distribution. Gauss derived the normal distribution, not as a limit of sums of independent random variables, but from the consideration of certain “natural” hypotesis for the distribution of errors; e.g. considering the arithmetic mean of the observations to be the “most probable” value of the quantity being observed. Nowadays, the central limit theorem supports the use of normal distribution as a distribution of errors, since in many real situations it is possible to consider the error of an observation as the result of many independent small errors. There are, too, many situations which are not subject to observational errors, in which the use of the normal distribution can still be justified by the central limit theorem. For example, the distribution of heights of mature men of a certain age can be considered normal, since height can be seen as the sum of many small and independent effects. The normal distribution did not have its origins with Gauss. It appeared, at least discretely, in the work of De Moivre, who proved the central limit theorem for the case of Bernoulli essays with p=1/2 (e.g. when the n-th random variable is the result of tossing a coin.) Version: 15 Owner: Koro Author(s): Koro

2230

Chapter 632 60F15 – Strong theorems 632.1

Kolmogorov’s strong law of large numbers

Let X1 , X2 , . . . be a sequence of independent random variables, with finite expectations. The strong law of large numbers holds if one of the following conditions is satisfied: 1. The random variables are identically distributed; 2. For each n, the variance of Xn is finite, and ∞ X Var(Xn ) n=1

n2

< ∞.

Version: 3 Owner: Koro Author(s): Koro

632.2

strong law of large numbers

A sequence of random variables X1 , X2 , . . . with finite expectations in a probability space is said to satisfiy the strong law of large numbers if n

1X a.s. (Xk − E Xk ) −−→ 0, n k=1

where a.s. stands for convergence almost surely. When the random variables are indentically distributed, with expectation µ, the law becomes: 2231

n

1X a.s. Xk −−→ µ. n k=1

Kolmogorov’s strong law of large numbers theorems give conditions on the random variables under wich the law is satisfied. Version: 5 Owner: Koro Author(s): Koro

2232

Chapter 633 60G05 – Foundations of stochastic processes 633.1

stochastic process

Intuitively, if random variables are the stochastic version of (deterministic) variables then stochastic processes are the stochastic version of functions. Hence, they are also called random functions. Formalizing this notion, f : T × Ω → S is a stochastic process where Ω is the set of all possible states of the world, T is the index set (often time), and S is the state space. Often one takes T to be either N, or R, speaking respectively of a discrete- or continuous-time stochastic process. Another way to define a stochastic process is as a set of random variables {X(t), t ∈ T } where T again is the index set. Important stochastic processes are Markov chains, Wiener processes, and Poisson processes. Version: 4 Owner: nobody Author(s): yark, igor, Manoj, armbrusterb

2233

Chapter 634 60G99 – Miscellaneous 634.1

stochastic matrix

Definition Let I be a finite or countable set, and let P = (pij : i, j ∈ I) be a matrix and let all pij be nonnegative. We say P is stochastic if X pij = 1 i∈I

for every j ∈ I. We call P doubly stochastic if, in addition, X pij = 1. j∈I

Equivalently, P is stochastic if every column is a distribution, and doubly stochastic if, in addition, every row is a distribution. Stochastic and doubly stochastic matrices are common in discussions of random processes, particularly Markov chains. Version: 4 Owner: mathwizard Author(s): mathwizard, drummond

2234

Chapter 635 60J10 – Markov chains with discrete parameter 635.1

Markov chain

Definition We begin with a probability space (Ω, F, P). Let I be a countable set, (Xn : n ∈ Z) be a collection of random variables taking values in I, T = (tij : i, j ∈ I) be a stochastic matrix, and λ be a distribution. We call (Xn )n≥0 a Markov chain with initial distribution λ and transition matrix T if: 1. X0 has distribution λ. 2. For n ≥ 0, P(Xn+1 = in+1 |X0 = i0 , . . . , Xn = in ) = tin in+1 . That is, the next value of the chain depends only on the current value, not any previous values. This is often summed up in the pithy phrase, “Markov chains have no memory.”

Discussion Markov chains are arguably the simplest examples of random processes. They come in discrete and continuous versions; the discrete version is presented above. Version: 1 Owner: drummond Author(s): drummond

2235

Chapter 636 62-00 – General reference works (handbooks, dictionaries, bibliographies, etc.) 636.1

covariance

The covariance of two random variables X1 and X2 with mean µ1 and µ2 respectively is defined as cov(X1 , X2 ) := E[(X1 − µ1 )(X2 − µ2 )].

(636.1.1)

The covariance of a random variable X with itself is simply the variance, E[(X − µ)2 ]. Covariance captures a measure of the correlation of two variables. Positive covariance indicates that as X1 increases, so does X2 . Negative covariance indicates X1 decreases as X2 increases and vice versa. Zero covariance can indicate that X1 and X2 are uncorrelated. The correlation coefficient provides a normalized view of correlation based on covariance:

corr(X, Y ) := p

cov(X, Y ) var(X)var(Y )

.

(636.1.2)

corr(X, Y ) ranges from -1 (for negatively correlated variables) through zero (for uncorrelated variables) to +1 (for positively correlated variables). While if X and Y are independent we have corr(X, Y ) = 0, the latter does not imply the former.

2236

Version: 3 Owner: drummond Author(s): drummond

636.2

moment

Moments Given a random variable X, the kth moment of X is the value E[X k ], if the expectation exists. Note that the expected value is the first moment of a random variable, and the variance is the second moment minus the first moment squared. The kth moment of X is usually obtained by using the moment generating function. Central moments Given a random variable X, the kth central moment of X is the value E[(X − E[X])k ], if the expectation exists. Note that the first central moment is equal to zero. The second central moment is the variance. The third central moment is the skewness. The fourth central moment is called kurtosis. Version: 1 Owner: Riemann Author(s): Riemann

636.3

variance

Consider a discrete random variable X. The variance of X is defined as V ar[X] = E[(X − E[X])2 ] Note that (X − E[X])2 is a new random variable (it’s a function of X). The variance is also denoted as σ 2 . A useful formula that follows inmediately from the definition is that V ar[X] = E[X 2 ] − E[X]2 In words, the variance of X is the second moment of X minus the first moment squared.

2237

The variance of a random variable determines a level of variation of the possible values of X around its mean. However, as this measure is squared, the standard deviation is used instead when one wants to talk about how much a random variable varies around its expected value. Variance is not a linear function. It satisfies the relation: V ar[aX + b] = a2 V ar[X]. However, using also covariance we can express the variance of a linear combination: V ar[aX + bY ] = a2 V ar[X] + b2 V ar[Y ] + 2abCov[X, Y ]. If we cannot analyze a whole population but we have to take a sample, we define its variance (denoted as s2 ) with the formula: n

1 X (xj − x¯)2 s = n − 1 j=1 2

where x¯ is the mean for the sample. The value for s2 is an estimator for σ 2 . Version: 6 Owner: drini Author(s): drini, Riemann

2238

Chapter 637 62E15 – Exact distribution theory 637.1

Pareto random variable

X is a Pareto random variable with parameters a, b if fX (x) =

aba , xa+1

x = {b, b + 1, ..., ∞}

Parameters: ? a ∈ {1, 2, ...} ? b ∈ {1, 2, ...} Syntax: X ∼ P areto(a, b) Notes: 1. X represents a random variable with shape parameter a and scale parameter b. 2. The expected value of X is noted as E[X] = 3. The variance of X is noted as V ar[X] =

ab a−1

ab2 , (a−1)2 (a−2)

a ∈ {3, 4, ...}

4. The cumulative distribution function of X is noted as F (x) = 1 − ( xb )a 5. The moments of X around 0 are noted as E[X n ] =

abn , a−n

Version: 4 Owner: mathcam Author(s): mathcam, aparna 2239

n ∈ {1, 2, ..., a − 1}

637.2

exponential random variable

X is a exponential random variable with parameter λ if fX (x) = λe−λx , x > 0 Parameters:

? λ>0 syntax: X ∼ Exp(λ) Notes:

1. X is commonly used to model lifetimes and duration between Poisson events. 2. The expected value of X is noted as E[X] = 3. The variance of X is noted as V ar[X] =

1 λ

1 λ2

4. The moments of X are noted as MX (t) =

λ λ−t

5. It is interesting to note that X is a Gamma random variable with an α parameter of 1. Version: 3 Owner: aparna Author(s): aparna, Riemann

637.3

hypergeometric random variable

X is a hypergeometric random variable with parameters M, K, n if

fX (x) =

−K (Kx)(Mn−x ) , x = {0, 1, ..., n} (Mn )

2240

Parameters:

? M ∈ {1, 2, ...} ? K ∈ {0, 1, ..., M} ? n ∈ {1, 2, ..., M} syntax: X ∼ Hypergeo(M, K, n) Notes:

1. X represents the number of ”special” items (from the K special items) present on a sample of size n from a population with M items. K 2. The expected value of X is noted as E[X] = n M K M −K M −n 3. The variance of X is noted as V ar[X] = n M M M −1

Approximation techniques:  If K2 << n, M − K + 1 − n then X can be approximated as a binomial random variable with parameters n = K and p = MM−K+1−n . This approximation simplifies the distribution −K+1 by looking at a system with replacement for large values of M and K. Version: 4 Owner: aparna Author(s): aparna, Riemann

637.4

negative hypergeometric random variable

X is a negative hypergeometric random variable with parameters W, B, b if fX (x) =

(x+b−1 )(W +B−b−x ) x W −x , x = {0, 1, ..., W } (WW+B)

Parameters: ? W ∈ {1, 2, ...} 2241

? B ∈ {1, 2, ...} ? b ∈ {1, 2, ..., B} Syntax: X ∼ NegHypergeo(W, B, b) Notes: 1. X represents the number of ”special” items (from the W special items) present before the bth object from a population with B items. 2. The expected value of X is noted as E[X] = 3. The variance of X is noted as V ar[X] =

Wb B+1

W b(B−b+1)(W +B+1) (B+2)(B+1)2

Approximation techniques:   If x2 << W and 2b << B then X can be approximated as a negative binomial random variable with parameters r = b and p = WW+B . This approximation simplifies the distribution by looking at a system with replacement for large values of W and B. Version: 11 Owner: aparna Author(s): aparna

637.5

negative hypergeometric random variable, example of

Suppose you have 7 black marbles and 10 white marbles in a jar. You pull marbles until you have 3 black marbles in your hand. X would represent the number of white marbles in your hand. ? The expected value of X would be E[X] = ? The variance of X would be V ar[X] = 1.875

Wb B+1

=

3(10) 7+1

= 3.75

W b(B−b+1)(W +B+1) (B+2)(B+1)2

=

10(3)(7−3+1)(10+7+1) (7+2)(7+1)2

? The probability of having 3 white marbles would be fX (3) = (3+3−1 )(10+7−3−3 ) 3 10−3 = 0.1697 (10+7 ) 10 Version: 1 Owner: aparna Author(s): aparna 2242

=

(3+b−1 )(W +B−b−3 ) 3 W −3 = W +B ( W )

637.6

proof of expected value of the hypergeometric distribution

We will first prove a useful property of binomial coefficients. We know   n n! . = k!(n − k)! k

This can be transformed to     n n n n−1 = (n − 1)!(k − 1)!(n − 1 − (k − 1))! = . k k k k−1

(637.6.1)

Now we can start with the definition of the expected value:  −K  n X x Kx Mn−x  E(X) = . M n

x=0

Since for x = 0 we add a 0 in this formula we can say  −K  n X x Kx Mn−x  E(X) = . M n

x=1

Applying equation (636.7.1) we get:

Setting l := x − 1 we get:



n nK X E(X) = M x=1

K−1 x−1

n−1 nK X E(X) = M

K−1 l

l=0



M −1−(K−1) n−1−(x−1)  M −1 n−1 M −1−(K−1) n−1−l  M −1 n−1



.



.

The sum in this equation is 1 as it is the sum over all probabilities of a hypergeometric distribution. Therefore we have nK . E(X) = M Version: 2 Owner: mathwizard Author(s): mathwizard

637.7

proof of variance of the hypergeometric distribution

We will first prove a useful property of binomial coefficients. We know   n n! . = k!(n − k)! k 2243

This can be transformed to     n n n n−1 . = (n − 1)!(k − 1)!(n − 1 − (k − 1))! = k k k k−1

The variance V (X) of X is given by: V (X) =

n  X x=0

We expand the right hand side:

V (X) =

nK x− M

2



K x





M −K n−x  M n



(637.7.1)

.

M −K n−x M n x=0   n K M −K 2nK X x x n−x  − M M x=0 n   n K M −K n2 K 2 X x n−x  . + M M 2 x=0 n

n X x2

K x

The second of these sums is the expected value of the hypergeometric distribution, the third sum is 1 as it sums up all probabilities in the distribution. So we have:   n K M −K n2 K 2 X x2 x n−x  V (X) = − . + M M2 n x=0 In the last sum for x = 0 we add nothing so we can write:   n K M −K n2 K 2 X x2 x n−x  . + V (X) = − M M2 n x=0

Applying equation (636.7.1) and k = (k − 1) + 1 we get:   n n K−1 M −K n2 K 2 nK X (x − 1) x−1 n−x nK X  + V (X) = − + M −1 M2 M x=1 M x=1 n−1



K−1 M −K n−x x−1  M −1 n−1



.

Setting l := k − 1 the first sum is the expected value of a hypergeometric distribution and is therefore given as (n−1)(K−1) . The second sum is the sum over all the probabilities of a M −1 hypergeometric distribution and is therefore equal to 1. So we get: n2 K 2 nK(n − 1)(K − 1) nK + + M2 M(M − 1) M 2 2 −n K (M − 1) + Mn(n − 1)K(K − 1) + KnM(M − 1) M 2 (M − 1) nK(M 2 + (−K − n)M + nK) M 2 (M − 1) nK(M − K)(M − n) M 2 (M − 1)   K K M −n n 1− . M M M −1

V (X) = − = = = =

2244

This formula is the one we wanted to prove. Version: 3 Owner: mathwizard Author(s): mathwizard

637.8

proof that normal distribution is a distribution v 2 u (x−µ)2 u − − 2 2 u e 2σ e 2σ √ √ int∞ dx = tint∞ dx −∞ −∞ σ 2π σ 2π v u (x−µ)2 (y−µ)2 u − − 2σ 2 e t ∞ e 2σ2 √ int−∞ √ = dxint∞ dy −∞ σ 2π σ 2π s 2 2 − (x−µ) +(y−µ) 2σ 2 ∞ ∞ e int−∞ int−∞ dxdy = σ 2 2π (x−µ)2

Substitute x0 = x − µ and y 0 = y − µ. Since the bounds are infinite, they don’t change, and dx0 = dx and dy 0 = dy, so we have: s

∞ int∞ −∞ int−∞

(x−µ)2 +(y−µ)2 − 2σ 2

e

σ 2 2π

dxdy =

s

∞ int∞ −∞ int−∞

e−

(x0 )2 +(y 0 )2 2σ 2

σ 2 2π

Next substitute r 2 = x2 + y 2 to get: s r 2 − r2 2 2σ 1 e − r2 ∞ 2σ dr dr = int 2πre int∞ 2πr −∞ σ 2 2π σ 2 2π −∞ r   ∞ 1 r2 2 −e− c = 2πσ σ 2 2π −∞ √ = 1 = 1 Version: 1 Owner: Henry Author(s): Henry

2245

dx0 dy 0

Chapter 638 65-00 – General reference works (handbooks, dictionaries, bibliographies, etc.) 638.1

normal equations

Normal Equations We consider the problem Ax ≈ b , where A is an m × n matrix with m ≥ n rank (A) = n, b is an m × 1 vector, and x is the n × 1 vector to be determined. The sign ≈ stands for the least squares approximation, i.e. a minimization of the norm of the residual r = Ax − b. ||Ax − b||2 = ||r||2 =

" m X i=1

ri2

#1/2

or the square 1 1 ||Ax − b||22 = (Ax − b)T (Ax − b) 2 2 1 T T (x A Ax − 2xT AT b + bT b) = 2

F (x) =

i.e. a differentiable function of x. The necessary condition for a minimum is:

∇F (x) = 0or

∂F = 0(i = 1, . . . , n) ∂xi 2246

These equations are called the normal equations , which become in our case: AT Ax = AT b The solution x = (AT A)−1 AT b is usually computed with the following algorithm: First (the lower triangular portion of) the symmetric matrix AT A is computed, then its Cholesky decomposition LLT . Thereafter one solves Ly = AT b for y and finally x is computed from LT x = y. Unfortunately AT A is often ill-conditioned and strongly influenced by roundoff errors (see [Golub89]). Other methods which do not compute AT A and solve Ax ≈ b directly are QR decomposition and singular value decomposition. References • Originally from The Data Analysis Briefbook (http://rkb.home.cern.ch/rkb/titleA.html) Version: 3 Owner: akrowne Author(s): akrowne

638.2

principle components analysis

The principal components analysis is a mathematical way of determining that linear transformation of a sample of points in N-dimensional space which exhibits the properties of the sample most clearly along the coordinate axes. Along the new axes the sample variances are extremes (maxima and minima), and uncorrelated. The name comes from the principal axes of an ellipsoid (e.g. the ellipsoid of inertia), which are just the coordinate axes in question. By their definition, the principal axes will include those along which the point sample has little or no spread (minima of variance). Hence, an analysis in terms of principal components can show (linear) interdependence in data. A point sample of N dimensions for whose N coordinates M linear relations hold, will show only (N − M) axes along which the spread is non-zero. Using a cutoff on the spread along each axis, a sample may thus be reduced in its dimensionality (see [Bishop95]). The principal axes of a point sample are found by choosing the origin at the “centre of gravity” and forming the dispersion matrix X tij = (1/N) [(xi − hxi i)(xj − hxj i)] where the sum is over the N points of the sample and the xi are the ith components of the point coordinates. h.i stands for the average of the parameter. The principal axes and the variance along each of them are then given by the eigenvectors and associated eigenvalues of the dispersion matrix. 2247

Principal component analysis has in practice been used to reduce the dimensionality of problems, and to transform interdependent coordinates into significant and independent ones. An example used in several particle physics experiments is that of reducing redundant observations of a particle track in a detector to a low-dimensional subspace whose axes correspond to parameters describing the track. Another example is in image processing; where it can be used for color quantization. Principle components analysis is described in [O’Connel74]. References • Originally from The Data Analysis Briefbook (http://rkb.home.cern.ch/rkb/titleA.html) Bishop95 C.M. Bishop, Neural Networks for Pattern Recognition, Oxford University Press, Oxford, 1995. O’Connel74 M.J. O’Connel, Search Program for Significant Variables, Comp. Phys. Comm. 8 (1974) 49. Version: 1 Owner: akrowne Author(s): akrowne

638.3

pseudoinverse

The inverse A−1 of a matrix A exists only if A is square and has full rank. In this case, Ax = b has the solution x = A−1 b. The pseudoinverse A+ (beware, it is often denoted otherwise) is a generalization of the inverse, and exists for any m × n matrix. We assume m > n. If A has full rank (n) we define: A+ = (AT A)−1 AT and the solution of Ax = b is x = A+ b. The best way to compute A+ is to use singular value decomposition. With A = USV T , where U and V (both n × n) orthogonal and S (m × n) is diagonal with real, non-negative singular values σi , i = 1, . . . , n. We find A+ = V (S T S)−1 S T U T If the rank r of A is smaller than n, the inverse of S T S does not exist, and one uses only the first r singular values; S then becomes an r × r matrix and U,V shrink accordingly. see also Linear Equations. 2248

References • Originally from The Data Analysis Briefbook (http://rkb.home.cern.ch/rkb/titleA.html) Version: 1 Owner: akrowne Author(s): akrowne

2249

Chapter 639 65-01 – Instructional exposition (textbooks, tutorial papers, etc.) 639.1

cubic spline interpolation

Suppose we are given N + 1 data points {(xk , yk )} such that a = x0 < · · · < xN .

(639.1.1)

Then the function S(x) is called a cubic spline interpolation if there exists N cubic polynomials Sk (x) with coefficients sk,i 0 6 i 6 3 such that the following hold. 1. S(x) = Sk (x) =

P3

i=0 sk,i (x

− xk )i ∀x ∈ [xk , xk+1 ] 0 6 k 6 N − 1

2. S(xk ) = yk 0 6 k 6 N 3. Sk (xk+1 ) = Sk+1 (xk+1 ) 0 6 k 6 N − 2 0 (xk+1 ) 0 6 k 6 N − 2 4. Sk0 (xk+1 ) = Sk+1 00 (xk+1 ) 0 6 k 6 N − 2 5. Sk00 (xk+1 ) = Sk+1

The set of points (638.1.1) are called the knots. The set of cubic splines on a fixed set of knots, forms a vector space for cubic spline addition and scalar multiplication. So we see that the cubic spline not only interpolates the data {(xk , yk )} but matches the first and second derivatives at the knots. Notice, from the above definition, one is free to specify constraints on the endpoints. One common end point constraint is S 00 (a) = 0 S 00 (b) = 0, which is called the natural spline. Other popular choices are the clamped cubic spline, 2250

parabolically terminated spline and curvature-adjusted spline. Cubic splines are frequently used in numerical analysis to fit data. Matlab uses the command spline to find cubic spline interpolations with not-a-knot end point conditions. For example, the following commands would find the cubic spline interpolation of the curve 4 cos(x) + 1 and plot the curve and the interpolation marked with o’s. x = 0:2*pi; y = 4*cos(x)+1; xx = 0:.001:2*pi; yy = spline(x,y,xx); plot(x,y,’o’,xx,yy) Version: 3 Owner: tensorking Author(s): tensorking

2251

Chapter 640 65B15 – Euler-Maclaurin formula 640.1

Euler-Maclaurin summation formula

Let Br be the rth Bernoulli number, and Br (x) be the rth Bernoulli periodic function. For any integer k > 0 and for any function f of class C k+1 on [a, b], a, b ∈ Z, we have X

f (n) =

intba f (t)dt +

k X (−1)r+1 Br+1 r=0

a
(r + 1)!

(f (r) (b) − f (r) (a)) +

(−1)k intba Bk+1 (t)f (k+1) (t)dt. (k + 1)!

Version: 4 Owner: KimJ Author(s): KimJ

640.2

proof of Euler-Maclaurin summation formula

Let a and b be integers such that a < b, and let f : [a, b] → R be continuous. We will prove by induction that for all integers k ≥ 0, if f is a C k+1 function, X

a
f (n) =

intba f (t)dt +

k X (−1)r+1 Br+1 r=0

(r + 1)!

(f (r) (b) − f (r) (a)) +

(−1)k intb Bk+1 (t)f (k+1) (t)dt (k + 1)! a

(640.2.1) where Br is the rth Bernoulli number and Br (t) is the rth Bernoulli periodic function. To prove the formula for k = 0, we first rewrite intnn−1 f (t)dt, where n is an integer, using

2252

integration by parts: d 1 (t − n + )f (t)dt dt 2 n 1 = (t − n + )f (t) n−1 − intnn−1 (t − n + 2 1 = (f (n) + f (n − 1)) − intnn−1 (t − n + 2

intnn−1 f (t)dt = intnn−1

Because t − n +

1 2

1 0 )f (t)dt 2 1 0 )f (t)dt. 2

= B1 (t) on the interval (n − 1, n), this is equal to 1 intnn−1 f (t)dt = (f (n) + f (n − 1)) − intnn−1 B1 (t)f 0 (t)dt. 2

From this, we get 1 f (n) = intnn−1 f (t)dt + (f (n) − f (n − 1)) + intnn−1 B1 (t)f 0 (t)dt. 2 Now we take the sum of this expression for n = a + 1, a + 2, . . . , b, so that the middle term on the right telescopes away for the most part: b X

1 f (n) = intba f (t)dt + (f (b) − f (a)) + intba B1 (t)f 0 (t)dt 2 n=a+1 which is the Euler-Maclaurin formula for k = 0, since B1 = − 21 . Suppose that k > 0 and the formula is correct for k − 1, that is X

f (n) =

intba f (t)dt

+

k−1 X (−1)r+1 Br+1 r=0

a
(r + 1)!

(f (r) (b) − f (r) (a)) +

(−1)k−1 b inta Bk (t)f (k) (t)dt. k!

(640.2.2) We rewrite the last integral using integration by parts and the facts that Bk is continuous 0 (t) = (k + 1)Bk (t) for k ≥ 0: for k ≥ 2 and Bk+1 intba Bk (t)f (k) (t)dt

=

0 b Bk+1 (t) (k) f (t)dt inta

k+1 b 1 1 = Bk+1 (t)f (k) (t) a − intb Bk+1 (t)f (k+1) (t)dt. k+1 k+1 a

Using the fact that Bk (n) = Bk for every integer n if k ≥ 2, we see that the last term in Eq. 639.2.2 is equal to (−1)k+1 Bk+1 (k) (−1)k (f (b) − f (k) (a)) + intb Bk+1 (t)f (k+1) (t)dt. (k + 1)! (k + 1)! a Substituting this and absorbing the left term into the summation yields Eq. 639.2.1, as required. Version: 2 Owner: pbruin Author(s): pbruin 2253

Chapter 641 65C05 – Monte Carlo methods 641.1

Monte Carlo methods

Monte Carlo methods are the systematic use of samples of random numbers in order to estimate parameters of an unknown distribution by statistical simulation. Methods based on this principle of random sampling are indicated in cases where the dimensionality and/or complexity of a problem make straightforward numerical solutions impossible or impractical. The method is ideally adapted to computers, its applications are varied and many, its main drawbacks are potentially slow convergence (large variances of the results), and often the difficulty of estimating the statistical error (variance) of the result. Monte Carlo problems can be formulated as integration of a function f = (~x) over a (multidimensional) volume V , with the result intV f dV = V f where f , the average of f , is obtained by exploring randomly the volume V . Most easily one conceives a simple (and inefficient) hit-and-miss Monte Carlo: assume, for example, a three-dimensional volume V to be bounded by surfaces difficult to intersect and describe analytically; on the other hand, given a point (x, y, z) ∈ V , it is easy to decide whether it is inside or outside the boundary. In this case, a simply bounded volume which fully includes V can be sampled uniformly (the components x,y,z are generated as random numbers with uniform probability density function), and for each point a weight is computed, which is zero if the point is outside V , 1 otherwise. After N random numbers, n (≤ N) will have been found inside V , and the ratio n/N is the fraction of the sampled volume which corresponds to V . Another method, crude Monte Carlo, may be used for integration: assume now the volume V 2254

is bounded by two functions z(x, y) and z 0 (x, y), both not integrable, but known for any x,y, over an interval ∆x and ∆y . Taking random pairs (x, y), evaluating ∆z = |z(x, y) = z 0 (x, y)| at each point, averaging to h∆zi and forming ∆x∆y h∆zi , gives an approximation of the volume (in this example, sampling the area with quasirandom numbers or, better, using standard numerical integration methods will lead to more precise results). Often, the function to be sampled is, in fact, a probability density function , e.g. a matrix element in phase space. In the frequent case that regions of small values of the probability density function dominate, unacceptably many points will have to be generated by crude Monte Carlo, in other words, the convergence of the result to small statistical errors will be slow. Variance reducing techniques will then be indicated, like importance sampling or stratified sampling. For more reading, see [Press95], [Hammersley64], [Kalos86]. References • Originally from The Data Analysis Briefbook (http://rkb.home.cern.ch/rkb/titleA.html) Press95 W.H. Press, S.A. Teukolsky, W.T. Vetterling, and B.P. Flannery, Numerical Recipes in C, Second edition, Cambridge University Press, 1995. (The same book exists for the Fortran language). There is also an Internet version which you can work from. Hammersley64 J.M. Hammersley and D.C. Handscomb, Monte Carlo Methods, Methuen, London, 1964. Kalos86 M.H. Kalos and P.A. Whitlock, Monte Carlo Methods, Wiley, New York, 1986. Version: 3 Owner: akrowne Author(s): akrowne

2255

Chapter 642 65D32 – Quadrature and cubature formulas 642.1

Simpson’s rule

Simpson’s rule is a method of (approximate) numerical definite integration (or quadrature). Simpson’s rule is based on a parabolic model of the function to be integrated (in contrast to the trapezoidal model of the trapezoidal rule). Thus, a minimum of three points and three function values are required, and the definite integral is intxx20 f (x)dx ≈ I = Note that x1 =

x0 +x2 , 2

h (f (x0 ) + 4f (x1 ) + f (x2 )) 3

the midpoint of the quadrature domain.

We can extend this to greater precision by breaking our target domain into n equal-length fragments. The quadrature is then the weighted sum of the above formula for every pair of adjacent regions, which works out to

I=

h (f (x0 ) + 4f (x1 ) + 2f (x2 ) + 4f (x3 ) + · · · + 4f (xn−3 ) + 2f (xn−2 ) + 4f (xn−1 ) + f (xn )) 3

Version: 3 Owner: akrowne Author(s): akrowne

2256

Chapter 643 65F25 – Orthogonalization 643.1

Givens rotation

Let A be an m × n matrix with m ≥ n and full rank (viz. rank n). An orthogonal matrix triangularization (QR decomposition) consists of determining an m × m orthogonal matrix Q such that   R Q A= 0 T

with the n × n upper triangular matrix R. One only has then to solve the triangular system Rx = P y, where P consists of the first n rows of Q. Householder transformations clear whole columns except for the first element of a vector. If one wants to clear parts of a matrix one element at a time, one can use Givens rotation, which is particularly practical for parallel implementation . A matrix 

 ··· 0 ··· 0 .. ..  . .  c · · · s · · · 0 .. . . .. ..  . . . .  · · · −s · · · c · · · 0  .. .. . . ..  . . . . 0 ··· 0 ··· 0 ··· 1

1  .. .  0 . G=  .. 0  .  ..

··· .. . ···

0 .. .

with properly chosen c = cos(ϕ) and s = sin(ϕ) for some rotation angle ϕ can be used to 2257

zero the element aki . The elements can be zeroed column by column from the bottom up in the following order: (m, 1), (m, −1, 1), . . . , (2, 1), (m, 2), . . . , (3, 2), . . . , (m, n), . . . , (n + 1, n). Q is then the product of g = (2m + n + 1)/2 Givens matrices Q = G1 G2 · · · Gg . To annihilate the bottom element of a 2 × 1 vector: 

c s −s c

T     a r = b 0

the conditions sa + cb = 0 and c2 + s2 = 1 give:

c= √

a b ,s = √ a2 + b2 a2 + b2

For “Fast Givens”, see [Golub89]. References • Originally from The Data Analysis Briefbook (http://rkb.home.cern.ch/rkb/titleA.html) Golub89 Gene H. Golub and Charles F. van Loan: Matrix Computations, 2nd edn., The John Hopkins University Press, 1989. Version: 3 Owner: akrowne Author(s): akrowne

643.2

Gram-Schmidt orthogonalization

Any set of linearly independent vectors v1 , . . . , vn can be converted into a set of orthogonal vectors q1 , . . . , qn by the Gram-Schmidt process. In three dimensions, v1 determines a line; the vectors v1 and v2 determine a plane. The vector q1 is the unit vector in the direction v1 . The (unit) vector q2 lies in the plane of v1 , v2 , and is normal to v1 (on the same side as v2 . The (unit) vector q3 is normal to the plane of v1 , v2 , on the same side as v3 , etc. In general, first set u1 = v1 , and then each ui is made orthogonal to the preceding u1 , . . . ui−1 by subtraction of the projections of vi in the directions of u1 , . . . , ui−1 :

2258

i−1 X uTj vi ui = vi − uj T u uj j j=1

The i vectors ui span the same subspace as the vi . The vectors qi = ui /||ui|| are orthonormal. This leads to the following theorem: Theorem. Any m × n matrix A with linearly independent columns can be factorized into a product, A = QR. The columns of Q are orthonormal and R is upper triangular and invertible. This “classical” Gram-Schmidt method is often numerically unstable, see [Golub89] for a “modified” Gram-Schmidt method. References • Originally from The Data Analysis Briefbook (http://rkb.home.cern.ch/rkb/titleA.html) Golub89 Gene H. Golub and Charles F. van Loan: Matrix Computations, 2nd edn., The John Hopkins University Press, 1989. Version: 4 Owner: akrowne Author(s): akrowne

643.3

Householder transformation

The most frequently applied algorithm for QR decomposition uses the Householder transformation u = Hv, where the Householder matrix H is a symmetric and orthogonal matrix of the form: H = I − 2xxT with the identity matrix I and any normalized vector x with ||x||22 = xT x = 1. Householder transformations zero the m − 1 elements of a column vector v below the first element: 

   v1 c !1/2 m  v2   0 X     2 vi  ..  →  ..  with c = ±||v||2 = ±  .  . vm

i=1

0

2259

One can verify that 

 v1 − c  v2  1   x = f  ..  with f = p  .  2c(c − v1 ) vm

 T fulfils xT x = 1 and that with H = I − xxT one obtains the vector c 0 · · · 0 .

To perform the decomposition of the m × n matrix A = QR (with m ≥ n) we construct in this way an m × m matrix H (1) to zero the m − 1 elements of the first column. An m − 1 × m − 1 matrix G(2) will zero the m − 2 elements of the second column. With G(2) we produce the m × m matrix

H (2)

  1 0 ··· 0 0    =  ..  , etc (2) .  G 0

After n (n − 1 for m = n) such orthogonal transforms H (i) we obtain: R = H (n) · · · H (2) H (1) A R is upper triangular and the orthogonal matrix Q becomes: Q = H (1) H (2) · · · H (n) In practice the H (i) are never explicitely computed. References • Originally from The Data Analysis Briefbook (http://rkb.home.cern.ch/rkb/titleA.html) Version: 3 Owner: akrowne Author(s): akrowne

2260

643.4

orthonormal

643.4.1

Basic Definition

Short for orthogonal and normalized. If a set S of vectors is orthonormal, then for every u, v in S, we have that u · v = 0 and ||u|| = 1. These vectors then form an orthogonal matrix. Perhaps counter-intuitively, a lone vector is considered an orthonormal set, provided the vector is normalized. The empty set is unqualifiedly considered an orthonormal set as well.

643.4.2

Complete Definition

Let V be an inner product space. A subset of vectors S ⊂ V is an orthonormal set if the following conditions are met: 1. If u, v ∈ S, then hu, vi = 0, where h·, ·i is the inner product of V . 2. For all u ∈ S, we have ||u|| = 1, where ||.|| is the norm induced by the inner product.

643.4.3

Applications

A standard application is finding an orthonormal basis for a vector space, such as by Gram-Schmidt orthono Orthonormal bases are computationally simple to work with. Version: 7 Owner: akrowne Author(s): akrowne

2261

Chapter 644 65F35 – Matrix norms, conditioning, scaling 644.1

Hilbert matrix

A Hilbert matrix H of order n is a square matrix defined by

Hij =

1 i+j−1

An example of a Hilbert matrix (to 6 digits of accuracy) with n = 5 is 1

1 1  21   31  4 1 5

1 2 1 3 1 4 1 5 1 6

1 3 1 4 1 5 1 6 1 7

1 4 1 5 1 6 1 7 1 8



1 5 1 6 1 7 1 8 1 9

Hilbert matrices are ill-conditioned. Version: 1 Owner: akrowne Author(s): akrowne

644.2

Pascal matrix

Definition The Pascal matrix P of order n is the real square n × n matrix whose entries

2262

are [1] Pij = For n = 5,

 1 1  P = 1 1 1



 i+j−2 . j−1

 1 1 1 1 2 3 4 5  3 6 10 15 , 4 10 20 35 5 15 35 70

so we see that the Pascal matrix contains the Pascal triangle on its antidiagonals. Pascal matrices are ill-conditioned. However, the inverse of the n × n Pascal matrix is known explicitly and given in [1]. The characteristic polynomial of a Pascal triangle is a reciprocal polynomial [1].

REFERENCES 1. N.J. Higham, Accuracy and Stability of Numerical Algorithms, 2nd ed., SIAM, 2002.

Version: 2 Owner: bbukh Author(s): bbukh, matte

644.3

Toeplitz matrix

A Toeplitz matrix is any n × n matrix with values constant along each (top-left to lowerright) diagonal. That is, a Toeplitz matrix has the form 

a0

a1

a2

  a−1 a0 a1   a a−1 a0  −2  . .. ..  .. . . a−(n−1) · · · a−2

 · · · an−1 ..  .. . .   .. . a2    .. . a1  a−1 a0

Numerical problems involving Toeplitz matrices typically have fast solutions. For example, the inverse of a symmetric, positive-definite n × n Toeplitz matrix can be found in O(n2 ) time.

2263

644.3.1

References

1. Golub and Van Loan, Matrix Computations, Johns Hopkins University Press 1993 Version: 2 Owner: akrowne Author(s): akrowne

644.4

matrix condition number

The matrix condition number κ(A) of a square matrix A is defined as κ(A) = kAkkA−1 k where k · k is any valid matrix norm. The condition number is basically a measure of stability or sensitivity of a matrix (or the linear system it represents) to numerical operations. In other words, we may not be able to trust the results of computations on an ill-conditioned matrix. Matrices with condition numbers near 1 are said to be well-conditioned. Matrices with condition numbers much greater than one (such as around 105 for a 5 × 5 Hilbert matrix) are said to be ill-conditioned. If κp (A) is the condition number of A in the p-norm, then κp (A) measures the relative p-norm distance from A to the set of singular matrices.

644.4.1

References

1. Golub and Van Loan, Matrix Computations, Johns Hopkins University Press 1993 Version: 2 Owner: akrowne Author(s): akrowne

644.5

matrix norm

A matrix norm is a function f : Rm×n → R that satisfies the following properties:

2264

f (A) = 0 ⇔ A = 0 f (A) ≥ 0 f (A + B) 6 f (A) + f (B) f (αA) = |α|f (A)

A ∈ Rm×n A, B ∈ Rm×n α ∈ R, x ∈ Rm×n

Thus a matrix norm is a norm on the set of matrices. Such a function is denoted as || A ||. Particular norms are distinguished by subscripts, such as || A ||F and || A ||p. Matrix norms are equivalent to vector norms. The most frequently used matrix norms are the Frobenius matrix norm and the matrix p-norm. Version: 4 Owner: mps Author(s): mps, Logan

644.6

pivoting

Pivoting is a process performed on a matrix in order to improve numerical stability. Partial pivoting of an n × n matrix is the sorting of the rows of the matrix so that row i contains the maximum absolute column value for column i, among all rows i, . . . , n. That is, we begin by swapping row 1 with the row that has the largest absolute value for the first column, then swap row 2 with the row that has the largest magnitude for the second column (among rows 2 and below), and so on. Complete pivoting is a reordering of both rows and columns, using the same method as above. It is usually not necessary to ensure numerical stability. Pivoting can be represented as multiplication by permutation matrices. Version: 3 Owner: akrowne Author(s): akrowne

2265

Chapter 645 65R10 – Integral transforms 645.1

integral transform

A generic integral transform takes the form F (s) = intβα K(s, t)f (t)dt Note that the transform takes a function f (t) and produces a new function F (s). The function K(s, t) is called the kernel of the transform. The kernel of an integral transform, along with the limits α and β, distinguish a particular integral transform from another. For example, if K(s, t) = e−st , α = 0, and β = ∞, then we get −st F (s) = int∞ f (t)dt 0 e

which is known as the Laplace transform. Version: 2 Owner: vampyr Author(s): vampyr

2266

Chapter 646 65T50 – Discrete and fast Fourier transforms 646.1

Vandermonde matrix

A Vandermonde matrix is any (n + 1) × (n + 1) matrix of the form  1 x0 x20 · · · xn0 1 x1 x2 · · · xn  1 1   .. .. .. . . ..  . . . . . 2 1 xn xn · · · xnn 

Vandermonde matrices usually arise when considering systems of polynomials evaluated at specific points (i.e. in interpolation or approximation). This may happen, for example, when trying to solve for constants from initial conditions in systems of differential equations or recurrence relations. Vandermonde matrices also appear in the computation of FFTs (Fast Fourier Transforms). Here the fact that Vandermonde systems V z = b can be solved in O(n log n) flops by taking advantage of their recursive block structure comes into play.

646.1.1

References

1. Golub and Van Loan, Matrix Computations, Johns Hopkins University Press 1993 Version: 1 Owner: akrowne Author(s): akrowne

2267

646.2

discrete Fourier transform

Summary Suppose we have a function of time g(t) that has been discretely sampled at N regular intervals with frequency f , i.e.,   j gj = g f The M-point discrete Fourier transform of g is given by Gk =

N X

gj e−2πνk ij/f

j=0

where νk = k

f 2M

Then Gk is the component of g corresponding to the frequency νk . Similarly, given G, we can reconstruct g via the discrete inverse Fourier transform: gj =

M X

Gk e2πνk ij/f

k=0

Explanation Generically, if we have some vector v = (v1 , v2 , . . . vn ) that we wish to project ˆ 1, B ˆ 2, . . . B ˆ m , we find that the kth component of the on to a set of unit basis vectors B 0 projection v is given by ˆ ki vk0 = hv, B where hu, vi is an inner product. Using the standard dot product of complex vectors, we have m X 0 ¯k,j vk = vj B j=0

(where z¯ represents the complex conjugate of z). We may use a similar procedure to project a function onto basis functions. Here, the components of our vectors are the functions sampled at certain time intervals. The idea of the Fourier transform is to project our discrete function g onto a basis made up of sinusoidal functions of varying frequency. We know from Euler’s identity that eix = cos x + i sin x

2268

so we can construct a sinusoidal function of any frequency ν: e2πνix = cos(2πνx) + i sin(2πνx) By Nyquist’s theorem, any frequency components of g above the Nyquist frequency f2 will be aliased into lower frequencies, so we will only concern ourselves with the frequency components 0 6 ν < f2 . We select a sequence of M evenly spaced frequencies from this interval for our basis functions: f νk = k 2M So, given a frequency νk and a time t, the value of the corresponding basis function at t is e2πνk it = cos(2πνk t) + i sin(2πνk t) and hence the complex conjugate of this is e−2πνk it = cos(−2πνk t) + i sin(−2πνk t) = cos(2πνk t) − i sin(2πνk t) Now we substitute these results into the standard formula for projection to reveal that

Gk =

N X

gj e−2πνk ij/f

j=0

If you take the limit of the discrete Fourier transform as the number of time divisions increases without bound, you get the integral form of the continuous Fourier transform. Version: 2 Owner: vampyr Author(s): vampyr

2269

Chapter 647 68M20 – Performance evaluation; queueing; scheduling 647.1

Amdahl’s Law

Amdahl’s Law reveals the maximum speedup that can be expected from parallel algorithms given the proportion of parts that must be computed sequentially. It gives the speedup S as S≤

1 f + (1 − f )/N

Where f is the fraction of the problem that must be computed sequentially and N is the number of processors. Note that as f approaches zero, S nears N, which we’d expect from a perfectly parallelizeable algorithm. Version: 11 Owner: akrowne Author(s): akrowne

647.2

efficiency

Efficiency is a measure of parallel algorithm performance per processor. The efficiency φ is defined as

φ=

S N

2270

where S is the speedup associated with the algorithm and N is the number of processors. Version: 2 Owner: akrowne Author(s): akrowne

647.3

proof of Amdahl’s Law

Suppose a algorithm needs n operations to compute the result. With 1 processor, the algorithm will take n time units. With N processors, the (1 − f )n parallelizable operations )n time units and the remaining f n non parallelizable operations will take will take (1−f N )n f n time units for a total running time of f n + (1−f time units. So the speedup S is N 1 n . (1−f )n = f + 1−f f n+

N

N

Version: 1 Owner: lieven Author(s): lieven

2271

Chapter 648 68P05 – Data structures 648.1

heap insertion algorithm

The heap insertion algorithm inserts a new value into a heap, maintaining the heap property. Let H be a heap, storing n elements over which the relation ⊗ imposes a total ordering. Insertion of a value x consists of initially adding x to the bottom of the tree, and then sifting it upwards until the heap property is regained. Sifting consists of comparing x to its parent y. If x ⊗ y holds, then the heap property is violated. If this is the case, x and y are swapped and the operation is repeated for the new parent of x. Since H is a balanced binary tree, it has a maximum depth of dlog2 ne + 1. Since the maximum number of times that the sift operation can occur is constrained by the depth of the tree, the worst case time complexity for heap insertion is O(log n). This means that a heap can be built from scratch to hold a multiset of n values in O(n log n) time. What follows is the pseudocode for implementing a heap insertion. For the given pseudocode, we presume that the heap is actually represented implicitly in an array (see the binary tree entry for details). Algorithm HeapInsert(H, n, ⊗, x) Input: A heap (H, ⊗) (represented as an array) containing n values and a new value x to be inserted into H Output: H and n, with x inserted and the heap property preserved begin n ← n + 1 H[n] ← x child ← n parent ← n div 2 whileparent > 1 do if H[child] ⊗ H[parent] then swap (H[parent], H[child]) child ← parent parent ← parent div 2 else parent ← 0 2272

end Version: 3 Owner: Logan Author(s): Logan

648.2

heap removal algorithm

Definition Let H be a heap storing n elements, and let ⊗ be an operator that defines a total order over all of the values in H. If H is non-empty, then there is a value x in H such that x ⊗ y holds for all values in H. The heap removal algorithm removes x from H and returns it, while maintaining the heap property of H. The process of this algorithm is similar in nature to that of the heap insertion algorithm. First, if H only holds one value, then that value is x, which can simply be removed and returned. Otherwise, let z be the value stored in the right-most leaf of H. Since x is defined by the heap property to be the root of the tree, the value of x is saved, z is removed, and the value at the root of the tree is set to z. Then z is sifted downwards into the tree until the heap property is regained. The sifting process is similar to that of the heap insertion algorithm, only in reverse. First, if z is a leaf, the process ends. Otherwise, let a, b be the two children of z, chosen such that a ⊗ b holds. If z ⊗ a holds, the process ends. Otherwise, a and z are swapped and the process repeats for z.

Analysis Since H is a balanced binary tree, it has a maximum depth of dlog2 ne + 1. Since the maximum number of times the sift operation can occur is constrained by the depth of the tree, the worst-case time complexity for heap insertion is O(log n).

Pseudocode What follows is the pseudocode for implementing heap removal. For the given pseudocode, we presume that the heap is actually represented implicitly in an array (see the binary tree entry for details), and that the heap contains at least one value. Algorithm HeapRemove(H, n, ⊗) Input: A heap (H, ⊗) (represented as an array) containing n > 0 values Output: Removes and returns a value x from H, such that x ⊗ y holds for all y in H begin 2273

end

top ← H[1] n ← n − 1 H[0] ← H[n] parent ← 1 child ← 2 whilechild < n do if H[child + 1] ⊗ H[child] then child ← child + 1 if not H[parent] ⊗ H[child] then swap (H[parent], H[child]) parent ← child child ← 2 · child else child ← n

Version: 3 Owner: Logan Author(s): Logan

2274

Chapter 649 68P10 – Searching and sorting 649.1

binary search

The Problem Let 6 be a total ordering on the set S. Given a sequence of n elements, L = {x1 6 x2 6 . . . 6 xn }, and a value y ∈ S, locate the position of any elements in L that are equal to y, or determine that none exist.

The Algorithm The binary search technique is a fundamental method for locating an element of a particular value within a sequence of sorted elements (see Sorting Problem). The idea is to eliminate half of the search space with each comparison. First, the middle element of the sequence is compared to the value we are searching for. If this element matches the value we are searching for, we are done. If, however, the middle element is “less than” the value we are chosen for (as specified by the relation used to specify a total order over the set of elements), then we know that, if the value exists in the sequence, it must exist somewhere after the middle element. Therefore we can eliminate the first half of the sequence from our search and simply repeat the search in the exact same manner on the remaining half of the sequence. If, however, the value we are searching for comes before the middle element, then we repeat the search on the first half of the sequence.

2275

Pseudocode Algorithm Binary Search(L, n, key) Input: A list L of n elements, and key (the search key) Output: P osition (such that X[P osition] = key) begin P osition ← F ind(L, 1, n, key); end function F ind(L, bottom, top, key) begin if bottom = top then if L[bottom] = key then F ind ← bottom else F ind ← 0 else begin middle ← (bottom + top)/2; if key < L[middle] then F ind ← F ind(L, bottom, middle − 1, key) else F ind ← F ind(L, middle + 1, top, key) end end

Analysis We can specify the runtime complexity of this binary search algorithm by counting the number of comparisons required to locate some element y in L. Since half of the list is eliminated with each comparison, there can be no more than log2 n comparisons before either the positions of all the y elements are found or the entire list is eliminated and y is determined to not exist in L. Thus the worst-case runtime complexity of the binary search is O(log n). It can also be shown that the average-case runtime complexity of the binary search is approximately log2 n − 1 comparisons. This means that any single entry in a phone book containing one million entries can be located with at most 20 comparisons (and on average 19). Version: 7 Owner: Logan Author(s): Logan

649.2

bubblesort

The bubblesort algorithm is a simple and na¨ıve approach to the sorting problem. Let ⊗ define a total ordering over a list A of n values. The bubblesort consists of advancing through A, swapping adjacent values A[i] and A[i + 1] if A[i + 1] ⊗ A[i] holds. By going through all 2276

of A in this manner n times, one is guaranteed to achieve the proper ordering.

Pseudocode The following is pseudocode for the bubblesort algorithm. Note that it keeps track of whether or not any swaps occur during a traversal, so that it may terminate as soon as A is sorted. Algorithm BubbleSort((A, n, ⊗)) Input: List A of n values Output: A sorted with respect to relation ⊗ begin done done ← true for i ← 0 to n − 1 do if A[i + 1] ⊗ A[i] then swap (A[i], A[i + 1]) done ← false end

Analysis The worst-case scenario is when A is given in reverse order. In this case, exactly one element can be put in order during each traversal, and thus all n traversals are required. Since each traversal consists of n − 1 comparisons, the worst-case complexity of bubblesort is O(n2 ). Bubblesort is perhaps the simplest sorting algorithm to implement. Unfortunately, it is also the least efficient, even among O(n2 ) algorithms. Bubblesort can be shown to be a stable sorting algorithm (since two items of equal keys are never swapped, initial relative ordering of items of equal keys is preserved), and it is clearly an in-place sorting algorithm. Version: 3 Owner: Logan Author(s): Logan

649.3

heap

Let ⊗ be a total order on some set A. A heap is then a data structure for storing elements in A. A heap is a balanced binary tree, with the property that if y is a descendent of x in the heap, then x ⊗ y must hold. This property is often referred to as the heap property. If ⊗ is 6, then the root of the heap always gives the smallest element of the heap, and if ⊗ is >, then the root of the heap always gives the largest element of the heap. More generally, the root of the heap is some a ∈ A such that a ⊗ x holds for all x in the heap. For example, the following heap represents the multiset {1, 2, 4, 4, 6, 8} for the total order > on Z.

2277

8 4 1

6 4

2

Due to the heap property, heaps have a very elegant application to the sorting problem. The heapsort is an in-place sorting algorithm centered entirely around a heap. Heaps are also used to implement priority queues. Version: 2 Owner: Logan Author(s): Logan

649.4

heapsort

The heapsort algorithm is an elegant application of the heap data structure to the sorting problem. It consists of building a heap out of some list of n elements, and the removing a maximal value one at a time.

The Algorithm The following pseudocode illustrates the heapsort algorithm. It builds upon the heap insertion and heap removal algorithms. Algorithm HeapSort((A, ⊗, n)) Input: List A of n elements Output: A sorted, such that ⊗ is a total order over A begin for i ← 2 to n do HeapInsert (A, ⊗, i − 1, A[i]) for i ← n downto 2 do A[i − 1] ← HeapRemove ( end

Analysis Note that the algorithm given is based on a top-down heap insertion algorithm. It is possible to get better results through bottom-up heap construction.

2278

Each step of each of the two for loops in this algorithm has a runtime complexity of O(log i). Thus overall the heapsort algorithm is O(n log n). Heapsort is not quite as fast as quicksort in general, but it is not much slower, either. Also, like quicksort, heapsort is an in-place sorting algorithm, but not a stable sorting algorithm. Unlike quicksort, its performance is guaranteed, so despite the ordering of its input its worstcase complexity is O(n log n). Given its simple implementation and reasonable performance, heapsort is ideal for quickly implementing a decent sorting algorithm. Version: 4 Owner: mathcam Author(s): yark, Logan

649.5

in-place sorting algorithm

A sorting algorithm is said to be in-place if it requires no additional space besides the initial array holding the elements that are to be sorted. Such algorithms are useful, particularly for large data sets, because they impose no extra memory requirements. Examples of inplace sorting algorithms are the quicksort and heapsort algorithms. An example of a sorting algorithm that is not in-place is the Mergesort algorithm. Version: 2 Owner: Logan Author(s): Logan

649.6

insertion sort

The Problem See the Sorting Problem.

The Algorithm Suppose L = {x1 , x2 , . . . , xn } is the initial list of unsorted elements. The insertion sort algorithm will construct a new list, containing the elements of L in order, which we will call L0 . The algorithm constructs this list one element at a time. Initially L0 is empty. We then take the first element of L and put it in L0 . We then take the second element of L and also add it to L0 , placing it before any elements in L0 that should come after it. This is done one element at a time until all n elements of L are in L0 , in sorted order. Thus, each step i consists of looking up the position in L0 where the element xi should be placed and inserting it there (hence the name of the algorithm). This requires a search, and then the shifting of all the elements in L0 that come after xi (if L0 is stored in

2279

an array). If storage is in an array, then the binary search algorithm can be used to quickly find xi ’s new position in L0 . Since at step i, the length of list L0 is i and the length of list L is n − i, we can implement this algorithm as an in-place sorting algorithm. Each step i results in L[1..i] becoming fully sorted.

Pseudocode This algorithm uses a modified binary search algorithm to find the position in L where an element key should be placed to maintain ordering. Algorithm Insertion Sort(L, n) Input: A list L of n elements Output: The list L in sorted order begin for i ← 1 to n do begin value ← L[i] position ← Binary Search(L, 1, i − 1, value) for j ← i downto position do L[j] ← L[j − 1] L[position] ← value end end function Binary Search(L, bottom, top, key) begin if bottom = top then Binary Search ← bottom else begin middle ← (bottom + top)/2 if key < L[middle] then Binary Search ← Binary Search(L, bottom, middle − 1, key) else Binary Search ← Binary Search(L, middle + 1, top, key) end end

Analysis In the worst case, each step i requires a shift of i − 1 elements for the insertion (consider an input list that is sorted in reverse order). Thus the runtime complexity is O(n2 ). Even the optimization of using a binary search does not help us here, because the deciding factor in this case is the insertion. It is possible to use a data type with O(log n) inser2280

tion time, giving O(n log n) runtime, but then the algorithm can no longer be done as an in-place sorting algorithm. Such data structures are also quite complicated. A similar algorithm to the insertion sort is the selection sort, which requires fewer data movements than the insertion sort, but requires more comparisons. Version: 4 Owner: mathcam Author(s): mathcam, Logan

649.7

lower bound for sorting

Several well-known sorting algorithms have average or worst-case running times of O(n log n) (heapsort, quicksort). One might ask: is it possible to do better? The answer to this question is no, at least for comparison-based sorting algorithms. To prove this, we must prove that no algorithm can perform better (even algorithms that we do not know!). Often this sort of proof is intractable, but for comparison-based sorting algorithms we can construct a model that corresponds to the entire class of algorithms. The model that we will use for this proof is a decision tree. The root of the decision tree corresponds to the state of the input of the sorting problem (e.g. an unsorted sequence of values). Each internal node of the decision tree represents such a state, as well as two possible decisions that can be made as a result of examining that state, leading to two new states. The leaf nodes are the final states of the algorithm, which in this case correspond to states where the input list is determined to be sorted. The worst-case running time of an algorithm modelled by a decision tree is the height or depth of that tree. A sorting algorithm can be thought of as generating some permutation of its input. Since the input can be in any order, every permutation is a possible output. In order for a sorting algorithm to be correct in the general case, it must be possible for that algorithm to generate every possible output. Therefore, in the decision tree representing such an algorithm, there must be one leaf for every one of n! permutations of n input values. Since each comparison results in one of two responses, the decision tree is a binary tree. A binary tree with n! leaves must have a minimum depth of log2 (n!). Stirling’s formula gives us

n! =



2πn

 n n e

(1 + O(1/n))

= Ω(nn ) log2 (n!) = Ω(log(nn )) = Ω(n log n)

Thus any general sorting algorithm has a lower bound of Ω(n log n) (see Landau notation). 2281

This result does not necessarily apply to non-comparison-based sorts, such as the radix sort. Comparison-based sorts – such as heapsort, quicksort, bubble sort, and insertion sort – are more general, in that they depend only upon the assumption that two values can be compared (which is a necessary condition for the sorting problem to be defined for a particular input anyway; see total order). Sorting algorithms such as the radix sort take advantage of special properties that need to hold for the input values in order to reduce the number of comparisons necessary. Version: 1 Owner: Logan Author(s): Logan

649.8

quicksort

Quicksort is a divide-and-conquer algorithm for sorting in the comparison model. Its expected running time is O(n lg n) for sorting n values.

Algorithm Quicksort can be implemented recursively, as follows: Algorithm Quicksort(L) Input: A list L of n elements Output: The list L in sorted order if n > 1 then p ← random element of L A ← {x|x ∈ L, x < p} B ← {z|z ∈ L, z = p} C ← {y|y ∈ L, y > p} SA ← Quicksort(A) SC ← Quicksort(C) return Concatenate(SA , B, SC ) else return L

Analysis The behavior of quicksort can be analyzed by considering the computation as a binary tree. Each node of the tree corresponds to one recursive call to the quicksort procedure. Consider the initial input to the algorithm, some list L. Call the Sorted list S, with ith and jth elements Si and Sj . These two elements will be compared with some probability

2282

pij . This probability can be determined by considering two preconditions on Si and Sj being compared: • Si or Sj must be chosen as a pivot p, since comparisons only occur against the pivot. • No element between Si and Sj can have already been chosen as a pivot before Si or Sj is chosen. Otherwise, would be separated into different sublists in the recursion. The probability of any particular element being chosen as the pivot is uniform. Therefore, the chance that Si or Sj is chosen as the pivot before any element between them is 2/(j − i + 1). This is precisely pij . The expected number of comparisons is just the summation over all possible comparisons of the probability of that particular comparison occurring. By linearity of expectation, no independence assumptions are necessary. The expected number of comparisons is therefore

n n X X

n X n X

pij =

i=1 j>i

i=1 j>i

2 j−i−1

n n−i+1 X X 2

=

i=1 k=2 n X X

≤ 2

i=1 k=1

(649.8.2)

k

n

(649.8.1)

1 k

= 2nHn = O(n lg n),

(649.8.3) (649.8.4)

where Hn is the nth harmonic number. The worst case behavior is Θ(n2 ), but this almost never occurs (with high probability it does not occur) with random pivots. Version: 8 Owner: thouis Author(s): thouis

649.9

sorting problem

Let 6 be a total ordering on the set S. Given a sequence of n elements, x1 , x2 , . . . , xn ∈ S, find a sequence of distinct indices 1 6 i1 , i2 , . . . , in 6 n such that xi1 6 xi2 6 . . . 6 xin . The sorting problem is a heavily studied area of computer science, and many sorting algorithms exist to solve it. The most general algorithms depend only upon the relation 6, and are called comparison-based sorts. It can be proved that the lower bound for sorting by 2283

any comparison-based sorting algorithm is Ω(n log n). A few other specialized sort algorithms rely on particular properties of the values of elements in S (such as their structure) in order to achieve lower time complexity for sorting certain sets of elements. Examples of such a sorting algorithm are the bucket sort and the radix sort. Version: 2 Owner: Logan Author(s): Logan

2284

Chapter 650 68P20 – Information storage and retrieval 650.1

Browsing service

A browsing service is a set of scenarios sc1 , ..., scn over a hypertext (meaning that events are defined by edges of the hypertext graph (VH , EH )), such that traverse link events ei are associated with a function TraverseLink : VH × EH → Contents, which given a node and a link retrieves the content of the target node, i.e., TraverseLink for eki = (vk , vt ) ∈ EH . Version: 1 Owner: seonyoung Author(s): seonyoung

650.2

Digital Library Index

Let I : 2T → 2H be an index function where T is a set of indexing features and H is a set of handles. An index is a set of index function. Version: 1 Owner: kemyers3 Author(s): kemyers3

650.3

Digital Library Scenario

A scenario is a sequence of related transition events < e1 , e2 , ..., en > on state set S such that ek = (sk , sk + 1) for 1 6 k 6 n. Version: 2 Owner: grouprly Author(s): grouprly

2285

650.4

Digital Library Space

A space is a measurable space, measure space, probability space, vector space, topological space, or a metric space. Version: 1 Owner: kemyers3 Author(s): kemyers3

650.5

Digitial Library Searching Service

a searching service is a set of searching senarios sc1 , sc2 , ..., sct , where for each query q ∈ Q there is a searching senario sck =< e0 , ..., en > such that e0 is the start event triggered by a query q and event en is the final event of returning the matching function values M1 (q, d) for all d ∈ C. Version: 1 Owner: grouprly Author(s): grouprly

650.6

Service, activity, task, or procedure

A service, activity, task, or procedure is a set of scenarios. Version: 1 Owner: seonyoung Author(s): seonyoung

650.7

StructuredStream

Given a structure (G, L, F ), G = (V, E) and a Stream, a StructuredStream is a function V → (NxN) that associates each node Vk ∈ V with a pair of natural numbers (a, b), a < b, corresponding to a contiguous subsequence [Sa , Sb ] (segment) of the Stream S. Version: 1 Owner: jamika chris Author(s): jamika chris

650.8

collection

A collectionC{do1 , do2, ..., dok } is a set of digital objects. Version: 1 Owner: ruiyang Author(s): ruiyang

2286

650.9

digital library stream

a STREAM is a sequence whose codomain is a non-empty set. Version: 1 Owner: jamika chris Author(s): jamika chris

650.10

digital object

A digital object is a tuple do = (h, SM, ST, StructureStreams) where 1. h ∈ H, where H is a set of univerally unique handles(lables); 2. SM = {sm1 , sm2 , ...smn } is a set of streams; 3. ST = {st1 , st2 , ...stn } is a set of structural metadata specifications; 4. StructureStreams = {stsm1 , stsm2 , ...stsmn } is a set of StructureStream function from the streams in the SM set ( the second component) of the digital object and from the structure in the ST set (the third component). Version: 3 Owner: ruiyang Author(s): ruiyang

650.11

good hash table primes

In the course of designing a good hashing configuration, it is helpful to have a list of prime numbers for the hash table size. The following is such a list. It has the properties that: 1. each number in the list is prime (as you no doubt expected by now) 2. each number is slightly less than twice the size of the previous 3. each number is as far as possible from the nearest two powers of two Using primes for hash tables is a good idea because it minimizes clustering in the hashed table. Item (2) is nice because it is convenient for growing a hash table in the face of expanding data. 2287

Item (3) has, allegedly, been shown to yield especially good results in practice. And here is the list:

lwr 25 26 27 28 29 21 0 21 1 21 2 21 3 21 4 21 5 21 6 21 7 21 8 21 9 22 0 22 1 22 2 22 3 22 4 22 5 22 6 22 7 22 8 22 9 23 0

upr 26 27 28 29 21 0 21 1 21 2 21 3 21 4 21 5 21 6 21 7 21 8 21 9 22 0 22 1 22 2 22 3 22 4 22 5 22 6 22 7 22 8 22 9 23 0 23 1

%err prime 10.416667 53 1.041667 97 0.520833 193 1.302083 389 0.130208 769 0.455729 1543 0.227865 3079 0.113932 6151 0.008138 12289 0.069173 24593 0.010173 49157 0.013224 98317 0.002543 196613 0.006358 393241 0.000127 786433 0.000318 1572869 0.000350 3145739 0.000207 6291469 0.000040 12582917 0.000075 25165843 0.000010 50331653 0.000023 100663319 0.000009 201326611 0.000001 402653189 0.000011 805306457 0.000000 1610612741

The columns are, in order, the lower bounding power of two, the upper bounding power of two, the relative deviation (in percent) of the prime number from the optimal middle of the first two, and finally the prime itself. Bon hash´etite! Version: 5 Owner: akrowne Author(s): akrowne

2288

650.12

hashing

650.12.1

Introduction

Hashing refers to an information storage and retrieval technique which is very widely used in real-world applications. There many more potential places it could profitably be applied as well. In fact, some programming languages these days (such as Perl) are designed with hashing built-in, so the programmer does not even have to know about them (or know much about them) to benefit. Hashing is inspired by both the classical searching and sorting problems. We know that with comparison-based sorting, the quickest we can put a set of n items in lexicographic order is O(n log n). We can then update the sorted structure with new values either by clumsy reallocations of memory and shifting of elements, or by maintaining list structures. Searching for an item in a sorted set of n items is then no faster than O(log n). While fast compared to other common algorithms, these time complexities pose some problems for very large data sets and real-time applications. In addition, there is no hassle-free way to add items to a sorted list of elements; some overhead always must be maintained. Hashing provides a better way, utilizing a bit of simple mathematics. With hashing, we can typically achieve an average-case time complexity of O(1) for storage and retrieval, with none of the updating overhead we see for either lists or arrays.

650.12.2

The Basics of Hashing

We begin with a set of objects which are referenced by keys. The keys are just handles or labels which uniquely describe the objects. The objects could be any sort of digital information, from a single number to an entire book worth of text 1 . The key could also be anything (textual, or numerical), but what is important for hashing is that it can be efficiently reduced to a numerical representation. The invariance of hashing with respect to the character of the underlying data is one of the main reasons it is such a useful technique. We ”hash” an object by using a function h to place it in a hash table (which is just an array). Thus, h takes the object’s key and returns a memory location in a hash table. If our set of keys is contained in a key space K, and T is the set of memory locations in the hash table, then we have

1

h : K 7→ T

In fact, modern filesystems use hashing. The keys are the file names, and the objects are the files themselves.

2289

Since T is memory locations, we must have T ⊆ Z. Since the size of the object set (n) has nothing to do with h, evaluating h is O(1) in n. In other words, optimally we can keep throwing stuff into the hash table (until it’s full) without expecting any slowdown. Similarly, we can “look up” an element in O(1) time, because our query consists of (1) sending our query key through h, and then (2) checking to see if anything is in the hash table location. Of course, in order to be able to tell if a location in the hash table is occupied, we need a special value called a “tombstone” to delimit an empty spot. A value such as “0” will do, assuming the objects we are storing cannot possibly have the digital representation “0”2 .

650.12.3

The Hash Function

The hash function

3

has two main characteristics:

1. it should evaluate in a small amount of time (it should be mathematically simple) 2. it should be relatively unlikely that two different keys are mapped to the same hash table location. There are two classes of hash functions, those that use integer modulo (division hashing) and those which use multiplication by a real number and then a truncation to integer (multiplication hashing). In general, division hash functions look like: h(k) = f (k) (mod#1) Where n is the hash table size and f is a function which expresses the key in numerical form. This works because the value of x mod n will be a “random” number between 2

An abstract hash table implementation, however, will have to go to extra lengths to ensure that the tombstone is an “out-of-band” value so that no extra restrictions are put on the values of the objects which the client can store in the hash table. 3 There is another kind of “hash” commonly referred to that has nothing to do with storage and retrieval, but instead is commonly used for checksums and verification. This kind of hash has to do with the production of a summary key for (usually large) sized objects. This kind of hash function maps the digital object space (infinitely large, but working with some functional range of sizes) into a much smaller but still astronomically large key space (typically something like 128 bits). Like the hash functions discussed above, these hash functions are also very sensitive to change in the input data, so a single bit changed is likely to drastrically change the output of the hash. This is the reason they are useful for verification. As for collisions, they are possible, but rather than being routine, the chances of them are infitesimally small because of the large size of the output key space.

2290

For example, f (k) = k (k integer values) and n = p (a prime) is a very simple and widely used class of hash function. Were k a different type of data than integer (say, strings), we’d need a more complicated f to produce integers. Multiplication hash functions look like: h(k) = bn · ((f (k) · r) (mod#1))c Where 0 < r < 1. Intuitively, we can expect that multiplying a “random” real number between 0 and 1 with an integer key should give us another “random” real number. Taking the decimal part of this should give us most of the digits of precision (i.e., randomness) of the original, and at the same time act as the analog of the modulo in the division hash to restrict output to a range of values. Multiplying the resulting “random” number between 0 and 1 with the size of the hash table (n) should then give us a “random” index into it.

650.12.4

Collision

In general, the hash function is not a one-to-one function. This means two different keys can hash to the same table entry. Because of this, some policy for placing a colliding key is needed. This is called a collision resolution policy. The collision resolution policy must be designed such that if there is a free space in the hash table, it must eventually find it. If not, it must indicate failure. One collision resolution policy is to use a collision resolution function to compute a new location. A simple collision resolution function is to add a constant integer to the hash table location until a free space is found (linear probing). In order to guarantee that this will eventually get us to an empty space, hashing using this policy works best with a prime-sized hash table. Collisions are the reason the hash function needs to spread the objects out in the hash table so they are distant from each other and “evenly” distributed. The more bunched-up they are, the more likely collisions are. If n is the hash table size and t is the number of places in the hash table which are taken, then the value l=

t h

is called load factor of the hash table, and tells us how “full” it is. As the load factor nears 1, collisions are unavoidable, and the collision resolution policy will be invoked more often. There are three ways to avoid this quagmire: 2291

• Make the hash table much bigger than the object set. • Maintain a list at each hash table location instead of individual objects. • Use a special hash function which is one-to-one. Option 1 makes it statistically unlikely that we will have to use the collision resolution function. This is the most common solution. If the key space is K, the set of actual keys we are hashing is A (A ⊆ K), and the table space is T , then this solution can be phrased as: |A| < c |T | with c some fractional constant. In practice, 1/2 < c < 3/4 gives good results 4 . Option 2 is called chaining and eliminates the need for a collision resolution function 5 . Note that to the extent that it is not combined with #1, we are replacing a pure hashing solution with a hash-and-list hybrid solution. Whether this is useful depends on the application. Option 3 is referring to a perfect hash function, and also eliminates the need for a collision resolution function. Depending on how much data we have and how much memory is available to us, it may be possible to select such a hash function. For a trivial example, h(k) = k suffices as a perfect hash function if our keys are integers and our hash table is an array which is larger than the key space. The downside to perfect hash functions is that you can never use them in the generic case; you always have to know something about the keyspace and the data set size to guarantee perfectness.

650.12.5

Variations

In addition to perfect hash functions, there is also a class of hash functions called minimal perfect hash functions. These functions are one-to-one and onto. Thus, they map a key space of size n to a hash table space of size n, all of our objects will map perfectly to hash table locations, and there will be no leftover spots wasted and no collisions. Once again an array indexed by integers is a simple example, but obviously there is more utility to be had from keys which are not identical to array indices. Minimal perfect hash functions may seem too good to be true– but they are only applicable in special situations. There is also a class of hash functions called order-preserving hash functions. This just means that the lexicographic order of the hashed object locations is the same as the lexicographic order of the keys: 4

For |T | prime, this condition inspires a search for “good primes” that are approximately one-half greater or double |A|. Finding large primes is a non-trivial task; one cannot just make one up on the spot. See good hash table primes. 5 It also allows over-unity load factors, since we can have more items “in” the hash table than actual “locations” in the hash table.

2292

k1 < k2 → h(k1 ) < h(k2 ) for keys k1 and k2 . The benefit of an order-preserving hash is that it performs a sort with no “extra” work 6 ; we can print out the hashed objects in sorted order simply by scanning the table linearly. In principle, an order-preserving hash is always possible when the used key space is smaller than the hash table size. However, it may be tricky to design the right hash function, and it is sure to be very specific to the key space. In addition, some complexity is likely to be introduced in the hash function which will slow down its evaluation.

650.12.6

References

Coming soon! Version: 3 Owner: akrowne Author(s): akrowne

650.13

metadata format

S Let DLM F = D1 , D2 , ..., Di be the set of domains that make up a set of literals LMF = ij =1 Dj . As for metadata specifications, let RMF and PMF represent sets of labels for resources and properties, respectively. A metadata format for descriptive metadata specifications is a tuple MF = (VMF , defMF ) with VMF = R1 , R2 , ..., Rk S⊂ 2RMF a family of subsets of the resources labels RMF and defMF : VMF × PMF → VMF DLM F is a property definition function. Version: 3 Owner: npolys Author(s): npolys

650.14

system state

A system state is a function s : L → V , from labels L to values V . A state set S consists of a set of state functions s : L → V . Version: 1 Owner: mikestaflogan Author(s): mikestaflogan 6

In other words, we escape the O(n log n) bounds on sorting because we aren’t doing a comparison-based sort at all.

2293

650.15

transition event

A transition event (or simply event) on a state set S is an element e = (si , sj ) ∈ (S × S) of a binary relation on state set S that signifies the transition from one state to another. An event e is defined by a condition function c(si ) which ealuates a Boolean function in state si and by an action function p. Version: 1 Owner: mikestaflogan Author(s): mikestaflogan

2294

Chapter 651 68P30 – Coding and information theory (compaction, compression, models of communication, encoding schemes, etc.) 651.1

Huffman coding

Huffman coding is a method of lossless data compression, and a form of entropy encoding. The basic idea is to map an alphabet to a representation for that alphabet, composed of strings of variable size, so that symbols that have a higher probability of occurring have a smaller representation than those that occur less often. The key to Huffman coding is Huffman’s algorithm, which constructs an extended binary tree of minimum weighted path length from a list of weights. For this problem, our list of weights consists of the probabilities of symbol occurrence. From this tree (which we will call a Huffman tree for convenience), the mapping to our variable-sized representations can be defined. The mapping is obtained by the path from the root of the Huffman tree to the leaf associated with a symbol’s weight. The method can be arbitrary, but typically a value of 0 is associated with an edge to any left child and a value of 1 with an edge to any right child (or vice-versa). By concatenating the labels associated with the edges that make up the path from the root to a leaf, we get a binary string. Thus the mapping is defined. In order to recover the symbols that make up a string from its representation after encoding, an inverse mapping must be possible. It is important that this mapping is unambiguous. We can show that all possible strings formed by concatenating any number of path labels in a Huffman tree are indeed unambiguous, due to the fact that it is a complete binary tree. That is, given a string composed of Huffman codes, there is exactly one possible way to 2295

decompose it into the individual codes. Ambiguity occurs if there is any path to some symbol whose label is a prefix of the label of a path to some other symbol. In the Huffman tree, every symbol is a leaf. Thus it is impossible for the label of a path to a leaf to be a prefix of any other path label, and so the mapping defined by Huffman coding has an inverse and decoding is possible.

Example For a simple example, we will take a short phrase and derive our probabilities from a frequency count of letters within that phrase. The resulting encoding should be good for compressing this phrase, but of course will be inappropriate for other phrases with a different letter distribution. We will use the phrase “math for the people by the people”. The frequency count of characters in this phrase are as follows (let denote the spaces). Letter Count 6 e 6 p 4 h 3 o 3 t 3 l 2 a 1 b 1 f 1 m 1 r 1 y 1 Total 33 We will simply let the frequency counts be the weights. If we pair each symbol with its weight, and pass this list of weights to Huffman’s algorithm, we will get something like the following tree (edge labels have been added).

From this tree we obtain the following mapping.

2296

Letter Count 6 e 6 p 4 h 3 o 3 t 3 l 2 a 1 b 1 f 1 m 1 r 1 y 1 Total 33

Huffman code 111 01 101 1100 1101 001 0001 00000 00001 10000 10001 10010 10011 -

Weight 18 12 12 12 12 9 8 5 5 5 5 5 5 113

If we were to use a fixed-sized encoding, our original string would have to be 132 bits in length. This is because there are 13 symbols, requiring 4 bits of representation, and the length of our string is 33. The weighted path length of this Huffman tree is 113. Since these weights came directly from the frequency count of our string, the number of bits required to represent our string using this encoding is the same as the weight of 113. Thus the Huffman encoded string is 85% the length of the fixed-sized encoding. Arithmetic encoding can in most cases obtain even greater compression, although it is not quite as simple to implement. Version: 4 Owner: Logan Author(s): Logan

651.2

Huffman’s algorithm

Huffman’s algorithm is a method for building an extended binary tree of with a minimum weighted path from a set of given weights. Initially construct a forest of singleton trees, one associated with each weight. If there are at least two trees, choose the two trees with the least weight associated with their roots and replace them with a new tree, constructed by creating a root node whose weight is the sum of the weights of the roots of the two trees removed, and setting the two trees just removed as this new node’s children. This process is repeated until the forest consists of one tree.

2297

Pseudocode Algorithm Huffman((W, n)) Input: A list W of n (positive) weights Output: An extended binary tree T with weights taken from W that gives the minimum weighted path length begin Create list F from singleton trees formed from elements of W whileF has more than one element do Find T1 , T2 in F that have minimum values associated with their roots Remove T1 , T2 from F Construct new tree T by creating a new node and setting T1 and T2 as its children Let the sum of the values associated with the roots of T1 and T2 be associated with the root of T Add T to F end Huffman ← tree stored in F Example Let us work through an example for the set of weights {1, 2, 3, 3, 4}. Initially our forest is

During the first step, the two trees with weights 1 and 2 are merged, to create a new tree with a root of weight 3.

We now have three trees with weights of 3 at their roots. It does not matter which two we choose to merge. Let us choose the tree we just created and one of the singleton nodes of weight 3.

Now our two minimum trees are the two singleton nodes of weights 3 and 4. We will combine these to form a new tree of weight 7.

Finally we merge our last two remaining trees.

2298

The result is an extended binary tree of minimum path length 29.

Analysis Each iteration of Huffman’s algorithm reduces the size of the problem by 1, and so there are exactly n iterations. The ith iteration consists of locating the two minimum values in a list of length n − i + 1. This is a linear operation, and so Huffman’s algorithm clearly has a time complexity of O(n2 ). However, it would be faster to sort the weights initially, and then maintain two lists. The first list consists of weights that have not yet been combined, and the second list consists of trees that have been formed by combining weights. This initial ordering is obtained at a cost of O(n log n). Obtaining the minimum two trees at each step then consists of two comparisons (compare the heads of the two lists, and then compare the larger to the item after the smaller). The ordering of the second list can be maintained cheaply by using a binary search to insert new elements. Since at step i there are i − 1 elements in the second list, O(log i) comparisons are needed for insertion. Over the entire duration of the algorithm the cost of keeping this list sorted is O(n log n). Therefore the overall time complexity of Huffman’s algorithm is O(n log n). In terms of space complexity, the algorithm constructs a complete binary tree with exactly n leaves. Therefore the output can only have at most 2n−1 nodes. Thus Huffman’s algorithm requires linear space. Version: 2 Owner: Logan Author(s): Logan

651.3

arithmetic encoding

Arithmetic coding is a technique for achieving near-optimal entropy encoding. An arithmetic encoder takes a string of symbols as input and produces a rational number in the interval [0, 1) as output. As each symbol is processed, the encoder will restrict the output to a smaller interval. Let N be the number of distinct symbols in the input; let x1 , x2 . . . xN represent the symbols, and let P1 , P2 . . . PN represent the probability of each symbol appearing. At each step in the process, the output is restricted to the current interval [y, y + R). Partition this interval into N disjoint subintervals: I1 = [y, y + P1 R) 2299

I2 = [y + P1 R, y + P1 R + P2 R) .. . " ! N −1 X IN = y + Pi R, y + R i=1

Therefore the size of Ii is Pi R. If the next symbol is xi , then restrict the output to the new interval Ii . Note that at each stage, all the possible intervals are pairwise disjoint. Therefore a specific sequence of symbols produces exactly one unique output range, and the process can be reversed. Since arithmetic encoders are typically implemented on binary computers, the actual output of the encoder is generally the shortest sequence of bits representing the fractional part of a rational number in the final interval. Suppose our entire input string contains M symbols: then xi appears exactly Pi M times in the input. Therefore, the size of the final interval will be Rf =

N Y

PiPi M

i=1

The number of bits necessary to write a binary fraction in this range is − log2 Rf = − log2 = =

N X

i=1 N X i=1

N Y

PiPi M

i=1

− log2 PiPiM −Pi M log2 Pi

By Shannon’s theorem, this is the total entropy of the original message. Therefore arithmetic encoding is near-optimal entropy encoding. Version: 1 Owner: vampyr Author(s): vampyr

651.4

binary Gray code

An n-bit binary Gray code is a non-repeating sequence of the integers from 0 to 2n − 1 inclusive such that the binary representation of each number in the sequence differs by exactly 2300

one bit from the binary representation of the previous number: that is, the Hamming distance between consecutive elements is 1. In addition, the last number in the sequence must differ by exactly one bit from the first number in the sequence. For example, one 3-bit Gray code is: 0002 0102 0112 0012 1012 1112 1102 1002 There is a one-to-one correspondence between all possible n-bit Gray codes and all possible Hamiltonian cycles on an n-dimensional hypercube. (To see why this is so, imagine assigning a binary number to each vertex of a hypercube where an edge joins each pair of vertices that differ by exactly one bit.) Version: 4 Owner: vampyr Author(s): vampyr

651.5

entropy encoding

An entropy encoding is a coding scheme that involves assigning codes to symbols so as to match code lengths with the probabilities of the symbols. Typically, entropy encoders are used to compress data by replacing symbols represented by equal-length codes with symbols represented by codes proportional to the negative logarithm of the probability. Therefore, the most common symbols use the shortest codes. According to Shannon’s theorem, the optimal code length for a symbol is − logb P where b is the number of symbols used to make output codes and P is the probability of the input symbol. Two of the most common entropy encoding techniques are Huffman encoding and arithmetic encoding. Version: 1 Owner: vampyr Author(s): vampyr 2301

Chapter 652 68Q01 – General 652.1

currying

Currying is the technique of emulating multiple-parametered functions with higher-order functions. The notion is that a function of n arguments can be thought of as a function of 1 argument that maps to a function of n − 1 arguments. A curried function is a function represented by currying, e.g. f : R → (R → R) For conciseness, the mapping operator → is usually considered right-associative, so that f : R → R → R is equivalent. In contrast, an uncurried function is usually specified as a mapping from a cartesian product, such as f :R×R→R The term currying is derived from the name of Haskell Curry, a 20th century logician. However, Curry was not the first person to discover this notion, as it was first introduced by Gottlob Frege in 1893 and expanded by Moses Sch¨onfinkel in the 1920s. Hence the notion is sometimes referred to as sch¨ onfinkeling. Version: 2 Owner: Logan Author(s): Logan

2302

652.2

higher-order function

Any function that maps a function to anything or maps anything to a function is a higherorder function. In programming language terms, a higher-order function is any function that takes one or more functions as arguments and/or returns a function. For example, a predicate which makes some statement about a function would be a higherorder function. Version: 1 Owner: Logan Author(s): Logan

2303

Chapter 653 68Q05 – Models of computation (Turing machines, etc.) 653.1

Cook reduction

Given two (search or decision) problems π1 and π2 and a complexity class C, a C Cook reduction of π1 to π2 is a Turing machine appropriate for C which solves π1 using π2 as an oracle (the Cook reduction itself is not in C, since it is a Turing machine, not a problem, but it should be the class of bounded Turing machines corresponding to C). The most common type are P Cook reductions, which are often just called Cook reductions. If a Cook reduction exists then π2 is in some sense ”at least as hard” as π1 , since a machine which solves π2 could be used to construct one which solves π1 . When C is closed under appropriate operations, if π2 ∈ C and π1 is C-Cook reducible to π2 then π1 ∈ C. A C Karp reduction is a special kind of C Cook reduction for decision problems L1 and L2 . It is a function g ∈ C such that: x ∈ L1 ↔ g(x) ∈ L2 Again, P Karp reductions are just called Karp reductions. A Karp reduction provides a Cook reduction, since a Turing machine could decide L1 by calculating g(x) on any input and determining whether g(x) ∈ L2 . Note that it is a stronger condition than a Cook reduction. For instance, this machine requires only one use of the oracle. Version: 3 Owner: Henry Author(s): Henry

2304

653.2

Levin reduction

If R1 and R2 are search problems and C is a complexity class then a C Levin reduction of R1 to R2 consists of three functions g1 , g2 , g3 ∈ C which satisfy: • g1 is a C Karp reduction of L(R1 ) to L(R2 ) • If R1 (x, y) then R2 (f (x), g(x, y)) • If R2 (f (x), z) then R1 (x, h(x, z)) Note that a C Cook reduction can be constructed by calculating f (x), using the oracle to find z, and then calculating h(x, z). P Levin reductions are just called Levin reductions. Version: 2 Owner: Henry Author(s): Henry

653.3

Turing computable

A function is Turing computable if the function’s value can be computed with a Turing machine. For example, all primitive recursive functions are Turing computable. Version: 3 Owner: akrowne Author(s): akrowne

653.4

computable number

A real number is called computable if its digit sequence can be produced by some algorithm (or Turing machine). The algorithm takes a natural number n as input and produces the n-th digit of the real number’s decimal expansion as output. A complex number is called computable if its real and imaginary parts are computable. The computable numbers form an algebraically closed field, and arguably this field contains all the numbers we ever need in practice. It contains all algebraic numbers as well as many known transcendental constants. There are however many real numbers which are not computable: the set of all computable numbers is countable (because the set of algorithms is) while the set of real numbers is uncountable. Every computable number is definable, but not vice versa. An example of a definable, non-computable real is Chaitin’s constant, Ω. 2305

Computable numbers were introduced by Alan Turing in 1936. Version: 4 Owner: AxelBoldt Author(s): AxelBoldt, yark

653.5

deterministic finite automaton

A deterministic finite automaton (or DFA) can be formally defined as a 5-tuple (Q, Σ, T, q0 , F ), where Q is a finite set of states, Σ is the alphabet (defining what set of input strings the automaton operates on), T : Q × Σ → Q is the transition function q0 ∈ Q is the starting state, and F ⊂ Q is a set of final (or accepting states). Operation of the DFA begins at q0 , and movement from state to state is governed by the transition function T . T must be defined for every possible state in Q and every possible symbol in Σ. A DFA can be represented visually as a directed graph. Circular verticies denote states, and the set of directed edges, labelled by symbols in Σ, denotes T . The transition function takes the first symbol of the input string as, and after the transition this first symbol is removed. If the input string is λ (the empty string), then the operation of the DFA is halted. If the final state when the DFA halts is in F , then the DFA can be said to have accepted the input string it was originally given. The starting state q0 is usually denoted by an arrow pointing to it that points from no other vertex. States in F are usually denoted by double circles. DFAs represent regular languages, and can be used to test whether any string in Σ∗ is in the language it represents. Consider the following regular language over the alphabet Σ := {a, b} (represented by the regular expression aa*b):

<S> ::= a A ::= b | a A

This language can be represented by the following DFA. 0

a

a +[o][F −]1

b a,b

+ + [o][F =]2

b

3 a,b

The vertex 0 is the initial state q0 , and the vertex 2 is the only state in F . Note that for every vertex there is an edge leading away from it with a label for each symbol in Σ. This is a requirement of DFAs, which guarantees that operation is well-defined for any finite string. 2306

If given the string aaab as input, operation of the DFA above is as follows. The first a is removed from the input string, so the edge from 0 to 1 is followed. The resulting input string is aab. For each of the next two as, the edge is followed from 1 to itself. Finally, b is read from the input string and the edge from 1 to 2 is followed. Since the input string is now λ, the operation of the DFA halts. Since it has halted in the accepting state 2, the string aaab is accepted as a sentence in the regular language implemented by this DFA. Now let us trace operation on the string aaaba. Execution is as above, until state 2 is reached with a remaining in the input string. The edge from 2 to 3 is then followed and the operation of the DFA halts. Since 3 is not an accepting state for this DFA, aaaba is not accepted. Although the operation of a DFA is much easier to compute than that of a non-deterministic automaton, it is non-trivial to directly generate a DFA from a regular grammar. It is much easier to generate a non-deterministic finite automaton from the regular grammar, and then transform the non-deterministic finite automaton into a DFA. Version: 2 Owner: Logan Author(s): Logan

653.6

non-deterministic Turing machine

The definition of a non-deterministic Turing machine is the same as the definition of a deterministic Turing machine except that δ is a relation, not a function. Hence, for any particular state and symbol, there may be multiple possible legal moves. If S ∈ Γ+ we say T accepts S if, when S is the input, there is some finite sequence of legal moves such that δ is undefined on the state and symbol pair which results from the last move in the sequence and such that the final state is an element of F . If T does not accept S then it rejects S. An alternative definition of a non-deterministic Turing machine is as a deterministic Turing machine with an extra one-way, read-only tape, the guess tape. Then we say T accepts S if there is any string c(S) such that, when c(S) is placed on the guess tape, T accepts S. We call c(S) a certificate for S, and otherwise that it rejects S. In some cases the guess tape is allowed to be two-way; this generates different time and space complexity classes than the one-way case (the one-way case is equivalent to the original definition). Version: 3 Owner: Henry Author(s): Henry

653.7

non-deterministic finite automaton

A non-deterministic finite automaton (or NDFA) can be formally defined as a 5-tuple (Q, Σ, T, q0 , F ), where Q is a finite set of states, Σ is the alphabet (defining what set of 2307

S input strings the automaton operates on), T : Q×(Σ λ) → P(Q) is the transition function, q0 ∈ Q is the starting state, and F ⊂ Q is a set of final (or accepting) states. Note how this definition differs from that of a deterministic finite automaton (DFA) only by the definition of the transition function T . Operation of the NDFA begins at q0 , and movement from state to state is governed by the transition function T . The transition function takes the first symbol of the (remaining) input string and the current state as its input, and after the transition this first symbol is removed only if the transition is defined for a symbol in Σ instead of λ. Conceptually, all possible transitions from a current state are followed simultaneously (hence the non-determinism). Once every possible transition has been executed, the NDFA is halted. If any of the states reached upon halting are in F for some input string, and the entire input string is consumed to reach that state, then the NDFA accepts that string. An NDFA can be represented visually as a directed graph. S Circular verticies denote states, and the set of directed edges, labelled by symbols in Σ λ, denotes T . The starting state q0 is usually denoted by an arrow pointing to it that points from no other vertex. States in F are usually denoted by double circles. NDFAs represent regular languages, and can be used to test whether any string in Σ∗ is in the language it represents. Consider the following regular language over the alphabet Σ := {a, b} (represented by the regular expression aa*b):

<S> ::= a A
::= λ B | a A ::= b This language can be represented by the following NDFA: 0

a

a +[o][F −]1

λ

+[o][F −]2 b

3

The vertex 0 is the initial state q0 , and the vertex 3 is the only state in F . If given the string aaab as input, operation of the NDFA is as follows. Let X ∈ (Q × Σ∗ ) indicate the set of “current” states and the remaining input associated with them. Initially X := {(0, aaab)}. For state 0 with a leading a as its input, the only possible transition to follow is to 1 (which consumes the a). This transforms X to {(1, aab)}. Now there are two possible transitions to follow for state 1 with a leading a. One transition is back to 1, consuming the a, while the other is to 2, leaving the a. Thus X is then {(1, ab), (2, aab)}. Again, the same transitions are possible for state 1, while no transition at all is available for state 2 with a leading a, so X is then {(1, b), (2, aab), (2, ab)}. At this point, there is still no possible transition from 2, and the only possible transition from 1 is to 2 (leaving the input string as it is). This then gives {(2, aab), (2, ab), (2, b)}. Only state 2 with remaining 2308

input of b has a transition leading from it, giving {(2, aab), (2, ab), (3, λ)}. At this point no further transitions are possible, and so the NDFA is halted. Since 3 is in F , and the input string can be reduced to λ when it reached 3, the NDFA accepts aaab. If the input string were instead aaaba, processing would occur as before until {(2, aaba), (2, aba), (3, a)} is reached and the NDFA halts. Although 3 is in F , it is not possible to reduce the input string completely before reaching 3. Therefore aaaba is not accepted by this NDFA. Any regular grammar can be represented by an NDFA. Any string accepted by the NDFA is in the language represented by that NDFA. Furthermore, it is a straight-forward process to generate an NDFA for any regular grammar. Actual operation of an NDFA is generally intractable, but there is a simple process to transform any NDFA into a DFA, the operation of which is very tractable. Regular expression matchers tend to operate in this manner. Version: 1 Owner: Logan Author(s): Logan

653.8

non-deterministic pushdown automaton

A non-deterministic pushdown automaton (or PDA) is a variation on the idea of a non-deterministic finite au (NDFA). Unlike an NDFA, a PDA is associated with a stack (hence the name pushdown). The transition function must also take into account the state of the stack. Formally defined, a non-deterministic pushdown automaton is a 6-tuple (Q, Σ, Γ, T, q0 , F ). Q, Σ, q0 , and F are the same as for an NDFA. Γ is the stack alphabet, specifying S the set of symbols that can be pushed onto the stack. The transition function is T : Q×(Σ {λ})×Γ → P(Q). Like an NDFA, a PDA can be presented visually as a directed graph. Instead of simply labelling edges representing transitions with the leading symbol, two additional symbols are added, representing what symbol must be matched and removed from the top of the stack (or λ if none) and what symbol should be pushed onto the stack (or λ if none). For instance, the notation a A/B for an edge label indicates that a must be the first symbol in the remaining input string and A must be the symbol at the top of the stack for this transition to occur, and after the transition, A is replaced by B at the top of the stack. If the label had been a λ/B, then the symbol at the top of the stack would not matter (the stack could even be empty), and B would be pushed on top of the stack during the transition. If the label had been a A/λ, A would be popped from the stack and nothing would replace it during the transition. When a PDA halts, it is considered to have accepted the input string if and only if there is some final state where the entire input string has been consumed and the stack is empty. For example, consider the alphabet Σ := {(, )}. Let us define a context-free language L that consists of strings where the parentheses are fully balanced. If we define Γ := {A}, then a PDA for accepting such strings is: 2309

( λ/A

0 ) A/λ

 Another simple example is a PDA to accept binary palindromes (that is, w ∈ {0, 1}∗ | w = w R ). 0 λ/A

0

λ λ/λ 0 λ/λ 1 λ/λ

0 A/λ

1

1 λ/B

1 B/λ

It can be shown that the language of strings accepted by any PDA is a context-free language, and that any context-free language can be represented by a PDA. Version: 4 Owner: Henry Author(s): Henry, yark, Logan

653.9

oracle

An oracle is a way to allow Turing machines access to information they cannot necessarily calculate. It makes it possible to consider whether solving one problem would make it possible to solve another problem, and therefore to ask whether one problem is harder than another. If T is a Turing machine and π is a problem (either a search problem or a decision problem) with appropriate alphabet then T π denotes the machine which runs T using π as an oracle. A Turing machine with an oracle has three special states ?, y and n. Whenever T π enters state ? it queries the oracle about x, the series of non-blank symbols to the right of the tape head. If π is a decision problem then the machine immediately changes to state y if in x ∈ π and n otherwise. If π is a search problem then it switches to state n if there is no z such that π(x, z) and to state y is there is such a z. In the latter case the section of the tape containing x is changed to contain z. In either case, the oracle allows the machine to behave as if it could solve π, since each time it accesses the oracle, the result is the same as a machine which solves π. Alternate but equivalent definitions use a special tape to contain the query and response. Clearly if L 6= L0 then L(T L ) may not be equal to L(T L ). 0

By definition, if T is a Turing machine appropriate for a complexity class C then T L is a C Cook reduction of L(T L ) to L. 2310

If C is a complexity class then T C is a complexity class with T C = {L | L = L(T C ) ∧ C ∈ C}. If D is another complexity class then DC = {L | L ∈ T C ∧ L(T ∅ ) ∈ D}. Version: 3 Owner: Henry Author(s): Henry

653.10

self-reducible

A search problem R is C self-reducible if there is a C Cook reduction of R to L(R). That is, if the decision problem for L(R) is C then so is the search problem for R. If R is polynomially self-reducible then it is called self-reducible. Note that L(R) is trivially Cook reducible to R. Version: 2 Owner: Henry Author(s): Henry

653.11

universal Turing machine

A universal Turing machine U is a Turing machine with a single binary one-way read-only input tape, on which it expects to find the encoding of an arbitrary Turing machine M. The set of all Turing machine encodings must be prefix-free, so that no special end-marker or ‘blank’ is needed to recognize a code’s end. Having transferred the description of M onto its worktape, U then proceeds to simulate the behaviour of M on the remaining contents of the input tape. If M halts, then U cleans up its worktape, leaving it with just the output of M, and halts too. If we denote by M() the partial function computed by machine M, and by < M > the encoding of machine M as a binary string, then we have U(< M > x) = M(x). There are two kinds of universal Turing machine, depending on whether the input tape alphabet of the simulated machine is {0, 1, #} or just {0, 1}. The first kind is a plain Universal Turing machine; while the second is a prefix Universal Turing machine, which has the nice property that the set of inputs on which it halts is prefix free. The letter U is commonly used to denote a fixed universal machine, whose type is either mentioned explicitly or assumed clear from context. Version: 2 Owner: tromp Author(s): tromp

2311

Chapter 654 68Q10 – Modes of computation (nondeterministic, parallel, interactive, probabilistic, etc.) 654.1

deterministic Turing machine

The formal definition of a deterministic Turing machine is a tuple: T = (Q, Σ, Γ, δ, q0 , B, F ) Where Q is a finite set of states, Γ is the finite set of tape symbols, B ∈ Γ is the blank symbol, Σ ⊆ Γ is the set of input symbols, δ : Q × Γ → Q × Γ × {L, R} is the move function, q0 ∈ Q is the start state and F ⊆ Q is the set of final states. A Turing machine is conceived to be a box with a tape and a tape head. The tape consists of an infinite number of cells stretching in both directions, with the tape head always located over exactly one of these cells. Each cell has symbol from Γ written on it. At the beginning of its computation, T is in state q0 and a finite string S (the input string) over the alphabet Γ is written on the tape, with the tape located over the first letter of S. Each cell before the beginning or after the end of S is blank. For each move, if the current state is q and the value of the cell under the tape head is A then suppose δ(q, A) = (q 0 , A0 , D). The value of the cell under the tape head is changed to A0 , the state changes to q 0 , and the tape head moves to the left if D = L and to the right if D = R. If δ(q, A) is undefined then the machine halts. For any S ∈ Γ+ , we say T accepts S if T halts in a state q ∈ F when S is the input string, that T rejects S if T halts in a state q ∈ 6 F , and otherwise that T does not halt on S. 2312

This definition can easily be extended to multiple tapes with various rules. δ should be a function from Q × Γn → Γm × {L, R}n where n is the number of tapes, m is the number of input tapes, and L is interpreted to mean not moving (rather than moving to the left) for one-way tapes. Version: 4 Owner: Henry Author(s): Henry

654.2

random Turing machine

A random Turing machine is defined the same way as a non-deterministic Turing machine, but with different rules governing when it accepts or rejects. Whenever there are multiple legal moves, instead of always guessing right, a random machine selects one of the possible moves at random. As with non-deterministic machines, this can also be viewed as a deterministic machine with an extra input, which corresponds to the random selections. There are several different ways of defining what it means for a random Turing machine to accept or reject an input. Let ProbT (x) be the probability that T halts in an accepting state when the input is x. A positive one-sided error machine is said to accept x if ProbT (x) > 12 and to reject x if ProbT (x) = 0. A negative one-sided error machine accepts x if ProbT (x) = 1 and rejects x if ProbT (x) 6 21 . So a single run of a positive one-sided error machine never misleadingly accepts but may misleadingly reject, while a single run of a negative one-sided error machine never misleadingly rejects. The definition of a positive one-sided error machine is stricter than the definition of a nondeterministic machine, since a non-deterministic machine rejects when there is no certificate and accepts when there is at least one, while a positive one-sided error machine requires that half of all possible guess inputs be certificates. A two-sided error machine accepts x if ProbT (x) >

2 3

and rejects x if ProbT (x) 6 13 .

The constants in any of the definitions above can be adjusted, although this will affect the time and space complexity classes. A minimal error machine accepts x if ProbT (x) >

1 2

and rejects x if ProbT (x) < 12 .

One additional variant defines, in addition to accepting states, rejecting states. Such a machine is called zero error if on at least half of all inputs it halts in either an accepting or a rejecting state. It accepts x if there is any sequence of guesses which causes it to end in an accepting state, and rejects if there is any sequence of guesses which causes it to end in a rejecting state. In other words, such a machine is never wrong when it provides an answer, but does not produce a decisive answer on all input. The machine can emulate a positive 2313

(resp. negative) one-sided error machines by accepting (resp. rejecting) when the result is indecisive. It is a testament to the robustness of the definition of the Turing machine (and the ChurchTuring thesis) that each of these definitions computes the same functions as a standard Turing machine. The point of defining all these types of machines is that some are more efficient than others, and therefore they define different time and space complexity classes. Version: 1 Owner: Henry Author(s): Henry

2314

Chapter 655 68Q15 – Complexity classes (hierarchies, relations among complexity classes, etc.) 655.1

NP-complete

A problem π ∈ NP is NP complete if for any π 0 ∈ NP there is a Cook reduction of π 0 to π. Hence if π ∈ P then every NP problem would be in P. A slightly stronger definition requires a Karp reduction or Karp reduction of corresponding decision problems as appropriate. A search problem R is NP hard if for any R0 ∈ NP there is a Levin reduction of R0 to R. Version: 1 Owner: Henry Author(s): Henry

655.2

complexity class

If f (n) is any function and T is a Turing machine (of any kind) which halts on all inputs, we say that T is time bounded by f (n) if for any input x with length |x|, T halts after at most f (|x|) steps. For a decision problem L and a class K of Turing machines, we say that L ∈ KT IME(f (n)) if there is a Turing machine T ∈ K bounded by f (n) which decides L. If R is a search problem then R ∈ KT IME(f (n)) if L(R) ∈ KT IME(f (n)). The most common classes are all restricted to one read-only input tape and one output/work tape (and in some cases a one-way, read-only guess tape) and are defined as follows: • D is the class of deterministic Turing machines. (In this case KT IME is written 2315

DT IME.) • N is the class of non-deterministic Turing machines (and so KT IME is written NT IME). • R is the class of positive one-sided error Turing machines and coR the class of negative one-sided error machines • BP is the class of two-sided error machines • P is the class of minimal error machines • ZP is the class of zero error machines Although KT IME(f (n)) is a time complexity class for any f (n), in actual use time complexity classes are usually S the union of KT IME(f (n)) for many f . If Φ is a class of functions then KT IME(Φ) = f ∈Φ KT IME(f (n)). Most commonly this is used when Φ = O(f (n)). The most important time complexity classes are the polynomial classes: KP =

[

KT IME(ni )

i∈N

When K = D this is called just P, the class of problems decidable in polynomial time. One of the major outstanding problems in mathematics is the question of whether P = NP. We say a problem π ∈ KSP ACE(f (n)) if there is a Turing machine T ∈ K which solves π, always halts, and never uses more than f (n) S cells of its output/work tape. As above, if Φ is a class of function then KSP ACE(Φ) = f ∈Φ KSP ACE(f (n)) The most common space complexity classes are KL = KSP ACE(O(log n)). When K = D this is just called L.

If C is any complexity class then π ∈ coC if π is a decision problem and π ∈ C or π is a search problem and L(π) ∈ coC. Of course, this coincides with the definition of coR above. Clearly co(coC) = C. Since a machine with a time complexity f (n) cannot possibly use more than f (n) cells, KT IME(f (n)) ⊆ KSP ACE(f (n)). If K ⊆ K 0 then KT IME(f (n)) ⊆ K 0 T IME(f (n)) and similarly for space. The following are all trivial, following from the fact that some classes of machines accept and reject under stricter circumstances than others: \ D ⊆ ZP = R coR [ \ R coR ⊆ BP N 2316

BP ⊆ P Version: 1 Owner: Henry Author(s): Henry

655.3

constructible

A function f : N → N is time constructible if there is a deterministic Turing machine T (with alphabet {0, 1, B}) such that when T receives as input the a series of n ones, it halts after exactly f (n) steps. Similarly f is space constructible if there is a similar Turing machine which halts after using exactly f (n) cells. Most ’natural’ functions are both time and space constructible, including constant functions, polynomials, and exponentials, for example. Version: 4 Owner: Henry Author(s): Henry

655.4

counting complexity class

If NC is a complexity class associated with non-deterministic machines then #C = {#R | R ∈ NC} is the set of counting problems associated with each search problem in NC. In particular, #P is the class of counting problems associated with NP search problems. Version: 1 Owner: Henry Author(s): Henry

655.5

polynomial hierarchy

The polynomial hierarchy is a hierarchy of complexity classes generalizing the relationship between P and NP. p

p

We let Σp0 = Πp0 = ∆p0 = P, then ∆pi+1 = PΣi and Σpi+1 = NPΣi . Define Πpi to be coΣpi . For instance Σp1 = NPP = NP. S The complexity class PH = i∈N Σpi .

The polynomial hierarchy is closely related to the arithmetical hierarchy; indeed, an alternate definition is almost identical to the definition of the arithmetical hierarchy but stricter rules on what quantifiers can be used. When there is no risk of confusion with the arithmetical hierarchy, the superscript p can be 2317

dropped. Version: 3 Owner: Henry Author(s): Henry

655.6

polynomial hierarchy is a hierarchy

The polynomial hierarchy is a hierarchy. Specifically: [ p \ Σpi Πi ⊆ ∆pi+1 ⊆ Σpi+1 Πpi+1

Proof S p To see that Σpi Πpi ⊆ ∆pi+1 = PΣi , observe that the machine which checks its input against its oracle and accepts or rejects when the oracle accepts or rejects (respectively) is easily in P, as is the machine which rejects or accepts when the oracle accepts or rejects (respectively). These easily emulate Σpi and Πpi respectively. Since P ⊆ NP, it is clear that ∆pi ⊆ Σpi . Since PC is closed under complementation for any complexity class C (the associated machines are deterministic and always halt, so the p complementary machine just reverses which states are accepting), if L ∈ PΣi ⊆ Σpi then so is L, and therefore L ∈ Πpi . Unlike the arithmetical hierarchy, the polynomial hierarchy is not known to be proper. Indeed, if P = NP then P = PH, so a proof that the hierarchy is proper would be quite significant. Version: 1 Owner: Henry Author(s): Henry

655.7

time complexity

Time Complexity Time complexity refers to a function describing, based on the parameters of an algorithm, how much time it will take to execute. The exact value of this function is usually ignored in favour of its order, in the so-called big-O notation. Example. Comparison-based sorting has time complexity no better than O(n log n), where n is the number of elements to be sorted. The exact expression for time complexity of a particular

2318

sorting algorithm may be something like T (n) = cn log n, with c a constant, which still is order O(n log n). Version: 4 Owner: akrowne Author(s): akrowne

2319

Chapter 656 68Q25 – Analysis of algorithms and problem complexity 656.1

counting problem

If R is a search problem then cR (x) = |{y | R(x)}| is the corresponding counting function and #R = {(x, y) | y 6 cR (x)} denotes the corresponding counting problem. Note that cR is a search problem while #R is a decision problem, however cR can be C Cook reduced to #R (for appropriate C) using a binary search (the reason #R is defined the way it is, rather than being the graph of cR , is to make this binary search possible). Version: 3 Owner: Henry Author(s): Henry

656.2

decision problem

Let T be a Turing machine and let L ⊆ Γ+ be a language. We say T decides L if for any x ∈ L, T accepts x, and for any x ∈ / L, T rejects x. We say T enumerates L if: x ∈ L iff T accepts x For some Turing machines (for instance non-deterministic machines) these definitions are equivalent, but for others they are not. For example, in order for a deterministic Turing machine T to decide L, it must be that T halts on every input. On the other hand T could enumerate L if it does not halt on some strings which are not in L. L is sometimes said to be a decision problem, and a Turing machine which decides it is said to solve the decision problem. 2320

The set of strings which T accepts is denoted L(T ). Version: 9 Owner: Henry Author(s): Henry

656.3

promise problem

A promise problem is a generalization of a decision problem. It is defined by two decision T problems L1Sand L2 with L1 L2 = ∅. A Turing machine decides a promise problem if, for any x ∈SL1 L2 , it accepts when x ∈ L1 and rejects if x ∈ L2 . Behavior is undefined when x∈ / L1 L2 (this is the promise: that x is in one of the two sets). If L2 = Γ+ \ L1 then this is just the decision problem for L1 . Version: 1 Owner: Henry Author(s): Henry

656.4

range problem

A range problem is a weakened form of a search problem. It consists of two functions fl and fu (the lower and upper bounds) and a linear ordering < on the ranges of f1 and f2 . A Turing machine solves a range problem if, for any x, the machine eventually halts with an output y such that f1 (x) < y < f2 (x). For example, given any function f with range in R and any g : N → R, the strong range 1 problem StrongRangeg (f ) is given by lower bound f (x) · (1 − 1−g(|x| ) and upper bound 1 f (x) · (1 − 1+g(|x| ) (note that g is passed the length of x, not the value, which need not even be a number). Version: 1 Owner: Henry Author(s): Henry

656.5

search problem

If R is a binary relation such that field(R) ⊆ Γ+ and T is a Turing machine, then T calculates f if: • If x is such that there is some y such that R(x, y) then T accepts x with output z such that R(x, z) (there may be multiple y, and T need only find one of them) • If x is such that there is no y such that R(x, y) then T rejects x

2321

Note that the graph of a partial function is a binary relation, and if T calculates a partial function then there is at most one possible output. A relation R can be viewed as a search problem, and a Turing machine which calculates R is also said to solve it. Every search problem has a corresponding decision problem, namely L(R) = {x | ∃yR(x, y)}. This definition may be generalized to n-ary relations using any suitable encoding which allows multiple strings to be compressed into one string (for instance by listing them consecutively with a delimiter). Version: 3 Owner: Henry Author(s): Henry

2322

Chapter 657 68Q30 – Algorithmic information theory (Kolmogorov complexity, etc.) 657.1

Kolmogorov complexity

Consider flipping a coin 50 times to obtain the binary string 000101000001010100010000010101000100000001 Can we call this random? The string has rather an abundance of 0s, and on closer inspection every other bit is 0. We wouldn’t expect even a biased coin to come up with such a pattern. Still, this string has probability 2−50 , just like any other binary string of the same length, so how can we call it any less random? Kolmogorov Complexity provides an answer to these questions in the form of a measure of information content in individual objects. Objects with low information content may be considered non-random. The topic was founded in the 1960s independently by three people: Ray Solomonoff, Andrei Kolmogorov, and Gregory Chaitin. Version: 5 Owner: tromp Author(s): tromp

657.2

Kolmogorov complexity function

The (plain) complexity C(x) of a binary string x is the length of a shortest program p such that U(p) = x, i.e. the (plain) universal Turing machine on input p, outputs x and halts. The lexicographically least such p is denoted x∗ . The prefix complexity K(x) is defined similarly in terms of the prefix universal machine. When clear from context, x∗ is also used to denote the lexicographically least prefix program for x. Plain and prefix conditional complexities C(x|y, K(x|y) are defined similarly but with U(x|y) = x, i.e. the universal machine starts out with y written on its worktape. 2323

Subscripting these functions with a Turing machine M, as in KM (x|y), denotes the corresponding complexity in which we use machine M in place of the universal machine U. Version: 4 Owner: tromp Author(s): tromp

657.3

Kolmogorov complexity upper bounds C(x) 6 l(x) + O(1).

This follows from the invariance theorem applied to a machine that copies the input (which is delimited with a blank) to the worktape and halts. C(x|y) 6 C(x) + O(1). This follows from a machine that works like U but which starts out by erasing the given string y from its worktape. K(x) 6 K(y) + K(x|y) + O(1). This follows from a machine that expects an input of the form pq, where p is a self-delimiting program for y and q a self-delimiting program to compute x given y. After simulating U to obtain y on its worktype, it continues to simulate U again, thus obtaining x. K(x|l(x)) 6 l(x) + O(1). This follows from a machine M which uses a given number n to copy n bits from input to its worktape. Version: 1 Owner: tromp Author(s): tromp

657.4

computationally indistinguishable

If {Dn }n∈N and {En }n∈N are distribution ensembles (on Ω) then we say they are computationally indistinguishable if for any probabilistic, polynomial time algorithm A and any polynomal function f there is some m such that for all n > m: | ProbA (Dn ) = ProbA (En )| < 2324

1 p(n)

where ProbA (Dn ) is the probability that A accepts x where x is chosen according to the distribution Dn . Version: 2 Owner: Henry Author(s): Henry

657.5

distribution ensemble

A distribution ensemble is a sequence {Dn }n∈N where each Dn is a distribution with finite support on some set Ω. Version: 2 Owner: Henry Author(s): Henry

657.6

hard core

If f is a function and R is a polynomial time computable relation, we say R is a hard-core of f if for any probabilistic, polynomial time computable relation S and any polynomial function p, there is some m such that for all n > m:

Pr[S(f (x)) = R(x)] <

1 p(n)

In particular, if f is one-to-one polynomial time computable then f is one-way, since if it were not f could be reversed to find x, and then R(x) calculated (since R is polynomial time computable). Version: 1 Owner: Henry Author(s): Henry

657.7

invariance theorem

The Invariance Theorem states that a universal machine provides an optimal means of description, up to a constant. Formally, for every machine M there exists a constant c such that for all binary strings x we have C(x) = CU (x) 6 CM (x) + c. This follows trivially from the definition of a universal Turing machine, taking c = l(< M >) as the length of the encoding of M. The Invariance Theorem holds likewise for prefix and conditional complexities. 2325

Version: 1 Owner: tromp Author(s): tromp

657.8

natural numbers identified with binary strings

It is convenient to identify a natural number ordering: 0 1 2 3 4 5 6 7 ...

n with the nth binary string in lexicographic  0 1 00 01 10 11 000 ...

The more common binary notation for numbers fails to be a bijection because of leading zeroes. Yet, there is a close relation: the nth binary string is the result of stripping the leading 1 from the binary notation of n + 1. With this correspondence in place, we can talk about such things as the length l(n) of a number n, which can be seen to equal blog(n + 1)c. Version: 1 Owner: tromp Author(s): tromp

657.9

one-way function

A function f is a one-way function if for any probabilistic, polynomial time computable function g and any polynomial function p there is m such that for all m < n:

Pr[f (g(f (x))) = f (x)] <

1 p(n)

where x has length n and all numbers of length n are equally likely. That is, no probabilistic, polynomial time function can effectively compute f −1 . Note that, since f need not be injective, this is a stricter requirement than Pr[g(f (x))) = x] <

2326

1 p(n)

since not only is g(f (x)) (almost always) not x, it is (almost always) no value such that f (g(f (x))) = f (x). Version: 2 Owner: Henry Author(s): Henry

657.10

pseudorandom

A distribution ensemble {Dn }n∈N is pseudorandom if it is computationally indistinguishable from the ensemble {Un }n∈N where each Un is the uniform distribution on the support of Dn . That is, no reasonable procedure can make meaningful predictions about what element will be chosen. Version: 5 Owner: Henry Author(s): Henry

657.11

psuedorandom generator

Let G be a deterministic polynomial-time function with stretch function l : N → N, so that if x has length n then G(x) has length l(n). Then let Gn be the distribution on strings of length l(n) defined by the output of G on a randomly selected string of length n selected by the uniform distribution. Then we say G is pseudorandom generator if {Gn }n∈N is pseudorandom. In effect, G translates a random input of length n to a pseudorandom output of length l(n). Assuming l(n) > n, this expands a random sequence (and can be applied multiple times, since if Gn can be replaced by the distribution of G(G(x))). Version: 1 Owner: Henry Author(s): Henry

657.12

support

If f is a distribution on Ω then the support of D is {x ∈ Ω | f (x) > 0}. That is, the set of x which have a positive probability under the distribution. Version: 2 Owner: Henry Author(s): Henry

2327

Chapter 658 68Q45 – Formal languages and automata 658.1

automaton

An automaton is a general term for any formal model of computation. Typically, an automaton is represented as a state machine. That is, it consists of a set of states, a set of transitions from state to state, a set of starting states, a set of acceptable terminating states, and an input string. A state transition usually has some rules associated with it that govern when the transition may occur, and are able to remove symbols from the input string. An automaton may even have some sort of data structure associated with it, besides the input string, with which it may interact. A famous automaton is the Turing machine, invented by Alan Turing in 1935. It consists of a (usually infinitely long) tape, capable of holding symbols from some alphabet, and a pointer to the current location in the tape. There is also a finite set of states, and transitions between these states, that govern how the tape pointer is moved and how the tape is modified. Each state transition is labelled by a symbol in the tape’s alphabet, and also has associated with it a replacement symbol and a direction to move the tape pointer (either left or right). At each iteration, the machine reads the current symbol from the tape. If a transition can be found leading from the current state that is labelled by the current symbol, it is “executed.” Execution of the transition consists of writing the transition’s replacement symbol to the tape, moving the tape pointer in the direction specified, and making the state pointed to by the transition the new current state. There are many variations of this model that are all called Turing machines, but fundamentally they all model the same set of possible computations. This abstract construct is very useful in computability theory.

2328

Other automata prove useful in the area of formal languages. Any context-free language may be represented by a pushdown automaton, and any regular language can be represented by a deterministic or non-deterministic finite automaton. Version: 4 Owner: Logan Author(s): Logan

658.2

context-free language

Formally, a context-free language is a 4-tuple (Σ, N, P, S), where Σ is an alphabet of terminal symbols, N is an S alphabet of non-terminal symbols, P is a finite subset of the cartesian product V × (N Σ)∗ , and S ∈ N is the start symbol. A context-free language defines a language over Σ, by specifying a set of non-terminals and a set of rules for combining terminals and non-terminals into new non-terminals. Any sentence that can be generated by following the productions that lead up to the non-terminal S are in the context-free language defined by (Σ, N, P, S). This formal definition is not initially very enlightening. As an example, consider typical infix arithmetic notation. An expression typically consists of one or more terms, joined together by operators with varying precedence. Parentheses can be used to override operator precedence. This syntax is a context-free language. Let us describe such expressions with the operators + and - with lowest precedence, and * and / with highest precedence, where all operators left-associative and the only other allowed symbols are integer literals and parentheses.

Σ := {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, (, ), +, -, *, /} N := {S, A, B, C, D} P := {(S, A), (S, S+A), (S, S-A), (A, B), (A, A*B), (A, A/B), (B, C), (B, (S)), (C, D), (C, CD), (D, 0), (D, 1), (D, 2), (D, 3), (D, 4), (D, 5), (D, 6), (D, 7), (D, 8), (D, 9)} (658.2.1) A context-free grammar is a grammar that generates a context-free language. Context-free grammars are also known as Type-2 grammars in the Chomsky hierarchy. The Chomsky hierarchy specifies Type-2 grammars as consisting only of production rules of the form A → γ, where S A is a non-terminal and γ is a string of terminals and non-terminals (i.e., A ∈ N, γ ∈ (N Σ)∗ ).

A context-free grammar can be represented by a pushdown automaton. The automaton serves both as an acceptor for the language (that is, it can decide whether or not any arbitrary sentence is in the language) and as a generator for the language (that is, it can generate any finite sentence in the language in finite time). 2329

Backus-Naur Form (or BNF as it is commonly denoted) is a convenient notation used to represent context-free grammars in a much more intuitive manner than the definition above. In Backus-Naur Form, there are only four symbols that have special meaning: <, >, ::=, and |. A non-terminal (that is, a symbol in the alphabet N) is always enclosed in < and > (e.g. <expression>). A terminal (that is, a symbol in the alphabet Σ) is often represented as itself, though in the context of computer languages a terminal symbol is often enclosed in single quotes. A production (non-terminal, symbols) in P is then represented as <non-terminal > ::= symbols The symbol | is used in BNF to combine multiple productions in P into one rule. For instance, if P := {(S, A), (S, B)}, then P in BNF is <S > ::= A | B Let us transform the context-free grammar specified in (657.2.1) to BNF. For readability, we will call S expression, A term, B factor, C number, and D digit. The BNF for (657.2.1) is then

<expression>

::= ::= ::= ::= ::=

| <expression> + | <expression> - | * | / | (expression) | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 (658.2.2)

The syntaxes of most programming languages are context-free grammars (or very close to it). In fact, BNF was invented to specify the syntax of ALGOL 60. A very useful subset of context-free languages are regular languages. Version: 5 Owner: Logan Author(s): Logan

2330

Chapter 659 68Q70 – Algebraic theory of languages and automata 659.1

Kleene algebra

A Kleene algebra (A, ·, +,∗ , 0, 1) is an idempotent semiring (A, ·, +, 0, 1) with an additional (right-associative) unary operator ∗ , called the Kleene star, which satisfies 1 + aa∗ 6 a∗ , ac + b 6 c ⇒ a∗ b 6 c, 1 + a∗ a 6 a∗ , ca + b 6 c ⇒ ba∗ 6 c, for all a, b, c ∈ A. Regular expressions are a form (or close variant) of a Kleene algebra. Version: 2 Owner: Logan Author(s): Logan

659.2

Kleene star

If Σ is an alphabet (a set of symbols), then the Kleene star of Σ, denoted Σ∗ , is the set of all strings of finite length consisting of symbols in Σ, including the empty string λ. If S is a set of strings, then the Kleene star of S, denoted S ∗ , is the smallest superset of S that contains λ and is closed under the string concatenation operation. That is, S ∗ is the set of all strings that can be generated by concatenating zero or more strings in S. The definition of Kleene star can be generalized so that it operates on any monoid (M, ++), where ++ is a binary operation on the set M. If e is the identity element of (M, ++) and S is a subset of M, then S ∗ is the smallest superset of S that contains e and is closed under ++. 2331

Examples Σ = {a, b} S = {ab, cd}

Σ∗ = {λ, a, b, aa, ab, ba, bb, aaa, . . . } S ∗ = {λ, ab, cd, abab, abcd, cdab, cdcd, ababab, . . . }

Version: 2 Owner: Logan Author(s): Logan

659.3

monad

A monad over a category C is a triple (T, η, µ), where T is an endofunctor of C and η and µ are natural transformations. These natural transformations are defined as

η : idC → T µ : T2 → T Note that T 2 is simply shorthand for T ◦ T . Such a triple (T, η, µ) is only a monad if the following laws hold. • associative law of a monad µ ◦ (µ ◦ T ) ⇔ µ ◦ (T ◦ µ) • left and right identity laws of a monad µ ◦ (T ◦ η) ⇔ idC ⇔ µ ◦ (η ◦ T ) These laws are illustrated in the following diagrams. T 3 (C)

T µC

µT (C)

T 2 (C)

T (idC(A))

T (ηC

µC idC

T 2 (C)

µC

T (C)

T 2 (C) µC

ηT (C)

idC(T (C))

idC

T (C) µ ◦ (T ◦ η) ⇔ idC ⇔ µ ◦ (η ◦ T )

µ ◦ (µ ◦ T ) ⇔ µ ◦ (T ◦ µ)

As an application, monads have been successfully applied in the field of functional programming. A pure functional program can have no side effects, but some computations are frequently much simpler with such behavior. Thus a mathematical model of computation such as a monad is needed. In this case, monads serve to represent state transformations, mutable variables, and interactions between a program and its environment. Version: 5 Owner: mathcam Author(s): mathcam, Logan 2332

Chapter 660 68R05 – Combinatorics 660.1

switching lemma

A very useful tool in circuit complexity theory, used to transform a CNF formula into a DNF formula avoiding exponential blow-up: By a probabilistic argument, it can be shown that such a transformation occurs with high probability, if we apply a random restriction on the origianl CNF. This applies also to transforming from a DNF to a CNF. Version: 5 Owner: iddo Author(s): iddo

2333

Chapter 661 68R10 – Graph theory 661.1

Floyd’s algorithm

Floyd’s algorithm is also known as the all pairs shortest path algorithm. It will compute the shortest path between all possible pairs of vertices in a (possibly weighted) graph or digraph simultaneously in O(n3 ) time (where n is the number of vertices in the graph). Algorithm Floyd(V) Input: A weighted graph or digraph with vertices V Output: A matrix cost of shortest paths and a matrix pred of predecessors in the shortest path for (a, b) ∈ V 2 do if adjacent(a, b) then cost(a, b) ← weight(a, b) pred(a, b) ← a else cost(a, b) ← ∞ pred(a, b) ← null for c ∈ V do for (a, b) ∈ V 2 do if cost(a, c) < ∞ and cost(c, b) < ∞ then if cost(a, b) = ∞ or cost(a, c) + cost(c, b) < cost(a, b) then cost(a, b) ← cost(a, c) + cost(c, b) pred(a, b) ← pred(c, b) Version: 3 Owner: vampyr Author(s): vampyr

661.2

digital library structural metadata specification

A digital library structural metadata specification is a digital library structure. 2334

Version: 1 Owner: gaurminirick Author(s): gaurminirick

661.3

digital library structure

A digital library structure is a tuple (G, L, F ), where G = (V, E) is a directed graph with a vertex set V and edge set E, L is a set of label values, and F is labeling function F : S (V E) → L. Version: 1 Owner: gaurminirick Author(s): gaurminirick

661.4

digital library substructure

A digital library substructure of a digital library structure (G, L, F ) is another digital S 0 library 0 0 0 0 0 0 0 0 0 structure (G , L , F ) where G = (V , E ) is a subgraph of G, L ⊆ L and F : (V E ) → L0 . Version: 2 Owner: gaurminirick Author(s): gaurminirick

2335

Chapter 662 68T10 – Pattern recognition, speech recognition 662.1

Hough transform

Hough Transform The Hough transform is a general technique for identifying the locations and orientations of certain types of features in a digital image. Developed by Paul Hough in 1962 and patented by IBM, the transform consists of parameterizing a description of a feature at any given location in the original image’s space. A mesh in the space defined by these parameters is then generated, and at each mesh point a value is accumulated, indicating how well an object generated by the parameters defined at that point fits the given image. Mesh points that accumulate relatively larger values then describe features that may be projected back onto the image, fitting to some degree the features actually present in the image.

Hough Line Transform The simplest form of the Hough transform is the Hough line transform. Suppose we are attempting to describe lines that match edges in a two-dimensional image. If we perform an edge-detection technique on the image, we might obtain an image like the following.

2336

To use the Hough transform, we need a way to characterize a line. One representation of a line is the slope-intercept form y = mx + b,

(662.1.1)

where m is the slope of the line and b is the y-intercept (that is, the y component of the coordinate where the line intersects the y-axis). Given this characterization of a line, we can then iterate through any number of lines that pass through any given point (x, y). By iterating through fixed values of m, we can solve for b by b = y − mx However, this method is not very stable. As lines get more and more vertical, the magnitudes of m and b grow towards infinity. A more useful representation of a line is its normal form. x cos θ + y sin θ = ρ

(662.1.2)

This equation specifies a line passing through (x, y) that is perpendicular to the line drawn from the origin to (ρ, θ) in polar space (i.e., (ρ cos θ, ρ sin θ) in rectangular space). For each point (x, y) on a line, θ and ρ are constant. Now, for any given point (x, y), we can obtain lines passing through that point by solving for ρ and θ. By iterating through possible angles for θ, we can solve for ρ by (661.1.2) directly. This method proves to be more effective than (661.1.1), as it is numerically stable for matching lines of any angle.

The Accumulator To generate the Hough transform for matching lines in our image, we choose a particular granularity for our lines (e.g., 10◦ ), and then iterate through the angles defined by that granularity. For each angle θ, we then solve for ρ = x cos θ + y sin θ, and then increment the value located at (ρ, θ). This process can be thought of as a “vote” by the point (x, y) for the line defined by (ρ, θ), and variations on how votes are generated exist for making the transform more robust. The result is a new image defined on a polar mesh, such as the one below (generated from the previous image).

2337

Note that in this representation we have projected the polar space onto a rectangular space. The origin is in the upper-left corner, the θ axis extends to the right, and the ρ axis extends downward. The image has been normalized, so that each pixel’s intensity represents the ratio of the original value at that location to the brightest original value. The intensity of a pixel corresponds to the number of votes it received relative to the pixel that received the most votes. Iterating through each value of θ for a particular (x, y) in the original image generates a curve in the rectangular representation of the Hough transform. Lines that intersect in the same location are generated by colinear points. Thus, if we locate the brightest points in the image, we will obtain parameters describing lines that pass through many points in our original image. Many methods exist for doing this; we may simply threshold the Hough transform image, or we can locate the local maxima. We then get a set of infinite lines, as below. This notion corresponds neatly with the notion of a voting process; the brighter pixels represented coordinates in ρθ space that got the most votes, and thus are more likely to generate lines that fit many points. For each (ρ, θ) that we deem bright enough in the Hough transform, we can then generate a line from the parameterization given earlier. Here is the projection of some lines obtained from the previous image’s Hough transform, drawn on top of the original image.

We can also use this information to break these lines up into line segments that fit our original binary image well. Below is an example of the line segments we might obtain, drawn on top of the original image.

Hough Circle Transform The Hough transform can be used for representing objects besides lines. For instance, a circle can be parameterized as (x − a)2 + (y − b)2 = r 2 Here, (a, b) is the coordinate of the center of the circle that passes through (x, y), and r is its radius. Since there are three parameters for this equation, it follows that the Hough transform will be a three-dimensional image. Therefore circles require more computation to find than lines. For this reason the Hough transform is more typically used for simpler curves, such as straight lines and parabolas. 2338

General Hough Transform The general Hough transform is used when an analytical description of the feature we are searching for is not possible. Instead of a parametric equation describing the feature, we use some sort of lookup table, correlating locations and orientations of potential features in the original image to some set of parameters in the Hough transform space.

REFERENCES 1. Gonzalez, R.C. and Woods, R.E., Digital Image Processing, Prentice Hall, 1993.

Version: 4 Owner: Logan Author(s): Logan

2339

Chapter 663 68U10 – Image processing 663.1

aliasing

Aliasing Used in the context of processing digitized signals (e.g. audio) and images (e.g. video), aliasing describes the effect of undersampling during digitization which can generate a false (apparent) low frequency for signals, or staircase steps along edges in images (jaggies.) Aliasing can be avoided by an antialiasing (analogue) low-pass filter, before sampling. The term antialiasing is also in use for a posteriori signal smoothing intended to remove the effect. References • Based on content from The Data Analysis Briefbook (http://rkb.home.cern.ch/rkb/titleA.html) Version: 4 Owner: akrowne Author(s): akrowne

2340

Chapter 664 68W01 – General 664.1

Horner’s rule

Horner’s rule is a technique to reduce the work required for the computation of a polynomial at a particular value. Its simplest form makes use of the repeated factorizations

y = a0 + a1 x + a2 x2 + · · · + an xn = a0 + x(a1 + x(a2 + x(a3 + · · · + xan )) · · · ) of the terms of the nth degree polynomial in x in order to reduce the computation of the polynomial y(a) (at some value x = a) to n multiplications and n additions. The rule can be generalized to a finite series y = a0 p0 + a1 p1 + · · · + an pn of orthogonal polynomials pk = pk (x). Using the recurrence relation pk = (Ak + Bk x)pk−1 + Ck pk−2 for orthogonal polynomials, one obtains y(a) = (a0 + C2 b2 )p0 (a) + b1 p1 (a) with

2341

bn+1 = bn+2 = 0, bk−1 = (Ak + Bk · a)bk + Ck+1 + bk+1 + ak−1 for the evaluation of y at some particular a. This is a simpler calculation than the straightforward approach, since a0 and C2 are known, p0 (a) and p1 (a) are easy to compute (possibly themselves by Horner’s rule), and b1 and b2 are given by a backwards-recurrence which is linear in n. References • Originally from The Data Analysis Briefbook (http://rkb.home.cern.ch/rkb/titleA.html) Version: 7 Owner: akrowne Author(s): akrowne

2342

Chapter 665 68W30 – Symbolic computation and algebraic computation 665.1

algebraic computation

Algebraic Computation Also called Formula Manipulation or Symbolic Computation. Existing programs or systems in this area allow us or one to transform mathematical expressions in symbolic form, hence in an exact way, as opposed to numerical and hence limited-precision floating point computation. Primarily designed for applications in theoretical physics or mathematics, these systems, which are usually interactive, can be used in any area where straightforward but tedious or lengthy calculations with formulae are required. Typical operations include differentiation and integration, linear algebra and matrix calculus, polynomials, or the simplification of algebraic expressions. Well known systems for algebraic computation are, amongst others, Macsyma, Maple , Mathematica, or Reduce. These systems have different scope and facilities, and some are easier to use or to access than others. Mathematica is a commercial package; Maple is available through another commercial package, Matlab (Symbolic Math Toolbox).

References • Based on content from the Data Analysis Briefbook (http://rkb.home.cern.ch/rkb/titleA.html) Version: 8 Owner: akrowne Author(s): akrowne

2343

Chapter 666 68W40 – Analysis of algorithms 666.1

speedup

Speedup is a way to quantify the advantage of using a parallel algorithm over a sequential algorithm. The speedup S is defined as R P Where R is the running time of the best available sequential algorithm and P is the running time of the parallel algorithm. S=

Ideally, on a system with N processors, the speedup for any algorithm would be N. Amdahl’s Law deals with the speedup in more realistic situations. Version: 2 Owner: akrowne Author(s): akrowne

2344

Chapter 667 74A05 – Kinematics of deformation 667.1

body

One distinct physical property of every body is to occupy a region of Euclidean space E, and even if no one of these regions can be intrinsically associated with the body, we find useful to choose one of these regions, we call B, as reference, so that we can identify points of the body with their positions in B, that is, formally speaking, a regular region in E. We call B reference configuration, the points p∈ B are called material points,and bounded regular sub regions of B are called parts. Version: 5 Owner: ottocolori Author(s): ottocolori

667.2

deformation

Let E an Euclidean space and B a body in E, a map f : B ⊆ E → f(B) ⊆ E is a deformation iff f is a diffeomorphism. The vector u(p) = f(p) − p,defined on B, is called the displacement of p and the tensor F(p) = ∇f(p) is called the deformation gradient and belongs to Lin+ , that is the set of all tensor S with detS > 0. If F is constant,then the deformation is homogeneous.

REFERENCES 1. M. Gurtin, Introduction to continuum mechanichs, Academic Press, 1992.

Version: 14 Owner: ottocolori Author(s): ottocolori 2345

Chapter 668 76D05 – Navier-Stokes equations 668.1

Navier-Stokes equations

Let (β, C) a newtonian fluid with  β, a body C = {(x, T), with xisochoricT = −πI + C(L), ρ = ρ0 constant, the constitutive class then the differential equations ˙ t) = µ∆v(x, t) − ∇π(x, t) + b(x, t) ρ0 v(x, div(v) = 0 with ρ, density, v(x, t), velocity field, µ,viscosity, b(x, t), body force, are the Navier-Stokes equations for newtonian fluid. Version: 4 Owner: ottocolori Author(s): ottocolori

2346

Chapter 669 81S40 – Path integrals 669.1

Feynman path integral

A generalisation of multi-dimensional integral, written intDφ exp (F[φ]) where φ ranges over some restricted set of functions from a measure space X to some space with reasonably nice algebraic structure. The simplest example is the case where φ ∈ L2 [X, R] and F [φ] = −πintX φ2 (x)dµ(x) in which case it can be argued thatPthe result is 1. The argument is by analogy to the 2 Gaussian integral intRn dx1 · · · dxn e−π xj ⇔ 1. Alas, one can absorb the π into the measure on X and this kind of ruins the niceness of the argument. Alternatively, following Pierre Cartier and others, one can use this analogy to define a measure on L2 and proceed axiomatically. One can bravely trudge onward and hope to come up with something, say `a la Riemann integral, by partitioning X, picking some representative of each partition, approximating the functional F based on these and calculating a multi-dimensional integral as usual over the sample values of φ. This leads to some integral int · · · dφ(x1 ) · · · dφ(xn )ef (φ(x1 ),...,φ(xn )) . One hopes that taking successively finer partitions of X will give a sequence of integrals which converge on some nice limit. I believe Pierre Cartier has shown that this doesn’t usually happen, except for the trivial kind of example given above. 2347

The Feynman path integral was constructed as part of a re-formulation of quantum field theory, based on the sum-over-histories postulate of quantum mechanics, and can be thought of as an adaptation of Green’s function methods for solving initial/ value problems. Bibliography to come soon. (I hope). Version: 9 Owner: quincynoodles Author(s): quincynoodles

2348

Chapter 670 90C05 – Linear programming 670.1

linear programming

A linear programming problem, or LP, is the problem of optimizing a given linear objective function over some polyhedron. The standard maximization LP, sometimes called the primal problem, is maximize cT x s.t. Ax ≤ b x≥0

(P)

Here cT x is the objective function and the remaining conditions define the polyhedron which is the feasible region over which the objective function is to be optimized. The dual of (P) is the LP minimize y T b s.t. y T A ≥ cT y≥0

(D)

The weak duality theorem states that if xˆ is feasible for (P) and yˆ is feasible for (D), then cT xˆ ≤ yˆT b. This follows readily from the above: cT xˆ ≤ (ˆ y T A)ˆ x = yˆT (Aˆ x) ≤ y T b. The strong duality theorem states that if both LPs are feasible, then the two objective functions have the same optimal value. As a consequence, if either LP has unbounded objective function value, the other must be infeasible. It is also possible for both LP to be infeasible. The simplex method of G. B. Dantzig is the algorithm most commonly used to solve LPs; in practice it runs in polynomial time, but the worst-case running time is exponential. Two 2349

polynomial-time algorithms for solving LPs are the ellipsoid method of L. G. Khachian and the interior-point method of N. Karmarkar. Bibliography • Chv´atal, V., Linear programming, W. H. Freeman and Company, 1983. • Cormen, T. H., Leiserson, C. E., Rivest, R. L., and C. Stein, Introduction to algorithms, MIT Press, 2001. • Korte, B. and J. Vygen, Combinatorial optimatization: theory and algorithms, Springer-Verlag, 2002. Version: 4 Owner: mps Author(s): mps

670.2

simplex algorithm

The simplex method is an algorithm due to G. B. Dantzig for solving linear programming problems. For simplicity, we will first consider the standard maximization LP of the form maximize cT x s.t. Ax ≤ b x≥0

(P)

where A ∈ Mm×n (R), c ∈ Rn , b ∈ Rm , and b ≥ 0. The objective function cT x is linear, so if the LP is bounded, an optimal value will occur at some vertex of the feasible polyhedron. The simplex method works by selecting vertices of the feasible polyhedron with successively greater objective function values until an optimal solution is reached, if one exists. If the LP is unbounded, then the simplex method will provide a certificate by which one can easily check that the LP is indeed unbounded. Note: There is still more to say, so the correction has not yet been accepted. Version: 5 Owner: ruiyang Author(s): mps, Johan

2350

Chapter 671 91A05 – 2-person games 671.1

examples of normal form games

A few example normal form games: Prisoner’s dilemma Probably the most famous game theory game where S1 = S2 = C, D and:  5    10 u1 (s1 , s2 ) = −5    0  5    10 u2 (s1 , s2 ) = −5    0

example, the prisoner’s dilemma is a two player if if if if

s1 = C s1 = D s1 = C s1 = D

and and and and

s2 = C s2 = C s2 = D s2 = D

if if if if

s1 = C s1 = C s1 = D s1 = D

and and and and

s2 = C s2 = D s2 = C s2 = D

The (much more convenient) normal form is: C D

C D 5,5 -5,10 10,-5 0,0

Notice that (C, C) Pareto dominates (D, D), however (D, D) is the only Nash equilibrium. Battle of the Sexes Another traditional two player game. The normal form is: O F 2351

O F 2,1 0,0 0,0 1,2

A Deviant Example One more, rather pointless, example which illustrates a game where one player has no choice: A

X Y 2,100 1,7

Z 14-5

Undercut A game which illustrates an infinite (indeed, uncountable) strategy space. There are two players and S1 = S2 = R+ .

u1 (s1 , s2 ) = u2 (s1 , s2 ) =





1 0

if if

s 1 < s2 s1 > s2

1 0

if if

s 2 < s1 s2 > s1

Version: 4 Owner: Henry Author(s): Henry

671.2

normal form game

A normal form game is a game of complete information in which there is a listQof n players, numbered 1, . . . , n. Each player has a strategy set, Si and a utility function ui : i6n Si → R. In such a game each player simultaneously selects a move si ∈ Si and receives ui((s1 , . . . , sn )).

Normal form games with two players and finite strategy sets can be represented in normal form, a matrix where the rows each stand for an element of S1 and the columns for an element of S2 . Each cell of the matrix contains an ordered pair which states the payoffs for each player. That is, the cell i, j contains (u1 (si , sj ), u2 (si , sj )) where si is the i-th element of S1 and sj is the j-th element of S2 . Version: 2 Owner: Henry Author(s): Henry

2352

Chapter 672 91A10 – Noncooperative games 672.1

dominant strategy

For any player i, a strategy s∗ ∈ Si weakly dominates another strategy s0 ∈ Si if: ∀s−i ∈ S−i [ui (s∗ , s−i ) > ui (s0 , s−i )] (Remember that S−i represents the product of all strategy sets other than i’s) s∗ strongly dominates s0 if: ∀s−i ∈ S−i [ui(s∗ , s−i) > ui(s0 , s−i )] Version: 2 Owner: Henry Author(s): Henry

2353

Chapter 673 91A18 – Games in extensive form 673.1

extensive form game

A game in extensive form is one that can be represented as a tree, where each node corresponds to a choice by one of the players. Unlike a normal form game, in an extensive form game players make choices sequentially. However players do not necessarily always know which node they are at (that is, what moves have already been made). Formally, an extensive form game is a set of nodes together with a function for each nonterminal node. The function specifies which player moves at that node, what actions are available, and which node comes next for each action. For each terminal node, there is instead a function defining utilities for each player when that node is the one the game results in. Finally the nodes are partitioned into information sets, where any two nodes in the same information set must have the same actions and the same moving player. A pure strategy for each player is a function which, for each information set, selects one of the available of actions. That is, if player i’s information Q Q sets are h1 , h2 , . . . , hm with corresponding sets actions a1 , a2 , . . . , am then Si = x hx → x ax . Version: 1 Owner: Henry Author(s): Henry

2354

Chapter 674 91A99 – Miscellaneous 674.1

Nash equilibrium

A Nash equilibrium of a game is a set of (possibly mixed) strategies σ = (σ1 , . . . , σn ) such that, if each player i believes that that every other player j will play σj , then i should play σi . That is, when ui is the utility function for the i-th player: σi 6= σi0 → ui (σi , σ−1 ) > ui (σi0 , σ−1 ) ∀i 6 n and ∀σi0 ∈ Σi

Translated, this says that if any player plays any strategy other than the one in the Nash equilibrium then that player would do worse than playing the Nash equilibrium. Version: 5 Owner: Henry Author(s): Henry

674.2

Pareto dominant

An outcome s∗ strongly Pareto dominates s0 if: ∀i 6 n [ui(s∗ ) > ui (s0 )] An outcome s∗ weakly Pareto dominates s0 if: ∀i 6 n [ui (s∗ ) > ui(s0 )] s∗ is strongly Pareto optimal if it strongly Pareto dominates all other outcomes, and weakly Pareto optimal if it weakly Pareto dominates all other outcomes. Version: 1 Owner: Henry Author(s): Henry 2355

674.3

common knowledge

In a game, a fact (such as the rules of the game) is common knowledge for the player if:

(a) All players know the fact (b) All the players know that all the players know the fact (c) All the players know that all the players know that all the players know the fact (d) · · · This is a much stronger condition than merely having all the players know the fact. By way of illustration, consider the following example: There are three participants and an experimenter. The experimenter informs them that a hat, either blue or red, will be placed on their head so that the other participants can see it but the wearer cannot. The experimenter then puts red hats on each person and asks whether any of them know what color hat they have. Of course, none of them do. The experimenter then whispers to each of them that at least two people have red hats, and then asks out loud whether any of them know what color hat they have. Again, none of them do. Finally the experimenter announces out loud that at least two people have red hats, and asks whether any of them know what color hat they have. After a few seconds, all three realize that they must have red hats, since if they had a blue hat then both of the other people could have figured out that their own hat was red. The significant thing about this example is that the fact that at least two of the participants have red hats was known to every participant from the beginning, but only once they knew that the other people also knew that could they figure out their own hat’s color. (This is only the second requirement in the list above, but more complicated examples can be constructed for any level in the infinite list). Version: 1 Owner: Henry Author(s): Henry

674.4

complete information

A game has complete information if each player knows the complete structure of the game (that is, the strategy sets and the payoff functions for each player). A game without complete information has incomplete information. 2356

Version: 2 Owner: Henry Author(s): Henry

674.5

example of Nash equilibrium

Consider the first two games given as examples of normal form games. In Prisoner’s Dilemma the only Nash equilibrium is for both players to play D: it’s apparent that, no matter what player 1 plays, player 2 does better playing D, and vice-versa for 1. Battle of the sexes has three Nash equilibria. Both (O, O) and (F, F ) are Nash equilibria, since it should be clear that if player 2 expects player 1 to play O, player 2 does best by playing O, and vice-versa, while the same situation holds if player 2 expects player 1 to play F . The third is a mixed equilibrium; player 1 plays O with 32 probability and player 2 plays O with 31 probability. We confirm that these are equilibria by testing the first derivatives (if 0 then the strategy is either maximal or minimal). Technically we also need to check the second derivative to make sure that it is a maximum, but with simple games this is not really neccessary. Let player 1 play O with probability p and player 2 plays O with probability q. u1(p, q) = 2pq + (1 − p)(1 − q) = 2pq − p − q + pq = 3pq − p − q u2 (p, q) = pq + 2(1 − p)(1 − q) = 3pq − 2p − 2q ∂u1 (p, q) = 3q − 1 ∂p ∂u2 (p, q) = 3p − 2 ∂q And indeed the derivatives are 0 at p =

2 3

and q = 31 .

Version: 1 Owner: Henry Author(s): Henry

674.6

game

In general, a game is a way of describing a situation in which players make choices with the intent of optimizing their utility. Formally, a game includes three features: • a set of pure strategies for each player (their strategy space) 2357

• a way of determining an outcome from the strategies selected by the players • a utility function for each player specifying their payoff for each outcome Version: 9 Owner: Henry Author(s): Henry

674.7

game theory

Game theory is the study of games in a formalized setting. Games are broken down into players and rules which define what the players can do and how much the players want each outcome. Typically, game theory assumes that players are rational, a requirement which that players always make the decision which most benefits them based on the information available (as defined by that game), but also that players are always capable of making that decision (regardless of the amount of calculation which might be necessary in practice). Branches of game theory include cooperative game theory, in which players can negotiate and enforce bargains and non-cooperative game theory, in which the only meaningful agreements are those which are ”self-enforcing,” that is, which the players have an incentive not to break. Many fields of mathematics (set theory, recursion theory, topology, and combinatorics, among others) apply game theory by representing problems as games and then use game theoretic techniques to find a solution. (To see how an application might work, consider that a proof can be viewed as a game between a ”prover” and a ”refuter,” where every universal quantifier represents a move by the refuter, and every existenial one a move by the prover; the proof is valid exactly when the prover can always win the corresponding game.) Version: 2 Owner: Henry Author(s): Henry

674.8

strategy

A pure strategy provides a complete definition for a way a player can play a game. In particular, it defines, for every possible choice a player might have to make, which option the player picks. A player’s strategy space is the set of pure strategies available to that player. A mixed strategy is an assignment of a probability to each pure strategy. It defines a probability over the strategies, and reflect that, rather than choosing a particular pure strategy, the player will randomly select a pure strategy based on the distribution given by their mixed strategy. Of course, every pure strategy is a mixed strategy (the function which takes that strategy to 1 and every other one to 0).

2358

The following notation is often used: • Si for the strategy space of the i-th player • si for a particular element of Si ; that is, a particular pure strategy P • σi for a mixed strategy. Note that σi ∈ Si → [0, 1] and si ∈Si σi (si ) = 1.

• Σi for the set of all possible mixed strategies for the i-th player Q • S for i Si , the set of all possible combinations of pure strategies (essentially the possible outcomes of the game) Q • Σ for i Σi • σ for a strategy profile, a single element of Σ Q Q • S−i for j6=i Sj and Σ−i for j6=i Σj , the sets of possible pure and mixed strategies for all players other than i. • s−i for an element of S−i and σ−i for an element of Σ−i .

Version: 4 Owner: Henry Author(s): Henry

674.9

utility

Utility is taken to be an absolute, accurate measurement of how desirable something is; in particular, it differs from money in three key ways. First, desire for it is linear (generally in economics and game theory a person with a lot of money receives less utility from an additional fixed amount of utility than someone with very little money). Second, when modeling a real situation, utility should include all external factors (the happiness received from doing a good deed, for instance). Third, different people’s utility is incomparable. It is meaningless to ask whether one person gets more utility from a situation than another person does. Utilities for a given person can be compared only to other utilities for that person. A utility function is a function which specifies how much utility a player gets for a particular outcome. It maps the space S of all possible strategy profiles to R. Version: 4 Owner: Henry Author(s): Henry

2359

Chapter 675 92B05 – General biology and biomathematics 675.1

Lotka-Volterra system

The Lotka-Volterra system was derived by Volterra in 1926 to describe the relationship between a predator and a prey, and independently by Lotka in 1920 to describe a chemical reaction. Suppose that N(t) is the prey population at time t, and P (t) is the predator population. Then the system is dN dt dP dt

= N(a − bP ) = P (cN − d)

where a, b, c and d are positive constants. The term aN is the birth of preys, −bNP represents the diminution of preys due to predation, which is converted into new predators with a rate cNP . Finally, predators die at the natural death rate d. Local analysis of this system is not very complicated (see, e.g., [1]). It is easily shown that it admits the zero equilibrium (unstable) as well as a positive equilibrium, which is neutrally stable. Hence, in the neighborhood of this equilibrium exist periodic solutions (with period T = 2π(ad)−1/2 ). This system is very simple, and has obvious limitations, one of the most important being that in the absence of predator, the prey population grows unbounded. But many improvements and generalizations have been proposed, making the Lotka-Volterra system one of the most studied systems in mathematical biology.

2360

REFERENCES 1. J.D. Murray (2002). Mathematical Biology. I. An Introduction. Springer.

Version: 4 Owner: jarino Author(s): jarino

2361

Chapter 676 93A10 – General systems 676.1

transfer function

The transfer function of a linear dynamical system is the ratio of the Laplace transform of its output to the Laplace transform of its input. In systems theory, the Laplace transform is called the “frequency domain” representation of the system. Consider a canonical dynamical system x(t) ˙ = Ax(t) + Bu(t) y(t) = Cx(t) + Du(t) with input u : R 7→ Rn , output y : R 7→ Rm and state x : R 7→ Rp , and (A,B,C,D) are constant matrices of conformable dimensions. The frequency domain representation is y(s) = (D + C(sI − A)−1 B)u(s), and thus the transfer function matrix is D + C(sI − A)−1 B. Version: 4 Owner: lha Author(s): lha

2362

Chapter 677 93B99 – Miscellaneous 677.1

passivity

UNDER CONTSTRUCTION. . . Important concept in the definition of pssivity positive realness (PR) and strict positive realness (SPR). A rational function f (s) = p(s)/q(s) is said to be (wide sense) strictly positive real (SPR) if 1. the degree of p(s) is equal to the degree of q(s), 2. f (s) is analytic in Re[s] ≥ 0, 3. Re[f (jω)] > 0 for all ω ∈ R, where j =



−1.

The function is said to be strictly positive real in the strict sense (SSPR) if 1. the degree of p(s) is equal to the degree of q(s), 2. f (s) is analytic in Re[s] ≥ 0, 3. there exists a δ > 0 such that Re[f (jω)] > δ for all ω ∈ R. A square transfer function matrix, X(s), is (wide sense) SPR if 1. X(s) is analytic in Re[s] ≥ 0, 2363

2. herm{X(jω)} > 0 for all ω ∈ (−∞, ∞) where herm{X} = X + X ∗ , X ∗ denotes Hermitian transpose, and X > 0 means X is positive definite (not componentwise positive). (Note that X(jω) + X ∗ (jω) = X(jω) + X T (−jω), where X T denotes transpose.) A square transfer function matrix, X(s), is strict-sense SPR if 1. X(s) is analytic in Re[s] ≥ 0, 2. herm{X(jω)} > 0 for all ω ∈ (−∞, ∞) 3. herm{X(∞)} ≥ 0 4. limω→∞ ω 2herm{X(jω)} > 0 if herm{X(∞)} is singular. The strict sense definitions correspond to X(s − ) being positive real for some real  > 0. See [1] “Time domain and frequency domain conditions for strict positive realness”, J. T. Wen, IEEE Trans. Automat. Contr., 33:988-992, 1988. [2] “Strictly positive real transfer functions revisited,” Lozano-Leal, R.; Joshi, S.M., IEEE Transactions on Automatic Control, 35(11):1243 -1245, 1990. [3] “Nonlinear Systems”, Hassan K. Khalil, Prentice Hall, 2002. Version: 4 Owner: nobody Author(s): lha

2364

Chapter 678 93D99 – Miscellaneous 678.1

Hurwitz matrix

A square matrix A is called a Hurwitz matrix if all eigenvalues of A have strictly negative real part, Re[λi ] < 0; A is also called a stability matrix, because the feedback system x˙ = Ax is stable. If G(s) is a (matrix-valued) transfer function, then G is called Hurwitz if the poles of all elements of G have negative real part. Note that it is not necessary that G(s), for a specific argument s, be a Hurwitz matrix — it need not even be square. The connection is that if A is a Hurwitz matrix, then the dynamical system x(t) ˙ = Ax(t) + Bu(t) y(t) = Cx(t) + Du(t) has a Hurwitz transfer function. Reference: Hassan K. Khalil, Nonlinear Systems, Prentice Hall, 2002 Version: 1 Owner: lha Author(s): lha

2365

Chapter 679 94A12 – Signal theory (characterization, reconstruction, etc.) 679.1

rms error

Short for “root mean square error”: the estimator of the standard deviation. Version: 1 Owner: akrowne Author(s): akrowne

2366

Chapter 680 94A17 – Measures of information, entropy 680.1

conditional entropy

Definition (Discrete) Let (Ω, F, µ) be a discrete probability space, and let X and Y be discrete random variables on Ω. The conditional entropy H[X|Y ], read as “the conditional entropy of X given Y ,” is defined as XX H[X|Y ] = − µ(X = x, Y = y) log µ(X = x|Y = y) (680.1.1) x∈X y∈Y

where µ(X|Y ) denotes the conditional probability.

Definition (Continuous) Let (Ω, B, µ) be a continuous probability space, and let X and Y be continuous random variables on Ω. The continuous conditional entropy h[X|Y ] is defined as h[X|Y ] = −intx∈Ω inty∈Ω µ(X = x, Y = y) log µ(X = x|Y = y)

(680.1.2)

where µ(X|Y ) denotes the conditional probability.

Discussion The results for discrete conditional entropy will be assumed to hold for the continuous case unless we indicate otherwise.

2367

With H[X, Y ] the joint entropy and f a function, we have the following results: H[X|Y ] + H[Y ] = H[X, Y ] H[X|Y ] ≤ H[X] H[X|Y ] = H[Y |X] H[X|Y ] ≤ H[X] + H[Y ] H[X|Y ] ≤ H[X|f (Y )] H[X|Y ] = 0 ⇔ X = f (Y )

(conditioning reduces entropy) (symmetry) (equality iff X, Y independent) (special case H[X|X] = 0)

(680.1.3) (680.1.4) (680.1.5) (680.1.6) (680.1.7) (680.1.8) (680.1.9)

The conditional entropy H[X|Y ] may be interpreted as the uncertainty in X given knowledge of Y . (Try reading the above equalities and inequalities with this interpretation in mind.) Version: 3 Owner: drummond Author(s): drummond

680.2

gaussian maximizes entropy for given covariance

Let f : Rn → R be a function with covariance matrix K, Kij = cov(xi , xj ). Let φ be the distribution of the multidimensional Gaussian N(0, K) with mean 0 and covariance matrix K. The gaussian maximizes the differential entropy for a given covariance matrix K. That is, h(φ) ≥ h(f ). Version: 1 Owner: drummond Author(s): drummond

680.3

mutual information

Let (Ω, F, µ) be a discrete probability space, and let X and Y be discrete random variables on Ω. The mutual information I[X; Y ], read as “the mutual information of X and Y ,” is defined as I[X; Y ] =

XX

µ(X = x, Y = y) log

x∈Ω y∈Ω

µ(X = x, Y = y) µ(X = x)µ(Y = y)

= D(µ(x, y)||µ(x)µ(y)). where D denotes the relative entropy. Mutual information, or just information, is measured in bits if the logarithm is to the base 2, and in “nats” when using the natural logarithm. 2368

Discussion The most obvious characteristic of mutual information is that it depends on both X and Y . There is no information in a vacuum—information is always about something. In this case, I[X; Y ] is the information in X about Y . As its name suggests, mutual information is symmetric, I[X; Y ] = I[Y ; X], so any information X carries about Y , Y also carries about X. The definition in terms of relative entropy gives a useful interpretation of I[X; Y ] as a kind of “distance” between the joint distribution µ(x, y) and the product distribution µ(x)µ(y). Recall, however, that relative entropy is not a true distance, so this is just a conceptual tool. However, it does capture another intuitive notion of information. Remember that for X, Y independent, µ(x, y) = µ(x)µ(y). Thus the relative entropy “distance” goes to zero, and we have I[X; Y ] = 0 as one would expect for independent random variables. A number of useful expressions, most apparent from the definition, relate mutual information to the entropy H:

0 ≤ I[X; Y ] ≤ H[X] I[X; Y ] = H[X] − H[X|Y ] I[X; Y ] = H[X] + H[Y ] − H[X, Y ] I[X; X] = H[X]

(680.3.1) (680.3.2) (680.3.3) (680.3.4) (680.3.5)

Recall that the entropy H[X] quantifies our uncertainty about X. The last line justifies the description of entropy as “self-information.” Historical Notes Mutual information, or simply information, was introduced by Shannon in his landmark 1948 paper “A Mathematical Theory of Communication.” Version: 1 Owner: drummond Author(s): drummond

680.4

proof of gaussian maximizes entropy for given covariance

Let f : Rn → R be a function with mean 0 and covariance matrix K, Kij = cov(xi , xj ). Let g be the distribution of the multidimensional Gaussian N(0, K) with mean 0 and covariance matrix K. The gaussian maximizes the differential entropy for a given covariance matrix K. That is, h(g) ≥ h(f ) where h is the differential entropy. The proof uses the nonnegativity of relative entropy D(f ||g), and an interesting (if simple) property of quadratic forms. If A is a quadratic form and p, q are probability distributions 2369

each with mean 0 and covariance matrix K, we have intp xi xj dxi dxj = Kij = intq xi xj dxi dxj

(680.4.1)

intAp = intAq

(680.4.2)

and thus

Now note that since

1 1 g(x) = ((2π)n |K|)− 2 exp (− xT K−1 x), 2 we see that log g is a quadratic form plus a constant.

(680.4.3)

0 ≤ D(f ||g)

(680.4.4)

= intf log

(680.4.5)

f g = intf log f − intf log g = −h(f ) − intf log g = −h(f ) − intg log g by the quadratic form property above = −h(f ) + h(g)

and thus h(g) ≥ h(f ). Version: 2 Owner: drummond Author(s): drummond

2370

(680.4.6) (680.4.7) (680.4.8) (680.4.9)

Chapter 681 94A20 – Sampling theory 681.1

sampling theorem

Sampling Theorem The greyvalues of digitized one- or two-dimensional signals are typically generated by an analogue-to-digital converter (ADC), by sampling a continuous signal at fixed intervals (e.g. in time), and quantizing (digitizing) the samples. The sampling (or point sampling) theorem states that a band-limited analogue signal xa (t), i.e. a signal in a finite frequency band (e.g. between 0 and BHz), can be completely reconstructed from its samples x(n) = x(nT ), if the sampling frequency is greater than 2B (the Nyquist rate); expressed in the time domain, 1 this means that the sampling interval T is at most 2B seconds. Undersampling can produce serious errors (aliasing) by introducing artifacts of low frequencies, both in one-dimensional signals and in digital images.

References • Originally from the Data Analysis Briefbook (http://rkb.home.cern.ch/rkb/titleA.html) Version: 4 Owner: akrowne Author(s): akrowne

2371

Chapter 682 94A60 – Cryptography 682.1

Diffie-Hellman key exchange

The Diffie-Hellman key exchange is a cryptographic protocol for symmetric key exchange. There are various implementations of this protocol. The following interchange between Alice and Bob demonstrates the elliptic curve Diffie-Hellman key exchange. • 1) Alice and Bob publicly agree on an elliptic curve E over a large finite field F and a point P on that curve. • 2) Alice and Bob each privately choose large random integers, denoted a and b. • 3) Using elliptic curve point-addition, Alice computes aP on E and sends it to Bob. Bob computes bP on E and sends it to Alice. • 4) Both Alice and Bob can now compute the point abP , Alice by multipliying the received value of bP by her secret number a, and Bob vice-versa. • 5) Alice and Bob agree that the x coordinate of this point will be their shared secret value. An evil interloper Eve observing the communications will be able to intercept only the objects E, P , aP , and bP . She can succeed in determining the final secret value by gaining knowledge of either of the values a or b. Thus, the security of the exchange depends on the hardness of that problem, known as the elliptic curve discrete logarithm problem. For large a and b, it is a computationally ”difficult” problem. As a side note, some care has to be taken to choose an appropriate curve E. Singular curves and ones with ”bad” numbers of points on it (over the given field) have simplified solutions to the discrete log problem. Version: 1 Owner: mathcam Author(s): mathcam 2372

682.2

elliptic curve discrete logarithm problem

The elliptic curve discrete logarithm problem is the cornerstone of much of present-day elliptic curve cryptography. It relies on the natural group law on a non-singular elliptic curve which allows one to add points on the curve together. Given an elliptic curve E over a finite field F , a point on that curve, P , and another point you know to be an integer multiple of that point, Q, the ”problem” is to find the integer n such that nP = Q. The problem is computationally difficult unless the curve has a ”bad” number of points over the given field, where the term ”bad” encompasses various collections of numbers of points which make the elliptic curve discrete logarithm problem breakable. For example, if the number of points on E over F is the same as the number of elements of F , then the curve is vulnerable to attack. It is because of these issues that point-counting on elliptic curves is such a hot topic in elliptic curve cryptography. For an introduction to point-counting, read up on Schoof’s algorithm. Version: 1 Owner: mathcam Author(s): mathcam

2373

Chapter 683 94A99 – Miscellaneous 683.1

Heaps’ law

Heaps’ law describes the portion of a vocabulary which is represented by an instance document (or set of instance documents) consisting of words chosen from the vocabulary. This can be formulated as VR (n) = Knβ Where VR is the portion of the vocabulary V (VR ⊆ V ) represented by the instance text of size n. K and β are free parameters determined empirically. With English text corpuses, typically K is between 10 and 100, and .4 ≤ β ≤ .6.

Heaps’ law means that as more instance text is gathered, there will be diminishing returns in terms of discovery of the full vocabulary from which the distinct terms are drawn. It is interesting to note that Heaps’ law applies in the general case where the “vocabulary” is just some set of distinct types which are attributes of some collection of objects. For example, the objects could be people, and the types could be country of origin of the person. If persons (generated by GNU Octave and gnuplot)

Figure 683.1: A typical Heaps-law plot. The y-axis represents the text size, and the x-axis represents the number of distinct vocabulary elements present in the text. Compare the values of the two axes. 2374

are selected randomly (that is, we are not selecting based on country of origin), then Heaps’ law says we will quickly have representatives from most countries (in proportion to their population) but it will become increasingly difficult to cover the entire set of countries by continuing this method of sampling.

683.1.1

References

• Baeza-Yates and Ribeiro-Neto, Modern Information Retrieval, ACM Press, 1999. Version: 3 Owner: akrowne Author(s): akrowne

2375

History Free Encyclopedia of Mathematics Version 0.0.1 was edited by Joe Corneli and Aaron Krowne. The Free Encyclopedia of Mathematics is based on the PlanetMath online Encyclopedia. – jac, Tuesday, Jan 27, 2004.

2376

Related Documents

Encylopedia Of Love-magick
December 2019 27
Free Play 2
June 2020 1
Free Rider 2 Cheat
October 2019 15
Free Rider 2
November 2019 8
Ch 2 Free Consent
December 2019 27