Diophantine Equations

  • Uploaded by: Nguyen Ha Duc Thinh
  • 0
  • 0
  • May 2020
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View Diophantine Equations as PDF for free.

More details

  • Words: 8,592
  • Pages: 16
Perspectives of Mathematics Module on “Diophantine Equations and Geometry” Fall 2004 Michael Stoll 1. Introduction and Examples What is a diophantine equation? Being diophantine is not so much a property of the equation (or, more generally, system of equations), but rather related to the kind of solutions we want to know about. Definition. A diophantine equation (or system of equations) is an algebraic equation (or system of equations) that we want to solve in integers or rational numbers. An algebraic equation is one that only uses addition, multiplication and constants (which will be integers for us); in other words, the equations take the form F (X1 , . . . , Xn ) = 0 where F ∈ Z[X1 , . . . , Xn ] is a polynomial with integral coefficients in a certain number of variables. The reason why these equations are named after Diophantus of Alexandria (of whom only very little is known; even his lifespan has only vaguely be determined to likely have been in the third century AD) is that he has written a series of books called “Arithmetica”, where he deals with certain types of such equations. His books constitute the earliest known fairly systematic treatment of problems of this kind. Here are some examples. (1) A linear equation. aX + bY = c (where a, b, c are given integers). (2) Pythagorean triples. X 2 + Y 2 = Z 2. (3) Fermat’s equation. X n + Y n = Z n (where n ≥ 3 is a specified integer). (4) Sums of four squares. X12 + X22 + X32 + X42 = n (where n > 0 is a specified integer). (5) An instance of Pell’s equation. X 2 − 409 Y 2 = 1. (6) The “rational box”. X 2 + Y 2 = U 2, X 2 + Z 2 = V 2,

Y 2 + Z 2 = W 2,

(7) An elliptic curve. Y 2 = X 3 + 7823. (8) A generalized Fermat equation. X 2 + Y 3 = Z 7.

X 2 + Y 2 + Z 2 = T 2.

2

In example (7), we are interested in rational solutions, whereas in examples (1), (4) and (5), we are interested in integral solutions. For examples (2), (3) and (6), it doesn’t really matter, since we can scale solutions by a common factor. In these cases, we will be interested in primitive integral solutions, i.e., solutions in integers without common factor. For example, (X, Y, Z) = (3, −4, 5) is a primitive integral solution of X 2 + Y 2 = Z 2 , while (15, 36, 39) is not. In these “homogeneous” cases (where in every equation, all terms have the same total degree), we only consider non-trivial solutions — the trivial solution is when all variables take the value zero (the trivial solution is obviously not primitive, so it is ruled out automatically when we restrict to primitive integral solutions). In example (8), we are also interested in primitive integral solutions (for less obvious, but very good reasons). Now, associated to such an equation or system of equations, there are two basic problems. (1) Are there (rational/integral/primitive integral) solutions at all? (2) If so, can we describe them all? Is there anything we can say in general about these problems? Yes, there is. The question whether there exists an algorithm to solve the first problem in the case of integral solutions is known as “Hilbert’s tenth problem”. David Hilbert, who was a famous German mathematician and one of the last people who had an overview over mathematics as a whole, gave a famous talk at the International Congress of Mathematicians in 1900, where he listed 23 problems, whose solution he thought would move mathematics ahead. His tenth problem was eventually solved in the 1970s, but the solution was negative: there is no such procedure. Interestingly, Matijasevich, who finished the proof, used the theory of the Pell Equation in his work. (NB: This might be a term paper project!) In the case of rational solutions, the problem is open; it is the fundamental open problem in the area. It is likely that at least for certain classes of equations (for example, equations in two variables), there is an algorithm, but for the general problem, it is hard to judge. The second problem is, of course, even harder. We may easily spot one (or several) solutions, but to find them all (or to prove that there are no others besides the ones we found) can be really difficult. Of course, if the answer to the first problem is “no”, then the second is solved at the same time! To finish this introduction, let me give you some indications about the answers to the two problems for the examples I gave earlier. (1) aX + bY = c (where a, b, c are given integers). This has solutions if and only if c is a multiple of the gcd of a and b. In this case, we can find one solution (x0 , y0 ) by the Euclidean algorithm, and all solutions are then given by (x, y) = (x0 − bt, y0 + at) for t ∈ Z. (2) X 2 + Y 2 = Z 2 . There is a well-known parametrization of the primitive integral solutions, which we will derive soon in various ways. (3) X n + Y n = Z n (where n ≥ 3 is a specified integer). The only solutions are those where one of the variables vanishes. This is the statement of “Fermat’s Last Theorem”, which only became a theorem about ten years ago.

3

(4) X12 + X22 + X32 + X42 = n (where n > 0 is a specified integer). Lagrange proved that this always has solutions. There is also a formula for the number of solutions for any given n; it is ( ) X 8 if n is odd · d 24 if n is even d|n,d odd (5) X 2 − 409 Y 2 = 1. Besides the obvious solutions with Y = 0, this equation has in fact infinitely many integral solutions. They are all “generated” (by a recurrence formula) by the rather large basic solution X = 1238789998647218582160 ,

Y = 25052977273092427986049 .

(6) X 2 + Y 2 = U 2 , X 2 + Z 2 = V 2 , Y 2 + Z 2 = W 2 , X 2 + Y 2 + Z 2 = T 2 . Whether this has any nontrivial solution is an open problem. (There are solutions known when one of the variables is allowed to take an irrational value.) The main reason why this problem is hard is that the associated geometric object is a surface, whereas in most of the cases that can be solved, it is a curve. (7) Y 2 = X 3 + 7823. This has infinitely many rational solutions, again “generated” by a basic solution, which is 2263582143321421502100209233517777 X= 119816734100955612 186398152584623305624837551485596770028144776655756 Y = 119816734100955613 (Found be me a few years ago.) (8) X 2 + Y 3 = Z 7 . This has exactly the following primitive integral solutions. (±1, −1, 0),

(±1, 0, 1),

(±2213459, 1414, 65),

±(0, 1, 1),

(±3, −2, 1),

(±15312283, 9262, 113),

(±71, −17, 2),

(±21063928, −76271, 17) .

This is a recent result of myself together with Bjorn Poonen and Ed Schaefer. 2. Pythagorean Triples Let us start with the rather well-known case of Pythagorean Triples to show various methods to solve diophantine equations. First Approach: Algebraic/Arithmetic. We first remind ourselves that we are looking for primitive integral solutions of our equation X2 + Y 2 = Z2 . This implies that X, Y , Z cannot be all even. Can they all be odd? No, since then the left hand side is even, but the right hand side is odd. Arguing like this, we see that two of the variables have to be odd, the last one even. Now I claim that Z must be odd. Otherwise X and Y are both odd, and then X 2 and Y 2 are both of the form 4k + 1, so their sum is of the form 4k + 2; however, Z 2 is divisible by 4, a contradiction.

4

So we can assume that X is even, Y and Z are odd (but we have to bear in mind that there is a symmetric case with Y even). Now we write X 2 = Z 2 − Y 2 = (Z − Y )(Z + Y ) and observe that each of the two factors on the right is even: Z − Y = 2U ,

Z + Y = 2V .

We can also set X = 2W . This leads to W2 = UV , and I claim that U and V have no common divisors. For any common divisor d would also divide Y = V − U and Z = V + U and thence X, contradicting the assumption that the solution is primitive. Now we need the following key fact. Lemma. If a and b are coprime integers such that ab = c2 is a square, then either a = u2 , b = v 2 , or a = −u2 , b = −v 2 with suitable integers u and v. Proof. We consider the prime factorizations of a, b and c2 . In the latter, all primes occur with an even exponent. Now, since a and b do not have a common prime divisor, any prime p occurring in c can only occur either in a or in b (but not in both). So the power of p dividing whichever of a or b must be the same as the one dividing c2 , hence the exponent is even. So all primes dividing a or b occur in the factorization with an even exponent; this implies a = ±u2 and b = ±v 2 with integers u and v. Finally, c2 is positive, so a and b must have the same sign.  Note that this proof is based on two facts. (1) The ring Z of integers has unique factorization: every integer can be written as a unit times a product of prime numbers, which is uniquely defined up to the ordering of the factors. (This is the so-called “fundamental theorem of arithmetic”.) (2) Z has just the two units 1 and −1. (A unit is an element that has a multiplicative inverse in the ring.) Let me note here that this fairly innocent-looking lemma is in fact very strong and (together with its generalizations) can be used to deal with many diophantine problems. For example, this is true for Example (7) above. Let us now use this lemma for the solution of our equation. We had rewritten it in the form W 2 = U V , where U and V are coprime. Applying the lemma, we find integers S and T such that U = S2 ,

V = T2

or U = −S 2 ,

V = −T 2 .

Going back to the original variables, we find Y = T 2 − S2 ,

Z = T 2 + S2 ,

X = 2T S

or Y = S 2 − T 2 , Z = −(S 2 + T 2 ) , X = 2ST . In each case, S and T must be of different parity in order to get a primitive solution. For the complete solution, we have to add another pair of such formulas, where the roles of X and Y are switched.

5

Second Approach: Geometric. Now let us prove the same result in a rather different way. We first observe that Z 6= 0 in any primitive solution, which allows us to divide the equation by Z 2 . Setting x = X/Z and y = Y /Z, we obtain x2 + y 2 = 1 , and we are now looking for rational solutions. You certainly know from school that this equation has a geometric meaning: it characterizes those points (x, y) in the cartesian plane whose distance from the origin is 1, i.e., the points on the unit circle. In this view, we are asking about the rational points on the unit circle, i.e., the points with rational coordinates. Now there are four easy points to spot: (1, 0), (0, 1), (−1, 0), (0, −1). Can we use this to find more? The idea is to use lines. Fix one of the points above, say P = (−1, 0). If Q = (x, y) is any other rational point on the circle, then the line joining P and Q will have rational slope y/(x + 1) (and intercept, which happens to be the same number). Conversely, if we consider the line through P with rational slope t, then it will intersect the circle again at another rational point. Let’s do the math: The equation of the line is y = t(x + 1). Plugging this into the equation of the circle, we get x2 + t2 (x + 1)2 = 1 or (1 + t2 )x2 + 2t2 x + t2 − 1 = 0 . Now we know one solution of this, namely x = −1, corresponding to P . Dividing by x + 1, we get (1 + t2 )x + t2 − 1 = 0 ,

and so x =

1 − t2 . 1 + t2

In this way, we get a bijection between the rational points on the circle except P and the rational numbers: Q ←→ {(x, y) ∈ Q2 : x2 + y 2 = 1, x 6= −1}  1−t2 t 7−→ , 2t 1+t2 1+t2 y ←− (x, y) x+1 The missing point P on the right can be considered as corresponding to the “limiting case” t → ∞. To translate this back into primitive integral solutions to our original equation in X, Y and Z, we have to look at numerator and denominator of x and y. We write t = V /U as a fraction in lowest terms; then we have X U2 − V 2 = 2 , Z U +V2

Y 2U V = 2 . Z U +V2

Now if U and V are not both odd (so they are of different parity, since they cannot be both even either), then U 2 − V 2 and U 2 + V 2 are coprime, and we get, taking Z to be positive, X = U2 − V 2 ,

Y = 2U V ,

Z = U2 + V 2

as before. If U and V are both odd, then we can write U + V = 2S ,

U − V = 2T ;

6

this gives U 2 − V 2 = 4ST , U 2 + V 2 = (S + T )2 + (S − T )2 = 2(S 2 + T 2 ) and 2U V = 2(S 2 − T 2 ), hence 2ST Y S2 − T 2 X = 2 , = . Z S + T2 Z S2 + T 2 Since U = S + T is odd, S and T are of different parity, and the fractions on the right are in lowest terms. This gives the other parametrization, taking care of the case that X is even. Note that we get the point P by taking U = 0, V = 1, which corresponds to “t = 1/0 = ∞”. 3. Ternary Quadratic Forms and Lattice Points Ternary Quadratic Forms. We now want to generalize the Pythagorean Triples example to more general equations of the shape Q(X, Y, Z) = a X 2 + b Y 2 + c Z 2 = 0 . We again observe that if (X, Y, Z) is a nontrivial solution, then also (λX, λY, λZ) is a nontrivial solution, for any λ 6= 0. So we can either restrict to primitive integral solutions as before, or we can consider all these scaled solutions as being essentially the same solution, and therefore ask for solutions up to scaling. If we do this, it does not make a difference whether we ask for integral or for rational solutions. Given an equation like the one above, there are three problems one might want to solve. (1) Does there exist a (nontrivial integral or rational) solution? (2) If one exists, find one! (3) Describe all solutions, given one solution! We have seen how the last problem can be solved using geometry in the special case X 2 + Y 2 − Z 2 = 0. The only property of the equation we need for this is that it is of degree 2. So this procedure readily generalizes. As an example, consider X 2 + 2 Y 2 = 17 Z 2 . After a few tries, we find a solution (X, Y, Z) = (3, 2, 1). We divide by Z 2 and obtain (again writing x = X/Z and y = Y /Z) the equation of an ellipse, x2 + 2 y 2 = 17 , which has a rational point P = (3, 2). Taking the line through P of slope t, with equation y = t(x − 3) + 2, and computing the other point of intersection with the ellipse, we get: x2 + 2(t(x − 3) + 2)2 − 17 = 0 (1 + 2t2 )x2 + 4t(2 − 3t)x + (18t2 − 24t − 9) = 0 (x − 3)((1 + 2t2 )x + (−6t2 + 8t + 3)) = 0 and so

6t2 − 8t − 3 −4t2 − 6t + 2 y = t(x − 3) + 2 = . 2t2 + 1 2t2 + 1 In this way, we get again a bijection between the rational points on the ellipse (which are in bijection with integral/rational solutions to the original equation, x=

7

up to scaling) and the set of rational numbers, plus “infinity” (which corresponds to the point (x, y) = (3, −2)). (Question: which value of t corresponds to the point P ?) Before we look at the first problem, let us make some reductions. We assume that a, b, c are nonzero integers. Now if a, b, c have a common divisor d, we can simply divide the equation by d and thus obtain a “smaller” equation. But we can also do something if just two coefficients, a and b, say, have a common divisor d: write a = a0 d, b = b0 d and multiply the equation by d; then you get a0 d2 X 2 + b0 d2 Y 2 + cd Z 2 = a0 (dX)2 + b0 (dY )2 + cd Z 2 = 0 . So, scaling the variables appropriately, we obtain the smaller equation a0 X 2 + b0 Y 2 + cd Z 2 (if we measure size by the absolute value of the product of the coefficients). Continuing in this way, we arrive at an equation where the coefficients a, b, c are coprime in pairs. Finally, again by scaling the variables, we can get rid of any square factors in the coefficients; therefore we can also assume that a, b, c are squarefree (i.e., not divisible by a square > 1). Now, coming back to the first problem, we can easily prove that solutions exist by exhibiting one. On the other hand, how can we prove that there are no solutions? Consider for example the case that all of a, b, c are positive. Then there is obviously no nontrivial solution, since the expression on the left hand side will always be positive: there are not even nontrivial solutions in real numbers. On the other hand, we can also use divisibility arguments. Take for example X2 + Y 2 − 3 Z2 = 0 . There are certainly solutions in real numbers here, but still there are no integral solutions. For assume (X, Y, Z) is an integral solution; then by scaling, we can also assume that not all of X, Y, Z are divisible by 3. From the equation, we get that 3 divides X 2 + Y 2 , and since X 2 is either of the form 3k or of the form 3k + 1, this is only possible when 3 divides both X and Y . But then 9 divides 3 Z 2 , and so 3 also divides Z, a contradiction. So no solution can exist. (Exercise: prove this by an argument using divisibility by 4!) What necessary conditions do we get in this way? Proposition.

If there is a nontrivial integral or rational solution of a X2 + b Y 2 + c Z2 = 0 ,

where a, b, c are nonzero integers, coprime in pairs and squarefree, then a, b, c cannot all have the same sign. Furthermore, there exist integers u, v, w such that a | u2 + bc ,

b | v 2 + ac ,

c | w2 + ab .

Proof. The first assertion is clear. For the other statement, assume (without loss of generality) that (X, Y, Z) is a primitive integral solution. Then a divides (bY )2 + bcZ 2 . If Z and a had a common prime divisor p, then p would have to divide bY and hence Y (since gcd(a, b) = 1). But then p2 will divide aX 2 , hence (since a is squarefree, so p2 - a) p also divides X, contradicting the assumption that (X, Y, Z) is a primitive solution. So we have gcd(a, Z) = 1; this implies that there is an integer r such that a | rZ − 1. Taking u = rbY , we see that a | u2 + bc. The existence of v and w is proved in the same way. 

8

Note that we can efficiently check these necessary conditions if we know the prime factorizations of a, b, and c. (For all prime divisors p of a, we need to check that −bc is a square mod p, etc. If we want to find u, v, w, we have to extract square roots modulo primes and then combine the results using the Chinese Remainder Theorem.) The surprising fact now is that these conditions are already sufficient! Theorem (Legendre). If a, b, c are nonzero integers, squarefree and coprime in pairs, not all of the same sign, and if there exist integers u, v, w such that a | u2 + bc ,

b | v 2 + ac ,

c | w2 + ab ,

then a X 2 + b Y 2 + c Z 2 = 0 has a nontrivial integral solution. The basic idea of the proof is the following. We want to show that (for a suitable choice of (X, Y, Z) 6= (0, 0, 0)), a certain integer z = a X2 + b Y 2 + c Z2 is zero. This can be done by showing two properties of z: (1) z is a multiple of d for some non-zero integer d; (2) |z| < |d| for the same d. In the proof, we will first construct a suitable subset of Z3 such that its elements (x, y, z) satisfy 2abc | a x2 + b y 2 + c z 2 . Then we will use a nice theorem to conclude that for at least one non-zero element, we also have |a x2 + b y 2 + c z 2 | < 2|abc| . Lattices and Minkowski’s Theorem. What is a lattice? For our purposes, we will use the following definition (the usual definition is a bit more general). A lattice in Rn is a subgroup Λ of finite index in Zn . (The “finite index” condition is equivalent to saying that there is some integer N ≥ 1 such that (N Z)n ⊂ Λ.) The index ∆(Λ) = (Zn : Λ) = #(Zn /Λ) is called the covolume of Λ. Examples of lattices are Zn itself or the “even” lattice Λ = {(x, y, z) ∈ Z3 : x + y + z is even} with ∆(Λ) = 2. More generally, we have the following. Lemma. (1) If gcd(a, r, s, t) = 1 and a 6= 0, then Λ = {(x, y, z) ∈ Z3 : a | rx + sy + tz} is a lattice of covolume |a|. (2) If Λ1 , Λ2 ⊂ Zn are lattices in Rn , then Λ = Λ1 ∩ Λ2 is again a lattice. If we also have that gcd(∆(Λ1 ), ∆(Λ2 )) = 1, then ∆(Λ) = ∆(Λ1 )∆(Λ2 ). Let us use these facts in order to come up with a lattice that will help prove the theorem. Lemma. Under the hypotheses of the theorem, there exists a lattice Λ such that ∆(Λ) = 2|abc| and such that 2abc | a x2 + b y 2 + c z 2

9

for all (x, y, z) ∈ Λ. Proof. For simplicity, we will assume here that abc is odd. The even case is very similar, but a little bit more involved. Let us define four lattices: Λ1 = {(x, y, z) ∈ Z3 : a | by − uz} Λ2 = {(x, y, z) ∈ Z3 : b | cz − vx} Λ3 = {(x, y, z) ∈ Z3 : c | ax − wy} Λ4 = {(x, y, z) ∈ Z3 : 2 | x + y + z} By the first part of the lemma, we have ∆(Λ1 ) = |a|, ∆(Λ2 ) = |b|, ∆(Λ3 ) = |c|, and ∆(Λ4 ) = 2. So if we set Λ = Λ1 ∩ Λ2 ∩ Λ3 ∩ Λ4 , we get by repeated application of the second part of the lemma that ∆(Λ) = 2|abc|. It remains to show that the divisibility statement holds. Let (x, y, z) ∈ Λ1 . Then we have the following chain of implications. a | u2 + bc =⇒ a | (u2 + bc)z 2 =⇒ a | (uz)2 + bc z 2 =⇒ a | (by)2 + bc z 2 =⇒ a | b(b y 2 + c z 2 ) =⇒ a | b y 2 + c z 2 =⇒ a | a x2 + b y 2 + c z 2 Similarly, we see that for (x, y, z) ∈ Λ2 (resp. Λ3 ), we have that b (resp. c) divides ax2 + by 2 + cz 2 . Since a, b and c are odd, we also see that ax2 + by 2 + cz 2 is always even for (x, y, z) ∈ Λ4 . Taking all of this together, for (x, y, z) ∈ Λ, the value ax2 + by 2 + cz 2 is divisible by a, b, c and 2. Since these four numbers are relatively prime in pairs, it follows that 2abc divides ax2 + by 2 + cz 2 .  Now we want to prove that we can find (x, y, z) such that ax2 +by 2 +cz 2 is actually zero. We know that we can at least arrange to have the value divisible by 2abc. So in order to ensure to get a zero value, it suffices to show that there is some (0, 0, 0) 6= (x, y, z) ∈ Λ such that |ax2 + by 2 + cz 2 | < 2abc. The following theorem will allow us to produce that such an element. Theorem (Minkowski). Let Λ ⊂ Rn be a lattice of covolume ∆(Λ), and let S ⊂ Rn be convex and symmetric (meaning that if s ∈ S, we also have −s ∈ S), such that vol S > 2n ∆(Λ). Then S contains a non-zero lattice point from Λ. In order to use this result for the proof of Legendre’s Theorem, we have to come up with a sufficiently large convex symmetric set that produces the bound on |ax2 + by 2 + cz 2 |. At this point, we have to use the “not all signs equal” condition. Let us assume (without loss of generality) that a, b > 0 and c < 0. Then we can take S to be the elliptic cylinder S = {(x, y, z) ∈ R3 : ax2 + by 2 < 2|abc| ,

z 2 < 2ab} .

Let us check the volume condition. We have √ 2π|abc| √ vol S = √ 2 2ab = 4 2π|abc| > 16|abc| = 23 ∆(Λ) , ab so this is OK, and Minkowski’s Theorem tells us that there is some (0, 0, 0) 6= (x, y, z) ∈ S ∩ Λ. For this (x, y, z), we have on the one hand that 2abc | ax2 + by 2 + cz 2 .

10

On the other hand, by the definition of S, 0 ≤ ax2 + by 2 < 2|abc| and 0 ≥ cz 2 > −2|abc| , so |ax2 + by 2 + cz 2 | < 2|abc| . Together, these conditions imply that ax2 + by 2 + cz 2 = 0, finishing the proof. Here is a sketch of how you prove Minkowski’s Theorem. The first step is to consider X = 21 S (i.e., the set S shrunk by a factor of 1/2) and to show that X meets one of its translates X + λ for some λ ∈ Λ \ {0}. The idea for this is to note that vol X = 2−n vol S > ∆(Λ), and so if all translates X +λ (λ ∈ Λ) were pairwise disjoint, then the union would be “too large”. More precisely, a big hypercube of sidelength N contains about N n /∆(Λ) lattice points, and so it contains about this number of translated copies of X. If they are disjoint, the volume of their union would be approximately vol X · N n /∆(Λ) > CN n with some C > 1. On the other hand, the volume of the hypercube is only N n , leading to a contradiction. (To make this rigorous, one has to estimate the errors due to boundary effects, but this can easily be done.) This shows that there are distinct λ1 , λ2 ∈ Λ such that (X + λ1 ) ∩ (X + λ2 ) 6= ∅. But then (translating by −λ1 ), we also have that X ∩ (X + λ) 6= ∅ with λ = λ2 − λ1 . Concretely, this means that there are x, y ∈ X such that x = y+λ, or x−y = λ 6= 0. Now S = 2X, hence 2x ∈ S, 2y ∈ S, then (since S is symmetric) −2y ∈ S, and finally λ = 21 (2x + (−2y)) ∈ S (since S is convex: with two points, it contains the midpoint of the line segment joining them). 4. Sums of Four Squares In this section, we want to state and prove Lagrange’s Theorem on sums of four squares. Theorem (Lagrange). Every positive integer n is the sum of (at most) four integer squares. For example, 1 = 12 ,

2 = 12 + 12 ,

5 = 12 + 22 ,

3 = 12 + 12 + 12 ,

6 = 12 + 12 + 22 ,

9 = 12 + 22 + 22 = 32 ,

4 = 12 + 12 + 12 + 12 = 22 ,

7 = 12 + 12 + 12 + 22 ,

8 = 22 + 22 ,

10 = 12 + 12 + 22 + 22 = 12 + 32 , . . .

Note that 7 cannot be written as a sum of less than four squares, so four squares will be necessary in general. How can we prove a statement like this? The proof we will see has two main parts. The first part uses a multiplicative property of sums of four squares to reduce to the case that n is a prime number p. The second part is somewhat in the spirit of the proof of Legendre’s Theorem: we set up a suitable lattice that gives us divisibility by p and then use Minkowski’s Theorem to deduce that some lattice point does what we want. Let us look at the first part. The result we need is as follows. Lemma. If m and n are both sums of four squares, then so is mn.

11

Proof. Suppose that m = x21 + x22 + x23 + x24

and n = y12 + y22 + y32 + y42 .

Then one can easily check that mn = (x1 y1 − x2 y2 − x3 y3 − x4 y4 )2 + (x1 y2 + x2 y1 + x3 y4 − x4 y3 )2 + (x1 y3 − x2 y4 + x3 y1 + x4 y2 )2 + (x1 y4 + x2 y3 − x3 y2 + x4 y1 )2 .  Remark. What lies behind this is the existence of the skew field of quaternions, H = R + Ri + Rj + Rk where i2 = j 2 = k 2 = ijk = −1; in particular, (a + bi + cj + dk)(a − bi − cj − dk) = a2 + b2 + c2 + d2 . By the lemma, using induction and the prime factorization of positive integers, we see that it suffices to show that every prime number p is a sum of four squares. The first step in approaching this is to show that at least a number divisible by p can be written in this way; more precisely, we show the following. Lemma. Let p be a prime number. Then there are integers u and v such that p divides 1 + u2 + v 2 . Proof. This is clear when p = 2 (take u = 1, v = 0), so we can assume that p is odd. The best way of looking at this is to compute with residue classes modulo p: we identify integers whose difference is a multiple of p. In this way, we get p different classes of integers (each comprising integers which leave the same remainder when divided by p), and we can add, subtract and multiply them by working with representative elements. Now the fact is: There are exactly (p + 1)/2 residue classes that are squares (of residue classes). To see this, we note that the classes of k and −k have the same square, and the two classes are distinct unless both are the “zero” class of numbers divisible by p. On the other hand, two classes having the same square have to be identical or negatives of each other. (If p divides x2 − y 2 , then p divides x − y or x + y.) Since there are (p + 1)/2 (unordered) pairs consisting of a class and its negative, the claim follows. Using this, we see that there are (p + 1)/2 classes of the form 1 + u2 and also (p + 1)/2 classes of the form −v 2 . Since there are only p classes altogether, there must be one class that is common to both sets of classes. But this exactly means that there are integers u and v such that p divides (1+u2 )−(−v 2 ) = 1+u2 +v 2 .  We now use these numbers u and v to define a lattice: Λ = {(x1 , x2 , x3 , x4 ) ∈ Z4 : p | x2 − ux1 − vx4 ,

p | x3 − vx1 + ux4 }

Since there are two independent “mod-p” conditions, we get that ∆(Λ) = p2 . Also, I claim that for every lattice element (x1 , x2 , x3 , x4 ) ∈ Λ, we have that p divides x21 + x22 + x23 + x24 . To see this, note that p | x2 − ux1 − vx4 =⇒ p | x22 − (ux1 + vx4 )2 p | x3 − vx1 + ux4 =⇒ p | x23 − (vx1 − ux4 )2 and so p | x21 + x22 + x23 − x24 ⇐⇒ p | x21 + (ux1 + vx4 )2 + (vx1 − ux4 )2 + x24 ⇐⇒ p | (x21 + x24 )(1 + u2 + v 2 ) ,

12

which is true. Now we know that the sum of squares is divisible by p on Λ, we only need to know that Λ contains a sufficiently “small” element. We take S = {(x1 , x2 , x3 , x4 ) ∈ R4 : x21 + x22 + x23 + x24 < 2p} ; √ this is the open four-dimensional ball of radius 2p. What is its volume? Well, Z vol S = dx1 dx2 dx3 dx4 x21 +x22 +x23 +x24 <2p

Z



=

Z



dx3 dx4 dx1 dx2

x21 +x22 <2p x23 +x24 <2p−x21 −x22 √

Z 2p (2p − r2 ) r dr π(2p − x21 − x22 ) dx1 dx2 = 2π 2

Z =

0

x21 +x22 <2p



= 2π 2 pr2 −

√ r4  2p



= 2π 2 (2p2 − p2 ) = 2π 2 p2 .

4 0 Since π > 8, this is larger than 24 ∆(Λ) = 16p2 , so by Minkowski’s Theorem, there is some 0 6= (x1 , x2 , x3 , x4 ) ∈ S ∩ Λ. Then we have that 2

p | x21 + x22 + x23 + x24

and 0 < x21 + x22 + x23 + x24 < 2p ,

hence x21 + x22 + x23 + x24 = p, as was to be shown. 5. Pell’s Equation In this section, we will discuss the equation X2 − d Y 2 = 1 , to be solved in integers. Here, d > 1 is a squarefree integer. Let us look at some examples. For d = 2, we have the following solutions with x, y ≥ 0. x 1 3 17 99 577 3363 19601 y 0 2 12 70 408 2378 13860 For d = 3, we have x 1 2 7 26 97 362 1351 y 0 1 4 15 56 209 780 And for d = 5, we get x 1 9 161 2889 51841 930249 y 0 4 72 1292 23184 416020 We observe that there are nontrivial solutions, and also that they grow quite fast (the number of digits seems to grow linearly, so the numbers grow exponentially). The sequence of solutions seems to go on, so it appears that there are infinitely many solutions. Let us first prove an important fact about the structure of the solution set. More generally, we have the following.

13

Lemma. Let u, v, x, y be numbers such that u2 − dv 2 = a, x2 − dy 2 = b. Then (ux + d vy)2 − d (uy + vx)2 = ab . Proof. Simple calculation.

 √ What is behind this is the following. We consider numbers of the form x + y d with x, y ∈ Z. The map √ √ √ N : x + y d 7−→ (x + y d)(x − y d) = x2 − dy 2 √ is called the norm (on the ring Z[ d] whose elements are these numbers). The lemma then says that the norm is multiplicative, i.e. N (αβ) = N (α)N (β) — note that √ √ √ (u + v d)(x + y d) = (ux + dvy) + (uy + vx) d . Now let Sd = {(x, y) ∈ Z2 : x2 − d y 2 = 1} be the solution set we are interested in. Then the lemma implies the following. Proposition. The set Sd is an abelian group with zero element (1, 0) and “addition” (x, y) + (x0 , y 0 ) = (xx0 + d yy 0 , xy 0 + x0 y) . The subset Sd+ of solutions with x > 0 forms a subgroup of index 2. When Sd+ contains more than one element, then there is a fundamental solution with x, y > 0 and x as small as possible, and all other solutions in Sd+ are integral multiples of the fundamental solution (and all these integral multiples are distinct). Proof. By the lemma, the addition as defined, applied to elements of Sd , yields solutions again. The group axioms are easily verified (note that the negative is −(x, y) = (x, −y)). In the context of the map φ : Sd 3 (x, y) 7−→ x + y



d ∈ R× ,

we get √ a group homomorphism. The map is injective — note that 1/φ(x, y) √ = x − y d, and so x = (φ(x, y) + 1/φ(x, y))/2 and y = (φ(x, y) − 1/φ(x, y))/(2 d). Under√φ, Sd+ is the preimage of the positive reals — we have dy 2 = x2 + 1, so |y| d < |x|, hence the sign of φ(x, y) is that of x. Since the positive reals form a subgroup of the multiplicative group of nonzero reals, Sd+ is a subgroup of Sd . Its index is 2, since the index of the positive reals is 2, and for example φ(−1, 0) = −1 < 0. Now assume that Sd+ 6= {(1, 0)}. Then among all solutions with x, y > 0, there will be one with smallest x. Call it (x1 , y1 ). This is the fundamental solution (and it is uniquely determined, since for any x, there are at most two values of y satisfying the equation, and only one can be positive). I claim that φ(x1 , y1 ) is the smallest element in φ(Sd ) that is greater than 1. This follows from √ x + y d > 1 ⇐⇒ x, y > 0 for all (x, y) ∈ Sd (by the formulas for x and y above) and from φ(x0 , y 0 ) > φ(x, y) if x0 > x > 0 and y, y 0 > 0 for all (x, y), (x0 , y 0 ) ∈ Sd . Since α := φ(x1 , y1 ) > 1, all its powers are distinct, hence so are the integral multiples of (x1 , y1 ) in the group structure of Sd .

14

Now assume that there is some (x, y) ∈ Sd+ that is not an integral multiple of (x1 , y1 ). Then there is a unique n ∈ Z such that αn < β := φ(x, y) < αn+1 . But then (x, y) − n(x1 , y1 ) ∈ Sd+ maps to βα−n under φ, which is strictly between 1 and α, contradicting the choice of α = φ(x1 , y1 ).  In more concrete terms, this means that if there are nontrivial solutions, then there is a smallest positive fundamental solution (x1 , y1 ), and all nontrivial solutions are then of the form (±xn , ±yn ), where for n ≥ 1, xn+1 = x1 xn + d y1 yn ,

yn+1 = y1 xn + x1 yn .

Equivalently,

√ √ xn + yn d = (x1 + y1 d)n . Note (exercise!) that xn is a polynomial of degree n in x1 alone, independent of d, e.g., x2 = 2x21 − 1, x3 = 4x31 − 3x1 .

The next question, of course, is whether there are indeed nontrivial solutions. The answer is: Theorem. For every squarefree integer d > 1, the equation X 2 − d Y 2 = 1 has infinitely many nontrivial solutions. They are obtained from the smallest positive solution in the way described above. In order to prove that theorem, we have to look at how well one can approximate irrational numbers by rational ones. This is quite natural in the context of Pell’s Equation, √ since for any (positive) solution (x, y), the fraction x/y must be quite close to d:  x 2 1 −d= 2 , y y so x √ 1 1 √ < √ . − d = 2 y y (x/y + d) 2 d y2 Can we get such good approximations? Lemma. Let α ∈ R \ Q be an irrational real number. Then there are infinitely many fractions p/q ∈ Q such that p 1 − α < 2 . q q Proof. The proof is combinatorial. We will in fact show that for every n ≥ 1, there is a fraction p/q with q ≤ n such that |p/q − α| < 1/qn. This implies the statement of the lemma. Define the fractional part of a real number x to be hxi = x − bxc (where bxc is the largest integer ≤ x); we always have 0 ≤ hxi < 1 and x − hxi ∈ Z. Consider the n+1 numbers 0, hαi, h2αi, . . . hnαi in the interval [0, 1). We subdivide this interval into n subintervals [0, 1/n), [1/n, 2/n), . . . , [(n − 1)/n, 1). Then there must be two of the n + 1 numbers falling into the same subinterval, therefore we have |hlαi − hkαi| < 1/n for some 0 ≤ k < l ≤ n. Let m = blαc − bkαc, then we get m 1 1 |(l − k)α − m| < =⇒ − α < , n l−k (l − k)n and 0 < l − k ≤ n. 

15

Now from this we obtain at least the following consequence. Lemma. There are infinitely many pairs (x, y) of positive integers such that √ |x2 − d y 2 | < 2 d + 1 . Proof. By the previous lemma, there are infinitely many pairs (x, y) √ of positive √ 2 integers such that |x/y √ − d|√< 1/y . Take such a pair. Then |x − y d| < 1/y and therefore |x + y d| < 2y d + 1/y. Multiplying, we get √ √ √ 1 d + 1/y 2y = 2 d + 2 ≤ 2 d + 1. |x2 − d y 2 | < y y  √ √ There are only finitely many integers between −2 d − 1 and 2 d + 1, so for some of them, m say, there must be infinitely many pairs (x, y) such that x2 − dy 2 = m. Note that m 6= 0, since d is not a square. Now there are only finitely many residue classes modulo m, and hence there must be two distinct pairs (x, y) and (u, v) such that x2 − d y 2 = u2 − d v 2 = m and m | x − u , m | y − v . Then, “adding” (x, −y) and (u, v), we have (xu − dyv)2 − d(xv − yu)2 = m2 , but also that m divides xu − dyv and xv − yu: xu − dyv ≡ x2 − dy 2 = m ≡ 0 mod m ,

xv − yu ≡ xy − xy = 0 mod m .

On the other hand, since (x, y) and (u, v) are distinct, xv − yu 6= 0. So, setting s = (xu − dyv)/m and t = (xv − yu)/m, we have found a nontrivial solution of our equation X 2 − d Y 2 = 1. Now we know that there always is a fundamental solution. But how can we find it? The answer is provided by continued fractions. Continued fractions provide a method of constructing good rational approximations. The continued fraction expansion of an irrational real number α is obtained as follows. Let α0 = α, and then for n ≥ 0, 1 1 an = bαn c , αn+1 = = . αn − an hαn i Note that for n ≥ 1, αn > 1, and so an ≥ 1. In addition, define p−2 = 0 , q−2 = 1 , p−1 = 1 , q−1 = 0 pn+1 = an pn + pn−1 , qn+1 = an qn + qn−1 . Then for n ≥ 0, pn = a0 + qn

1

=: [a0 ; a1 , a2 , . . . , an ] ,

1

a1 + a2 +

1

1 an and these fractions are in a certain sense the “best” rational approximations to α. In particular, we have p 1 n − α < 2 . qn qn ··· +

16

√ Fact. If α = d with d > 1 squarefree, then the sequence a1 , a2 , . . . is periodic, i.e., there is some (smallest) k ≥ 1 such that for all n ≥ 1, one has an+k = an . We √ can recognize k by the property that it is the first n ≥ 1 such that αn = m + d for some m ∈ Z. Fact. Let k be the period as above, then x = pk−1 , y = qk−1 is the smallest positive solution to X 2 − d Y 2 = ±1. If the sign is positive (this is the case when k is even), then it is the fundamental solution. If the sign is negative (when k is odd), then its “double” (2x2 + 1, 2xy) = (p2k−1 , q2k−1 ) is the fundamental solution of X 2 − d Y 2 = 1. Example. Let us compute the fundamental solution of X 2 − 23 Y 2 = 1. We find n 0 1 2 3 4

αn

1 23 − 4 7 √ 23 − 3 2 √ 23 − 3 7 √ 23 − 4 √

= = = =



23 √ 23 + 4 7 √ 23 + 3 2 √ 23 + 3 7 √ 23 + 4

an p n q n 4 4 1 1

5

1

3

19

4

1

24

5

So k = 4, and x = 24, y = 5 is the fundamental solution. Some Ideas for Term Paper Projects (1) Look at, understand and perhaps implement the algorithm described by Cremona and Rusin (paper obtainable from me) for solving diagonal ternary quadratic equations aX 2 + bY 2 + cZ 2 = 0. (2) Learn about quadratic reciprocity and present one proof of the Law of Quadratic Reciprocity. See for example the notes for my “Introductory Number Theory” class (you can find them under “Teaching at IUB”/“IntroNumTh2004-Spring” on my home page), or Ireland-Rosen, § 5, or Nathanson, § 3. (3) Learn about the “proof by descent” of Lagrange’s Theorem (my notes [see above] or Ireland-Rosen, § 17.7). Same for representing p ≡ 1 mod 4 as a sum of two squares, plus some other examples of “descent” (e.g., X 4 ± Y 4 = Z 2 ). (4) Learn about the formula for the number of ways to represent a positive integer as a sum of four squares. (Ireland-Rosen, § 17.7) (Either use the formula for two squares, or understand its proof, too.) (5) Pell’s Equation and Hilbert’s Tenth Problem. (Chapter 2 of Matiyasevich’s book. Somewhat involved but elementary.) (6) Continued fractions and Pell’s Equation. (Chapter 7 of Bressoud-Wagon, for example.)

Related Documents

Diophantine Equations
May 2020 14
Equations
November 2019 31
Equations
October 2019 28
Equations
July 2020 19

More Documents from ""