Proof sketch of Thue’s theorem

In this post a proof of the following theorem is going to be sketched, following the treatment in Borevich and Shafarevich’s Number Theory. This sketch is by no means meant to be highly detailed and I am writing it mostly for my own purposes, so I avoid proving some things, even if they aren’t that straightforward.

Thue’s theorem: Suppose \(f(x,y)=a_0x^n+a_1x^{n-1}y+\dots+a_ny^n\) is a binary form which has degree \(n\geq 3\), is irreducible (i.e. \(f(x,1)\) is an irreducible polynomial in \(x\)) and \(f(x,1)\) has at least one nonreal root in \(\mathbb C\). Then for any nonzero integer \(c\) the equation \(f(x,y)=c\) has only finitely many integral solutions.

Proof: Suppose otherwise…


First of all, we may suppose \(a_0=1\), for otherwise, we replace \(f(x,y)\) with \(a_0^{n-1}f(\frac{1}{a_0}x,y)\), which still has integer coefficients. Write


The numbers \(\theta_1,\dots,\theta_n\) are all conjugates of \(\theta=\theta_1\), since we assumed \(f(x,1)\) is irreducible. It’s then easy to see

\(f(x,y)=(x+y\theta_1)(x+y\theta_2)\dots(x+y\theta_n)=N(x+y\theta) \qquad (1)\)

where \(N\) is the norm of the field \(k=\mathbb Q(\theta)\). Also put \(K=\mathbb Q(\theta_1,\dots,\theta_n)\). Hence we are interested in the solutions of \(N(\alpha)=c\), where \(\alpha\) is in the module (i.e. the additive subgroup) \(M\) generated by \(1,\theta\). Extend this two-element set to a basis of \(k\) \(\mu_1=1,\mu_2=\theta,\mu_3,\dots,\mu_n\) and denote by \(\overline{M}\) the module generated by these. To recover elements of \(M\) among these, we use the dual basis, i.e. elements \(\mu_1^*,\dots,\mu_n^*\) such that \(T(\mu_i\mu_j^*)=0\) for \(i\neq j\) and \(T(\mu_i\mu_i^*)=1\). Trace of \(\alpha\mu_i^*\) recovers then the coefficient of \(\mu_i\) in \(\alpha\), hence we want

\(T(\alpha\mu_i^*)=0\) for \(i=3,\dots,n\).

A general result (Theorem 1, Section 5.2, Chapter 2 in Borevich-Shafarevich, slightly rephrased) about elements of fixed norm in a module states the following.

Theorem 1: For a module \(\overline{M}\) of rank \(n\) in a field \(k\) of degree \(n\) there are elements \(\gamma_1,\dots,\gamma_k\in\overline{M}\) and \(\varepsilon_1,\dots,\varepsilon_r\in k\) such that every solution of \(N(\alpha)=c,\alpha\in\overline{M}\) can be uniquely written as


Moreover, \(r=s+t-1\), where \(s\) is the number of real embeddings of \(k\) into \(\mathbb C\) and \(2t\) is the number of complex embeddings.

Therefore \(\alpha\) as above is in \(M\) if it satisfies the system of equations

\(T(\gamma_a\mu_i^*\varepsilon_1^{u_1}\dots\varepsilon_r^{u_r})=0\) for \(i=3,\dots,n\).

Since we assume there are infinitely many \(\alpha\) solving the above system, and \(\gamma_a\) ranges over a finite set, we can choose one of the \(\gamma\) such that infinitely many solutions of the above have \(\gamma_a=\gamma\). We can now write this system as

\(\displaystyle\sum_{j=1}^n\sigma_j(\gamma\mu_i^*)\sigma_j(\varepsilon_1)^{u_1}\dots\sigma_j(\varepsilon_r)^{u_r}=0\) for \(i=3,\dots,n\qquad (2)\),

where \(\sigma_j\) are embeddings of \(k\) into \(K\) ordered so that \(\sigma_j(\theta)=\theta_j\).

So now we want to derieve a contradiction from the assumption that \((2)\) has infinitely many solutions in integers \(a_1,\dots, a_r\).

Entering the \(\frak P\)-adic world

The idea now is to prove that \((2)\) not only has finitely many integral solutions, but it has finitely many solutions in \(\frak P\)-adic integers, where \(\frak P\) is some prime of \(K\). More precisely, we take any prime (= prime ideal in the ring of integers) \(\frak P\) and the corresponding valuation \(\nu=\nu_{\frak P}\). Then we construct the completion \(K_{\frak P}\) of \(K\) with respect to this valuation. By a “\(\frak P\)-adic number” we mean any element of \(K_{\frak P}\), and ones with nonnegative valuations are going to be called “\(\frak P\)-adic integers”.

We now want to make sense of equations \((2)\) for \(a_i\) not necessarily integers, but also \(\frak P\)-adic integers. The problem reduces to making sense of \(a^b\) for \(a\) a fixed \(\frak P\)-adic number and \(b\) a \(\frak P\)-adic integer, which is meant to vary. For this, we employ exponential and logarithmic functions: we will write \(a^b=\exp(b\log a)\). \(\exp\) and \(\log\) are defined using their power series:

\(\displaystyle\exp x=\sum_{n=0}^\infty\frac{x^n}{n!}\),


These two functions are each other’s inverses, that is,

\(\exp\log(1+x)=1+x,\log\exp x=x\).

There are many ways to justify this, the most straightforward one being that we know these equalities hold for complex numbers, hence they are formal equalities of power series, hence they must also hold for \(\frak P\)-adic numbers. However, these functions are not defined everywhere. Nevertheless, they can be shown to have positive radius of convergence. More precisely:

Lemma 1: There is a rational integer \(\kappa\) such that both \(\exp x\) and \(\log(1+x)\) are defined for \(\nu(x)\geq\kappa\). Moreover, \(\nu(\log(1+x))\geq\kappa\), so \((1+x)^b=\exp(b\log(1+x))\) is defined for any \(\frak P\)-adic integer \(b\).

Unfortunately, there is no reason to expect numbers \(\varepsilon_i\) suit our purposes. However, we can change them so that this is the case. First of all, we may suppose that \(\frak P\) is such that all of \(\sigma_j(\varepsilon_i)\) have valuation zero (there are finitely many of these numbers, and they have nonzero valuation only with respect to finitely many prime ideals). Now we look at reduction modulo \(\frak P^\kappa\) (or, more precisely, modulo any element with valuation \(\kappa\)). The quotient ring is finite, say it’s of size \(d\). Then \(\varepsilon_i^d\) always is congruent to \(1\) modulo \(\frak P^\kappa\), i.e. \(\varepsilon_i=1+x\) for \(x\) of valuation at least \(\kappa\).

Moreover, we can replace the set of \(\gamma_i\) by products of \(\gamma_i\) and suitable powers of \(\varepsilon_i\). we only need to multiply by powers between \(0\) and \(d-1\). To avoid introducing more notation, we will just assume that \(\varepsilon_i\), and hence also \(\sigma_j(\varepsilon_i)\), are of the form which allows us to speak of their exponential functions.

Analytic fluff

The exponential function on \(\frak P\)-adic numbers satisfies all the familiar properties. Thanks to this, equations \((2)\) can be rewritten as

\(\displaystyle\sum_{j=1}^nA_{ij}\exp L_j(u_1,\dots,u_r)=0\) for \(i=3,\dots,n,\qquad (3)\)

where \(A_{ij}=\sigma_j(\gamma\mu_i^*)\) and \(L_j(u_1,\dots,u_r)=\displaystyle\sum_{k=1}^ru_k\log\sigma_j(\varepsilon_k)\). Note that the involved functions are all continuous functions of \(u_k\).

Now we use the fact that \(\frak P\)-adic integers are compact (under the topology induced by the valuation). Since we assumed \((3)\) has infinitely many (\(\frak P\)-adic) integral solutions, there must be a subsequence of these solutions which converges to some tuple \((u_1^*,\dots,u_r^*)\). By continuity, this tuple constitutes another solution to \((3)\). By a change of variables \(v_i=u_i-u_i^*\), we get a system of equations

\(\displaystyle\sum_{j=1}^nA_{ij}^*\exp L_j(v_1,\dots,v_r)=0\) for \(i=3,\dots,n,\qquad (4)\)

where \(A_{ij}^*=A_{ij}\exp L_j(u_1^*,\dots,u_r^*)\), which by above has a sequence of solutions converging to the origin. We point out at this point that the equations in \((4)\) are linearly independent, i.e. the matrix \((A_{ij}^*)\) of coefficients has rank \(n-2\). This is because \(A_{ij}\) is the product of \(\exp L_j(u_1^*,\dots,u_r^*)\sigma_j(\gamma)\) and \(\sigma_j(\mu_i^*)\), and the matrix of all \(\sigma_j(\mu_i^*)\) is invertible, as square of its determinant is discriminant of linearly independent tuple, hence is nonzero.

We consider the local analytic manifold \(V\) of \((4)\), i.e. the set of solutions of this system in some small neighbourhood of the origin. By assumption on the sequence of solutions converging to the origin, this manifold consists of more than one point. Hence, by a general theorem, it must contain an analytic curve – there is a system of \(r\) (formal) power series \(\omega_1(t),\dots,\omega_r(t)\), not all identically zero and all with no constant term, which plugged in for \(v_k\) in \((4)\). Equivalently, if we put \(P_j(t)=L_j(\omega_1(t),\dots,\omega_r(t))\), we get

\(\displaystyle\sum_{j=1}^nA_{ij}^*P_j(t)=0\) for \(i=3,\dots,n. \qquad (5)\)

where \(P_j(t)\) are power series with no constant terms.

Finishing the proof

We have the system \((5)\) of equations involving (exponentials of) \(P_j(t)\). However, \(P_j(t)\) are also linear combinations of \(r\) power series. Therefore, by linear algebra, we can find a system of \(n-r\) independent linear equations

\(\displaystyle\sum_{j=1}^nP_j(t)=0\) for \(i=1,\dots,n-r\qquad (6)\)

satisfied by these power series. We will now use the assumption we haven’t used yet: that \(f(x,1)\) has a complex root. Recall this implies the field \(k\) has at least one complex embedding, i.e. \(t\geq 1\) (see statement of theorem 1). Therefore \(n-r=s+2t-s-t+1=t+1\geq 2\). Using \((5)\) and \((6)\) we can therefore use the following lemma:

Lemma 2: Suppose formal power series (over some field of characteristic zero) \(P_1(t),\dots,P_n(t)\) with no constant term satisfy a system of \(n-2\) equations of the form

\(\displaystyle\sum_{j=1}^nA_{ij}^*\exp P_j(t)=0\)

and also a system of two equations of the form


Then \(P_j(t)=P_k(t)\) for some \(j\neq k\).

Before we provide a proof of this lemma, we will show why it helps us complete the proof. Recalling the definition of \(P_j(t)\), this implies that any analytic curve contained in the manifold \(V\) is also contained in the manifold \(W\) defined by the equation

\(\displaystyle\prod_{1\leq j<k\leq n}(L_j(v_1,\dots,v_r)-L_k(v_1,\dots,v_r))\).

It follows (though not immediately) that \(V\subseteq W\). We will obtain a contradiction as soon as we deduce \(W\) contains only finitely many points \((v_1,\dots,v_r)\) corresponding to the solutions of \((3)\), since we assumed that \(V\) contains infinitely many such points. Equivalently, since product in the definition of \(W\) consists of finitely many terms, we need to show only finitely many tuples can satisfy


for \(j\neq k\).

Let \((u_1,\dots,u_r)\) be a solution of \((3)\) coming from \(\alpha=x+y\theta,x,y\in\mathbb Q\), and \(u_i=u_i^*+v_i\). We have

\(=c_j\exp L_j(v_1,\dots,v_r)\)

where \(c_j\) is a constant independent of \(\alpha\). Similarly,

\(\sigma_k(\alpha)=c_k\exp L_k(v_1,\dots,v_r)\).

Assuming \(L_j(v_1,\dots,v_r)=L_k(v_1,\dots,v_r)\), this implies


Taking \(\alpha’=x’+y’\theta\) to be a different such solution, this implies


and hence \((xy’-x’y)(\theta_j-\theta_k)=0\) and \(xy’=x’y,\frac{x}{x’}=\frac{y}{y’}\) (\(x’,y’\) can’t be both zero, so neither can be). It follows that \(\alpha’\) is a rational multiple of \(\alpha\), say \(\alpha’=d\alpha\). But recall that \(\alpha,\alpha’\) have the same norm, so \(d\) has norm \(1\), hence it is \(\pm 1\). Therefore \(\alpha,\alpha’\) are equal or opposite. Hence there are only two possible values of \(\alpha\), which is certainly a finite amount! As explained above, this gives us a contradiction. \(\square\)

Proof of lemma 2

Since \(n\) power series \(\exp P_j\) satisfy \(n-2\) independent linear equations, we can express all of them in terms of just two, say \(\exp P_{n-1}\) and \(P_n\). Put

\(\exp P_i=a_i\exp P_{n-1}+b_i\exp P_n\qquad (7)\).

Suppose \(a_i=0\). Then \(\exp P_i\) and \(b_i\exp P_n\) are equal. They have constant terms equal to, respectively, \(1,b_i\) since \(P_i\) have no constant term, so \(\exp P_i=\exp P_n\) and we can deduce from this (computing coefficients one-by-one) that \(P_i=P_n\). Hence we may assume \(a_i\neq 0\) (as otherwise we are already done). Putting \(Q_i=P_i-P_n\) we then have

\(\exp Q_i=a_i\exp Q_{n-1}+b_i\)

and we may also assume \(Q_i\) are nonzero. Differentiation gives

\(Q_i’\exp Q_i=a_iQ_{n-1}’\exp Q_{n-1}\).

Previous two equations combined give

\(\displaystyle Q_i’=Q_{n-1}’\exp Q_{n-1}\frac{1}{c_i+\exp Q_{n-1}}\qquad (8)\)

with \(c_i=\frac{b_i}{a_i}\) for \(i=1,\dots,n-2\). We now use the other pair of assumed equations. By subtracting suitable multiples of \(P_n\) from them we find

\(\displaystyle\sum_{j=1}^{n-1} B_{ij}Q_j=k_iP_n\) dla \(i=1,2\).

If either \(k_i\) is zero, this gives us a nontrivial linear relation between \(Q_j\). Otherwise, subtracting suitable multiples and using independence we again get a nontrivial linear relation. In either case, we get


for \(d_i\) not all zero. Differentiation and \((8)\) give us

\(Q_{n-1}’\exp Q_{n-1}\left(\displaystyle\sum_{i=1}^{n-2}\frac{d_i}{c_i+\exp Q_{n-1}}+\frac{d_i}{\exp Q_{n-1}}\right)=Q_{n-1}’\exp Q_{n-1}\left(\sum_{i=1}^{n-1}\frac{d_i}{c_i+\exp Q_{n-1}}\right)=0\)

(setting \(c_{n-1}=0\)). As \(Q_{n-1}’,\exp Q_{n-1}\neq 0\) we deduce

\(\displaystyle\sum_{i=1}^{n-1}\frac{d_i}{c_i+\exp Q_{n-1}}=0\).

Hence we get that the rational function


vanishes when we put \(z=\exp Q_{n-1}\). But unless this function vanishes identically, this would imply \(\exp Q_{n-1}\) is algebraic overits field of coefficients. But no nonconstant power series over a field is algebraic, so this can’t be as \(Q_{n-1}\neq 0\). Thus this rational function is identically zero. This means that some two \(c_i\) are equal (otherwise this function would have a pole as \(z\rightarrow -c_i\) for any \(c_i\) with \(d_i\neq 0\). Therefore \(c_j=c_k\) for some \(j\neq k\).

Since \(\frac{b_j}{a_j}=c_j=c_k=\frac{b_k}{a_k}\), \((7)\) gives us

\(\frac{1}{a_k}\exp P_j=\frac{1}{a_k}\exp P_k\).

Comparing constant coefficients and then other coefficients, we get \(P_j=P_k\) with \(j\neq k\). \(\square\)

Ultrabrief summary

The proof goes roughly as follows:

  1. Suppose otherwise.
  2. Using (a variation of) Dirichlet’s unit theorem and general results on modules, reduce the problem to showing finiteness of certain exponential equation in many variables.
  3. Generalize the context of the question to \(\frak P\)-adic-analytic setting so that we can speak of exponentials of (some) non-rational-integers.
  4. Using some difficult words like “local analytic manifold” reduce (a big part of) the problem to (essentially) showing it cannot contain an analytic curve.
  5. Use a fancy lemma to deduce the manifold is too algebraically constrained to contain infinitely many integral points.
  6. Write an ultrabrief summary.

Clearly two of these steps are (arguably) the most ingenious and crucial ones: passing from a number field to its completion and then reducing the problem to analoguous problem in functional setting (i.e. there is no formal power series blah blah). Both the complete fields (called more precisely local fields) and functional questions have many times in mathematics proven themselves to be much easier to work with than in number fields. The former’s advantage is mainly ability for us to use analytic tools (and difficult words), while in functional setting we have an incredibely useful tool – differentiation.

You can see simplicity of working in functional setting e.g. in the proof of Riemann hypothesis. In the future I will probably make more posts showcasing the local methods like this one, possibly less difficult ones (or perhaps more).


Abstract divisors


Finiteness of subsets of arbitrary finite sets


  1. Nathan Ho

    any plans for new posts?

Leave a Reply

Powered by WordPress & Theme by Anders Norén