In response to a question on Quora about the nonlinear nature of gravity, someone noted that the famous perihelion advance of Mercury, one of Einstein's classical tests of gravity, is an example demonstrating that gravity is nonlinear.

I was incredulous. I was ready to respond, no, that's stupid! Everybody knows that you can compute the perihelion advance from the linearized approximation. After all, Einstein already figured out the correct value long before he had the final version of general relativity.

Thankfully, I restrained myself. Instead of posting, I went searching, and soon I realized that my "received wisdom" was wrong: that in fact, the nonlinear nature of gravity does play a role in Mercury's perihelion anomaly.

But just as I was ready to forget this question, I came across a paper by a K. Kraus from 1968*, one that earned "honorable mention" in the Gravity Research Foundation's annual essay contest that year, which demonstrated that Mercury's perihelion advance, in fact, can be derived easily using the linearized approximation. Ouch! So who is right, then?

The conventional wisdom, explored in depth in books like Weinberg's Gravitation and Cosmology or Will's Theory and Experiment in Gravitational Physics, is that the nonlinear term does play a role. Indeed, in his delightful Lectures on Gravitation, Feynman first derives the wrong result (overestimating the perihelion shift by a factor of 4/3) by assuming that gravity is a linear spin-2 field theory, demonstrating how such a theory is inadequate when confronted with observational data. So I had to assume that Kraus was wrong, but why?

I actually wrote a computer algebra script for Maxima for this to verify Kraus's derivation and also to rederive the conventional result. The funny thing is, Kraus's calculation is not wrong per se; what is wrong is that near the very end, he uses the wrong approximation scheme, and misinterprets the resulting equation. But let's do it one step at a time, starting with the spherically symmetric, static metric in the following very generic form:

\begin{align*}
ds^2=(1+u)dr^2+(1+u)r^2d\theta^2+(1+u)r^2\sin^2\theta d\phi^2-(1-v)dt^2,
\end{align*}

where $u$ and $v$ are functions of $r$ alone and are assumed to be much less than 1. (We work in units such that $G=c=1$). Consequently, we can write the inverse metric approximately as $g^{\mu\nu}\simeq{\rm diag}(1-u,(1-u)/r^2,(1-u)/r^2\sin^2\theta,-1-v)$.

To the same level of approximation, the nonzero Christoffel-symbols that correspond to this metric are given by:

\begin{align*}
\begin{matrix}
\Gamma_{rr}^r=\frac{1}{2}(1-u)\partial_ru,&\Gamma_{r\theta}^\theta=\Gamma_{r\phi}^\phi=\frac{1}{2}(1-u)\partial_ru+\dfrac{1}{r},&\Gamma_{rt}^t=-\frac{1}{2}(1+v)\partial_rv,\\
\Gamma_{\theta\theta}^r=-r^2\Gamma_{r\theta}^\theta,&\Gamma_{\theta\phi}^\phi=\dfrac{1}{\tan\theta},&\Gamma_{\phi\phi}^r=-r^2\sin^2\theta\Gamma_{r\phi}^\phi,\\
\Gamma_{\phi\phi}^\theta=-\sin\theta\cos\theta,&\Gamma_{tt}^r=\frac{1}{2}(u-1)\partial_rv.
\end{matrix}
\end{align*}

The standard derivation proceeds with the geodesic equation

\begin{align*}
\ddot{x}^\mu+\Gamma_{\alpha\beta}^\mu\dot{x}^\alpha\dot{x}^\beta=0,
\end{align*}

where the overdot represents differentiation with respect to $s$. For $\theta$, after some trivial algebra, this yields the equation

\begin{align*}
\ddot{\theta}+\left(\frac{\partial_ru}{1+u}+\frac{2}{r}\right)\dot{r}\dot\theta-\dot\phi^2\sin\theta\cos\theta=0.
\end{align*}

This equation is trivially solved by $\theta=\pi/2$. The solution amounts to selecting coordinates such that the orbital plane coincides with this angle. This greatly simplifies the geodesic equations for $t$ and $\phi$:

\begin{align*}
(1-u^2)\ddot\phi+(1-u)\left(\partial_ru+2\frac{u+1}{r}\right)\dot r\dot\phi&=0,\\
(1-v^2)\ddot t-(1+v)\partial_rv\dot{r}\dot{t}&=0.
\end{align*}

We note that both equations can be brought to readily integrable forms. The equation for $t$ can be written as

\begin{align*}
(1+v)\frac{d}{ds}\left[(1-v)\dot{t}\right]=0,
\end{align*}

or, after integrating the part within square brackets,

\begin{align*}
\dot{t}=\frac{C}{1-v}.
\end{align*}

If we require that at infinity, we have Minkowski space with $(dt/ds)^2=-1$, the integration constant becomes fixed at $C^2=-1$.

The equation for $\phi$ can be written as

\begin{align*}
\frac{1-u}{r^2}\frac{d}{ds}\left[(1+u)r^2\dot{\phi}\right]=0,
\end{align*}

or, after integration,

\begin{align*}
\dot{\phi}=\frac{iJ}{(1+u)r^2},
\end{align*}

where we recognize the integration constant $J$ as the angular momentum per unit mass; the extra factor of $i=\sqrt{-1}$ arises because of the metric signature.

Instead of solving the last remaining geodesic equation directly, we use instead the identity $g_{\mu\nu}\dot{x}^\mu\dot{x}^\nu=1$. Spelled out (again setting $\theta=\pi/2$), it yields the equation

\begin{align*}
(1+u)\dot{\phi}^2\left(\frac{dr}{d\phi}\right)^2+(1+u)\dot{\phi}^2r^2-(1-v)\dot{t}^2-1=0.
\end{align*}

After some rearrangement, and using the results for $\phi$ and $t$, we get:

\begin{align*}
\left(\frac{dr}{d\phi}\right)^2+r^2-\frac{(1+u)v}{(1-v)J^2}r^4=0.
\end{align*}

Substituting $r=1/z$ and multiplying by $z^4$ yields

\begin{align*}
\left(\frac{dz}{d\phi}\right)^2+z^2-\frac{(1+u)v}{(1-v)J^2}=0.
\end{align*}

Differentiating with respect to $\phi$ and dividing by $2dz/d\phi$, we obtain

\begin{align*}
\frac{d^2z}{d\phi^2}+z-\frac{(1+u)\partial_zv+(1-v)v\partial_zu}{2(1-v)^2J^2}=0.
\end{align*}

Now we can assign values to $u$ and $v$. In the standard PPN metric, we have $u=2\gamma mz$ and $v=2mz-2\beta m^2z^2$, where $\beta$ and $\gamma$ are two of the PPN parameters; $\gamma$ represents the amount of spatial curvature due to gravity, whereas $\beta$ represents the nonlinearity of superposition (for general relativity, $\beta=\gamma=1$). After rearranging and dropping terms quadratic or higher in $z$ (that is, we are assuming a weak gravitational field), we get

\begin{align*}
\frac{d^2z}{d\phi^2}+z-2(2+2\gamma-\beta)\frac{m^2}{J^2}z=\frac{m}{J^2}.
\end{align*}

This equation is readily solvable, with the solution given by

\begin{align*}
z=z_0\sin\left(\phi\sqrt{1-2(2+2\gamma-\beta)(m/J)^2}-\phi_0\right)+\frac{m}{J^2-2(2+2\gamma-\beta)m^2},
\end{align*}

with $z_0$ and $\phi_0$ as constants of integration. This solution is periodic; $z$ will return to its original value every time $\phi$ advances by $2\pi/\sqrt{1-2(2+2\gamma-\beta)(m/J)^2}$${}\simeq 2\pi+2\pi(2+2\gamma-\beta)(m/J)^2$. After setting $\beta=\gamma=1$ and restoring units, we find that the perihelion advance for Mercury is given by $6\pi(GM_\odot/crv)^2$, which, given $r\sim 58\times 10^6~{\rm km}$ and $v\sim 47~{\rm km}/{\rm s}$ yields about $5\times 10^{-7}$ radians per revolution (about $0.241$ years), or $43"$ per century.

This is the correct result, which appears in the literature. But then, where did Kraus go so badly wrong?

His derivation, as it turns out, is identical to the above up to the point of substituting $z$ in place of $1/r$. Instead, Kraus chooses the substitution $1/z=r+m$ or $r=1/z-m$, which yields the equation

\begin{align*}
\left(\frac{dz}{d\phi}\right)^2+(1-mz)^2z^2-\frac{(1+u)v(1-mz)^4}{(1-v)J^2}=0.
\end{align*}

Substituting $u=2\gamma m/(1/z-m)$, $v=2m/(1/z-m)-2\beta m^2/(1/z-m)^2$, differentiating with respect to $\phi$, dividing through by $2dz/d\phi$, and dropping terms higher order in m yields the equation

\begin{align*}
\frac{d^2z}{d\phi^2}+z-\frac{m}{J^2}=3mz^2,
\end{align*}

which, Kraus claims, is the standard form of the perihelion advance that appears in the literature.

But it is not. Nor is it legitimate to expand in terms of $m$, which is just a constant. There also appears to be no dependence on the spatial curvature, represented by $\gamma$, despite Kraus's attempt to show such dependence in Section 3 of his paper. Therefore, we must conclude that Kraus's claim that the perihelion shift is purely an effect of linearized gravity is manifestly false.

Here is the Maxima computer algebra script used to reproduce and prove the results above:

load(ctensor);
derivabbrev:true;
ct_coords:[r,theta,phi,t];
lg:ident(4);
depends([u,v],r);
lg[1,1]:1+u;
lg[2,2]:(1+u)*r^2;
lg[3,3]:(1+u)*r^2*sin(theta)^2;
lg[4,4]:-(1-v);
cmetric();
ug[1,1]:1-u;
ug[2,2]:(1-u)/r^2;
ug[3,3]:(1-u)/r^2/sin(theta)^2;
ug[4,4]:-(1+v);
christof(mcs);
for i thru dim do for j thru dim do for k thru dim do
for i thru dim do for j thru dim do for k thru dim do
cgeodesic(false);
geod:geod,theta=%pi/2,diff;
geod:geod,theta=%pi/2,diff;
geod:geod,theta=%pi/2,diff;
geod:geod,theta=%pi/2,diff;
g4:'diff((v-1)*diff(t,s),s);
geod+(v+1)*g4,diff,expand;
g4:integrate(g4=0,s)/(v-1);
g3:'diff(diff(phi,s)*r^2*(u+1),s);
geod+g3*(u-1)/r^2,diff,factor;
g3:integrate(g3=0,s)/(u+1)/r^2\$g3:g3,%c2=%i*J;
depends([r,z],phi);
sum(ev(lg[i,i]*diff(ct_coords[i],s)^2,theta=%pi/2),i,1,4)-1;
%,g3,g4;
%-ev(%,J=0)+factor(ev(%,J=0));
X:sum(part(%,i)/((%-ev(%,diff(r,phi)=0))/diff(r,phi)^2),i,1,length(%));
X:X,%c1^2=-1;

depends([u,v],z);
sum(factor(ev(part(X,i),r=1/z,diff)*z^4),i,1,length(X));
sum(factor(diff(part(%,i),phi)/diff(z,phi)/2),i,1,length(%));
sum(factor(ev(part(%,i),u=2*gamma*m*z,v=2*m*z-2*beta*m^2*z^2,diff)),i,1,length(%));
taylor(part(%,1),z,0,1);
sum(part(%th(2),i),i,2,length(%th(2)))+factor(ev(%,z=0))+factor(%-ev(%,z=0));

assume(J^2-expand(2*m^2*(2*gamma-beta+2))<0);
ode2(%th(2),z,phi);
%,sqrt(4*m^2*gamma-2*beta*m^2+4*m^2-J^2)=%i*k;
demoivre(%);
%,%k1=a-b*%i,%k2=a+b*%i;
lhs(%)=part(rhs(%),1)+factor(sum(part(rhs(%),i),i,2,length(rhs(%))));
%,k=sqrt(1-2/J^2*m^2*(2*gamma+2-beta))*J;

X,r=1/z-m,diff;
sum(factor(part(%,i)*z^4),i,1,length(%));
%,u=2*gamma*m/(1/z-m),v=2*m/(1/z-m)-2*beta*m^2/(1/z-m)^2;
%-ev(%,diff(z,phi)=0)+taylor(ev(%,diff(z,phi)=0),m,0,1);
expand(%);
sum(factor(diff(part(%,i),phi)/2/diff(z,phi)),i,1,length(%));
(%=0)+3*m*z^2;

6*%pi*(G*M/c/r/v)^2/T,G=6.674e-11*m^3/kg/s^2,M=2e30*kg,r=58e9*m,v=47e3*m/s,c=3e8*m/s,T=0.241*yr;
%/%pi*180*deg;
%,deg=3600*sec;
float(%),yr=0.01*cy;

*Kraus's paper is missing a page on the GRF Web site. The GRF was kind enough to send me the missing page, and they promised that the online version will be fixed soon. Until then, here is a PDF with the missing page included.