Computing tables of R(n) and p(n) by lists-of-primes methods

Recall that R(n) is the number of ways of writing n as a sum of odd primes, and p(n) is the smallest odd prime with n-p(n) also prime.

Finding R(n) for the first few thousand n is trivial: produce an array whose entries are one for primes and zero otherwise, then run i through the odd numbers between 1 and n, counting how often P[i] and P[n-i] are simultaneously one. This clearly takes time O(n), even if you store the array efficiently (one bit per odd number) and take advantage of symmetry by something like

if ((i%4 == 2) && !get_bit(prime_array, (i-2)/4)) r=1; // 2p 
bool ok=false;
for (unsigned int j=1;j<i/2;j+=2)
if (!get_bit(prime_array,(j-1)/2) && !get_bit(prime_array, (i-j-1)/2))
{
if(!ok) {ok=true;first=j;}
r+=2;
}

and so is infeasible for large N; it takes almost exactly 45 seconds for N=3x105 on my newly-accelerated desktop PC, or four and a quarter hours for N=6x106. This is the function goldbach_slowest in the source code.

Getting to O(N2(log N)-1)

You'll notice that the inner loop above is iterating over primes by iterating over odd integers and checking which are prime. If instead you construct a table of (odd) primes, and replace the loop by

for (primi = prima.begin(); 2*(*primi)<i; primi++)
if (!get_bit(prime_array, (i-(*primi)-1)/2))
r+=2;

the inner loop will run π(i) times, and by the Prime Number Theorem the total run-time will be reduced by about a factor log N. On implementation, this takes nine seconds for N=3x105. This is the function goldbach_slower. But I didn't bother timing an overnight run, because ...

Getting to O(N2(log N)-2)

If we consider the improved inner loop, we are basically iterating backwards over a list of integers by checking which are prime. We could instead try to use the table of primes; the cost is O(N) memory usage (for an array of N counters, at 32 bits each, rather than N is-it-prime flags at one bit each), but generally, when N2 is a feasible amount of computation, N is a feasible amount of storage.

In pseudo-code, the loop looks like

and in C++, as seen in the function goldbach_slow, it comes out as

vector<unsigned int> R(N/2+1);
for (i=0; i<np; i++)
{
unsigned int pi = prime[i];
if (pi+pi <= N) R[pi]++;
int pj;
for (int j=0; j<i && (pj=prime[j])<=(N-pi); j++)
{
R[(pi+pj)/2]+=2;
if(first[(pi+pj)/2]==-1) first[(pi+pj)/2]=pj;
}
}

This takes 1.6 seconds for N=3x105 and just over 22 minutes for N=6x106: an overnight run can handle N=27 x 106. We have an interestingly-shaped graph if we plot i against time, with the calculation running fairly slowly in the middle and then speeding up at the end as N-pi gets small.

S-shaped curve

The overnight run mentioned above produced the quite stunningly insignificant fact that the smallest number that can be written in more than half a million ways (counting order; a quarter-million, if you don't) as a sum of two primes is 23130030.