jwood000/RcppAlgos

Jump in prime counting

srdeaton opened this issue · 7 comments

Hi,
Was working with your implementation, doing some detailed calculations. Thought had bug in my code, but tracked it to the prime counting function; specifically to region [110^9,1.110^9]. The start is hard to find, has somewhat gradual increase, but the end is not, a sharper fall. The end jump is in [1.07210^9,1.07410^9] with an error of about 12,400.
Attached some code showing this, and a plot of my function's output which alerted me. For the paper I'm working on, plan to show output for entire computable range; I can skip over this region, but would look strange in plots w/o explanation. Have not decide how to proceed; skip & explain or show & explain. Either way, it's still a good implementation. Don't think there a too many such regions, < 10^16, especially that far off. At least from what I could tell so far.
If you can fix it within the next month, at most, then I can redo calculations in my paper before submission.
Thanks,
Shaun
PrimeTruncation_Error
PrimeCounting_ErrorDeviations.txt

Note: In the plot above, it not the prime count function, but another I was working on that altered me to the issue. The following is a plot of the prime counting function itself, for the end of the region shown above.
PrimeCount_Error_EndOfRegion

Hi @srdeaton,

Thanks for reporting this. Could you provide your sessionInfo() (Just type sessionInfo() in the console)?

Thanks again,
Joseph

@srdeaton,

TL;DR

This was fixed in version 2.3.0. N.B. the current version is 2.4.1. You might want to update R as well. The current version is 4.0.0 (Arbor Day).

The Nitty Gritty

Your problem is that you are using a very old version (almost 2 years old). I recall running into the very same problem while I was doing a major overhaul to primeCount and primeSieve. The code change was pretty drastic from version 2.2.0 to 2.3.0. I scoured the commits to see if there were any notes like "Fixed prime count issue around a billion", but was unsuccessful. I should have opened up an issue when I found it (that's on me). Since then, I've tried to be better about documenting bugs fixes and documentation in general.

I will update the NEWS to reflect this correction as of version 2.3.0.

I can confirm your results. Below is an example using version 2.2.0:

packageurl <- "http://cran.r-project.org/src/contrib/Archive/RcppAlgos/RcppAlgos_2.2.0.tar.gz"
install.packages(packageurl, repos=NULL, type="source")

library(RcppAlgos)
M1 <- 0.9 * 10^9
M2 <- 1.1 * 10^9
n <- round(seq(from=M1,to=M2,by=100000))
length(n)
# [1] 2001
plot(n,primeCount_Eval(n),type='l')

algos2 2 0

Here is an example using version 2.3.0 (I first restarted my R session):

packageurl <- "http://cran.r-project.org/src/contrib/Archive/RcppAlgos/RcppAlgos_2.3.0.tar.gz"
install.packages(packageurl, repos=NULL, type="source")

library(RcppAlgos)
M1 <- 1.073*10^9
M2 <- 1.0735*10^9
n <- round(seq(from=M1,to=M2,by=200))
length(n)
# [1] 2500
plot(n,primeCount_Eval(n),type='l')

algos2 3 0

Validate Using Kim Walisch's primecount library

First, we generate some random data:

set.seed(123)
a <- sample(M1:M2, 10000)
write.csv(sort(a), file = "testWithWalisch.csv", row.names = F)

Then, on the command line, we run the Kim Walisch primecount on all of the values generated above:

touch primecount_walisch_output.txt
while IFS=, read -r col1; do primecount-4/primecount "$col1" >> primecount_walisch_output.txt; done < RcppAlgos/testWithWalisch.csv

Finally, we read the results into R and validate RcppAlgos::primeCount on the same input:

fromWalisch <- read.table("../primecount_walisch_output.txt")
dim(fromWalisch)
# [1] 10000     1
head(fromWalisch)
#         V1
# 1 54364435
# 2 54364436
# 3 54364441
# 4 54364444
# 5 54364445
# 6 54364447
canon <- fromWalisch$V1
head(canon)
# [1] 54364435 54364436 54364441 54364444 54364445 54364447
algosImpl <- sapply(sort(a), primeCount)
head(algosImpl)
# [1] 54364435 54364436 54364441 54364444 54364445 54364447

identical(canon, algosImpl)
# [1] TRUE

@srdeaton,

The shout-out is much appreciated, however I would be remiss if I did not mention @kimwalisch. Quite simply, the prime number capabilities in RcppAlgos would not be possible without his excellent work.

Also, to address your question: "Is there a version of the prime count function for
vector inputs, containing increasing integers, such that the function will
not restart its count for each integer?"

First off, that is a great question. I wondered that myself, and sadly to my knowledge, I don't think it is possible to use Legendre's method (the method used in RcppAlgos::primeCount) to count primes between two bounds.

I think your best bet is to call primeCount first then use primeSieve in chunks. That sounds like what you are already doing.

Anywho, I hope that helps.

Best of luck,
Joseph