CurtTilmes/raku-primesieve

Math::Primesieve::iterator dumps core

mscha opened this issue · 11 comments

mscha commented

Using the following script:

#!/usr/bin/env perl6

use v6.c;

use Math::Primesieve;

my $iterator = Math::Primesieve::iterator.new;
my @primes = { $iterator.next } ... *;

say @primes[^25];
for 2980 .. 3100 -> $n {
    say "$n: @primes[$n]";
}

I get the following output:

(2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97)
2980: 27239
2981: 27241
2982: 27253
2983: 27259
2984: 27271
2985: 27277
2986: 27281
2987: 27283
2988: 27299
2989: 27329
2990: 27337
2991: 27361
2992: 27367
2993: 27397
2994: 27407
2995: 27409
2996: 27427
2997: 27431
2998: 27437
2999: 27449
3000: 27457
3001: 27479
3002: 27481
3003: 27487
3004: 27509
zsh: segmentation fault (core dumped)  ./sieve

This is consistent, but when I change the starting point of the loop (e.g. to 3000) the point it crashes changes (in this case after 3011). Also adding a sleep 0.01 in the loop affects the crash point.

When I change the my @primes line to

my @primes = lazy gather loop { take $iterator.next };

the “crash point” is much later, somewhere between 11,000 and 12,000. So it's definitely very timing-dependent.

I can replicate this, but I can't figure it out. I suspect a Rakudo/MoarVM bug, but I can't prove it...
It feels like a memory/garbage collection problem, but I can't see where GC could affect this. We're holding on to a pointer to the CStruct, so it shouldn't be deallocated. I think the GC can move it, but even that would be fine, since we always pass the pointer to every function that uses it.

I printed the pointer value for every iteration, and it doesn't change from the previous working iterations until the one that fails.

Something broken between Perl <-> C <-> C++, needs someone who understands them all to debug

First several calls to generate work fine.

I changed the next method to this:

    method next()
    {
        say "---------";
        say "self pointer {+nativecast(Pointer, self)}";
        say "i = $!i, last = $!last_idx";
        if $!i++ == $!last_idx {
            say "calling generate function";
            primesieve_generate_next_primes(self);
            say "returned from generate function";
            say "primes pointer {+nativecast(Pointer, $!primes)}";
            say "i = $!i, last = $!last_idx";
            print "$_ of $!last_idx $!primes[$_] " for 0..$!last_idx;
            print "\n";
        }
        return $!primes[$!i];
    }

After several successful calls to primesieve_generate_next_primes(), I get to this one:

---------
self pointer 84332512
i = 1779, last = 1779
calling generate function
returned from generate function
primes pointer 86612240
i = 0, last = 7420

and it can access the first 1775 elements in the primes array, but seg faults when it accesses the 1775th element. The array should have 7420 primes in it Perl can access (curiously close to the previous allocation of that data structure for 1779 on the last call to generate).

The C code is stashing an opaque C++ vector in part of the C struct, which should work fine. I don't touch it. The C++ code turns that pointer back into a vector, then calls a C++ function to fill it with primes, then makes a C pointer to the array of primes which gets put back into the C struct which I can access from Perl. Somewhere in that round trip things aren't working.

I ran valgrind (memory memory error detector) on all tests programs using the primesieve C API and all test programs using primesieve::iterator and found no error so far.

and it can access the first 1775 elements in the primes array, but seg faults when it accesses the 1775th element.

Can you please give me the values of all variables of the primesieve_iterator struct just before calling primesieve_generate_next_primes() after which you get a segmentation fault later on:

 std::size_t i_;
 std::size_t last_idx_;
 uint64_t start_;
 uint64_t stop_;
 uint64_t stop_hint_;
 uint64_t tiny_cache_size_;

Using this information I should be able to write a C program that reproduces your issue (if it is a libprimesieve issue) and then I will debug it using valgrind and Clang sanitizers.

I'm not convinced it is a libprimesieve issue... but thanks for looking into it.

I made these changes to next:

    method next()
    {
        say "---------";
        say "self pointer {+nativecast(Pointer, self)}";
        say "i = $!i, last = $!last_idx";
        if $!i++ == $!last_idx {
            say "calling generate function";
            say "i_        = $!i";
            say "last_idx_ = $!last_idx";
            say "start_    = $!start";
            say "stop_     = $!stop";
            say "stop_hint_ = $!stop_hint";
            say "tiny_cache_size_ = $!tiny_cache_size";
            primesieve_generate_next_primes(self);
            say "returned from generate function";
            say "primes pointer {+nativecast(Pointer, $!primes)}";
            say "i = $!i, last = $!last_idx";
            print "$_ of $!last_idx $!primes[$_] " for 0..$!last_idx;
            print "\n";
        }
        return $!primes[$!i];
    }

And the last bit before the crash looks like this:

---------
self pointer 44922768
i = 1779, last = 1779
calling generate function
i_        = 1780
last_idx_ = 1779
start_    = 3210
stop_     = 19745
stop_hint_ = -1
tiny_cache_size_ = 65536
returned from generate function
primes pointer 79983136
i = 0, last = 7420

This time it printed the first 2341 of 7420 primes, then seg-faulted.

If you could expose primesieve_next_prime() (and primesieve_prev_prime()) in the library, I could just call them and treat the whole iterator struct as opaque. With them "static inline", I have to re-write them in Perl.

I know you don't want to slow down the C version, but even if you made extra functions primesieve_next_prime_slow() or something that I could call directly, that would work.

Thanks for the variable values. I will try to reproduce your bug tomorrow.

I know you don't want to slow down the C version, but even if you made extra functions primesieve_next_prime_slow() or something that I could call directly, that would work.

That's actually a good idea as I have seen other people struggling to write correct C bindings especially for the C primesieve_iterator. But lets first try to find the reason for your bug, maybe we don't need primesieve_next_prime_slow() anymore once we fixed the bug.

I have attempted to reproduce your bug using the the following C program:

#include <primesieve.h>
#include <inttypes.h>
#include <stdio.h>

int main()
{
  primesieve_iterator it;
  primesieve_init(&it);
  uint64_t prime = 0;
  uint64_t count = 0;

  prime = primesieve_next_prime(&it);

  it.i_ = 1779;
  it.last_idx_ = 1779;
  it.start_ = 3210;
  it.stop_ = 19745;
  it.stop_hint_ = -1;
  it.tiny_cache_size_ = 65536;

  prime = primesieve_next_prime(&it);

  printf("%" PRIu64 "\n", prime);
  printf("i = %" PRIu64 "\n", it.i_);
  printf("last_idx = %" PRIu64 "\n", it.last_idx_);
  printf("is_error = %d\n", it.is_error_);
  count = it.last_idx_ + 100;

  for (uint64_t i = 0; i < count; i++)
  {
    prime = primesieve_next_prime(&it);
    printf("%" PRIu64 ": %" PRIu64 "\n", i, prime);
  }

  primesieve_free_iterator(&it);

  return 0;
}

On my system this program runs correctly without segmentation fault. I have also tested the program using valgrind and found no issues (using both GCC and Clang compiler). Hence my current thinking is that your bug is not caused by a bug in libprimesieve.

I will try to read your perl source code and see if I can spot any obvious issue.

I have found an issue which is unrelated to your segmentation fault:

my $p = primesieve_generate_primes($start, $stop, $size, UINT64_PRIMES);

die X::Math::Primsieve.new if $p == PRIMESIEVE_ERROR;

The primesieve_generate_primes() and primesieve_generate_n_primes() functions return NULL and set errno = EDOM if any error occurs. Only primesieve function with ùint64_t return type return PRIMESIEVE_ERROR if an error occurs.

Below is how we do error checking for primesieve_generate_primes() in primesieve-python:

cpdef np.ndarray[np.int64_t, ndim=1] primes(int64_t a, int64_t b = 0):
    """Generate a numpy primes array"""

    # Rest errno
    global errno
    errno = 0

    cdef size_t size = 0
    cdef void* c_primes = cpp_numpy.primesieve_generate_primes(a, b, &size, INT64_PRIMES)

    if errno != 0:
        raise RuntimeError("Failed generating primes, most likely due to insufficient memory.")

    primes = c_to_numpy_array(c_primes, size, np.NPY_INT64)
    return primes

Your bindings look correct to me. The only code which I am uncertain about is:

has CArray[uint64] $.primes;
has CArray[uint64] $.primes_pimpl;

If CArray is a Perl class what happens if this array gets modified in libprimesieve?! For example if the Perl CArray class has an associated size member variable this variable will not be updated when libprimesieve updates its *primes pointer (new size = last_idx_ + 1).

I could imagine that you need to update your primes CArray after each primesieve_generate_next_primes() and primesieve_generate_prev_primes() call:

method next()
{
    primesieve_generate_next_primes(self) if $!i++ == $!last_idx;
    # update primes CArray if primesieve_generate_next_primes(self) has been called
    $!primes[$!i]
}

method prev()
{
    primesieve_generate_prev_primes(self) if $!i-- == 0;
    # update primes CArray if primesieve_generate_prev_primes(self) has been called
    $!primes[$!i]
}

This is just an educated guess however as I don't know Perl.

Another possible explanation that comes to my mind:

Is CArray garbage collected?

If so then when libprimesieve updates the *primes pointer (of primesieve_iterator) Perl notices that its own primes CArray has been updated and after some time the garbage collector frees the old primes CArray. This may cause an error as libprimesieve also frees its *primes pointer, so *primes would be freed twice!?

Isn't there a pointer type in Perl which could be used instead of CArray to prevent this issue (if the issue is related to the garbage collector)?

I fixed the error check.

CArray is not garbage collected, but the GC can move the whole structure around. I think that was the problem. I changed the memory allocation for the iterator struct, and dereference it on every call to next().

That fixed the seg fault.

Thanks for your help Kim!