Performance bug? Collecting results after intersect takes 2.5 as long as intersecting

Question

Performance bug? Collecting results after intersect takes 2.5 as long as intersecting

endrebak opened this issue 5 years ago · 4 comments

This is the output from the code below. Example data used follows at the end.

timing query_from_array: 6.431662082672119
timing fetch results: 16.93972611427307
timing ailist destroy: 0.7660536766052246

As you can see, fetching the results takes much longer than finding overlaps. And this is not due to Python interop! See the annotated code here:

And using one np.array with entries per row is actually faster than using two separate ones (which was the first thing I tried.) And it is not that fetching one entry (e.g. value) is much slower than fetching the other (e.g. index).

@cython.boundscheck(False)
@cython.wraparound(False)
@cython.initializedcheck(False)
cdef long[:,::1] _intersect_from_array(AIList self, const long[::1] starts, const long[::1] ends, const long[::1] indices):
    cdef int length = len(starts)
    start = time()
    cdef ailist_t *total_overlaps = ailist_query_from_array(self.interval_list, &starts[0], &ends[0], &indices[0], length)
    end = time()
    print("timing query_from_array:", end - start)

    cdef long[:,::1] results = np.zeros((2,total_overlaps.nr), dtype=np.long)

    start = time()
    cdef int i
    for i in range(total_overlaps.nr):
        results[0, i] = <int>total_overlaps.interval_list[i].value
        results[1, i] = total_overlaps.interval_list[i].index
    end = time()

    print("timing fetch results:", end - start)

    start = time()
    ailist_destroy(total_overlaps)
    end = time()

    print("timing ailist destroy:", end - start)

    #return np.asarray(results)
    return results

Example used for timing:

import mkl
mkl.set_num_threads(1)

import pyranges as pr

size = int(1e6)
length = 1

import numpy as np

np.random.seed(0)

l1 = np.random.randint(int(1e5))
l2 = np.random.randint(int(1e5))


c = pr.data.chromsizes()
# c1 = c[c.Chromosome == "chr21"]
c1 = c.head(1)

a = pr.random(size, length=length, strand=False, chromsizes=c1, int64=True)
a2 = pr.random(size, length=length, strand=False, chromsizes=c1, int64=True)

a.End += l1
a2.End += l2

print(a)
print(a2)

a = a.df.drop("Chromosome", 1).to_numpy()
a2 = a2.df.drop("Chromosome", 1).to_numpy()

values = np.ones(size)
ix = np.arange(size, dtype=np.int64)

from ailist import AIList
i = AIList()

print("time to build ailist:")
i.from_array(a[:, 0], a[:, 1], ix, values)
i.construct()

print("time to query ailist:")
ai_res = i.intersect_from_array(a2[:, 0], a2[:, 1], ix)

Answer 1 · 2019-11-09T15:42:17.000Z

Actually, I take this back. When I just find the overlaps with the NCLS without actually fetching them, the NCLS uses 2.5 seconds, less than 50% of the AIList. So for both methods, it seems like actually getting the intervals after the intersecting is the slow part.

Answer 2 · 2019-11-09T15:57:06.000Z

I have no ego connected to the NCLS; I did not write it. So if you are able to get the AIList to be faster than the NCLS I would love to switch :)

Answer 3 · 2019-11-09T20:18:47.000Z

I have some ideas for how to fix this. Give me a few days (without any guarantees) :)

Answer 4 · 2019-11-09T23:49:15.000Z

My idea was actually slower. Who knows why. Here is my attempt: https://gist.github.com/endrebak/d11b65079ac5aba8e6ca44b8f3613ca2

------------------------------
Subject file: chainOrnAna1.bed (1.5), Query file: chainRn4.bed (1.5)
Building NCLS took 0.37735605239868164
Querying NCLS took 92.85288500785828
Building ai_endre took 0.5927338600158691
Querying ai_endre took 78.94602131843567
Building ai took 0.3491032123565674
Querying ai took 75.86743712425232
------------------------------
Subject file: chainVicPac2.bed (1.5), Query file: chainRn4.bed (1.5)
Building NCLS took 0.39214491844177246
Querying NCLS took 92.57744908332825
Building ai_endre took 0.3825538158416748
Querying ai_endre took 80.12227296829224
Building ai took 0.3954188823699951
Querying ai took 73.98339509963989
------------------------------
Subject file: chainXenTro3Link.bed (1.5), Query file: chainRn4.bed (1.5)
Building NCLS took 0.3832888603210449
Querying NCLS took 33.25918006896973
Building ai_endre took 0.2506730556488037
Querying ai_endre took 29.237913131713867
Building ai took 0.25615978240966797
Querying ai took 27.772515296936035
------------------------------
Subject file: chainMonDom5Link.bed (1.5), Query file: chainRn4.bed (1.5)
Building NCLS took 0.26062488555908203
Querying NCLS took 10.006813049316406
Building ai_endre took 0.19189214706420898
Querying ai_endre took 7.796525001525879
Building ai took 0.20964312553405762
Querying ai took 4.56371283531189

I really am stumped as to why since it only does one pass through the data instead of one for finding the intervals and one for collecting them into an array afterwards 🤔