Bug (I think) in counts_in_cylinders when using multiple threads

When running counts_in_cylinders with more than one thread, the returned array is of size num_threads * len(sample1) rather than just len(sample1).

I think the issue is in this line:

halotools/halotools/mock_observables/counts_in_cells/counts_in_cylinders.py

Line 171 in 926621b

counts = np.vstack(result)

counts = np.sum(result, axis=0)     # What I think it should be
counts = np.vstack(result)              # Current

I'm happy to put in a fix for this (and perhaps change one of the tests to use multiple threads - though I'm not sure whether the CI this runs on has multiple threads?). But thought I would open an issue to check that I hadn't missed something before opening a PR this time :)

The docs for this function also mention rbins in the shape of the returned counts? I'm guessing at some point in the past there was an option to bin this.

Some code to repro:

import numpy as np
import halotools.mock_observables as mo


def generate_locus_of_3d_points(npts, xc=0.1, yc=0.1, zc=0.1, epsilon=0.001):
    x = np.random.uniform(xc - epsilon/2., xc + epsilon/2., npts)
    y = np.random.uniform(yc - epsilon/2., yc + epsilon/2., npts)
    z = np.random.uniform(zc - epsilon/2., zc + epsilon/2., npts)

    return np.vstack([x, y, z]).T

def generate_3d_regular_mesh(npts, dmin=0, dmax=1):
    x = np.linspace(0, 1, npts+1)
    y = np.linspace(0, 1, npts+1)
    z = np.linspace(0, 1, npts+1)
    delta = np.diff(x)[0]/2.
    x, y, z = np.array(np.meshgrid(x[:-1], y[:-1], z[:-1]))
    return np.vstack([x.flatten()+delta, y.flatten()+delta, z.flatten()+delta]).T

# This code is just the test_counts_in_cylinders0
period = 1
sample1 = generate_3d_regular_mesh(5)

npts2 = 100
sample2 = generate_locus_of_3d_points(npts2, xc=0.101, yc=0.101, zc=0.101)

proj_search_radius, cylinder_half_length = 0.02, 0.02

result1 = mo.counts_in_cylinders(sample1, sample2, proj_search_radius, cylinder_half_length, period=period)
result2 = mo.counts_in_cylinders(sample1, sample2, proj_search_radius, cylinder_half_length, period=period, num_threads=4)

assert result1.shape == result2.shape, "Shapes are actually {} and {}".format(result1.shape, result2.shape)

Good catch @Christopher-Bradshaw - and your proposed fix for the bug looks correct to me. The code snippet you posted is perfect for a regression test as well (yes, all the CI environments we use come with multiple cores.) As part of the testing suite for mock_observables, @duncandc and I try to always enforce exact agreement between functions called in serial and parallel, but sometimes it slips through the cracks. If you'd like to have the fix in master right away, then do feel free to submit a PR, which is always appreciated. But otherwise, I'm tagging this issue as a bug and so a fix will make it into the v0.7 release this summer. Either way, thanks for catching the bug and helping out.