samplics-org/samplics

wrong sample size in SelectMethod.sys

Opened this issue · 4 comments

Hello!

I'm trying to select second stage sample with stage2_design2 = SampleSelection(method = SelectMethod.sys, strat = True, wr = False) and ssu_sample_r, ssu_hits, _ = stage2_design2.select(samp_unit = ssu_frame['household], samp_size = 10, stratum = ssu_frame["cluster"])`. I have 5.000 stratas (clusters), then the sample should have n = 50.000, but there are only 49.548 ssu units. When i checked the result, I detect that in the strata with 77 and 154 (77x2) ssu units, 9 have been selected and not 10 ssu units. Any idea why? Thank you in advance.

I'm using python 3.9.13 and samplics 0.4.11

Hi @JuanVeraF
Always difficult to debug without data. Is it possible that some of your second-stage "strata" do not have 10 units?

Hello, thanks for answering. Here is an example:

import pandas as pd
from samplics import SelectMethod
from samplics.sampling import SampleSelection
import numpy as np

np.random.seed(33)

df = {
    "id_strata" : list(range(10)),
    "Ssu" : np.random.randint(70, 220, 10)
}

df = pd.DataFrame.from_dict(df)
df.loc[df.id_strata == 1, 'Ssu'] = 77
df.loc[df.id_strata == 3, 'Ssu'] = 154

df = {
    "id_strata" : np.repeat(df.id_strata.values, df.Ssu.values, axis = 0)
}

df = pd.DataFrame.from_dict(df)
df['id_ssu'] = list(range(df.shape[0]))

d = SampleSelection(
    method = SelectMethod.sys,
    strat = True, 
    wr = False
)

np.random.seed(22)

sel, hits, _ = d.select(
    samp_unit = df["id_ssu"],
    samp_size = 10,
    stratum = df["id_strata"]
)

df["ssu_sel"] = sel
df["ssu_hits"] = hits

df.ssu_sel.sum() # 98
df.groupby('id_strata').sum('ssu_sel')   

Hi @JuanVeraF
Please retry with the latest version of samplics.
Let me know if you are still having issues.

Hi @MamadouSDiallo
I tried again with the lastest version, the result is now correct.