wrong sample size in SelectMethod.sys
Opened this issue · 4 comments
Hello!
I'm trying to select second stage sample with stage2_design2 = SampleSelection(method = SelectMethod.sys, strat = True, wr = False)
and ssu_sample_r, ssu_hits, _ = stage2_design2.select(samp_unit = ssu_frame['household
], samp_size = 10, stratum = ssu_frame["cluster"])`. I have 5.000 stratas (clusters), then the sample should have n = 50.000, but there are only 49.548 ssu units. When i checked the result, I detect that in the strata with 77 and 154 (77x2) ssu units, 9 have been selected and not 10 ssu units. Any idea why? Thank you in advance.
I'm using python 3.9.13 and samplics 0.4.11
Hi @JuanVeraF
Always difficult to debug without data. Is it possible that some of your second-stage "strata" do not have 10 units?
Hello, thanks for answering. Here is an example:
import pandas as pd
from samplics import SelectMethod
from samplics.sampling import SampleSelection
import numpy as np
np.random.seed(33)
df = {
"id_strata" : list(range(10)),
"Ssu" : np.random.randint(70, 220, 10)
}
df = pd.DataFrame.from_dict(df)
df.loc[df.id_strata == 1, 'Ssu'] = 77
df.loc[df.id_strata == 3, 'Ssu'] = 154
df = {
"id_strata" : np.repeat(df.id_strata.values, df.Ssu.values, axis = 0)
}
df = pd.DataFrame.from_dict(df)
df['id_ssu'] = list(range(df.shape[0]))
d = SampleSelection(
method = SelectMethod.sys,
strat = True,
wr = False
)
np.random.seed(22)
sel, hits, _ = d.select(
samp_unit = df["id_ssu"],
samp_size = 10,
stratum = df["id_strata"]
)
df["ssu_sel"] = sel
df["ssu_hits"] = hits
df.ssu_sel.sum() # 98
df.groupby('id_strata').sum('ssu_sel')
Hi @JuanVeraF
Please retry with the latest version of samplics.
Let me know if you are still having issues.
Hi @MamadouSDiallo
I tried again with the lastest version, the result is now correct.