samplics-org/samplics

Sample designing

Opened this issue · 2 comments

Hi! I came around your package and I'm not fully sure if it does what I'm looking for. If no - you can treat my post as a slight suggestion. So, I'm looking for a library to design a sample for a survey which is going to be conducted (hence no existing data yet). I have desired proportions of key variables and would love to get the full sample design - i.e. all variables combined with proportions and counts. Let's say my goal is to get n = 2000 with the stratification of 3 key variables (gender, age, education) below:

expected_coverage: dict = {
    "woman": 0.64,
    "man": 0.36,
    "18-30": 0.20,
    "31-50": 0.50,
    "50+": 0.30,
    "lower": 0.30,
    "mid": 0.50,
    "higher": 0.20
}

As you can see I treat those variables separately. For example, I need to check what would be the proportion and counts of men, aged 18-30 with higher education (obviously each variable values should not be combined with themselves, within a group). Is there any way to get a sample design with samplics? You can also check my Stack Overflow post to read about the whole issue: SO post.

As a side note: I'm fully aware that this problem could be quite easily solved using random/numpy/pandas libraries but anyways I'm curious if samplics could offer more convenient solution. Additionally, a comprehensive and reliable tool to complex sample designs would be priceless.

Hi @meh-wzdech
I do not fully understand your question. Let me know if this is what you want to achieve. You have a total sample size of 2,000 and you want to allocate it across the 18 strata formed by gender, age, and education? If so, do you want allocate proportionally or else?

Hello @MamadouSDiallo. Yes, I would like to create a sample design with those strata (e.g. woman, 18-30, lower education; man, 50+, higher education etc.). The problem is to allocate each strata in a way that, as a result, in the total sample there would be roughly 64% women, 36% men; 20% aged 18-30, 50% aged 31-50, 30% aged 50+; 30% with lower education, 50% with middle education and 20% with higher education.