Question: Is it possible to draw a one-stage PPS sample? (no stratum)
Opened this issue · 3 comments
I have a list of schools I want to sample proportionally to the number of students. How would I do this?
This is the code that I am using:
# Fake data
school_pop_df = pd.DataFrame(dict(id=(1,2,3), n_students=(10, 20, 100)))
n = 2
school_pop_df['samplics_prob'] = pps_design.inclusion_probs(samp_unit=school_pop_df['id'],
samp_size=n,
mos=school_pop_df['n_students'])
- If I leave stratum=None (the default), it throws an error:
def _anycertainty(
210 samp_size: Union[DictStrInt, int],
211 stratum: Optional[np.ndarray],
212 mos: np.ndarray,
213 ) -> bool:
215 certainty = False
--> 216 if stratum.shape not in ((), (0,)) and isinstance(samp_size, dict):
217 for s in np.unique(stratum):
218 stratum_units = stratum == s
AttributeError: 'NoneType' object has no attribute 'shape'
- If use
stratum=1
then it runs and it seems accurate.
But then, to select the sample, if I try to run:
pps_design.select(
samp_unit=school_pop_df['id'],
samp_size=n,
stratum=1,
mos=school_pop_df['n_students'])
I get that some clusters are certainties:
770 elif _mos.shape not in ((), (0,)) and self.method in (
771 SelectMethod.pps_brewer,
772 SelectMethod.pps_hv,
(...)
775 SelectMethod.pps_sys,
776 ):
777 if self._anycertainty(samp_size=self.samp_size, stratum=_stratum, mos=_mos):
--> 778 raise CertaintyError("Some clusters are certainties.")
780 _samp_ids = np.linspace(
781 start=0, stop=_samp_unit.shape[0] - 1, num=_samp_unit.shape[0], dtype="int"
782 )
784 if remove_nan:
CertaintyError: Some clusters are certainties.
Hi @cuchoi
One of your cluster is much larger than the other two. Therefore, it becomes a certainty cluster meaning the probability of inclusion is 1. You will have to handle it manually. In this case, the sample is the certainty one. If you were selecting more than one, you could exclude the certainty unit from the frame as selected, and sample the rest of the units frame the remaining frame. In the future I have plans to handle this situation better but for now it is a manual process.
I will improve the code to better handle the case where stratum is None.
Best
That makes sense; thanks for the answer!
In the future I have plans to handle this situation better but for now it is a manual process
Do you have any reference implementations or papers? I could try submitting a PR.