raphaelvallat/yasa

Hypnogram.hypno as a pd.Categorical?

raphaelvallat opened this issue · 2 comments

@remrama , related to our discussion about the output of yasa.Hypnogram.upsample_to_data in #124, I was thinking that we could in fact set the default dtype of hyp.hypno as a pandas.Categorical. This reduces the memory size of an upsampled hypnogram (5 stages, 8 hour TIB, freq="10ms") from 22 MB (dtype=str or dtype=np.int64) to 2.7 MB.

pd.Categorical(hyp.hypno, categories=hyp.labels, ordered=False)

O great idea!! I didn't realize that drops memory so much. Seems like a "go" at first thought, though there might be some subtle and unforeseen limitations. I can't remember any specific examples, but this has been somewhat of my prior experience with pandas.Categorical. I think it's definitely worth trying.

Great! I also had some prior issues, especially when using groupby functions. But I don't think it applies to us so let's make the switch! I can create a separate PR once we've merged #124