carpenter-singh-lab/2022_Haghighi_NatureMethods

Clarify that there are 4, not 5 datasets

Opened this issue · 0 comments

Our abstract says

we provide a collection of four datasets with both gene expression and morphological profile data useful for developing and testing multimodal methodologies.

but the GitHub repo says

We have gathered the following five available data sets that had both Cell Painting morphological (CP) and L1000 gene expression (GE) profiles, preprocessed the data from different sources and in different formats in a unified .csv format.

We should clarify this, using the context below

One of the chemical datasets (CDRP-BBBC047-Bray) has a subset of compounds that are known to be bioactive. We referred to this subset as CDRP-bio-BBBC036-Bray and reported the details independently for this dataset (Supplementary Data 1 and 2). We only used CDRP-bio and not the full CDRP set for the analysis, because we believe that the quality of CDRP is insufficient for either of these analyses given that very few data points remained after filtering for replicate reproducibility across both modalities (Supplementary Fig. 1).