YingfanWang/PaCMAP

Required characteristics of the input data

Closed this issue · 1 comments

Firstly, thank you for the insightful paper and interesting DR approach.

My team are attempting to run PaCMAP on scRNA-seq data, but are encountering a number of oddities. Namely, when the data is pre-processed (log normalised and/or scaled) the known and expected biological structure is lost. When the data is provided as unnormalized, raw counts (and thus an unequal number of counts per cell) - the structure is retained.

Is PaCMAP internally normalizing/scaling data or is it best suited to working on raw data?

Basically, what is the optimal format of data to provide PaCMAP?

Hi There! Thank you for using PaCMAP!

PaCMAP does not internally normalize/scale data, and the data preprocessing has to be done outside of the algorithm. If for your data, pre-processing results in loss of expected biological structure, you can feed the raw data into PaCMAP. Our team is interested in investigating the relationship between scRNA-seq data preprocessing and the behavior of DR algorithms, and wonder if your data is available to be studied or if you could suggest some datasets with similar behavior. Thank you so much in advance!