YingfanWang/PaCMAP

Issue Inputting Dataframe Data into PaCMAP

Closed this issue · 5 comments

I have a pandas dataframe with mostly numerical data but very high dimensions (37 variables 56 rows). I am struggling to get the pandas dataframe to convert to nd array and work with PaCMAP. The variable of interest is 'Intact DNA/million CD4 T cells logscale binarized' and are the labels, while the rest of the variables are predictors. Also, how should I handle categorical variables?

PaCMAPGithubError

Hi Alex

For converting pandas dataframe to numpy array (Currently PaCMAP only accepts numpy array as input), this link may help: https://stackoverflow.com/questions/13187778/convert-pandas-dataframe-to-numpy-array

For categorical variables, you may convert them to dummy variables.

I converted the dataframe into a ndarray with to_numpy and I converted a categorical variable to a dummy variable but I still get a dimensional mismatch error when fitting the data. The code is the same as my previous question but with to_numpy() in place of .values. Any idea what could be causing this? Error message is below

image

I guess the reason is that you have converted a categorical variable to a dummy variable, which has a different dtype. Would you be able to elaborate more on the conversion process? You can try to cast the type of the data to float via X = X.astype(float) and see if that solves the problem.

I have been using processedunstimulateddf["Group"].replace({"mj": 1, "non": 0}, inplace=True) to convert the categorical variable. I tried converting processedunstimulateddf["Group"] to float like you said but it did not solve the problem. Do you have any advice now that I have sent you the data as a csv?

This problem is now resolved in release 0.5.5.