Issue Inputting Dataframe Data into PaCMAP

Question

Issue Inputting Dataframe Data into PaCMAP

Closed this issue 2 years ago · 5 comments

AlexRichardson2001 commented 2 years ago

I have a pandas dataframe with mostly numerical data but very high dimensions (37 variables 56 rows). I am struggling to get the pandas dataframe to convert to nd array and work with PaCMAP. The variable of interest is 'Intact DNA/million CD4 T cells logscale binarized' and are the labels, while the rest of the variables are predictors. Also, how should I handle categorical variables?

Answer 1 · 2022-03-04T20:10:01.000Z

Hi Alex

For converting pandas dataframe to numpy array (Currently PaCMAP only accepts numpy array as input), this link may help: https://stackoverflow.com/questions/13187778/convert-pandas-dataframe-to-numpy-array

For categorical variables, you may convert them to dummy variables.

Answer 2 · 2022-03-11T19:49:16.000Z

I converted the dataframe into a ndarray with to_numpy and I converted a categorical variable to a dummy variable but I still get a dimensional mismatch error when fitting the data. The code is the same as my previous question but with to_numpy() in place of .values. Any idea what could be causing this? Error message is below

Answer 3 · 2022-03-12T21:34:06.000Z

I guess the reason is that you have converted a categorical variable to a dummy variable, which has a different dtype. Would you be able to elaborate more on the conversion process? You can try to cast the type of the data to float via X = X.astype(float) and see if that solves the problem.

Answer 4 · 2022-03-15T01:00:43.000Z

I have been using processedunstimulateddf["Group"].replace({"mj": 1, "non": 0}, inplace=True) to convert the categorical variable. I tried converting processedunstimulateddf["Group"] to float like you said but it did not solve the problem. Do you have any advice now that I have sent you the data as a csv?

Answer 5 · 2022-04-11T16:11:16.000Z

This problem is now resolved in release 0.5.5.