Data format of our own data

Question

Data format of our own data

leihouyeung opened this issue 3 years ago · 11 comments

Could you explain more about the specific data format of our own data respectively? (including scRNAseq_data.RDS spatial_data.RDS scRNAseq_label.RDS)
What should they contain? What are the row and column of each file? Thanks!

Answer 1 · 2021-03-20T23:21:10.000Z

Could you explain more about the specific data format of our own data respectively? (including scRNAseq_data.RDS spatial_data.RDS scRNAseq_label.RDS)
What should they contain? What are the row and column of each file? Thanks!

The 'scRNAseq_data.RDS' refers to the single-cell RNA-seq data that you use for deconvolution. It is a data matrix with rows as genes and columns as cells. The 'scRNAseq_label.RDS' is a data frame with rowname as the cell names and one column as cell type. The 'spatial_data.RDS' is the spatial transcriptomics data matrix with rows as genes and columns as spots.

Answer 2 · 2021-03-22T05:47:39.000Z

Could you explain more about the specific data format of our own data respectively? (including scRNAseq_data.RDS spatial_data.RDS scRNAseq_label.RDS)
What should they contain? What are the row and column of each file? Thanks!

The 'scRNAseq_data.RDS' refers to the single-cell RNA-seq data that you use for deconvolution. It is a data matrix with rows as genes and columns as cells. The 'scRNAseq_label.RDS' is a data frame with rowname as the cell names and one column as cell type. The 'spatial_data.RDS' is the spatial transcriptomics data matrix with rows as genes and columns as spots.

Thanks for your response! I have another question. I have tried to run my own data(not the example data). What's the purpose of returning the second data frames in the "st_labels" in the function called "data_process"? I know the first one is the mixed pseudo-ST by raw scRNA-seq data. I am just confused about the existence of the second one. Why could we get the real-ST labels?

Answer 3 · 2021-03-25T22:20:55.000Z

Could you explain more about the specific data format of our own data respectively? (including scRNAseq_data.RDS spatial_data.RDS scRNAseq_label.RDS)
What should they contain? What are the row and column of each file? Thanks!

The 'scRNAseq_data.RDS' refers to the single-cell RNA-seq data that you use for deconvolution. It is a data matrix with rows as genes and columns as cells. The 'scRNAseq_label.RDS' is a data frame with rowname as the cell names and one column as cell type. The 'spatial_data.RDS' is the spatial transcriptomics data matrix with rows as genes and columns as spots.

Thanks for your response! I have another question. I have tried to run my own data(not the example data). What's the purpose of returning the second data frames in the "st_labels" in the function called "data_process"? I know the first one is the mixed pseudo-ST by raw scRNA-seq data. I am just confused about the existence of the second one. Why could we get the real-ST labels?

Hey good question. Yes the returned list includes two elements, the 1st is the mixed labels of pseudo-ST data, but the 2nd is not the real-ST labels. The 2nd one is used to keep the size of data structure, but will not be used in the learning process. You will obtain the real-ST labels after you finish the whole pipeline.

Answer 4 · 2021-03-30T06:23:03.000Z

Thanks for your explanation.
FYI, when I run convert_data.R with my own data, it raised a mistake:missing values are not allowed in subscripted assignments of data frames on running function "SPOTlight::test_spot_fun". I solved it by following code before running "test_spot_fun" :rownames(st_label[[1]]) = colnames(st_count[[1]]). I hope it could be helpful.
Nice work :)

Answer 5 · 2021-04-05T07:49:11.000Z

Could you please explain more about the meaning of the existence of "filterEdge" function in gutils.py? What is it used for?

Answer 6 · 2021-04-05T07:49:30.000Z

And I am confused in the adjacent matrix construction part in utils.py.
id_grp1 = np.array([ np.concatenate((np.where(find1 == id_graph2.iloc[i, 1])[0], np.where(find1 == id_graph2.iloc[i, 0])[0])) for i in range(len(id_graph2)) ])
I think it should be
id_grp1 = np.array([ np.concatenate((np.where(find1 == id_graph2.iloc[i, 2])[0], np.where(find1 == id_graph2.iloc[i, 1])[0])) for i in range(len(id_graph2)) ])
Because the id_graph2.iloc[i,0] are the indices of all edges.

Answer 7 · 2021-04-05T13:16:46.000Z

Thanks for your explanation.
FYI, when I run convert_data.R with my own data, it raised a mistake:missing values are not allowed in subscripted assignments of data frames on running function "SPOTlight::test_spot_fun". I solved it by following code before running "test_spot_fun" :rownames(st_label[[1]]) = colnames(st_count[[1]]). I hope it could be helpful.
Nice work :)

Thanks for the comments. I will check the names and update the codes.

Answer 8 · 2021-04-05T13:22:12.000Z

Could you please explain more about the meaning of the existence of "filterEdge" function in gutils.py? What is it used for?

The link graph between pseudo-ST and real-ST data is built primarily based on the reduced dimension space. The 'filterEdge' function further purifies the link graph for reliability based on the original pseudo-ST and real-ST data. Hope my explanation helps.

Answer 9 · 2021-04-05T13:24:40.000Z

And I am confused in the adjacent matrix construction part in utils.py.
id_grp1 = np.array([ np.concatenate((np.where(find1 == id_graph2.iloc[i, 1])[0], np.where(find1 == id_graph2.iloc[i, 0])[0])) for i in range(len(id_graph2)) ])
I think it should be
id_grp1 = np.array([ np.concatenate((np.where(find1 == id_graph2.iloc[i, 2])[0], np.where(find1 == id_graph2.iloc[i, 1])[0])) for i in range(len(id_graph2)) ])
Because the id_graph2.iloc[i,0] are the indices of all edges.

In the codes setting, the variable id_graph2 should have two columns. If your id_graph2 has three columns, then you have the first column as the indices of all edges, and accordingly change '1' to '2', and '0' to '1'.

Answer 10 · 2021-04-05T14:05:44.000Z

Could you please explain more about the meaning of the existence of "filterEdge" function in gutils.py? What is it used for?

The link graph between pseudo-ST and real-ST data is built primarily based on the reduced dimension space. The 'filterEdge' function further purifies the link graph for reliability based on the original pseudo-ST and real-ST data. Hope my explanation helps.

I am confused about the "position" vector in this function. It is the indices of "nn[1]", but why it can be used in "edge"? They are different data frames. I am confused about the meaning of "fedge" of this function.

Answer 11 · 2021-04-05T14:21:15.000Z

Could you please explain more about the meaning of the existence of "filterEdge" function in gutils.py? What is it used for?

The link graph between pseudo-ST and real-ST data is built primarily based on the reduced dimension space. The 'filterEdge' function further purifies the link graph for reliability based on the original pseudo-ST and real-ST data. Hope my explanation helps.

I am confused about the "position" vector in this function. It is the indices of "nn[1]", but why it can be used in "edge"? They are different data frames. I am confused about the meaning of "fedge" of this function.

If I understand your question correct, the indices of nn[1] represent the node indices. So it applies to edges. That line of code means identifying the neighbors between a certain pseudo-ST node and the other real-ST node in the "edges" as well as in "nn".