The question about metadata

Question

The question about metadata

Closed this issue 8 months ago · 7 comments

Hello, I would like to ask, I have several organic acids as a phenotype, how should I prepare my "metadata"?
$ lccZT0.sorted.bam : num 3610 21900 19400 10900 59600 2330000 49600 18400 1700000 6530 ...
$ lccZT4.sorted.bam : num 13000 21100 29600 4150 30600 2500000 12300 9710 2200000 674 ...
$ lccZT8.sorted.bam : num 9290 28000 20900 8920 26500 2370000 12000 9350 1220000 2270 ...
$ lccZT12.sorted.bam: num 8590 31200 32000 8400 37000 2340000 16200 6350 2540000 7740 ...
$ lccZT16.sorted.bam: num 13800 17000 18700 6720 27200 2860000 14000 9960 2050000 5060 ...

Answer 1 · 2024-03-11T18:39:25.000Z

Hi, @409199

Thank you for using BioNERO. ;-)

Sample metadata must be formatted in a data frame with sample names in row names and any relavant variables in columns. Sample names must match the colnames() of your expression matrix.

In your case, it seems like transposing the data frame would do the job (with the t() function).

However, instead of storing the expression matrix and sample metadata data frame in separate objects, I'd strongly recommend working with SummarizedExperiment objects (see docs here), because it's way easier to have all data in a single object.

Best,
Fabricio

Answer 2 · 2024-03-12T09:55:15.000Z

First of all, thank you very much for your kindly reply.

In the process of applying BioNERO, I found that my phenotypic data is not the same as the data in your tutorial. Your phenotypic data is from each organ, and our phenotypic data is the content of several organic acids, so I don't know how to prepare metadata and correlate the module with organic acid content according to your tutorial.

My metadata

My MEs

My exp
Pyruvic_acid e_Hydroxypropanoic_acid e_Aminobutyric_acid
lccZT0.sorted.bam 3610 21900 19400
lccZT4.sorted.bam 13000 21100 29600
lccZT8.sorted.bam 9290 28000 20900
lccZT12.sorted.bam 8590 31200 32000

Answer 3 · 2024-03-12T10:10:32.000Z

Hi,

Could you please store your expression data and sample metadata in a SummarizedExperiment object? The way it is, it's very hard to figure out what your data looks like, especially because you're pasting screenshots, and not a reprex.

Right now, I don't understand what you have as gene IDs, and if sample IDs match between your expression matrix and your sample metadata data frame.

Best,
Fabricio

Answer 4 · 2024-03-14T05:33:48.000Z

First of all, thank you very much for your quick reply.

I've stored the expression data and sample metadata in the SummarizedExperiment object. After carefully reading the other questions you answered, I found that the core problem is that my variables are the content of organic acids as a continuous variable, while the categorical variables of your variables are roots, stems, and leaves. Can continuous variables be correlated with BioNERO? If so, how to do that?

Thank you for your kind help.

Answer 5 · 2024-03-14T09:01:58.000Z

Hi,

Continuous variables can also be handled by BioNERO in the same way categorical or ordinal variables are; you add them as columns in the colData slot of your SummarizedExperiment object.

The code below was extracted from the vignette, but I edited the code to create a simulated continuous variable named compound_content:

set.seed(123)
suppressPackageStartupMessages({
    library(BioNERO)
    library(SummarizedExperiment)
})
data(zma.se)

final_exp <- exp_preprocess(
    zma.se, min_exp = 10, variance_filter = TRUE, n = 2000
)
#> Number of removed samples: 1

sft <- SFT_fit(final_exp, net_type = "signed hybrid", cor_method = "pearson")
#> Warning: executing %dopar% sequentially: no parallel backend registered
#>    Power SFT.R.sq    slope truncated.R.sq mean.k. median.k. max.k.
#> 1      3 0.293000  0.27100         0.1180   384.0     386.0    689
#> 2      4 0.000141 -0.00465        -0.2750   290.0     272.0    584
#> 3      5 0.210000 -0.20100         0.0542   227.0     202.0    509
#> 4      6 0.427000 -0.35900         0.2990   184.0     155.0    452
#> 5      7 0.583000 -0.48400         0.4780   153.0     121.0    407
#> 6      8 0.665000 -0.58300         0.5720   129.0      96.0    370
#> 7      9 0.697000 -0.66500         0.6110   111.0      77.8    339
#> 8     10 0.786000 -0.71800         0.7260    95.8      64.1    313
#> 9     11 0.787000 -0.77600         0.7310    83.8      53.4    290
#> 10    12 0.821000 -0.82800         0.7810    73.9      44.7    270
#> 11    13 0.857000 -0.86700         0.8290    65.6      37.5    252
#> 12    14 0.884000 -0.89500         0.8660    58.6      31.5    236
#> 13    15 0.890000 -0.91400         0.8710    52.7      26.7    221
#> 14    16 0.884000 -0.93900         0.8630    47.6      22.9    208
#> 15    17 0.886000 -0.96300         0.8630    43.1      19.7    196
#> 16    18 0.896000 -0.97500         0.8740    39.2      17.0    185
#> 17    19 0.905000 -0.98400         0.8840    35.8      14.8    175
#> 18    20 0.914000 -0.99300         0.8930    32.8      12.8    166
net <- exp2gcn(
    final_exp, net_type = "signed hybrid", SFTpower = sft$power, 
    cor_method = "pearson"
)
#> ..connectivity..
#> ..matrix multiplication (system BLAS)..
#> ..normalization..
#> ..done.

# Add a fake continuous variable in the colData slot
final_exp$compound_content <- rnorm(ncol(final_exp), 20, 2)
colData(final_exp)
#> DataFrame with 27 rows and 2 columns
#>                    Tissue compound_content
#>               <character>        <numeric>
#> SRX339756       endosperm          22.3282
#> SRX339757       endosperm          19.6960
#> SRX339758       endosperm          25.0386
#> SRX339762       endosperm          18.5401
#> SRX339764       endosperm          24.2687
#> ...                   ...              ...
#> SRX2792107 whole_seedling          20.3020
#> SRX2792108 whole_seedling          15.3818
#> SRX2792102 whole_seedling          18.0599
#> SRX2792103 whole_seedling          18.7434
#> SRX2792104 whole_seedling          20.6909

me_trait <- module_trait_cor(exp = final_exp, MEs = net$MEs)
me_trait
#>                ME            trait         cor       pvalue            group
#> 1          MEblue        endosperm  0.48007104 0.0112681646           Tissue
#> 2          MEblue           pollen  0.30284517 0.1246651473           Tissue
#> 3          MEblue   whole_seedling -0.66012782 0.0001791887           Tissue
#> 4          MEcyan        endosperm  0.17353097 0.3866980847           Tissue
#> 5          MEcyan           pollen  0.24563232 0.2168395572           Tissue
#> 6          MEcyan   whole_seedling -0.34505954 0.0779446760           Tissue
#> 7          MEgrey        endosperm  0.47614330 0.0120525998           Tissue
#> 8          MEgrey           pollen  0.22461961 0.2599972506           Tissue
#> 9          MEgrey   whole_seedling -0.59551215 0.0010486148           Tissue
#> 10 MEmidnightblue        endosperm -0.15025972 0.4544057353           Tissue
#> 11 MEmidnightblue           pollen  0.01734256 0.9315803304           Tissue
#> 12 MEmidnightblue   whole_seedling  0.11895932 0.5545294876           Tissue
#> 13       MEpurple        endosperm  0.34624222 0.0768633212           Tissue
#> 14       MEpurple           pollen  0.15124540 0.4514169837           Tissue
#> 15       MEpurple   whole_seedling -0.42359091 0.0276840445           Tissue
#> 16          MEred        endosperm  0.04884068 0.8088429045           Tissue
#> 17          MEred           pollen  0.11272883 0.5755964071           Tissue
#> 18          MEred   whole_seedling -0.13119761 0.5142119592           Tissue
#> 19       MEsalmon        endosperm  0.26568876 0.1804225913           Tissue
#> 20       MEsalmon           pollen  0.24020439 0.2274917146           Tissue
#> 21       MEsalmon   whole_seedling -0.42209187 0.0282986267           Tissue
#> 22         MEblue compound_content  0.07975065 0.6925339219 compound_content
#> 23         MEcyan compound_content  0.20683423 0.3006063556 compound_content
#> 24         MEgrey compound_content  0.04589558 0.8201785431 compound_content
#> 25 MEmidnightblue compound_content -0.26328857 0.1845399040 compound_content
#> 26       MEpurple compound_content  0.33186280 0.0908131856 compound_content
#> 27          MEred compound_content  0.32037601 0.1032658856 compound_content
#> 28       MEsalmon compound_content  0.11437710 0.5699887999 compound_content

As you can see, BioNERO automatically recognizes that the variable compount_content is continuous, so it calculates ME-variable correlations accordingly.

Does this solve your issue?

^{Created on 2024-03-14 with reprex v2.1.0}

Answer 6 · 2024-03-15T08:54:15.000Z

I successfully solved my problem under your guidance, thank you very much!

Answer 7 · 2024-03-15T09:27:31.000Z

Great to know it worked for you! I'll close the issue, then.

Thank you for using BioNERO. ;-)

Best,
Fabricio