d3b-center/hope-cohort-analysis

October 2023 figures updates

Closed this issue ยท 31 comments

per Nicole:
'''
please help me out by following these guidelines

Use Arial font. Otherwise, you must figure out how to embed your fonts. To embed fonts, you might follow this procedure: (1) Open the file you want to embed fonts in. (2) On the application (PowerPoint or Word) menu, select Preferences. (3) In the dialog box, under Output and Sharing, select Save.(4) Under Font Embedding, select Embed fonts in the file.

Prefer vectorized file formats (e.g., PDF or SVG) whenever possible. Vectorized formats maintain individual object distinctions allowing for independent modification. Avoid settings such as use_raster = TRUE in preparing heatmaps and avoid using ggrastr unless necessary. Especially avoid rasterizing text.

When saving figures in a vectorized format, avoid using Dingbats. For example, in R, you can save your files like this: pdf(file = "myfile.pdf", useDingBats = FALSE)

If used, color should be encoded as RGB, and to accommodate all viewers, red and green should not be used together
'''

Yes, will do. When do you need this by?

Could you do this by Friday morning?

Yes they have been removed for subtypes < 3 cases. The survival analysis was given an OK by @jharenza.

@komalsrathi
Could I ask for cascade plot with tumor only by age with two age groups (there is plot already with three age groups)?
https://github.com/d3b-center/hope-cohort-analysis/tree/rerun-analyses/analyses/oncoplots/results/cascade_plots_add_tumor_only

This is super short notice but I will try to do it in the morning. Thanks.

@mkoptyra I have added the additional plots in the same folder with the suffix _two_age_groups.pdf

Thank you @komalsrathi - is the cut-off for the somatic 6% cohort frequency?

@komalsrathi - thank you so much for ALT and age/sex correlation with adjusted cohort.
Seems that there is age correlation with two groups.
Do you recall if we had that significant before (not super important but curious since i don't recall) ?

@komalsrathi - thank you so much for ALT and age/sex correlation with adjusted cohort.

Seems that there is age correlation with two groups.

Do you recall if we had that significant before (not super important but curious since i don't recall) ?

Please also remember that subtypes are enriched in certain age groups so this by default makes the conclusions age-related, but I'm inclined to say these are tumor intrinsic rather than age intrinsic factors. We would need additional analyses - within age group analyses, I believe, to tease out specific questions related to phenotypes. So while we can say "ALT status is different between age groups"- we are not comparing apples to apples here. It's not all HGG, H3 wildtype here.

We can review this next week before we present anything in the hope meetings - please let's review all before any discussions in hope meetings

It is telemore content (not the ALT status) and the main question of the first figure is related to any features of the age related aspects. I see that as transparently indicated tendency, not necessary a big statement.

@komalsrathi
For the oncoplot, and somatci calls presented on the figure can I ask to change the genes to top 20? ( I think currently there is >6%)

And could you add annotation with the genomic subtypes?

I have added top 20 alterations to the oncoplots + added molecular subtypes to the annotation.

Thank you so much. This looks great - I added some flags :)
cascade_orderby_age_two_age_groups=FLAGs copy

@komalsrathi few more questions, comments:

  1. We will need additional similar to the oncoplot you created (like the one which is base for the above picture):
    https://github.com/d3b-center/hope-cohort-analysis/blob/rerun-analyses/analyses/oncoplots/results/cascade_plots_add_tumor_only/cascade_orderby_age_two_age_groups.pdf
    with all the oncogenes on the list (not only top 20). That long version of figure will go to the supplementary data for the Hope manuscript.

  2. For the following graph:
    https://github.com/d3b-center/hope-cohort-analysis/blob/rerun-analyses/analyses/alt-analysis/results/telomere_content_vs_age_two_groups.pdf
    Could you embed also tumor location on this graph? I am thinking of adding tumor_location colors to this graph

  3. Can I confirm the ALT status and age correlation was done with the refreshed cohort (same as used for recent circos and oncoprint plots)?
    https://github.com/d3b-center/hope-cohort-analysis/blob/rerun-analyses/analyses/alt-analysis/results/alt_status_chisq_output.tsv

  4. Just a heads up - Jo Lynne is working on the modified list of tumor locations to adopt from OpenPedCan to Hope cohort. That may require to create additional versions (not replacing) of few graphs (circos plot, oncoprint and ALT vs age with tumor location as in point 2).

Sure - will work on this.

  • with all the oncogenes on the list (not only top 20). That long version of figure will go to the supplementary data for the Hope manuscript.

Oncogenes from what source? Would the reference gene list provided in annoFuse be ok: https://github.com/d3b-center/annoFuseData/blob/master/inst/extdata/genelistreference.txt? I would filter by any rows that have Oncogene under the type column.

Clarifying so that there are no conflicts later on.

Could you embed also tumor location on this graph? I am thinking of adding tumor_location colors to this graph

Confused how you are envisioning this. Could you draw or explain as text and send it over?

Can I confirm the ALT status and age correlation was done with the refreshed cohort (same as used for recent circos and oncoprint plots)?

Yes everything has been updated.

cc: @jharenza if you have any inputs.

Could you embed also tumor location on this graph?

Did you mean something like this?
telomere_content_vs_age_two_groups_by_tumor_loc.pdf

I was able to get this done with the help of @zzgeng. Few points to note here:

  1. Version 1: The boxplot colors are not retained (shades of green in the original boxplot) and they are now black

telomere_content_vs_age_two_groups_by_tumor_loc.pdf

  1. Version 2: The boxplot colors are retained and they are used for the points border as well

telomere_content_vs_age_two_groups_by_tumor_loc.pdf

I was able to get this done with the help of @zzgeng. Few points to note here:

  1. Version 1: The boxplot colors are not retained (shades of green in the original boxplot) and they are now black

telomere_content_vs_age_two_groups_by_tumor_loc.pdf

  1. Version 2: The boxplot colors are retained and they are used for the points border as well

telomere_content_vs_age_two_groups_by_tumor_loc.pdf

@mkoptyra I thought what we wanted to do here was color the subtype, since the caveat to this is that all of the DHG will be in the older age group and are also ALT+, and we want to call that out. @komalsrathi for that, you can use the cancer_group_short. For the oncoprint subtype field, please use that (in v2) since it will be fewer groups.

Notes from our meeting (11/29/23):

Oncoplots:

  1. Liftover genes from PNOC003 to Gencode v39.
  2. Generate oncoplots with full gene list (i.e. OpenPedCan + MMR genes + PNOC003) + top 20 genes only.
  3. Add cancer_group_short and CNS_region to oncoplot top annotations

Data availability circos plots:

  1. Create an additional version of circos plot (with continuous age) with CNS_region instead of HOPE_Tumor.Location.condensed.

cc: @jharenza @mkoptyra

Please add/edit where applicable.

adding :
Create an additional version of ALT teleomer content by two age groups with CNS_region instead of HOPE_Tumor.Location.condensed:
https://github.com/d3b-center/hope-cohort-analysis/blob/rerun-analyses/analyses/alt-analysis/results/telomere_content_vs_age_two_groups_by_tumor_loc.pdf

@mkoptyra please check and let me know if I missed anything:

  1. Updated genes lists to hg38

  2. Updated Cascade plots with 2 age groups + tumor only samples + all genes + CNS_region and Cancer_Group (i.e. cancer_group_short) annotations:
    https://github.com/d3b-center/hope-cohort-analysis/blob/rerun-analyses/analyses/oncoplots/results/cascade_plots_add_tumor_only/cascade_two_age_groups_allgenes.pdf

  3. Updated Cascade plots with 2 age groups + tumor only samples + top 20 genes + CNS_region and Cancer_Group (i.e. cancer_group_short) annotations:
    https://github.com/d3b-center/hope-cohort-analysis/blob/rerun-analyses/analyses/oncoplots/results/cascade_plots_add_tumor_only/cascade_two_age_groups_top20genes.pdf

  4. Telomere content vs two age groups colored by CNS_region:
    https://github.com/d3b-center/hope-cohort-analysis/blob/rerun-analyses/analyses/alt-analysis/results/telomere_content_vs_age_two_groups_by_cns_region.pdf

  5. Data availability circos plot with CNS_region instead of Tumor location condensed:
    https://github.com/d3b-center/hope-cohort-analysis/blob/rerun-analyses/analyses/data-availability/results/hope_clinical_data_availability_age_continuous_cns_region.pdf

Hi Komal; For the data availability circos plot with Tumor location:
https://github.com/d3b-center/hope-cohort-analysis/blob/rerun-analyses/analyses/data-availability/results/hope_clinical_data_availability_age_continuous.pdf

  1. Is it possible to integrate the annotation between the rings - (here, below is modification I added in Photoshop with the the annotation on the rings, but ideally the annotations above/between the rings would be ideal)
    hope_clinical_data_availability_age_continuous-MOIDF

  2. the color of the outer age rink. is there a possibility to use a one color changing into another color? We got a feedback that the blue gradient in the peds ring section may be hard to notice any change. Wondering if you have any thoughts about this.

Regarding the Updated Cascade plots with 2 age groups + tumor only samples + top 20 genes + CNS_region and Cancer_Group (i.e. cancer_group_short) annotations:
https://github.com/d3b-center/hope-cohort-analysis/blob/rerun-analyses/analyses/oncoplots/results/cascade_plots_add_tumor_only/cascade_two_age_groups_orderby_age_top20genes.pdf

I made the following modifications in photoshop:
cascade_two_age_groups_orderby_age_top20genes (1) copy
These changes included the folloiwng:

  • removal of CNS_region, and Molecular_subtype
  • Adding the PXA as the additional Cancer group and adding corresponding color to the annotation column
  • introducing the flag
    Could you apply these in the new version? (I can make the flag in the photoshop)

Additional changes needed:

  • replacing the abbreviations DHG, DMG, HGG, IHG and PXA with:
    (DHG) Diffuse Hemispheric Glioma
    (DMG) Diffuse Midline Glioma
    (HGG) High Grade Glioma (not otherwise specified)
    (IHG) Infantile High Grade Glioma
    (PXA) Pleomorphic Xanthoastrocytoma
  • removal of underscore from the columns annotation and side annotations

For the Updated Cascade plots with 2 age groups + tumor only samples + all genes + CNS_region and Cancer_Group (i.e. cancer_group_short) annotations:
https://github.com/d3b-center/hope-cohort-analysis/blob/rerun-analyses/analyses/oncoplots/results/cascade_plots_add_tumor_only/cascade_two_age_groups_allgenes.pdf

I made the following modifications in photoshop:
cascade_two_age_groups_allgene-MODIF

These changes included the following:

  • removal of CNS_region (molecular_subtype stays)
  • Adding the PXA as the additional Cancer_group and adding corresponding color to the annotation column
  • introducing the flag to molecular_subtype and Cancer_group
    Could you apply these in the newer version? (I can make the flag in the photoshop)

Additional changes needed:

  • For the Cancer_group replacing the abbreviations DHG, DMG, HGG, IHG and PXA with:
    (DHG) Diffuse Hemispheric Glioma
    (DMG) Diffuse Midline Glioma
    (HGG) High Grade Glioma (not otherwise specified)
    (IHG) Infantile High Grade Glioma
    (PXA) Pleomorphic Xanthoastrocytoma
  • removal of underscore from the columns annotation and side annotations

For the Telomere content vs two age groups colored by Tumor_location:
https://github.com/d3b-center/hope-cohort-analysis/blob/rerun-analyses/analyses/alt-analysis/results/telomere_content_vs_age_two_groups_by_tumor_loc.pdf

Is the p value adjusted with the tumor location? The reason for that question is the fact that midline tumors are mostly in the 0-15 age group so it may bias the ALT telomere content difference