Sprint 13 Task List

Question

cristinaetrv opened this issue 4 months ago · 1 comments

Proteomics:
Goal: Wrap up proteomics methods

Take data cleaning analysis w/ preprocess step that separates out somascan and TMT, see effect of domain adaptation before combining vs combined @akotlar 6/11/2024
Filtering needs to be generalized to SomaScan @akotlar 6/14/2024
Harmonize SomaScan/TMT datasets - latent variable model with two sets of covariates, do imputation on each, harmonization minimizing the discrepancy @austinTalbot7241993 6/14/2024
Demonstrate network analysis on ~300 sample dataset @akotlar 6/14/2024

GIN 6/17 work:

Ability to download or stream ancestry json - @akotlar - done: https://github.com/bystrogenomics/bystro-web/pull/453
Ability to convert ancestry json to tsv/csv - @akotlar - done: #525
Ability to explode annotation tsv/tsv.gz by gene name - @akotlar - done: #523

PRS

Covariance Matrix Estimation/ML library
Goal: Hand off POE method to Mike by end of sprint

Make more computational and alternative hypothesis tests for Ilha to benchmark @austinTalbot7241993 6/27/2024
Updates to loss functions - @IlhaH 6/27/2024
Computational benchmarking (compared to POIROT) - @IlhaH 6/27/2024

Platform

Documentation

Separate out annotator description/perl side including performance figures, describe every piece that repo has including Machine Learning subsection, Bioinformatics tools subsection (installation first) - 6/27/2024
GIF of how you would use general purpose ML library - 6/27/2024

Answer 1 · 2024-06-12T19:18:31.000Z

6/12/2024

Proteomics:

Alex met with Erik Dammer, and Erik will send more information about which files are the ones we should be analyzing
Erik hadn't normalized within batch in dataset that Alex had been using because they were comparing tissues types and looking at total abundance numbers, but Erik will provide name of dataset that was used for network analysis. Instead, two types of data (soma and TMT) were considered as 'batches' so they are normalized by platform.

POE:

Test is anti-conservative, but can use a bootstrap approach and see what coefficient estimates are and see which ones have a POE
Getting benchmarks on speed