russHyde/coxpresdbr

rewrite to work with a collapsed all-gene database

Closed this issue · 5 comments

In the current version, the coxpresdb files look like

# file for target gene_a
gene_b    MR_ab    COR_ab
gene_c    MR_ac    COR_ac
gene_d    MR_ad    COR_ad
...

This leads to inefficient sampling - for each gene sampled you have to re-read it's coexpression data

It would be more efficient to read in the data for all genes at one time from a single file

The file should look like

gene_a    gene_b    MR_ab    COR_ab
gene_a    gene_c    MR_ac    COR_ac
gene_a    gene_d    MR_ad    COR_ad
...
gene_b    ...
...

Ditch the correlation column

The CoxpresDbImporter object should return a dataframe with columns (source_id, target_id, mutual_rank) for each gene requested.

  • should the user pass a data-frame or a filepath containing the data-frame?

Also should be able to use a data-frame when making a CoxpresDbImporter; perhaps it should be called CoxpresDbAccessor?

Plan:

  • - rename CoxpresDbImporter as CoxpresDbAccessor
  • - add a proper class definition for CoxpresDbAccessor (rather than just calling methods::new())
  • - add subclasses
    • - CoxpresDbArchiveAccessor
      • - Move slots from Accessor to ArchiveAccessor
      • - IO produces an ArchiveAccessor
      • - get_file_paths_[for_gene] should be specific to ArchiveAccessor
      • - get_[raw|uncompressed]_archive should be specific to ArchiveAccessor
    • - CoxpresDbDataframeAccessor
      • - Add class and slots etc
      • - add validity test over the enclosed dataframe
      • - implement get_gene_ids for DataframeAccessor
      • - implement get_all_coex_partners for DataframeAccessor
  • -CoxpresDbAccessor(db, ...) function should decide which class is dispatched based on the input
    • - if db is a dataframe
      • return a CoxpresDbDataframeAccessor that internally stores the dataframe
      • note if the user has presummarised the db into a single file, they should read-in and pass the dataframe
    • - if db is an archive
      • return a CoxpresDbArchiveAccessor
      • use current code present in CoxpresDbImporter
  • - Rename import_all_coex_partners as get_all_coex_partners - this should work for any type of CoxpresDbAccessor

Fixed in #22