rewrite to work with a collapsed all-gene database
Closed this issue · 5 comments
russHyde commented
In the current version, the coxpresdb files look like
# file for target gene_a
gene_b MR_ab COR_ab
gene_c MR_ac COR_ac
gene_d MR_ad COR_ad
...
This leads to inefficient sampling - for each gene sampled you have to re-read it's coexpression data
It would be more efficient to read in the data for all genes at one time from a single file
The file should look like
gene_a gene_b MR_ab COR_ab
gene_a gene_c MR_ac COR_ac
gene_a gene_d MR_ad COR_ad
...
gene_b ...
...
russHyde commented
Ditch the correlation column
russHyde commented
The CoxpresDbImporter
object should return a dataframe with columns (source_id, target_id, mutual_rank
) for each gene requested.
- should the user pass a data-frame or a filepath containing the data-frame?
russHyde commented
Also should be able to use a data-frame when making a CoxpresDbImporter; perhaps it should be called CoxpresDbAccessor?
russHyde commented
Plan:
- - rename
CoxpresDbImporter
asCoxpresDbAccessor
- - add a proper class definition for CoxpresDbAccessor (rather than just calling
methods::new()
) - - add subclasses
- -
CoxpresDbArchiveAccessor
- - Move slots from Accessor to ArchiveAccessor
- - IO produces an ArchiveAccessor
- - get_file_paths_[for_gene] should be specific to ArchiveAccessor
- - get_[raw|uncompressed]_archive should be specific to ArchiveAccessor
- -
CoxpresDbDataframeAccessor
- - Add class and slots etc
- - add validity test over the enclosed dataframe
- - implement
get_gene_ids
for DataframeAccessor - - implement
get_all_coex_partners
for DataframeAccessor
- -
- -
CoxpresDbAccessor(db, ...)
function should decide which class is dispatched based on the input- - if db is a dataframe
- return a CoxpresDbDataframeAccessor that internally stores the dataframe
- note if the user has presummarised the db into a single file, they should read-in and pass the dataframe
- - if db is an archive
- return a CoxpresDbArchiveAccessor
- use current code present in CoxpresDbImporter
- - if db is a dataframe
- - Rename
import_all_coex_partners
asget_all_coex_partners
- this should work for any type of CoxpresDbAccessor