scverse/anndataR

Coordinating with `scKirby`

Opened this issue · 3 comments

Hi there @rcannood, I was just alerted by @mtmorgan about the anndataR project. Really excited to see further development in this vein, as I'm a huge fan on anndata, both in its R and Python incarnations.

I wanted to make you aware that I've independently been working on an R package called scKirby, which aims to automate reading/converting/saving single-cell data from/to any format. It's still under active development, but in regards to anndata I currently use conversion functions from sceasy and zellkonverter.

Some notes on interfacing with conda/python:

  • A tricky aspect of using the anndata R package is getting it to work with reticulate. To help with this, I use another R package I've been working on called echoconda to ensure the right conda env is activated.
  • That said, I find basilisk much easier to use, and I really like how this is implemented within zellkonverter. So i would definitely support any movement towards using basilisk over reticulate.

Additional scKirby features that may be of interest:

  • Automated file type inference:scKirby does a lot of automatic inference of file types, so that users don't have to specify what reader function to use. It just figures it out and uses the appropriate function to get the single-cell data in R. The goal is to make it so that people don't have to become experts in every single-cell format out there to use the data (regardless of what format it happens to be provided in).
  • Intra-species data aggregation: scKirby::map_data builds upon my other package orthogene to enable standardisation of gene names (or aggregate transcript IDs to gene-level data). This currently supports several single-cell formats including anndata.
  • Inter-species data conversion: scKirby::map_data also lets you easily convert single-cell object between species, by taking advantage of the various ortholog mapping strategies provided via orthogene (e.g. 1:1, many:1, 1:many, many:many). I use this quite a bit in my comparative work.

Let me know if you have any questions or suggestions for improvements on my end. PRs to any of my packages are always welcome as well!

All the best,
Brian

Hey Brian!

Thanks for the detailed post! It's clear you already but a lot of thought into ensuring people can convert from one file format to another ☺️

It sounds like a major endeavour, supporting the conversion between all these file formats. Is there some documentation on what conversions between file formats are working, and which fields/slots mappings (from-to) are supported? Do you have plans for unit testing between file formats in a pairwise manner?

We'd be happy to give scKirby a shot. If we succeed in our cause of building a native H5AD and H5Zarr reader in R, this could represent an easier way of reading in h5ad and h5zarr files in scKirby/sceasy. If we manage to read/write h5ad files reliably without any discrepancies between the R and Python implementations ;)

Would you want to join one of our upcoming meetings to discuss how we could collaborate?

Kind regards,
Robrecht

You have probably heard this from Martin/Robrecht already but just in case anyone else comes across this thread. The goal of this new {anndataR} is to consolidate the R AnnData interfaces we have already worked on, and conversion from the interface to common R objects (SingleCellExperiment/Seurat). Most of the work is around building native R H5AD/Zarr readers/writers but we will also have support for an in-memory representation (and maybe {reticulate}). Once this works we should try to replace the existing implementations in {zellkonverter}, {anndata}, {MuData} etc. so that we aren't duplicating things.

I haven't looked too much at what you have done already but hopefully that can be incorporated somehow and if there is anything you can contribute here that would also be awesome.

@rcannood

It sounds like a major endeavour, supporting the conversion between all these file formats. Is there some documentation on what conversions between file formats are working, and which fields/slots mappings (from-to) are supported?

One thing on my to-do list is to get the docs website back up, I'll try to make that a priority now that I'm not the only one using it!

Do you have plans for unit testing between file formats in a pairwise manner?

This is definitely the plan! Currently I only have Roxygen note examples for all of the conversion functions, but I'm working on making proper unit tests for all of these.

We'd be happy to give scKirby a shot. If we succeed in our cause of building a native H5AD and H5Zarr reader in R, this could represent an easier way of reading in h5ad and h5zarr files in scKirby/sceasy. If we manage to read/write h5ad files reliably without any discrepancies between the R and Python implementations ;)
Would you want to join one of our upcoming meetings to discuss how we could collaborate?

That would be amazing, I'm definitely keen to collaborate on this and synchronize our efforts! Definitely loop me into your next meeting: brian.schilder [at] alumni.brown.edu

@lazappi Thanks so much for the recap on this, I think this will be extremely valuable to the bioinf community! If you like, I can give you all a tour of what I've done so far at one of your meetings.