scverse/anndataR

obs_names and var_names as character/str or not

Opened this issue · 3 comments

During the hackathon, I think it was mentioned that obs_names and var_names might not just be an array of strings, but other dtypes would also be supported. This would cause an issue w.r.t. interoperability in other languages -- at least in R, since standard data frames do not support non-string row names.

This has resulted in a somewhat clunky approach to storing obs_names and var_names in an anndataR::AnnData object, since we don't assume we can simply add the obs_names and var_names to any of the slots (X, var, obs, ...) since it will result in a conversion warning being thrown and thus loss of information.

We should figure out what the planned roadmap for this functionality is (probably related to scverse/anndata#777?), and whether there is a different way of resolving this in R because not being to add any dimnames to X and rownames to obs and var is quite cumbersome.

a strategy following https://anndata.readthedocs.io/en/latest/fileformat-prose.html#dataframe-specification-v0-2-0 might define obs_attrs() and including obs_attrs()[["_name"]] the column name of the index. obs would return a data.frame that included the named column. Is it actually a problem not having row names? This is the norm in the tidyverse world.

Perhaps we would implement (on AbstractAnnData) a single-square-brack subset method ad[cidx, ridx] that created a subset / view based on the corresponding row / column index,.

I think it should be possible to support most things by storing names separately. Indexing is maybe only an issue for the in-memory backend anyway, I don't think we want to try to implement indexing on the file-backed backends. Conversion to R objects might be more difficult because as soon as we put things into colnames/rownames they will coerced to characters.

Maybe @ivirshup can give us an idea of how soon this might happen (and how much we need to worry about it now)?

It's not imminent. At soonest, my guess would be late this year. But I can ping maintainers of this library ahead of that.