Connect Sopa and Segger
Closed this issue · 2 comments
Here are the tasks to connect segger with sopa. FYI @LucaMarconato
1. Raw data to SpatialData
This is handled by spatialdata-io, and does not directly concern sopa or segger
2. SpatialData to segger input
Three options:
- add a function inside sopa to create inputs for segger
- add a function in segger that takes as input a dask dataframe (points) and a geopandas dataframe (cells). This way, SpatialData is not a dependency of segger, and Sopa makes the link between the two.
- add a function in segger to read from a zarr store representing a SpatialData object
I think solution 2 is better. For instance, with Sopa, we store a SpatialData object on disk, and then we could call the function below from segger, and we provide the objects that are needed for segger (in the right coordinate system).
# inside segger
def from_spatialdata(transcripts: dd.DataFrame, cells: gpd.GeoDataFrame):
... # the transcripts and cells provided by Sopa should live in the same coordinate system
# inside Sopa
import segger
def segger_segmentation(sdata: SpatialData):
segger.from_spatialdata(sdata[points_key], sdata[shapes_key])
...
The toy dataset from Sopa might be useful to work on this:
import sopa
sdata = sopa.io.uniform()
sdata.write("toy_data.zarr") # store the SpatialData object on disk
sdata["transcripts"] # dask dataframe with columns "x", "y", "z", "genes"
sdata["cells"] # geopandas dataframe, with column "geometry"
# shapes can be read with geopandas, without using spatialdata
import geopandas as gpd
gpd.read_parquet(sdata.path / "shapes" / "cells" / "shapes.parquet")
3. Segger output to Sopa
We will need two things:
- a column in the dask dataframe containing the cell ID (this is already done in segger). Then, sopa can create a cell-by-gene table based on this
- the shapes as boundaries. Ideas: use the same approach as Baysor, or use alphashape
Thanks @quentinblampey for the explanation, I agree that plan 2 is the shortest path forward and should be pretty quick to implement.
#20 is gonna be a continuation of this, as once there's a bridge to SpatialData
, we have the bridge to sopa. @LucaMarconato @rukhovich @andrewmoorman and @quentinblampey to further continue the discussions. I close this one.