EliHei2/segger_dev

Connect Sopa and Segger

Closed this issue · 2 comments

Here are the tasks to connect segger with sopa. FYI @LucaMarconato

1. Raw data to SpatialData

This is handled by spatialdata-io, and does not directly concern sopa or segger

2. SpatialData to segger input

Three options:

  1. add a function inside sopa to create inputs for segger
  2. add a function in segger that takes as input a dask dataframe (points) and a geopandas dataframe (cells). This way, SpatialData is not a dependency of segger, and Sopa makes the link between the two.
  3. add a function in segger to read from a zarr store representing a SpatialData object

I think solution 2 is better. For instance, with Sopa, we store a SpatialData object on disk, and then we could call the function below from segger, and we provide the objects that are needed for segger (in the right coordinate system).

# inside segger
def from_spatialdata(transcripts: dd.DataFrame, cells: gpd.GeoDataFrame):
     ... # the transcripts and cells provided by Sopa should live in the same coordinate system

# inside Sopa
import segger

def segger_segmentation(sdata: SpatialData):
    segger.from_spatialdata(sdata[points_key], sdata[shapes_key])
    ...

The toy dataset from Sopa might be useful to work on this:

import sopa

sdata = sopa.io.uniform()
sdata.write("toy_data.zarr") # store the SpatialData object on disk

sdata["transcripts"] # dask dataframe with columns "x", "y", "z", "genes"
sdata["cells"] # geopandas dataframe, with column "geometry"

# shapes can be read with geopandas, without using spatialdata
import geopandas as gpd
gpd.read_parquet(sdata.path / "shapes" / "cells" / "shapes.parquet")

3. Segger output to Sopa

We will need two things:

  • a column in the dask dataframe containing the cell ID (this is already done in segger). Then, sopa can create a cell-by-gene table based on this
  • the shapes as boundaries. Ideas: use the same approach as Baysor, or use alphashape

Thanks @quentinblampey for the explanation, I agree that plan 2 is the shortest path forward and should be pretty quick to implement.

#20 is gonna be a continuation of this, as once there's a bridge to SpatialData, we have the bridge to sopa. @LucaMarconato @rukhovich @andrewmoorman and @quentinblampey to further continue the discussions. I close this one.