EliHei2/segger_dev

[Visualisation] Baysor-like output

Opened this issue · 3 comments

Hi, @EliHei2 ! Hope you been well.

I've been testing segger-dev on in-house Xenium dataset as purpose of head-to-head comparison with Baysor.
I have final output like following but somehow missing any polygon-masks.

segment( model, dm, save_dir='benchmarks', seg_tag='segger_embedding_1001', transcript_file='transcripts.parquet', receptive_field=receptive_field, min_transcripts=5, cell_id_col='segger_cell_id', use_cc=False, knn_method='cuda', verbose=True, )
image

I wonder is there any quick way to convert current segger output into polygon-mask as Baysor folks providing?
[segmentation_borders.html]
image

Any guidancd would be much appreciated!
J

Hey @jpark27 thanks for reaching out. you can use the boundary module in segger.validation as the following to generate non-convex cell boundaries, it's the same algorithm impelmented by baysor.

from segger.prediction.boundary import generate_boundary
import geopandas as gpd
import dask.dataframe as dd
from tqdm import tqdm
from pqdm.processes import pqdm  # or use pqdm.threads for threading-based parallelism

ddf = dd.read_parquet('path/to/segger_transcripts.parquet')

# Modify the function to work with a single group to use with pqdm
def process_group(group):
    cell_id, t = group
    return {
        "cell_id": cell_id,
        "length": len(t),
        "geom": generate_boundary(t, x="x_location", y="y_location")
    }

def generate_boundaries(df, x="x_location", y="y_location", cell_id="segger_cell_id", n_jobs=10):
    # Group by cell_id
    group_df = df.groupby(cell_id)
    # Use pqdm to process each group in parallel
    results = pqdm(tqdm(group_df, desc="Processing Groups"), process_group, n_jobs=n_jobs)
    # Convert results to GeoDataFrame
    return gpd.GeoDataFrame(
        data=[[res["cell_id"], res["length"]] for res in results],
        geometry=[res["geom"] for res in results],
        columns=["cell_id", "length"],
    )

bb = generate_boundaries(ddf, x="x_location", y="y_location", cell_id="segger_cell_id", n_jobs=8)

Hi, @EliHei2! Many thanks for suggestion.
I followed your attached script and come across with following error that I couldn't resolve yet. Any chance you notice it before on current version of segger-dev output?

image
image

Hi. @EliHei2 , I am running the 'Introduction to Segger" from the Tutorial section with the xenium example data. I had the exact same issue as @jpark27 when trying to plot the boundaries. I checked the values in the columns of the segger_transcripts.parquet file (loading with pandas read_parquet function) and the columns 'score', 'segger_cell_id' and "bound' had either NaN or None values across all rows I checked (including rows with assigned cell_IDs). Could this be the reason of the issue? If so, is something wrong when running the segment function that it is not outputting the segger_cell_IDs into the parquet file?

Thanks!