[Visualisation] Baysor-like output
Opened this issue · 3 comments
Hi, @EliHei2 ! Hope you been well.
I've been testing segger-dev
on in-house Xenium dataset as purpose of head-to-head comparison with Baysor.
I have final output like following but somehow missing any polygon-masks.
segment( model, dm, save_dir='benchmarks', seg_tag='segger_embedding_1001', transcript_file='transcripts.parquet', receptive_field=receptive_field, min_transcripts=5, cell_id_col='segger_cell_id', use_cc=False, knn_method='cuda', verbose=True, )
I wonder is there any quick way to convert current segger output into polygon-mask as Baysor folks providing?
[segmentation_borders.html]
Any guidancd would be much appreciated!
J
Hey @jpark27 thanks for reaching out. you can use the boundary module in segger.validation as the following to generate non-convex cell boundaries, it's the same algorithm impelmented by baysor.
from segger.prediction.boundary import generate_boundary
import geopandas as gpd
import dask.dataframe as dd
from tqdm import tqdm
from pqdm.processes import pqdm # or use pqdm.threads for threading-based parallelism
ddf = dd.read_parquet('path/to/segger_transcripts.parquet')
# Modify the function to work with a single group to use with pqdm
def process_group(group):
cell_id, t = group
return {
"cell_id": cell_id,
"length": len(t),
"geom": generate_boundary(t, x="x_location", y="y_location")
}
def generate_boundaries(df, x="x_location", y="y_location", cell_id="segger_cell_id", n_jobs=10):
# Group by cell_id
group_df = df.groupby(cell_id)
# Use pqdm to process each group in parallel
results = pqdm(tqdm(group_df, desc="Processing Groups"), process_group, n_jobs=n_jobs)
# Convert results to GeoDataFrame
return gpd.GeoDataFrame(
data=[[res["cell_id"], res["length"]] for res in results],
geometry=[res["geom"] for res in results],
columns=["cell_id", "length"],
)
bb = generate_boundaries(ddf, x="x_location", y="y_location", cell_id="segger_cell_id", n_jobs=8)
Hi, @EliHei2! Many thanks for suggestion.
I followed your attached script and come across with following error that I couldn't resolve yet. Any chance you notice it before on current version of segger-dev
output?
Hi. @EliHei2 , I am running the 'Introduction to Segger" from the Tutorial section with the xenium example data. I had the exact same issue as @jpark27 when trying to plot the boundaries. I checked the values in the columns of the segger_transcripts.parquet file (loading with pandas read_parquet function) and the columns 'score', 'segger_cell_id' and "bound' had either NaN or None values across all rows I checked (including rows with assigned cell_IDs). Could this be the reason of the issue? If so, is something wrong when running the segment function that it is not outputting the segger_cell_IDs into the parquet file?
Thanks!