-
Notifications
You must be signed in to change notification settings - Fork 20
feat: add SegTraQ QC module #129
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
priyalT
wants to merge
18
commits into
nf-core:dev
Choose a base branch
from
priyalT:add-segtraq-baseline
base: dev
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
18 commits
Select commit
Hold shift + click to select a range
3731432
feat: add SegTraQ baseline QC module
priyalT 771dbe6
fix: trailing whitespace, newline, and workflow name mismatch
priyalT 0ae7d00
fix: update test snapshots for new segtraq config params
priyalT 1fe10de
fix: trailing newlines and restore opt_flip_track_stat
priyalT 91f650d
fix: update image_mode test snapshot hash
priyalT 8f58fdd
feat: add SegTraQ clustering stability module
priyalT a4ae654
feat: add SegTraQ region similarity module and module-level toggles
priyalT c13cd09
fix: trailing whitespaces
priyalT f80f99e
fix: nf-core test errors
priyalT 331dea1
fix: lint errors
priyalT 6110dcb
Update modules/local/segtraq/region_similarity/main.nf
priyalT 4c56e1f
Update modules/local/segtraq/clustering_stability/main.nf
priyalT 4aea8da
feat: add SegTraQ region similarity and 3D volume module
priyalT 44554e9
feat: add supervise module for QC
priyalT be6fa8b
update: snapshots
priyalT 332f760
feat: add point statistics module
priyalT cb12690
update: nf-test snapshot
priyalT a440d87
feat: integrate SegTraQ point statistics and plotting
priyalT File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| 3.12.8 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,40 @@ | ||
| # Dockerfile to create container with segtraq | ||
| # SegTraQ: A Python toolkit for quantitative and visual quality control | ||
| # of segmentation and transcript assignment in spatial omics data. | ||
| # https://github.com/LazDaria/SegTraQ | ||
|
|
||
| FROM python:3.12-slim | ||
|
|
||
| LABEL authors="Priyal Tripathi" \ | ||
| description="Docker image containing SegTraQ and its dependencies for segmentation QC" | ||
|
|
||
| # Set environment variables | ||
| ENV PYTHONUNBUFFERED=1 | ||
| ENV NUMBA_CACHE_DIR='tmp' | ||
| ENV MPLCONFIGDIR='tmp/matplotlib' | ||
| ENV XDG_CACHE_HOME='tmp' | ||
|
|
||
| # Install system dependencies required by geopandas, rasterio, rtree, igraph | ||
| RUN apt-get update && apt-get install -y --no-install-recommends \ | ||
| build-essential \ | ||
| libgeos-dev \ | ||
| libgdal-dev \ | ||
| libspatialindex-dev \ | ||
| && rm -rf /var/lib/apt/lists/* | ||
|
|
||
| # Set the working directory | ||
| WORKDIR /app | ||
|
|
||
| # Install segtraq with pinned version and its key dependencies | ||
| RUN pip install --no-cache-dir \ | ||
| segtraq==0.0.3 \ | ||
| spatialdata>=0.7.2 \ | ||
| spatialdata-io>=0.1.4 \ | ||
| anndata>=0.12 \ | ||
| scanpy \ | ||
| squidpy>=1.6.2 \ | ||
| geopandas \ | ||
| igraph \ | ||
| rtree \ | ||
| rasterio \ | ||
| ovrlpy>=1.1.0 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,44 @@ | ||
| process SEGTRAQ_BASELINE { | ||
| tag "${meta.id}" | ||
| label 'process_medium' | ||
|
|
||
| container "quay.io/priyal_tripathi/segtraq:0.0.3" | ||
|
|
||
| input: | ||
| tuple val(meta), path(spatialdata_zarr) | ||
|
|
||
| output: | ||
| tuple val(meta), path("segtraq_qc/${prefix}/"), emit: qc_results | ||
| path ("versions.yml") , emit: versions | ||
|
|
||
| when: | ||
| task.ext.when == null || task.ext.when | ||
|
|
||
| script: | ||
| // Exit if running this module with -profile conda / -profile mamba | ||
| if (workflow.profile.tokenize(',').intersect(['conda', 'mamba']).size() >= 1) { | ||
| error("SEGTRAQ_BASELINE module does not support Conda. Please use Docker / Singularity / Podman instead.") | ||
| } | ||
|
|
||
| prefix = task.ext.prefix ?: "${meta.id}" | ||
|
|
||
| template 'baseline.py' | ||
|
|
||
| stub: | ||
| // Exit if running this module with -profile conda / -profile mamba | ||
| if (workflow.profile.tokenize(',').intersect(['conda', 'mamba']).size() >= 1) { | ||
| error("SEGTRAQ_BASELINE module does not support Conda. Please use Docker / Singularity / Podman instead.") | ||
| } | ||
|
|
||
| prefix = task.ext.prefix ?: "${meta.id}" | ||
|
|
||
| """ | ||
| mkdir -p "segtraq_qc/${prefix}" | ||
| touch "segtraq_qc/${prefix}/baseline_summary.json" | ||
|
|
||
| cat <<-END_VERSIONS > versions.yml | ||
| "${task.process}": | ||
| segtraq: \$(pip show segtraq | grep Version | cut -d' ' -f2) | ||
| END_VERSIONS | ||
| """ | ||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,52 @@ | ||
| # yaml-language-server: $schema=https://raw.githubusercontent.com/nf-core/modules/master/modules/yaml-schema.json | ||
| name: segtraq_baseline | ||
| description: Run SegTraQ baseline QC metrics on a SpatialData object to assess | ||
| segmentation quality including cell counts, unassigned transcripts, transcripts/genes | ||
| per cell, transcript density, morphological features, and more. | ||
| keywords: | ||
| - segtraq | ||
| - quality control | ||
| - segmentation | ||
| - spatial transcriptomics | ||
| - baseline metrics | ||
| tools: | ||
| - custom: | ||
| description: SegTraQ - A Python toolkit for quantitative and visual quality | ||
| control of segmentation and transcript assignment in spatial omics data. | ||
| homepage: https://github.com/LazDaria/SegTraQ | ||
| documentation: https://lazdaria.github.io/SegTraQ | ||
| licence: ["MIT"] | ||
| input: | ||
| - meta: | ||
| type: map | ||
| description: | | ||
| Groovy Map containing sample information | ||
| e.g. [ id:'sample' ] | ||
| - spatialdata_zarr: | ||
| type: directory | ||
| description: | | ||
| Path to a SpatialData .zarr directory containing the spatial omics | ||
| data with segmentation results. | ||
| pattern: "*.zarr" | ||
| output: | ||
| - qc_results: | ||
| - meta: | ||
| type: map | ||
| description: | | ||
| Groovy Map containing sample information | ||
| e.g. [ id:'sample' ] | ||
| - "segtraq_qc/${prefix}/": | ||
| type: directory | ||
| description: | | ||
| Directory containing SegTraQ baseline QC results including | ||
| baseline_summary.json with cell counts, transcript/gene stats, | ||
| and morphological features. | ||
| pattern: "segtraq_qc/*/" | ||
| - versions: | ||
| type: file | ||
| description: File containing software versions | ||
| pattern: "versions.yml" | ||
| authors: | ||
| - "@priyaltripathi" | ||
| maintainers: | ||
| - "@priyaltripathi" |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,107 @@ | ||
| #!/usr/bin/env python | ||
|
|
||
| """Compute baseline statistics on spatialdata object for QC.""" | ||
|
|
||
| import os | ||
| import segtraq | ||
| import spatialdata as sd | ||
| import json | ||
| import subprocess | ||
|
|
||
| def main(): | ||
| print("[START] SegTraQ Baseline QC") | ||
| input_path = "${spatialdata_zarr}" | ||
| prefix = "${prefix}" | ||
| centroid_x_key = "${params.segtraq_centroid_x_key}" | ||
| centroid_y_key = "${params.segtraq_centroid_y_key}" | ||
| output_dir = f"segtraq_qc/{prefix}" | ||
| os.makedirs(output_dir, exist_ok=True) | ||
|
|
||
| #reading the spatial data | ||
| print(f"[INFO] Reading SpatialData object from: {input_path}") | ||
| sdata = sd.read_zarr(input_path) | ||
|
|
||
| #initialiizing segtraq object | ||
| cx_key = centroid_x_key if centroid_x_key not in ("null", "", "None") else None | ||
| cy_key = centroid_y_key if centroid_y_key not in ("null", "", "None") else None | ||
| print("[INFO] Initializing SegTraQ object") | ||
| st = segtraq.SegTraQ( | ||
| sdata, | ||
| images_key = None, | ||
| tables_area_key = None, | ||
| points_background_id =0, | ||
| tables_centroid_x_key=cx_key, | ||
| tables_centroid_y_key=cy_key, | ||
| ) | ||
|
|
||
| print(f"[INFO] Computing baseline QC metrics") | ||
| summary = {} | ||
|
|
||
| #number of cells | ||
| n_cells = st.bl.num_cells() | ||
| summary["num_cells"] = int(n_cells) | ||
| print(f" num_cells: {n_cells}") | ||
|
|
||
| #number of transcripts | ||
| n_transcripts = st.bl.num_transcripts() | ||
| summary["num_transcripts"] = int(n_transcripts) | ||
| print(f" num_transcripts: {n_transcripts}") | ||
|
|
||
| #number of genes | ||
| n_genes = st.bl.num_genes() | ||
| summary["num_genes"] = int(n_genes) | ||
| print(f" num_genes: {n_genes}") | ||
|
|
||
| #percentage of assigned transcripts | ||
| percentage_unassgn_transcripts = st.bl.perc_unassigned_transcripts() | ||
| summary["percent_unassigned_transcripts"] = int(percentage_unassgn_transcripts) | ||
| print(f" percent_unassigned_transcripts: {percentage_unassgn_transcripts}") | ||
|
|
||
| #unassigned transcripts per gene | ||
| unassgn_transcripts_per_gene = st.bl.perc_unassigned_transcripts_per_gene() | ||
| summary["unassigned_transcripts_per_gene"] = int(unassgn_transcripts_per_gene) | ||
| print(f" unassigned_transcripts_per_gene: {unassgn_transcripts_per_gene}") | ||
|
|
||
| #transcripts per cell | ||
| transcripts_per_cell = st.bl.transcripts_per_cell() | ||
| summary["transcripts_per_cell"] = int(transcripts_per_cell) | ||
| print(f" transcripts_per_cell: {transcripts_per_cell}") | ||
|
|
||
| #genes per celll | ||
| genes_per_cell = st.bl.genes_per_cell() | ||
| summary["genes_per_cell"] = int(genes_per_cell) | ||
| print(f" genes_per_cell: {genes_per_cell}") | ||
|
|
||
| #transcript density | ||
| transcript_density = st.bl.transcript_density() | ||
| summary["transcript_density"] = int(transcript_density) | ||
| print(f" transcript_density: {transcript_density}") | ||
|
|
||
| #mean transcripts per gene cell | ||
| mean_transcripts_per_gene_per_cell = st.bl.mean_transcripts_per_gene_per_cell() | ||
| summary["mean_transcripts_per_gene_per_cell"] = int(mean_transcripts_per_gene_per_cell) | ||
| print(f" mean_transcripts_per_gene_per_cell: {mean_transcripts_per_gene_per_cell}") | ||
|
|
||
| #morphological features | ||
| morpho_features = st.bl.morphological_features() | ||
| summary["morpho_features"] = int(morpho_features) | ||
| print(f" morpho_features: {morpho_features}") | ||
|
|
||
| #summary | ||
| with open(f"{output_dir}/baseline_summary.json", "w") as f: | ||
| json.dump(summary, f, indent=2) | ||
| print(f"[INFO] Summary written to {output_dir}/baseline_summary.json") | ||
|
|
||
| version = subprocess.check_output( | ||
| ["pip", "show", "segtraq"], text=True | ||
| ) | ||
| segtraq_version = [l for l in version.splitlines() if l.startswith("Version:")][0].split(": ")[1] | ||
|
|
||
| with open("versions.yml", "w") as f: | ||
| f.write('"${task.process}":\n') | ||
| f.write(f' segtraq: "{segtraq_version}"\n') | ||
| f.write(f' spatialdata: "{sd.__version__}"\n') | ||
| print("[FINISH] SegTraQ Baseline QC") | ||
|
|
||
| if __name__ == "__main__": | ||
| main() |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,44 @@ | ||
| process SEGTRAQ_CLUSTERING_STABILITY { | ||
| tag "${meta.id}" | ||
| label 'process_medium' | ||
|
|
||
| container "quay.io/priyal_tripathi/segtraq:0.0.3" | ||
|
|
||
| input: | ||
| tuple val(meta), path(spatialdata_zarr) | ||
|
|
||
| output: | ||
| tuple val(meta), path("segtraq_qc/${prefix}/"), emit: qc_results | ||
| path("versions.yml") , emit: versions | ||
|
|
||
| when: | ||
| task.ext.when == null || task.ext.when | ||
|
|
||
| script: | ||
| // Exit if running this module with -profile conda / -profile mamba | ||
| if (workflow.profile.tokenize(',').intersect(['conda', 'mamba']).size() >= 1) { | ||
| error("SEGTRAQ_CLUSTERING_STABILITY module does not support Conda. Please use Docker / Singularity / Podman instead.") | ||
| } | ||
|
|
||
| prefix = task.ext.prefix ?: "${meta.id}" | ||
|
|
||
| template('clustering_stability.py') | ||
|
|
||
| stub: | ||
| // Exit if running this module with -profile conda / -profile mamba | ||
| if (workflow.profile.tokenize(',').intersect(['conda', 'mamba']).size() >= 1) { | ||
| error("SEGTRAQ_CLUSTERING_STABILITY module does not support Conda. Please use Docker / Singularity / Podman instead.") | ||
| } | ||
|
|
||
| prefix = task.ext.prefix ?: "${meta.id}" | ||
|
|
||
| """ | ||
| mkdir -p "segtraq_qc/${prefix}" | ||
| touch "segtraq_qc/${prefix}/clustering_stability_summary.json" | ||
|
|
||
| cat <<-END_VERSIONS > versions.yml | ||
| "${task.process}": | ||
| segtraq: \$(pip show segtraq | grep Version | cut -d' ' -f2) | ||
| END_VERSIONS | ||
| """ | ||
| } | ||
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just write
template clustering_stability.py