Sbp 364 dataset by vtnphan · Pull Request #51 · AustralianBioCommons/sbp-backend

vtnphan · 2026-05-28T05:44:32Z

Pull Request

Summary

SBP-XXX Add interaction screening dataset upload endpoint. This introduces a dedicated samplesheet builder and API endpoint for the interaction screening workflow, which requires a structured CSV with sequence file paths and group assignments (query/target → g1/g2).

Changes

New endpoint POST /datasets/interaction-screening/upload in routes/workflows.py — accepts a list of sequences with id/group and a runId, creates a named Seqera dataset, and uploads the generated samplesheet
New service function upload_interaction_screening_dataset() in services/datasets.py — builds the samplesheet CSV (id, sequence, group, type columns) and uploads it to Seqera; sequence file paths are resolved under /g/data/yz52/sbp-service/input/interaction_screening/<unique-run-path>/; groups map query → g1, target → g2
New helper build_unique_dataset_name() in services/datasets.py — generates a timestamped, slug-safe name with a 4-char random suffix; replaces the previous millisecond-timestamp approach
New schemas SequenceItem (id: str, group: Literal["query", "target"]) and InteractionScreeningDatasetUploadRequest (sequences, runId) in schemas/workflows.py
Breaking DatasetUploadRequest — removed datasetName and datasetDescription fields; dataset naming is now handled automatically by build_unique_dataset_name
Breaking create_seqera_dataset() signature changed from (name: str | None, description: str | None) to (name: str = "dataset"); description is no longer sent to the Seqera API
Security /datasets/upload now requires require_workflow_execution_role (previously unprotected)
Tests added coverage for the new service function, route, and schemas in test_services_datasets.py, test_additional_coverage.py, and test_schemas.py

How to Test

Start the backend locally with valid Seqera env vars (SEQERA_API_URL, SEQERA_ACCESS_TOKEN, WORK_SPACE)
Call the new endpoint:

curl -X POST http://localhost:8000/api/workflows/datasets/interaction-screening/upload \
  -H "Authorization: Bearer <token>" \
  -H "Content-Type: application/json" \
  -d '{
    "runId": "my-test-run",
    "sequences": [
      {"id": "seq_A", "group": "query"},
      {"id": "seq_B", "group": "target"}
    ]
  }'

Verify the response contains a datasetId and "success": true
In Seqera Platform, confirm a dataset was created with a uniquely named samplesheet CSV containing the correct g1/g2 groups and file paths under interaction_screening/
Run tests:

.venv/bin/python -m pytest tests/test_services_datasets.py tests/test_additional_coverage.py tests/test_schemas.py -v

Type of change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)
Documentation update

Checklist

I have added tests that prove my fix is effective or that my feature works
I have added or updated documentation where necessary
I have run linting and unit tests locally
The code follows the project's style guidelines

upload endpoint

marius-mather

Looks good, just a few suggestions for making the code a bit cleaner

marius-mather · 2026-05-28T22:42:16Z

+
+    await asyncio.sleep(2)
+
+    sequences = [{"id": s.id, "group": s.group} for s in payload.sequences]


You shouldn't need to convert this to dicts, just keep it as the pydantic type. That will make the type clearer on upload_interaction_screening_dataset too

marius-mather · 2026-05-28T22:43:00Z

-        dataset = await create_seqera_dataset(
-            name=payload.datasetName, description=payload.datasetDescription
-        )
+        dataset = await create_seqera_dataset()


supply a name explicitly, even if it's just dataset

marius-mather · 2026-05-28T22:44:11Z

+@router.post(
+    "/datasets/upload",
+    response_model=DatasetUploadResponse,
+    dependencies=[Depends(require_workflow_execution_role)],


Do all the endpoints under workflows require the workflow execution role? If so, apply it to the router

marius-mather · 2026-05-28T22:46:07Z

    return str(value)


+def build_unique_dataset_name(name: str) -> str:


Add a docstring with an example of what the name looks like

marius-mather · 2026-05-28T22:52:44Z

+        for s in sequences
+    ]
+
+    output = io.StringIO()


It's a good habit to call output.close() after you're done with the file (or you can use with io.iStringIO() as output: to automatically close the file after the with: block

vtnphan added 8 commits May 28, 2026 11:55

feat: multi-row dataset upload

65da782

fix: add interaction screening dataset

1a1360d

upload endpoint

feat: add build unique name for dataset

5208d77

fix: fix DatasetUploadRequest model

676ab48

fix: fix unique path

3c29662

fix: remove dataset description

5f948a9

fix: group number

b351aac

fix: add interaction screening dataset tests

3364238

vtnphan marked this pull request as ready for review May 28, 2026 06:54

vtnphan requested a review from marius-mather May 28, 2026 06:54

marius-mather requested changes May 28, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sbp 364 dataset#51

Sbp 364 dataset#51
vtnphan wants to merge 8 commits into
mainfrom
sbp-364-dataset

vtnphan commented May 28, 2026 •

edited

Loading

Uh oh!

marius-mather left a comment

Uh oh!

marius-mather May 28, 2026

Uh oh!

marius-mather May 28, 2026

Uh oh!

marius-mather May 28, 2026

Uh oh!

marius-mather May 28, 2026

Uh oh!

marius-mather May 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants


		await asyncio.sleep(2)

		sequences = [{"id": s.id, "group": s.group} for s in payload.sequences]

		return str(value)


		def build_unique_dataset_name(name: str) -> str:

Conversation

vtnphan commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request

Summary

Changes

How to Test

Type of change

Checklist

Uh oh!

marius-mather left a comment

Choose a reason for hiding this comment

Uh oh!

marius-mather May 28, 2026

Choose a reason for hiding this comment

Uh oh!

marius-mather May 28, 2026

Choose a reason for hiding this comment

Uh oh!

marius-mather May 28, 2026

Choose a reason for hiding this comment

Uh oh!

marius-mather May 28, 2026

Choose a reason for hiding this comment

Uh oh!

marius-mather May 28, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

vtnphan commented May 28, 2026 •

edited

Loading