Sbp 364 dataset#51
Conversation
upload endpoint
marius-mather
left a comment
There was a problem hiding this comment.
Looks good, just a few suggestions for making the code a bit cleaner
|
|
||
| await asyncio.sleep(2) | ||
|
|
||
| sequences = [{"id": s.id, "group": s.group} for s in payload.sequences] |
There was a problem hiding this comment.
You shouldn't need to convert this to dicts, just keep it as the pydantic type. That will make the type clearer on upload_interaction_screening_dataset too
| dataset = await create_seqera_dataset( | ||
| name=payload.datasetName, description=payload.datasetDescription | ||
| ) | ||
| dataset = await create_seqera_dataset() |
There was a problem hiding this comment.
supply a name explicitly, even if it's just dataset
| @router.post( | ||
| "/datasets/upload", | ||
| response_model=DatasetUploadResponse, | ||
| dependencies=[Depends(require_workflow_execution_role)], |
There was a problem hiding this comment.
Do all the endpoints under workflows require the workflow execution role? If so, apply it to the router
| return str(value) | ||
|
|
||
|
|
||
| def build_unique_dataset_name(name: str) -> str: |
There was a problem hiding this comment.
Add a docstring with an example of what the name looks like
| for s in sequences | ||
| ] | ||
|
|
||
| output = io.StringIO() |
There was a problem hiding this comment.
It's a good habit to call output.close() after you're done with the file (or you can use with io.iStringIO() as output: to automatically close the file after the with: block
Pull Request
Summary
SBP-XXX Add interaction screening dataset upload endpoint. This introduces a dedicated samplesheet builder and API endpoint for the interaction screening workflow, which requires a structured CSV with sequence file paths and group assignments (
query/target→g1/g2).Changes
POST /datasets/interaction-screening/uploadinroutes/workflows.py— accepts a list of sequences withid/groupand arunId, creates a named Seqera dataset, and uploads the generated samplesheetupload_interaction_screening_dataset()inservices/datasets.py— builds the samplesheet CSV (id,sequence,group,typecolumns) and uploads it to Seqera; sequence file paths are resolved under/g/data/yz52/sbp-service/input/interaction_screening/<unique-run-path>/; groups mapquery→g1,target→g2build_unique_dataset_name()inservices/datasets.py— generates a timestamped, slug-safe name with a 4-char random suffix; replaces the previous millisecond-timestamp approachSequenceItem(id: str,group: Literal["query", "target"]) andInteractionScreeningDatasetUploadRequest(sequences,runId) inschemas/workflows.pyDatasetUploadRequest— removeddatasetNameanddatasetDescriptionfields; dataset naming is now handled automatically bybuild_unique_dataset_namecreate_seqera_dataset()signature changed from(name: str | None, description: str | None)to(name: str = "dataset");descriptionis no longer sent to the Seqera API/datasets/uploadnow requiresrequire_workflow_execution_role(previously unprotected)test_services_datasets.py,test_additional_coverage.py, andtest_schemas.pyHow to Test
Start the backend locally with valid Seqera env vars (
SEQERA_API_URL,SEQERA_ACCESS_TOKEN,WORK_SPACE)Call the new endpoint:
Verify the response contains a
datasetIdand"success": trueIn Seqera Platform, confirm a dataset was created with a uniquely named samplesheet CSV containing the correct
g1/g2groups and file paths underinteraction_screening/Run tests:
Type of change
Checklist