feat: native TSV output — bypass mzIdentML for OpenMS/Percolator pipelines#9
Open
feat: native TSV output — bypass mzIdentML for OpenMS/Percolator pipelines#9
Conversation
Write search results directly to TSV from in-memory objects, bypassing mzIdentML serialization. Output is column-identical to MzIDToTsv (verified by diff on test.mgf search). This avoids generating large .mzid files when only TSV is needed downstream (e.g. OpenMS MSGFPlusAdapter, Percolator). - New DirectTSVWriter class with same score/protein/mod logic as MZIdentMLGen but streaming tab-delimited output - New -outputFormat parameter: 0=mzid (default), 1=tsv, 2=both - Includes fixed + variable mods, MGF Title column, decoy filtering - Backwards compatible: default remains mzid Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When -addFeatures 1 is used with -outputFormat tsv, the TSV now includes all PSMFeatureFinder columns needed for Percolator: ExplainedIonCurrentRatio, NTermIonCurrentRatio, CTermIonCurrentRatio, MS2IonCurrent, MS1IonCurrent, IsolationWindowEfficiency, NumMatchedMainIons, and all error statistics (MeanError/StdevError for All and Top7, both absolute and relative). These features were previously only available as UserParams in mzid and were not extracted by OpenMS's addMSGFFeatures() — now they are directly accessible as TSV columns. The peptide modification format (M+15.995) is already compatible with OpenMS MSGFPlusAdapter's modifySequence_() converter which transforms it to bracket notation M[+15.995] for AASequence. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
-outputFormatparameter (mzid,tsv,both) to write search results directly to TSV from in-memory objects, bypassing mzIdentML XML serialization + JAXB round-trip-addFeatures 1is used, 15 Percolator feature columns (ExplainedIonCurrentRatio, NTermIonCurrentRatio, CTermIonCurrentRatio, MS2IonCurrent, NumMatchedMainIons, error statistics) are appended to the TSVM+15.995) is already compatible with OpenMSMSGFPlusAdapter.modifySequence_()— zero changes needed in OpenMSMotivation
In quantms/OpenMS, MS-GF+ results flow through:
.mzid(JAXB XML serialization)MzIDToTsvreads mzid back via JAXB unmarshaller →.tsvMSGFPlusAdapterreads the TSV → idXMLWith
-outputFormat tsv, steps 1+2 are eliminated entirely. No JAXB object graph, no XML serialization, no multi-GB mzid files.TSV columns (base)
TSV columns (with
-addFeatures 1)All base columns plus:
Files changed
DirectTSVWriter.java— streams TSV rows from in-memory search resultsMSGFPlus.java— output format branching after Q-value computationParamManager.java— new-outputFormatparameter (enum: mzid/tsv/both)SearchParams.java— outputFormat field,writeMzid()/writeTsv()helpersVerification
diffbetween direct TSV and MzIDToTsv output: zero differences (byte-for-byte identical on test.mgf with 818 PSMs)mzidTest plan
mvn test— 119 tests, 0 failures-outputFormat 2(both): mzid + TSV written correctly-addFeatures 1 -outputFormat 1: all 15 Percolator feature columns populated🤖 Generated with Claude Code