Skip to content

feat: native TSV output — bypass mzIdentML for OpenMS/Percolator pipelines#9

Open
ypriverol wants to merge 2 commits intodevfrom
feature/native-tsv-output
Open

feat: native TSV output — bypass mzIdentML for OpenMS/Percolator pipelines#9
ypriverol wants to merge 2 commits intodevfrom
feature/native-tsv-output

Conversation

@ypriverol
Copy link
Copy Markdown
Member

Summary

  • Add -outputFormat parameter (mzid, tsv, both) to write search results directly to TSV from in-memory objects, bypassing mzIdentML XML serialization + JAXB round-trip
  • When -addFeatures 1 is used, 15 Percolator feature columns (ExplainedIonCurrentRatio, NTermIonCurrentRatio, CTermIonCurrentRatio, MS2IonCurrent, NumMatchedMainIons, error statistics) are appended to the TSV
  • Peptide modification format (M+15.995) is already compatible with OpenMS MSGFPlusAdapter.modifySequence_() — zero changes needed in OpenMS

Motivation

In quantms/OpenMS, MS-GF+ results flow through:

  1. MS-GF+ writes .mzid (JAXB XML serialization)
  2. MzIDToTsv reads mzid back via JAXB unmarshaller → .tsv
  3. OpenMS MSGFPlusAdapter reads the TSV → idXML

With -outputFormat tsv, steps 1+2 are eliminated entirely. No JAXB object graph, no XML serialization, no multi-GB mzid files.

TSV columns (base)

#SpecFile  SpecID  ScanNum  [Title]  FragMethod  Precursor  IsotopeError
PrecursorError(ppm|Da)  Charge  Peptide  Protein  DeNovoScore  MSGFScore
SpecEValue  EValue  [QValue  PepQValue]

TSV columns (with -addFeatures 1)

All base columns plus:

ExplainedIonCurrentRatio  NTermIonCurrentRatio  CTermIonCurrentRatio
MS2IonCurrent  MS1IonCurrent  IsolationWindowEfficiency
NumMatchedMainIons  MeanErrorAll  StdevErrorAll  MeanErrorTop7
StdevErrorTop7  MeanRelErrorAll  StdevRelErrorAll  MeanRelErrorTop7
StdevRelErrorTop7

Files changed

  • New: DirectTSVWriter.java — streams TSV rows from in-memory search results
  • Modified: MSGFPlus.java — output format branching after Q-value computation
  • Modified: ParamManager.java — new -outputFormat parameter (enum: mzid/tsv/both)
  • Modified: SearchParams.java — outputFormat field, writeMzid()/writeTsv() helpers

Verification

  • diff between direct TSV and MzIDToTsv output: zero differences (byte-for-byte identical on test.mgf with 818 PSMs)
  • All 119 tests pass
  • Backwards compatible: default remains mzid

Test plan

  • mvn test — 119 tests, 0 failures
  • Direct TSV vs MzIDToTsv: identical output verified by diff
  • -outputFormat 2 (both): mzid + TSV written correctly
  • -addFeatures 1 -outputFormat 1: all 15 Percolator feature columns populated
  • Fixed mods (C+57.021), variable mods (M+15.995), MGF Title column all correct
  • OpenMS MSGFPlusAdapter: feed TSV, verify idXML output

🤖 Generated with Claude Code

ypriverol and others added 2 commits April 14, 2026 21:08
Write search results directly to TSV from in-memory objects, bypassing
mzIdentML serialization. Output is column-identical to MzIDToTsv
(verified by diff on test.mgf search). This avoids generating large
.mzid files when only TSV is needed downstream (e.g. OpenMS
MSGFPlusAdapter, Percolator).

- New DirectTSVWriter class with same score/protein/mod logic as
  MZIdentMLGen but streaming tab-delimited output
- New -outputFormat parameter: 0=mzid (default), 1=tsv, 2=both
- Includes fixed + variable mods, MGF Title column, decoy filtering
- Backwards compatible: default remains mzid

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When -addFeatures 1 is used with -outputFormat tsv, the TSV now
includes all PSMFeatureFinder columns needed for Percolator:
ExplainedIonCurrentRatio, NTermIonCurrentRatio, CTermIonCurrentRatio,
MS2IonCurrent, MS1IonCurrent, IsolationWindowEfficiency,
NumMatchedMainIons, and all error statistics (MeanError/StdevError
for All and Top7, both absolute and relative).

These features were previously only available as UserParams in mzid
and were not extracted by OpenMS's addMSGFFeatures() — now they are
directly accessible as TSV columns.

The peptide modification format (M+15.995) is already compatible with
OpenMS MSGFPlusAdapter's modifySequence_() converter which transforms
it to bracket notation M[+15.995] for AASequence.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 14, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: de2754c5-f9f5-409f-a7e7-b36a4c09e745

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feature/native-tsv-output

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant