Conversation
Remove Numpy version pin; general linting and typing updates
There was a problem hiding this comment.
Pull request overview
This PR introduces IM2Deep 2.0 API, a major refactoring from version 1.2.0 to 2.0.0-beta. The changes modernize the codebase with comprehensive test coverage, refactored architecture using PyTorch Lightning, and an improved API design with better separation of concerns.
Changes:
- Complete API refactoring with new modular structure (core, calibration, model_ops, constants)
- Added comprehensive test suite with 9 test modules covering ~90% of functionality
- Replaced DeepLC-based models with native PyTorch Lightning implementations
- Updated dependencies: removed version constraint on deeplc, added torch and lightning as core dependencies
- Enhanced CLI with better argument handling and profiling support
Reviewed changes
Copilot reviewed 27 out of 36 changed files in this pull request and generated 29 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/* | New comprehensive test suite with fixtures, unit tests, and integration tests |
| pytest.ini | Pytest configuration with markers for integration and slow tests |
| pyproject.toml | Updated dependencies: removed deeplc constraint and er extras, added torch/lightning |
| im2deep/utils.py | Major expansion with input parsing, validation, CCS conversion, and CLI utilities |
| im2deep/_model_ops.py | New file for PyTorch model loading and prediction operations |
| im2deep/core.py | New high-level API with predict() and predict_and_calibrate() functions |
| im2deep/constants.py | New file centralizing model paths, configurations, and physical constants |
| im2deep/calibration.py | Refactored calibration with abstract base class and LinearCCSCalibration implementation |
| im2deep/_architecture.py | New file with PyTorch Lightning model architectures (IM2Deep, IM2DeepMulti, IM2DeepMultiTransfer) |
| im2deep/_exceptions.py | Minor formatting improvements to exception classes |
| im2deep/main.py | Major CLI refactor with DefaultCommandGroup, improved argument handling, and profiling |
| im2deep/init.py | Updated to version 2.0.0-beta with new API exports |
| README.md | Comprehensive documentation update with CLI examples and API usage |
| .github/workflows/test.yml | New CI/CD workflow for automated testing |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
RalfG
left a comment
There was a problem hiding this comment.
Some general comments:
- I think we could reconsider making some more modules private or split them into a public and private part (for instance utils.py)
- _architecture.py is quite a long file. Perhaps it could be split up into submodules for loss functions and each of the models?
- Exceptions raised in public functions should be also be public, so they can be caught if needed in downstream applications. (This was also wrong in DeepLC 4.0)
- Are the functions in _model_ops.py still necessary, or could they be replaced with a Lightning Trainer class?
- Use ruff for faster linting, including formatting checks - Use uv for faster installs and builds, including caching - Fix issue where test workflow would be run on 'master' branch, while default branch is 'main'
…optional dependencies to dependency groups
…aining typing issues; add some to-do's.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 37 out of 47 changed files in this pull request and generated 11 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
| from im2deep import _model_ops | ||
| from im2deep.calibration import Calibration, LinearCCSCalibration | ||
| from im2deep.constants import DEFAULT_MODEL, DEFAULT_MULTI_MODEL | ||
| from im2deep.utils import validate_psm_list |
| LOGGER.info("Predicting CCS values using IM2Deep.") | ||
| psm_list = validate_psm_list(psm_list) | ||
| return _model_ops.predict( | ||
| model=model or DEFAULT_MODEL if not multi else DEFAULT_MULTI_MODEL, |
| ) | ||
|
|
||
| # Assign the predicted CCS to the PSM metadata | ||
| for idx, psm in enumerate(psm_list): |
| psm_list_filtered = psm_list[charges != None] # noqa: E711 | ||
| psm_list_filtered = psm_list_filtered[charges <= 6] |
| if ( | ||
| psm.ion_mobility is not None | ||
| and psm.metadata is not None | ||
| and psm.metadata.get("CCS") is None | ||
| ): | ||
| psm.metadata["CCS"] = str( | ||
| im2ccs( | ||
| psm.ion_mobility, | ||
| psm.peptidoform.theoretical_mz, | ||
| psm.peptidoform.precursor_charge, | ||
| ) | ||
| ) | ||
| # Ensure CCS is always stored as float | ||
| elif psm.metadata.get("CCS") is not None: | ||
| ccs_value = psm.metadata["CCS"] | ||
| if not isinstance(ccs_value, float): | ||
| psm.metadata["CCS"] = float(ccs_value) | ||
|
|
| def test_ccs2im_basic(self): | ||
| """Test basic CCS to ion mobility conversion.""" | ||
| ccs = 450.0 | ||
| charge = 2 | ||
| mz = 500.0 | ||
| im = ccs2im(ccs, charge, mz) | ||
|
|
||
| assert isinstance(im, float) | ||
| assert im > 0 | ||
|
|
||
| def test_ccs2im_array(self): | ||
| """Test CCS to ion mobility conversion with arrays.""" | ||
| ccs = np.array([450.0, 520.0, 480.0]) | ||
| charge = np.array([2, 3, 2]) | ||
| mz = np.array([500.0, 600.0, 550.0]) | ||
|
|
||
| im = ccs2im(ccs, charge, mz) | ||
|
|
||
| assert isinstance(im, np.ndarray) | ||
| assert len(im) == len(ccs) | ||
| assert np.all(im > 0) | ||
|
|
||
| def test_im2ccs_basic(self): | ||
| """Test basic ion mobility to CCS conversion.""" | ||
| im = 1.0 | ||
| charge = 2 | ||
| mz = 500.0 | ||
| ccs = im2ccs(im, charge, mz) | ||
|
|
| predictions = core.predict_and_calibrate(psm_list, psm_list_cal, *args, **kwargs) | ||
| else: | ||
| LOGGER.info( | ||
| "No calibration file provided (calibration is HIGHLY recommended), performing prediction only..." | ||
| ) | ||
| predictions = core.predict(*args, **kwargs) |
| # TODO: Implement load_model function here (also config) and path to default model? | ||
| model = _get_architecture( | ||
| multi=multi, | ||
| ).load_from_checkpoint( | ||
| checkpoint_path=model, # type: ignore # TODO: Match with function signature | ||
| config=_get_model_config(multi=multi), | ||
| criterion=_get_loss_function(multi=multi), | ||
| ) | ||
| model.eval() |
| ) | ||
| model.eval() | ||
| LOGGER.debug(f"Model loaded on device: {device}") | ||
|
|
||
| data_loader = DataLoader( | ||
| data, | ||
| batch_size=batch_size, | ||
| shuffle=False, | ||
| num_workers=num_workers, | ||
| ) | ||
| LOGGER.debug("DataLoader created for prediction.") | ||
| LOGGER.debug("Starting prediction loop.") | ||
| predictions = _predict_loop(model, data_loader, device) | ||
| return predictions.cpu().detach() | ||
|
|
| @@ -0,0 +1,26 @@ | |||
| """Architecture callbacks.""" | |||
|
|
|||
| import pytorch_lightning as L | |||
No description provided.