Skip to content

Refactor TopoDiff recipe to use diffusion framework#1562

Open
CharlelieLrt wants to merge 4 commits intoNVIDIA:mainfrom
CharlelieLrt:topodiff-diffusion-refactor
Open

Refactor TopoDiff recipe to use diffusion framework#1562
CharlelieLrt wants to merge 4 commits intoNVIDIA:mainfrom
CharlelieLrt:topodiff-diffusion-refactor

Conversation

@CharlelieLrt
Copy link
Copy Markdown
Collaborator

PhysicsNeMo Pull Request

Description

Checklist

Dependencies

Review Process

All PRs are reviewed by the PhysicsNeMo team before merging.

Depending on which files are changed, GitHub may automatically assign a maintainer for review.

We are also testing AI-based code review tools (e.g., Greptile), which may add automated comments with a confidence score.
This score reflects the AI’s assessment of merge readiness and is not a qualitative judgment of your work, nor is
it an indication that the PR will be accepted / rejected.

AI-generated feedback should be reviewed critically for usefulness.
You are not required to respond to every AI comment, but they are intended to help both authors and reviewers.
Please react to Greptile comments with 👍 or 👎 to provide feedback on their accuracy.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Apr 10, 2026

Greptile Summary

This PR refactors the TopoDiff example scripts to use the physicsnemo.diffusion framework, replacing the custom Diffusion class with DDPMLinearNoiseScheduler, MSEDSMLoss, DPSScorePredictor, and the framework sample() loop. New adapters (DDPMSolver, ClassifierGuidance) and a component test file are added in utils.py and test_components.py.

There are two P1 bugs affecting training correctness:

  • Double normalization in train_classifier.py and train_regressor.py: both scripts normalize pixel values to [-1, 1] before the training loop and then apply the same * 2 - 1 transform again inside the loop, producing a [-3, 1] input range throughout training.

Important Files Changed

Filename Overview
examples/generative/topodiff/inference.py Replaces hand-rolled DDPM loop with DPSScorePredictor + DDPMSolver + framework sample(); gradient flow preserved via torch.inference_mode(False); plotting improved to handle dynamic batch sizes.
examples/generative/topodiff/utils.py Adds DDPMLinearNoiseScheduler, DDPMSolver, and ClassifierGuidance adapters wrapping the new diffusion framework; DDPMSolver stochastic noise check uses .sum() which is fragile, and timesteps() has a div-by-zero edge case when num_steps=1.
examples/generative/topodiff/train.py Replaces old Diffusion class with MSEDSMLoss + DDPMLinearNoiseScheduler; license header has a stray "cd .." appended.
examples/generative/topodiff/train_classifier.py Replaces manual noise injection with noise_scheduler.add_noise(); contains a double-normalization bug: images are mapped [0,1]→[-1,1] at load time, then the same transform is applied again inside the training loop, producing a [-3,1] range.
examples/generative/topodiff/train_regressor.py Same double-normalization as train_classifier.py: topologies are scaled [0,1]→[-1,1] before the loop, then scaled again inside the loop, yielding an incorrect [-3,1] range for training.
examples/generative/topodiff/test_components.py New test file that validates scheduler, noise, epsilon round-trips, DDPMSolver, MSEDSMLoss, and sample() against the original Diffusion class; well-structured and covers the key refactored components.

Comments Outside Diff (3)

  1. examples/generative/topodiff/train_classifier.py, line 79-81 (link)

    P1 Same double normalization in validation branch

    The validation batch has the same issue — valid_img was already mapped to [-1, 1] on line 40, so the * 2 - 1 here produces [-3, 1]:

  2. examples/generative/topodiff/train_regressor.py, line 77-79 (link)

    P1 Double normalization — topologies scaled to [-3, 1] instead of [-1, 1]

    Line 40 already normalizes topologies to [-1, 1] with topologies = topologies * 2 - 1. Applying * 2 - 1 again in the training loop maps the range to [-3, 1], feeding the regressor incorrectly scaled inputs throughout training:

  3. examples/generative/topodiff/train.py, line 15 (link)

    P2 Stray shell command in license header

    cd .. was accidentally appended to the closing license comment line:

Reviews (1): Last reviewed commit: "Refactor TopoDiff recipe to use diffusio..." | Re-trigger Greptile

@@ -64,8 +63,8 @@ def main(cfg: DictConfig) -> None:
batch = torch.tensor(train_img[idx]).float().unsqueeze(1).to(device) * 2 - 1
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Double normalization — training data scaled to [-3, 1] instead of [-1, 1]

load_data_classifier returns pixel values in [0, 1]. Lines 39–40 already map those to [-1, 1] via train_img = 2 * train_img - 1. The training loop then applies * 2 - 1 a second time, mapping [-1, 1] → [-3, 1]. The classifier therefore never sees correctly scaled inputs, which will hurt convergence. Remove the redundant transform:

Suggested change
batch = torch.tensor(train_img[idx]).float().unsqueeze(1).to(device) * 2 - 1
batch = torch.tensor(train_img[idx]).float().unsqueeze(1).to(device)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a valid point. However, the scope of this PR is only to refactor the TopoDiff recipe such that it uses the diffusion toolkit. Fixing pre-existing bugs is out-of-scope for this PR.

Copy link
Copy Markdown
Collaborator Author

@CharlelieLrt CharlelieLrt Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll let @mnabian chime in, but it does look like a bug to me

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that this is a valid point. But I suggest we leave it as is (or add a comment to the code about this issue) since the results are already validated.

Comment thread examples/generative/topodiff/utils.py Outdated
Comment thread examples/generative/topodiff/utils.py
Signed-off-by: CharlelieLrt <claurent@nvidia.com>
@CharlelieLrt CharlelieLrt requested a review from mnabian April 17, 2026 02:04
Copy link
Copy Markdown
Collaborator

@coreyjadams coreyjadams left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

approved for pyproject.toml

Copy link
Copy Markdown
Collaborator

@mnabian mnabian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the changes look good overall. I could not spot any major issues. It would be good to track the known bugs in the code in an issue. Hopefully at some point in the future we can revisit this example and extend it to 3D cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants