Refactor TopoDiff recipe to use diffusion framework by CharlelieLrt · Pull Request #1562 · NVIDIA/physicsnemo

CharlelieLrt · 2026-04-10T21:07:29Z

PhysicsNeMo Pull Request

Description

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.
The CHANGELOG.md is up to date with these changes.
An issue is linked to this pull request.
If I am implementing a new model or modifying any existing model, I have followed the Models Implementation Coding Standards.

Dependencies

Review Process

All PRs are reviewed by the PhysicsNeMo team before merging.

Depending on which files are changed, GitHub may automatically assign a maintainer for review.

We are also testing AI-based code review tools (e.g., Greptile), which may add automated comments with a confidence score.
This score reflects the AI’s assessment of merge readiness and is not a qualitative judgment of your work, nor is
it an indication that the PR will be accepted / rejected.

AI-generated feedback should be reviewed critically for usefulness.
You are not required to respond to every AI comment, but they are intended to help both authors and reviewers.
Please react to Greptile comments with 👍 or 👎 to provide feedback on their accuracy.

greptile-apps · 2026-04-10T21:11:20Z

Greptile Summary

This PR refactors the TopoDiff example scripts to use the physicsnemo.diffusion framework, replacing the custom Diffusion class with DDPMLinearNoiseScheduler, MSEDSMLoss, DPSScorePredictor, and the framework sample() loop. New adapters (DDPMSolver, ClassifierGuidance) and a component test file are added in utils.py and test_components.py.

There are two P1 bugs affecting training correctness:

Double normalization in train_classifier.py and train_regressor.py: both scripts normalize pixel values to [-1, 1] before the training loop and then apply the same * 2 - 1 transform again inside the loop, producing a [-3, 1] input range throughout training.

Important Files Changed

Filename	Overview
examples/generative/topodiff/inference.py	Replaces hand-rolled DDPM loop with DPSScorePredictor + DDPMSolver + framework sample(); gradient flow preserved via torch.inference_mode(False); plotting improved to handle dynamic batch sizes.
examples/generative/topodiff/utils.py	Adds DDPMLinearNoiseScheduler, DDPMSolver, and ClassifierGuidance adapters wrapping the new diffusion framework; DDPMSolver stochastic noise check uses .sum() which is fragile, and timesteps() has a div-by-zero edge case when num_steps=1.
examples/generative/topodiff/train.py	Replaces old Diffusion class with MSEDSMLoss + DDPMLinearNoiseScheduler; license header has a stray "cd .." appended.
examples/generative/topodiff/train_classifier.py	Replaces manual noise injection with noise_scheduler.add_noise(); contains a double-normalization bug: images are mapped [0,1]→[-1,1] at load time, then the same transform is applied again inside the training loop, producing a [-3,1] range.
examples/generative/topodiff/train_regressor.py	Same double-normalization as train_classifier.py: topologies are scaled [0,1]→[-1,1] before the loop, then scaled again inside the loop, yielding an incorrect [-3,1] range for training.
examples/generative/topodiff/test_components.py	New test file that validates scheduler, noise, epsilon round-trips, DDPMSolver, MSEDSMLoss, and sample() against the original Diffusion class; well-structured and covers the key refactored components.

Comments Outside Diff (3)

examples/generative/topodiff/train_classifier.py, line 79-81 (link)

Same double normalization in validation branch

The validation batch has the same issue — valid_img was already mapped to [-1, 1] on line 40, so the * 2 - 1 here produces [-3, 1]:
examples/generative/topodiff/train_regressor.py, line 77-79 (link)

Double normalization — topologies scaled to [-3, 1] instead of [-1, 1]

Line 40 already normalizes topologies to [-1, 1] with topologies = topologies * 2 - 1. Applying * 2 - 1 again in the training loop maps the range to [-3, 1], feeding the regressor incorrectly scaled inputs throughout training:
examples/generative/topodiff/train.py, line 15 (link)

Stray shell command in license header

cd .. was accidentally appended to the closing license comment line:

_{Reviews (1): Last reviewed commit: "Refactor TopoDiff recipe to use diffusio..." | Re-trigger Greptile}

greptile-apps · 2026-04-10T21:11:24Z

@@ -64,8 +63,8 @@ def main(cfg: DictConfig) -> None:
        batch = torch.tensor(train_img[idx]).float().unsqueeze(1).to(device) * 2 - 1


Double normalization — training data scaled to [-3, 1] instead of [-1, 1]

load_data_classifier returns pixel values in [0, 1]. Lines 39–40 already map those to [-1, 1] via train_img = 2 * train_img - 1. The training loop then applies * 2 - 1 a second time, mapping [-1, 1] → [-3, 1]. The classifier therefore never sees correctly scaled inputs, which will hurt convergence. Remove the redundant transform:

Suggested change

batch = torch.tensor(train_img[idx]).float().unsqueeze(1).to(device) * 2 - 1

batch = torch.tensor(train_img[idx]).float().unsqueeze(1).to(device)

That's a valid point. However, the scope of this PR is only to refactor the TopoDiff recipe such that it uses the diffusion toolkit. Fixing pre-existing bugs is out-of-scope for this PR.

I'll let @mnabian chime in, but it does look like a bug to me

I agree that this is a valid point. But I suggest we leave it as is (or add a comment to the code about this issue) since the results are already validated.

Signed-off-by: CharlelieLrt <claurent@nvidia.com>

…refactor

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

coreyjadams

approved for pyproject.toml

mnabian

I think the changes look good overall. I could not spot any major issues. It would be good to track the known bugs in the code in an issue. Hopefully at some point in the future we can revisit this example and extend it to 3D cases.

CharlelieLrt requested a review from pzharrington as a code owner April 10, 2026 21:07

greptile-apps bot reviewed Apr 10, 2026

View reviewed changes

Refactor TopoDiff recipe to use diffusion framework

3355188

Signed-off-by: CharlelieLrt <claurent@nvidia.com>

CharlelieLrt force-pushed the topodiff-diffusion-refactor branch from 35a1292 to 3355188 Compare April 10, 2026 23:42

CharlelieLrt requested review from coreyjadams and ktangsali as code owners April 10, 2026 23:42

CharlelieLrt and others added 3 commits April 16, 2026 15:39

Merge branch 'main' into topodiff-diffusion-refactor

7053160

Merge remote-tracking branch 'upstream/main' into topodiff-diffusion-…

df89566

…refactor

Update examples/generative/topodiff/utils.py

bc9ef27

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

CharlelieLrt requested a review from mnabian April 17, 2026 02:04

coreyjadams approved these changes Apr 17, 2026

View reviewed changes

mnabian approved these changes Apr 17, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor TopoDiff recipe to use diffusion framework#1562

Refactor TopoDiff recipe to use diffusion framework#1562
CharlelieLrt wants to merge 4 commits intoNVIDIA:mainfrom
CharlelieLrt:topodiff-diffusion-refactor

CharlelieLrt commented Apr 10, 2026

Uh oh!

greptile-apps bot commented Apr 10, 2026 •

edited

Loading

Comments Outside Diff (3)

Uh oh!

greptile-apps bot Apr 10, 2026

Uh oh!

CharlelieLrt Apr 17, 2026

Uh oh!

CharlelieLrt Apr 17, 2026 •

edited

Loading

Uh oh!

mnabian Apr 17, 2026

Uh oh!

Uh oh!

Uh oh!

coreyjadams left a comment

Uh oh!

mnabian left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		@@ -64,8 +63,8 @@ def main(cfg: DictConfig) -> None:
		batch = torch.tensor(train_img[idx]).float().unsqueeze(1).to(device) * 2 - 1

Conversation

CharlelieLrt commented Apr 10, 2026

PhysicsNeMo Pull Request

Description

Checklist

Dependencies

Review Process

Uh oh!

greptile-apps bot commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Important Files Changed

Comments Outside Diff (3)

Uh oh!

greptile-apps bot Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

CharlelieLrt Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

CharlelieLrt Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mnabian Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coreyjadams left a comment

Choose a reason for hiding this comment

Uh oh!

mnabian left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

greptile-apps bot commented Apr 10, 2026 •

edited

Loading

CharlelieLrt Apr 17, 2026 •

edited

Loading