adding underfill case by ghasemiAb · Pull Request #1528 · NVIDIA/physicsnemo

ghasemiAb · 2026-03-23T14:17:47Z

PhysicsNeMo Pull Request

Description

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.
The CHANGELOG.md is up to date with these changes.
An issue is linked to this pull request.
If I am implementing a new model or modifying any existing model, I have followed the Models Implementation Coding Standards.

Dependencies

Review Process

All PRs are reviewed by the PhysicsNeMo team before merging.

Depending on which files are changed, GitHub may automatically assign a maintainer for review.

We are also testing AI-based code review tools (e.g., Greptile), which may add automated comments with a confidence score.
This score reflects the AI’s assessment of merge readiness and is not a qualitative judgment of your work, nor is
it an indication that the PR will be accepted / rejected.

AI-generated feedback should be reviewed critically for usefulness.
You are not required to respond to every AI comment, but they are intended to help both authors and reviewers.
Please react to Greptile comments with 👍 or 👎 to provide feedback on their accuracy.

greptile-apps · 2026-03-23T14:21:33Z

Greptile Summary

This PR adds a new CFD example (examples/cfd/underfill/) that trains a GeoTransolver surrogate model to autoregressively predict epoxy Volume-of-Fluid (VOF) flow fronts from transient CFD simulation data stored as VTP files. The architecture, training loop, interface-band loss, and VTP reader are all well-conceived additions, but the PR has several issues — two of which affect correctness — that should be resolved before merging.

Key issues found:

[Logic] inference.py double-counts ground-truth statistics (MAE/RMSE) per timestep when both gt_seq and source VTP files are available, because the two extraction blocks are not mutually exclusive. The reported overall error metrics will be incorrect whenever both sources exist.
[Logic] conf/reader/vtp.yaml has _target_: vtu_reader.Reader (a non-existent module in this PR) instead of _target_: vtp_reader.Reader, causing an ImportError for any workflow using this Hydra config override.
[Logic] datapipe.py saves normalization stats to a relative path ("stats") that is resolved relative to Hydra's chdir-ed output directory. Inference and validation runs in a different working directory will silently fall back to computing stats from the wrong split, potentially producing a normalization mismatch.
[Style] Both conf/training/default.yaml and conf/inference/default.yaml contain hardcoded personal absolute paths (/workspace/aghasemi/...) that will fail immediately for any other user.
[Style] config.yaml retains a stale experiment description copied from a crash-models example.
[Style] datapipe.py contains several commented-out code paths (_compute_feature_stats, _log_statistics) that leave dead code and obscure the intentional decision to skip feature normalization.
[Style] train.py uses torch.optim.Muon, which requires PyTorch ≥ 2.7 but this version requirement is not documented in the README or referenced requirements.txt.

Important Files Changed

Filename	Overview
examples/cfd/underfill/inference.py	Contains a logic bug where ground-truth statistics are double-appended when both `gt_seq` and source VTP files are available, corrupting the final MAE/RMSE/MSE statistics.
examples/cfd/underfill/conf/reader/vtp.yaml	Wrong `_target_`: references `vtu_reader.Reader` instead of `vtp_reader.Reader`, causing an ImportError for any workflow using this config override.
examples/cfd/underfill/conf/training/default.yaml	Contains hardcoded personal workspace paths that will fail for other users; also `absolute_expansion: 0` silently overrides `band_fraction` without clear documentation.
examples/cfd/underfill/conf/config.yaml	Stale experiment description copied from crash-models example; `reader` config block defined inline but `conf/reader/` override also exists with a broken target.
examples/cfd/underfill/datapipe.py	Normalization statistics directory saved to a relative path that is sensitive to Hydra's `chdir`; feature normalization is silently hardcoded via commented-out code, leaving dead code.
examples/cfd/underfill/train.py	Well-structured training loop with per-timestep interface loss and AMP support; uses `torch.optim.Muon` (requires PyTorch ≥ 2.7) without documenting the version requirement.
examples/cfd/underfill/rollout.py	Clean autoregressive rollout implementation with gradient checkpointing; `compute_interface_band` logic is correct and well-documented.
examples/cfd/underfill/vtp_reader.py	Robust VTP reader with pattern-based time-series discovery and natural sort; no issues found.
examples/cfd/underfill/conf/inference/default.yaml	Contains hardcoded personal workspace path and a malformed duplicate-key comment on line 20.

_{Reviews (1): Last reviewed commit: "adding underfill case" | Re-trigger Greptile}

greptile-apps · 2026-03-23T14:21:37Z

+        if compute_error and gt_seq is not None and len(gt_seq) >= timestep:
+            try:
+                gt_np = gt_seq[t].detach().cpu().numpy().squeeze()
+                gt_stats = TimestepStats.from_array(gt_np)
+                all_gt_stats.append(gt_stats)
+                error = pred_np - gt_np
+                mae = float(np.abs(error).mean())
+                rmse = float(np.sqrt((error ** 2).mean()))
+                all_mae.append(mae)
+                all_rmse.append(rmse)
+                gt_available_count += 1
+            except Exception:
+                pass
+
+        if os.path.exists(source_file):


Double-counting statistics when both gt_seq and source files are available

The two ground-truth extraction blocks (lines 259–269 and 273–298) are not mutually exclusive. When gt_seq is provided and the source VTP file also exists, both blocks succeed and each one appends to all_gt_stats, all_mae, all_rmse, and increments gt_available_count. The result is that every timestep with both sources present is counted twice, making the reported total_mae, total_rmse, and total_mse incorrect.

The comment "Prefer dataset-provided ground truth if available" signals the intent, but the second block lacks an elif/else guard. It should only run when the first block did not already populate gt_stats.

Suggested change

if compute_error and gt_seq is not None and len(gt_seq) >= timestep:

try:

gt_np = gt_seq[t].detach().cpu().numpy().squeeze()

gt_stats = TimestepStats.from_array(gt_np)

all_gt_stats.append(gt_stats)

error = pred_np - gt_np

mae = float(np.abs(error).mean())

rmse = float(np.sqrt((error ** 2).mean()))

all_mae.append(mae)

all_rmse.append(rmse)

gt_available_count += 1

except Exception:

pass

if os.path.exists(source_file):

# Prefer dataset-provided ground truth if available

if compute_error and gt_seq is not None and len(gt_seq) >= timestep:

try:

gt_np = gt_seq[t].detach().cpu().numpy().squeeze()

gt_stats = TimestepStats.from_array(gt_np)

all_gt_stats.append(gt_stats)

error = pred_np - gt_np

mae = float(np.abs(error).mean())

rmse = float(np.sqrt((error ** 2).mean()))

all_mae.append(mae)

all_rmse.append(rmse)

gt_available_count += 1

except Exception:

pass

if os.path.exists(source_file) and gt_stats is None:

greptile-apps · 2026-03-23T14:21:38Z

+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+_target_: vtu_reader.Reader


Wrong _target_ in reader config

The target class is vtu_reader.Reader, but the file is vtp.yaml and the reader module introduced in this PR is vtp_reader.py, whose class is vtp_reader.Reader. Any workflow that uses this override config (e.g., reader=vtp) will fail with an ImportError / ModuleNotFoundError.

Suggested change

_target_: vtu_reader.Reader

_target_: vtp_reader.Reader

greptile-apps · 2026-03-23T14:21:39Z

+raw_data_dir:  "/workspace/aghasemi/isv/ansys/data/converted_output_singleVTU-VTP2/train_all"
+raw_data_dir_validation:  "/workspace/aghasemi/isv/ansys/data/converted_output_singleVTU-VTP2/val" 


Hardcoded personal workspace paths

Both raw_data_dir and raw_data_dir_validation point to a specific user's workspace directory (/workspace/aghasemi/...). These will fail for any other user or machine. They should be replaced with descriptive placeholder values, consistent with how similar examples in this repository are documented.

Suggested change

raw_data_dir: "/workspace/aghasemi/isv/ansys/data/converted_output_singleVTU-VTP2/train_all"

raw_data_dir_validation: "/workspace/aghasemi/isv/ansys/data/converted_output_singleVTU-VTP2/val"

raw_data_dir: "/path/to/train_data"

raw_data_dir_validation: "/path/to/val_data"

greptile-apps · 2026-03-23T14:21:40Z

+raw_data_dir_test: "/workspace/aghasemi/isv/ansys/data/converted_output_singleVTU-VTP2/val"
+#raw_data_dir_test: raw_data_dir_test:"/workspace/isv/ansys/data/converted_output_singleVTU-VTP/val"


Hardcoded personal path and malformed comment

raw_data_dir_test contains the same personal absolute path. Additionally, line 20 is a malformed comment that contains what looks like an accidental key duplication (raw_data_dir_test: raw_data_dir_test:"/..."); this should either be cleaned up or removed.

Suggested change

raw_data_dir_test: "/workspace/aghasemi/isv/ansys/data/converted_output_singleVTU-VTP2/val"

#raw_data_dir_test: raw_data_dir_test:"/workspace/isv/ansys/data/converted_output_singleVTU-VTP/val"

raw_data_dir_test: "/path/to/test_data"

greptile-apps · 2026-03-23T14:21:43Z

+experiment_name: "Unified-Training"
+experiment_desc: "unified training recipe for crash models"
+run_desc: "unified training recipe for crash models"


Stale experiment description copied from another example

experiment_desc and run_desc both read "unified training recipe for crash models", which describes a completely different application. These should describe the underfill use case.

Suggested change

experiment_name: "Unified-Training"

experiment_desc: "unified training recipe for crash models"

run_desc: "unified training recipe for crash models"

experiment_name: "Underfill-Training"

experiment_desc: "GeoTransolver autoregressive rollout for transient epoxy VOF prediction"

run_desc: "GeoTransolver autoregressive rollout for transient epoxy VOF prediction"

greptile-apps · 2026-03-23T14:21:44Z

+            #self.feature_stats = self._compute_feature_stats()
+            # Hardcode feature stats to make normalization a no-op
+            self.feature_stats = {
+                "feature_mean": torch.zeros(1, dtype=torch.float32),
+                "feature_std": torch.ones(1, dtype=torch.float32),
+            }
+
+            # Save for validation/inference (convert to pure Python types)
+            node_stats_serializable = _stats_to_serializable(self.node_stats)
+            feat_stats_serializable = _stats_to_serializable(self.feature_stats)
+
+            save_json(node_stats_serializable, node_stats_path)
+            save_json(feat_stats_serializable, feat_stats_path)
+            self._log(f"  Saved statistics to {self._stats_dir}/")
+
+        else:
+            # Load from saved training stats
+            if os.path.exists(node_stats_path) and os.path.exists(feat_stats_path):
+                self._log(f"\n  Loading statistics from {self._stats_dir}/")
+                self.node_stats = _stats_from_serializable(load_json(node_stats_path))
+                #self.feature_stats = _stats_from_serializable(load_json(feat_stats_path))
+                # Hardcode feature stats to make normalization a no-op
+                self.feature_stats = {
+                    "feature_mean": torch.zeros(1, dtype=torch.float32),
+                    "feature_std": torch.ones(1, dtype=torch.float32),
+                }
+            else:
+                self._log("\n  WARNING: No saved statistics found, computing from current split")
+                self._log("           Run training first to generate statistics!")
+                self.node_stats = self._compute_node_stats()
+                #self.feature_stats = self._compute_feature_stats()
+                # Hardcode feature stats to make normalization a no-op
+                self.feature_stats = {
+                    "feature_mean": torch.zeros(1, dtype=torch.float32),
+                    "feature_std": torch.ones(1, dtype=torch.float32),
+                }


Commented-out statistics code leaves dead code and silent hardcoding

The calls to _compute_feature_stats (lines 383, 403, 412) and _log_statistics (line 421) are commented out, and feature stats are unconditionally hardcoded to mean=0, std=1 (identity). While this appears intentional (VOF is already in [0,1]), the dead code is misleading to future contributors who may not realise the normalization is a deliberate no-op. The _compute_feature_stats method (lines 458–466) remains entirely unreachable.

Consider either:

Removing the unused method and commented lines, and adding a clear comment explaining why feature normalization is skipped, or

Restoring the normalization path with appropriate documentation.

greptile-apps · 2026-03-23T14:21:45Z

+        self._stats_dir = STATS_DIRNAME
+        os.makedirs(self._stats_dir, exist_ok=True)


Stats directory saved relative to Hydra's working directory

self._stats_dir = STATS_DIRNAME (i.e., "stats") is a relative path. Because config.yaml sets hydra.job.chdir: True, Hydra changes the working directory to ./outputs/ at runtime, so training saves stats to ./outputs/stats/. When inference or validation is later run, their working directory may be different, causing the stats load to fall back to computing from the current split — which may produce different normalization than training used. This silent normalization mismatch can degrade inference accuracy.

Consider making the stats path configurable (e.g., tied to ckpt_path) or using an absolute path derived from the configured data directory.

greptile-apps · 2026-03-23T14:21:46Z

+        muon_opt = torch.optim.Muon(
+            muon_params,
+            lr=base_lr,
+            weight_decay=weight_decay,
+            adjust_lr_fn="match_rms_adamw",
+        )


torch.optim.Muon requires PyTorch ≥ 2.7

torch.optim.Muon was added in PyTorch 2.7. Users on earlier versions will get an AttributeError with no actionable error message. The requirements.txt (referenced in the README) should pin torch>=2.7 and the README prerequisites section should document this dependency explicitly to avoid a confusing failure at runtime.

coreyjadams

Hi @ghasemiAb ,

Thanks for this PR! Overall I think it looks really good. I have scattered some comments here and there, mostly it is request for more organization / code clean up / better comments / docstrings etc. I think this example will get a lot of attention so I am pushing a little to encourage you to give it an extra level of polish.

The addition of a rollout example for GeoTransolver on a static mesh is pretty cool. Users have been asking about this type of workload, so this will get attention. It's really a nice addition to physicsnemo and I'm happy you've contributed it! You're choice to drive the loss function by the boundary layer is also good, rather than the static components of the mesh.

I have a couple of mandatory updates for approval, if you don't mind:

Please add a requirements.txt file. There are some things that have to be included there (like pyvista, for IO), and torch>(whenever muon was included, maybe that was 2.9?).
Your readme has some math rendering errors, can you make sure to go through it again and fix?
Please add some example convergence plots and visualizations to the README, if you can.

Hopefully, these aren't too much!

coreyjadams · 2026-03-23T14:31:21Z

+  run:
+    dir: ./outputs/
+
+experiment_name: "Unified-Training"


I think these names are outdated?

coreyjadams · 2026-03-23T14:33:28Z

+def _to_python_native(value: Any) -> Any:
+    """
+    Recursively convert tensor/numpy values to Python native types.
+
+    This ensures JSON serialization works without any numpy/torch dependencies.
+
+    Args:
+        value: Any value (tensor, numpy array, list, dict, scalar, etc.)
+
+    Returns:
+        Python native type (list, dict, float, int, etc.)
+    """
+    if isinstance(value, torch.Tensor):
+        # Convert tensor to Python list
+        return value.detach().cpu().tolist()
+    elif isinstance(value, np.ndarray):
+        # Convert numpy array to Python list
+        return value.tolist()
+    elif isinstance(value, (np.floating, np.float32, np.float64)):
+        # Convert numpy float to Python float
+        return float(value)
+    elif isinstance(value, (np.integer, np.int32, np.int64)):
+        # Convert numpy int to Python int
+        return int(value)
+    elif isinstance(value, dict):
+        # Recursively convert dict values
+        return {k: _to_python_native(v) for k, v in value.items()}
+    elif isinstance(value, (list, tuple)):
+        # Recursively convert list/tuple elements
+        return [_to_python_native(v) for v in value]
+    elif hasattr(value, 'item'):
+        # Handle any other type with .item() method (scalars)
+        return value.item()
+    else:
+        # Already a Python native type
+        return value
+
+
+def _to_tensor(value: Any, dtype: torch.dtype = torch.float32) -> torch.Tensor:
+    """
+    Safely convert a value to a torch tensor.
+
+    Handles: torch.Tensor, numpy.ndarray, list, scalar values.
+
+    Args:
+        value: Input value to convert
+        dtype: Target dtype
+
+    Returns:
+        torch.Tensor
+    """
+    if isinstance(value, torch.Tensor):
+        return value.to(dtype=dtype)
+    elif isinstance(value, np.ndarray):
+        return torch.from_numpy(value.copy()).to(dtype=dtype)
+    elif isinstance(value, (list, tuple)):
+        return torch.tensor(value, dtype=dtype)
+    else:
+        return torch.tensor(value, dtype=dtype)
+
+
+def _to_numpy(value: Any) -> np.ndarray:
+    """
+    Safely convert a value to a numpy array.
+
+    Args:
+        value: Input value (tensor, array, list, etc.)
+
+    Returns:
+        numpy.ndarray
+    """
+    if isinstance(value, torch.Tensor):
+        return value.detach().cpu().numpy()
+    elif isinstance(value, np.ndarray):
+        return value
+    else:
+        return np.asarray(value)


What is all of this conversion code doing? Is it necessary? Is this to save statistics or something else?

coreyjadams · 2026-03-23T14:37:39Z

+_TIME_SERIES_PATTERNS: list[re.Pattern] = [
+    # <field>_step00, <field>_step01, ...
+    re.compile(r"^(?P<field>.+?)_step(?P<idx>\d+)$"),
+    # <field>_t0.000, <field>_t0.005, ...  (float time label)
+    re.compile(r"^(?P<field>.+?)_t(?P<idx>\d+\.\d+)$"),
+    # <field>_00, <field>_01, ...  (bare numeric suffix)
+    re.compile(r"^(?P<field>.+?)_(?P<idx>\d+)$"),
+]


Is there a better way to do this? In general, unless there is a performance reason to want to use regular expressions I'm not a huge fan. I don't know how to cast spells so easily and its harder to maintain. Can we use some sort of glob to pattern match, unless that is too slow?

coreyjadams · 2026-03-23T14:39:03Z

+
+        arrays: list[np.ndarray] = []
+        for _idx, key in entries:
+            arr = np.asarray(mesh.point_data[key], dtype=np.float64)


Are you sure float64 is necessary?

coreyjadams · 2026-03-23T14:42:38Z

+    (train.py), not here.
+    """
+
+    def __init__(self, *args, **kwargs):


I would prefer this function and constructor to have a proper syntax with named arguments and docstring, unless there is a compelling reason to do it this way.

coreyjadams · 2026-03-23T14:49:13Z

+class CombinedOptimizer(Optimizer):
+    """Combine multiple PyTorch optimizers under a single Optimizer-like interface."""
+
+    def __init__(
+        self,
+        optimizers: Sequence[Optimizer],
+        torch_compile_kwargs: dict[str, Any] | None = None,
+    ):
+        if not optimizers:
+            raise ValueError("`optimizers` must contain at least one optimizer.")
+        self.optimizers = optimizers
+        param_groups = [g for opt in optimizers for g in opt.param_groups]
+        super().__init__(param_groups, defaults={})
+        if torch_compile_kwargs is None:
+            self.step_fns: list[Callable] = [opt.step for opt in optimizers]
+        else:
+            self.step_fns: list[Callable] = [
+                torch.compile(opt.step, **torch_compile_kwargs) for opt in optimizers
+            ]
+
+    def zero_grad(self, *args, **kwargs) -> None:
+        for opt in self.optimizers:
+            opt.zero_grad(*args, **kwargs)
+
+    def step(self, closure=None) -> None:
+        for step_fn in self.step_fns:
+            if closure is None:
+                step_fn()
+            else:
+                step_fn(closure)
+
+    def state_dict(self):
+        return {"optimizers": [opt.state_dict() for opt in self.optimizers]}
+
+    def load_state_dict(self, state_dict):
+        for opt, sd in zip(self.optimizers, state_dict["optimizers"]):
+            opt.load_state_dict(sd)
+        self.param_groups = [g for opt in self.optimizers for g in opt.param_groups]


This is upstreamed in physicsnemo and needs to be removed here. Look in physicsnemo.optim now, please :)

coreyjadams · 2026-03-23T14:50:06Z

+    """
+    Per-timestep interface-only loss.
+
+    Uses absolute_expansion for predictable behavior on normalized coords.
+    """


Please expand this docstring with more explanation about this loss construction and its action in the roll-out example?

coreyjadams · 2026-03-23T14:52:46Z

+    # ══════════════════════════════════════════════════════════════════════
+    # Pre-training summary (rank 0 only)
+    # ══════════════════════════════════════════════════════════════════════
+    if dist.rank == 0:
+        # ── Model parameters ──────────────────────────────────────────────
+        model_raw = (
+            trainer.model.module
+            if isinstance(trainer.model, DistributedDataParallel)
+            else trainer.model
+        )
+        total_params = sum(p.numel() for p in model_raw.parameters())
+        trainable_params = sum(
+            p.numel() for p in model_raw.parameters() if p.requires_grad
+        )
+        muon_params = sum(
+            p.numel() for p in model_raw.parameters() if p.ndim == 2
+        )
+        other_params = trainable_params - muon_params
+
+        logger0.info("")
+        logger0.info("=" * 72)
+        logger0.info("  TRAINING CONFIGURATION")
+        logger0.info("=" * 72)
+
+        # ── Data ──────────────────────────────────────────────────────────
+        logger0.info("")
+        logger0.info("  ┌─ Data ────────────────────────────────────────────────┐")
+        logger0.info(f"  │  Train dir:          {cfg.training.raw_data_dir}")
+        logger0.info(f"  │  Validation dir:     {cfg.training.raw_data_dir_validation}")
+        logger0.info(f"  │  Train samples:      {cfg.training.num_samples}")
+        logger0.info(f"  │  Validation samples: {cfg.training.num_validation_samples}")
+        logger0.info(f"  │  Time steps (T):     {cfg.training.num_time_steps}")
+        logger0.info(f"  │  Rollout steps:      {trainer.rollout_steps}")
+        logger0.info(f"  │  Dataloader workers: {cfg.training.num_dataloader_workers}")
+        logger0.info("  └────────────────────────────────────────────────────────┘")
+
+        # ── Model ─────────────────────────────────────────────────────────
+        logger0.info("")
+        logger0.info("  ┌─ Model ───────────────────────────────────────────────┐")
+        logger0.info(f"  │  Architecture:       {model_raw.__class__.__name__}")
+        logger0.info(f"  │  Total parameters:   {total_params:,}")
+        logger0.info(f"  │  Trainable:          {trainable_params:,}")
+        logger0.info(f"  │    Muon (2D):        {muon_params:,}")
+        logger0.info(f"  │    AdamW (other):    {other_params:,}")
+        if hasattr(model_raw, "rollout_steps"):
+            logger0.info(f"  │  Rollout steps:      {model_raw.rollout_steps}")
+        if hasattr(model_raw, "num_fourier_frequencies"):
+            logger0.info(f"  │  Fourier freqs:      {model_raw.num_fourier_frequencies}")
+        if hasattr(cfg, "model"):
+            model_cfg = cfg.model
+            for key in [
+                "functional_dim", "out_dim", "geometry_dim",
+                "slice_num", "n_layers",
+            ]:
+                val = getattr(model_cfg, key, None)
+                if val is not None:
+                    logger0.info(f"  │  {key + ':' :<20} {val}")
+        logger0.info("  └────────────────────────────────────────────────────────┘")
+
+        # ── Optimization ──────────────────────────────────────────────────
+        scheduler_T0 = getattr(cfg.training, "scheduler_T0", 50)
+        scheduler_T_mult = getattr(cfg.training, "scheduler_T_mult", 2)
+
+        logger0.info("")
+        logger0.info("  ┌─ Optimization ────────────────────────────────────────┐")
+        logger0.info(f"  │  Epochs:             {cfg.training.epochs}")
+        logger0.info(f"  │  Start LR:           {cfg.training.start_lr}")
+        logger0.info(f"  │  End LR (eta_min):   {cfg.training.end_lr}")
+        logger0.info(f"  │  Scheduler:          CosineAnnealingWarmRestarts")
+        logger0.info(f"  │    T_0:              {scheduler_T0}")
+        logger0.info(f"  │    T_mult:           {scheduler_T_mult}")
+        logger0.info(f"  │  Weight decay:       {getattr(cfg.training, 'weight_decay', 1e-4)}")
+        logger0.info(f"  │  Grad clip max_norm: 25.0")
+        logger0.info(f"  │  AMP enabled:        {cfg.training.amp}")
+        logger0.info("  └────────────────────────────────────────────────────────┘")
+
+        # ── Interface loss ────────────────────────────────────────────────
+        c = trainer.criterion
+        logger0.info("")
+        logger0.info("  ┌─ Interface Loss ──────────────────────────────────────┐")
+        logger0.info(f"  │  VOF thresholds:     ({c.vof_lo}, {c.vof_hi})")
+        logger0.info(f"  │  Band fraction:      {c.band_fraction}")
+        logger0.info(f"  │  Absolute expansion: {c.absolute_expansion}")
+        logger0.info(f"  │  Interface axis:     {c.interface_axis}  (-1 = auto)")
+        logger0.info("  └────────────────────────────────────────────────────────┘")
+
+        # ── Infrastructure ────────────────────────────────────────────────
+        logger0.info("")
+        logger0.info("  ┌─ Infrastructure ──────────────────────────────────────┐")
+        logger0.info(f"  │  World size:         {dist.world_size}")
+        logger0.info(f"  │  Device:             {dist.device}")
+        logger0.info(f"  │  Checkpoint dir:     {cfg.training.ckpt_path}")
+        logger0.info(f"  │  TensorBoard dir:    {cfg.training.tensorboard_log_dir}")
+        logger0.info(f"  │  Save every:         {cfg.training.save_chckpoint_freq} epochs")
+        logger0.info(f"  │  Validate every:     {cfg.training.validation_freq} epochs")
+        if trainer.epoch_init > 0:
+            logger0.info(f"  │  Resumed from epoch: {trainer.epoch_init}")
+        logger0.info("  └────────────────────────────────────────────────────────┘")
+
+        # ── Per-layer parameter breakdown (compact) ───────────────────────
+        logger0.info("")
+        logger0.info("  ┌─ Layer Parameter Breakdown ───────────────────────────┐")
+        logger0.info(f"  │  {'Layer':<40} {'Params':>10}  │")
+        logger0.info(f"  │  {'─' * 40} {'─' * 10}  │")
+        for name, param in model_raw.named_parameters():
+            if param.requires_grad:
+                logger0.info(
+                    f"  │  {name:<40} {param.numel():>10,}  │"
+                )
+        logger0.info("  └────────────────────────────────────────────────────────┘")
+
+        logger0.info("")
+        logger0.info(f"  Total parameters:     {total_params:>12,}")
+        logger0.info(f"  Trainable parameters: {trainable_params:>12,}")
+        logger0.info(f"  Model size:           {total_params * 4 / 1024**2:>11.2f} MB  (fp32)")
+
+        logger0.info("")
+        logger0.info("=" * 72)
+        logger0.info("  STARTING TRAINING")
+        logger0.info("=" * 72)
+        logger0.info("")


This is a lot of boilerplate printout. You could consider a cleanup with something like tabulate but it's not mandatory.

coreyjadams · 2026-03-23T14:54:37Z

+        )
+
+
+def print_header(title: str, width: int = 80):


There are plenty of libraries to simply work like this, FYI, that can reduce clutter in physicsnemo.

coreyjadams · 2026-03-23T14:54:57Z

+def _to_tensor(value, dtype=torch.float32) -> torch.Tensor:
+    """Safely convert a value to a torch tensor."""
+    if isinstance(value, torch.Tensor):
+        return value.to(dtype=dtype)
+    return torch.as_tensor(value, dtype=dtype)
+
+
+def _stats_to_device(stats: dict, device: torch.device, dtype=torch.float32) -> dict:
+    """Convert stats dict to tensors and move to device."""
+    return {k: _to_tensor(v, dtype=dtype).to(device) for k, v in stats.items()}


Duplicated in train.py, no?

ghasemiAb · 2026-03-30T21:38:32Z

Hi @coreyjadams @ram-cherukuri I considered the comment you made already. Please check and if okay confirm the PR.

RishikeshRanade · 2026-04-08T17:16:51Z

Are we allowed to publish this result?

We will share the PR with Arvind so they can call it out if they are not ok with sharing. May be now is a good time to share the link to the readme to make sure they are ok with everything.

RishikeshRanade · 2026-04-08T17:18:15Z

+        - feature_stats: {"feature_mean": [1], "feature_std": [1]}
+    """
+
+    NUM_FEATURES = 1  # Scalar field (epoxy_vof)


Should this be part of config?

RishikeshRanade · 2026-04-08T17:44:56Z

Please change the name of the example to underfill_dispensing

adding underfill case

61add7d

greptile-apps bot reviewed Mar 23, 2026

View reviewed changes

coreyjadams requested changes Mar 23, 2026

View reviewed changes

ghasemiAb and others added 2 commits March 30, 2026 23:22

Merge branch 'NVIDIA:main' into underfill-ag

1257d55

adding AI prediction resutls

f80c1fa

RishikeshRanade self-requested a review April 7, 2026 17:56

ghasemiAb and others added 2 commits April 8, 2026 18:58

Merge branch 'NVIDIA:main' into underfill-ag

0fc8b0b

adding viz

cb4983d

RishikeshRanade reviewed Apr 8, 2026

View reviewed changes

		raw_data_dir: "/workspace/aghasemi/isv/ansys/data/converted_output_singleVTU-VTP2/train_all"
		raw_data_dir_validation: "/workspace/aghasemi/isv/ansys/data/converted_output_singleVTU-VTP2/val"

		raw_data_dir_test: "/workspace/aghasemi/isv/ansys/data/converted_output_singleVTU-VTP2/val"
		#raw_data_dir_test: raw_data_dir_test:"/workspace/isv/ansys/data/converted_output_singleVTU-VTP/val" No newline at end of file

	raw_data_dir_test: "/workspace/aghasemi/isv/ansys/data/converted_output_singleVTU-VTP2/val"
	#raw_data_dir_test: raw_data_dir_test:"/workspace/isv/ansys/data/converted_output_singleVTU-VTP/val"
	raw_data_dir_test: "/path/to/test_data"

		self._stats_dir = STATS_DIRNAME
		os.makedirs(self._stats_dir, exist_ok=True)

Conversation

ghasemiAb commented Mar 23, 2026

PhysicsNeMo Pull Request

Description

Checklist

Dependencies

Review Process

Uh oh!

greptile-apps bot commented Mar 23, 2026

Greptile Summary

Important Files Changed

Uh oh!

greptile-apps bot Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

coreyjadams left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ghasemiAb commented Mar 30, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants