[stacked 2/3, multi-datapipes] transform updates#1506
[stacked 2/3, multi-datapipes] transform updates#1506coreyjadams wants to merge 11 commits intoNVIDIA:mainfrom
Conversation
…ms; add in-memory numpy reader for small single file datasets like darcy flow
Greptile SummaryThis PR adds several features to the datapipes subsystem: a new
Important Files Changed
Last reviewed commit: 6c0d513 |
| arr = arr[:] | ||
|
|
||
| data[field] = torch.from_numpy(np.array(arr)) | ||
| data[field] = torch.from_numpy(np.asarray(arr, dtype=np.float32)) |
There was a problem hiding this comment.
Forced float32 cast breaks integer fields
The change from torch.from_numpy(np.array(arr)) to torch.from_numpy(np.asarray(arr, dtype=np.float32)) unconditionally casts every loaded field to float32. This silently corrupts integer fields (e.g., labels, masks, indices stored as int64/int32). The conftest fixture numpy_data_dir even creates label=np.array([i], dtype=np.int64).
Consider only casting floating-point arrays to float32 and leaving integer arrays untouched:
| data[field] = torch.from_numpy(np.asarray(arr, dtype=np.float32)) | |
| data[field] = torch.from_numpy(np.asarray(arr, dtype=np.float32 if np.issubdtype(arr.dtype, np.floating) else arr.dtype)) |
| NormalizeVectors, | ||
| Purge, | ||
| Rename, | ||
| Resize, |
There was a problem hiding this comment.
Reshape missing from top-level exports
Reshape is exported from physicsnemo.datapipes.transforms (added to both the import and __all__ in transforms/__init__.py), but it is not imported or listed in this top-level datapipes/__init__.py. This means users cannot do from physicsnemo.datapipes import Reshape or dp.Reshape(...), unlike Resize which was properly added. This appears to be an oversight.
from physicsnemo.datapipes.transforms import (
...
Rename,
Resize,
Reshape, # <-- add this
Scale,
...
)
__all__ = [
...
"ConstantField",
"Reshape", # <-- add this
...
]
| arr = np.array(self._preloaded[field][index], copy=False) | ||
| if subsample_slice is not None and field in target_keys_set: | ||
| arr = arr[subsample_slice] | ||
| data[field] = torch.from_numpy(np.asarray(arr, dtype=np.float32)) |
There was a problem hiding this comment.
Same float32 cast issue in preloaded path
Same issue as in _load_from_npz: unconditional dtype=np.float32 cast will corrupt integer fields loaded from preloaded arrays. The fix should mirror whatever approach is chosen for _load_from_npz (line 318).
Updated the documentation for PhysicsNeMo Datapipes to improve clarity and consistency. Adjusted wording and structure for better readability.
[stack 2/3]
THis will rebase but contains the transform updates too.
PhysicsNeMo Pull Request
Description
Checklist
Dependencies
Review Process
All PRs are reviewed by the PhysicsNeMo team before merging.
Depending on which files are changed, GitHub may automatically assign a maintainer for review.
We are also testing AI-based code review tools (e.g., Greptile), which may add automated comments with a confidence score.
This score reflects the AI’s assessment of merge readiness and is not a qualitative judgment of your work, nor is
it an indication that the PR will be accepted / rejected.
AI-generated feedback should be reviewed critically for usefulness.
You are not required to respond to every AI comment, but they are intended to help both authors and reviewers.
Please react to Greptile comments with 👍 or 👎 to provide feedback on their accuracy.