Open
Conversation
Add a Flag to build_ocean so Raylib can work on Debian 11
Environments sharing the same map binary now share read-only road elements, grid maps, and neighbor caches via a reference-counted SharedMapData cache. This eliminates duplicate allocations (~36-73 MB per CARLA map) across envs. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Walk pufferlib/ in Python and create per-file symlinks, skipping resources/drive/binaries (60K+ map files). Symlink that dir as a single entry instead. Removes dependency on rsync/cp -rs. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
After fork, child processes inherit g_map_cache pointers from the parent. Calling free_shared_map_data on these corrupts the heap since the memory belongs to the parent's address space. Track the creating PID and skip freeing if PID doesn't match. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Adds NULL checks and bounds checks with stderr output to identify the root cause of worker segfaults after fork. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ocess The bug: each PufferDrive.__init__ calls binding.shared() which freed and rebuilt g_map_cache. When multiple PufferDrive instances exist in one process (Serial workers), earlier instances' Drive structs had shared_map pointers to freed SharedMapData, causing use-after-free crashes in checkNeighbors/compute_agent_metrics. Fix: only rebuild the cache after fork (PID mismatch) or on first call. Same-process calls reuse the existing cache, which is correct since the map data doesn't change — only agent-to-map assignment varies. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR is a broad “infrastructure + tooling” update centered on reducing memory usage and improving robustness when running multiple PufferDrive instances (notably via a shared map cache), while also adding cluster submission utilities and model export/rendering helpers.
Changes:
- Introduces a module-level shared map cache in the Drive C binding and wires cache release into the Python env lifecycle.
- Adds SLURM/Submitit cluster launch scripts + container setup helpers, and updates docs to match.
- Expands WOSAC evaluation plumbing/metrics and adds ONNX /
.binexport utilities.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| visualize.dSYM/Contents/Resources/Relocations/aarch64/visualize.yml | Adds macOS dSYM relocation metadata for visualize (build artifact). |
| visualize.dSYM/Contents/Info.plist | Adds macOS dSYM plist for visualize (build artifact). |
| setup.py | Adds tqdm to training-related install requirements. |
| scripts/verify_onnx.py | New script to sanity-check ONNX model structure and inference with dummy inputs. |
| scripts/train_sanity.sh | Removes legacy training helper script. |
| scripts/train_procgen.sh | Removes legacy training helper script. |
| scripts/train_ocean.sh | Removes legacy training helper script. |
| scripts/train_atari.sh | Removes legacy training helper script. |
| scripts/sweep_atari.sh | Removes legacy sweep helper script. |
| scripts/submit_cluster.py | New Submitit-based SLURM launcher with code isolation + optional Singularity wrapping. |
| scripts/setup_container.sh | New helper to create/install/rebuild a Singularity overlay for cluster runs. |
| scripts/export_onnx.py | New utility to export a trained policy checkpoint to ONNX + verify with ORT. |
| scripts/export_model_bin.py | New utility to export policy weights to a flat .bin for the C backend. |
| scripts/cluster_status.sh | New helper to summarize SLURM partition/node availability + user jobs. |
| scripts/cluster_configs/train_base.yaml | Adds a baseline cluster training YAML for program args. |
| scripts/cluster_configs/nyu_greene.yaml | Adds a baseline NYU Greene compute YAML for SLURM resources. |
| scripts/build_simple.sh | Removes a generic C build helper script. |
| pyproject.toml | Adds a cluster extra (submitit/pyyaml). |
| pufferlib/utils.py | Updates WOSAC subprocess invocation args and refactors/extends video rendering pipeline (incl. async support hooks). |
| pufferlib/pufferl.py | Adds async rendering support, truncation handling changes, logger init changes for distributed, adds render CLI mode. |
| pufferlib/ocean/torch.py | Switches Drive model’s ego dimension source to env.ego_features. |
| pufferlib/ocean/env_config.h | Extends Drive env init config parsing with many new fields (reward conditioning/randomization, spawn settings, etc.). |
| pufferlib/ocean/env_binding.h | Updates trajectory extraction bindings (scenario IDs as strings, adds vehicle/track-to-predict flags). |
| pufferlib/ocean/drive/visualize.c | Updates visualization to use new agent structures/config parsing and adds more CLI/config integration. |
| pufferlib/ocean/drive/error.h | Extends error enum/strings and changes error formatting. |
| pufferlib/ocean/drive/drivenet.h | Updates drivenet init signature and adjusts road feature layout handling. |
| pufferlib/ocean/drive/drive.py | Large env API/config expansion, resampling refactor, scenario-id propagation, and cache release on close. |
| pufferlib/ocean/drive/drive.c | Refactors demo/CLI parsing and integrates INI config parsing for the C demo entrypoint. |
| pufferlib/ocean/drive/datatypes.h | New shared constants/types/helpers for reward conditioning, road/agent types, and cleanup helpers. |
| pufferlib/ocean/drive/binding.c | Implements shared map cache + new shared-map selection logic and extends env init kwargs handling. |
| pufferlib/ocean/benchmark/wosac.ini | Updates WOSAC metric weights for 2024-vs-2025 differences. |
| pufferlib/ocean/benchmark/visual_sanity_check.py | Adjusts WOSAC visual sanity check config wiring. |
| pufferlib/ocean/benchmark/metrics_sanity_check.py | Switches sanity check to use a random baseline rollout path. |
| pufferlib/ocean/benchmark/evaluator.py | Adds batched evaluation loop with progress reporting + changes how likelihood/meta-metrics are aggregated. |
| pufferlib/ocean/benchmark/evaluate_imported_trajectories.py | Replaces KDTree alignment with (scenario_id, agent_id) alignment. |
| pufferlib/config/ocean/drive.ini | Major config expansion (reward conditioning/randomization, spawn settings, render config, WOSAC batch params, etc.). |
| drive.dSYM/Contents/Resources/Relocations/aarch64/drive.yml | Adds macOS dSYM relocation metadata for drive (build artifact). |
| drive.dSYM/Contents/Info.plist | Adds macOS dSYM plist for drive (build artifact). |
| docs/theme/extra.css | Adjusts table styling (removes alternating row backgrounds). |
| docs/src/wosac.md | Updates baseline table/results description and adds clarification of baselines. |
| docs/src/visualizer.md | Expands docs for rendering mode + CLI flags and puffer render. |
| docs/src/train.md | Reformats and updates training docs content. |
| docs/src/simulator.md | Clarifies control modes and adds important notes about expert/static agent behavior. |
| docs/src/pufferdrive-2.0.md | Updates author list + citation block. |
| docs/src/interact-with-agents.md | Adds CLI argument documentation for drive tool. |
| docs/src/export-onnx.md | New documentation for ONNX export and .bin weight export. |
| docs/src/data.md | Updates troubleshooting message for missing .bin maps. |
| docs/src/cluster.md | New end-to-end documentation for SLURM + container-based cluster training. |
| docs/src/SUMMARY.md | Adds new docs pages (cluster + ONNX export) to nav. |
| data_utils/carla/generate_carla_agents.py | Fixes heading/velocity computation and changes defaults/logging for dataset generation. |
| README.md | Adds CI badge and updates citation author list. |
| CLAUDE.local.md | Adds local cluster notes (developer-specific ops doc). |
| .github/workflows/utest.yml | Changes CI triggers from main to 2.0. |
| .github/workflows/train-ci.yml | Changes CI triggers from main to 2.0. |
| .github/workflows/render-ci.yml | Changes CI triggers from main to 2.0. |
| .github/workflows/perf-ci.yml | Changes CI triggers from main to 2.0. |
| .github/workflows/docs.yml | Changes docs deploy branch from main to 2.0. |
Comments suppressed due to low confidence (2)
pufferlib/ocean/drive/binding.c:335
total_agent_countis always clamped tonum_agentseven whenuse_all_mapsis true. Inuse_all_mapsmode the returnedagent_offsets[-1]should reflect the full active-agent count across all maps; clamping will silently truncate agents and produce incorrect offsets/map_ids. Restore the previous!use_all_mapsguard or pass an explicit limit variable for the non-use_all_mapscase.
total_agent_count = num_agents;
}
scripts/submit_cluster.py:231
cpus_per_taskis computed with integer division:from_config.get('cpus', 8) // args.task_per_node. Iftask_per_nodeexceedscpus, this becomes 0 and SLURM will reject the job; if it's not divisible, you may under-allocate CPUs. Clamp to at least 1 and/or validate thatcpus >= task_per_node(and maybe require divisibility) with a clear error message.
slurm_account=from_config.get("account"),
slurm_partition=from_config.get("partition"),
cpus_per_task=from_config.get("cpus", 8) // args.task_per_node,
tasks_per_node=args.task_per_node,
nodes=from_config.get("nodes", 1),
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Test plan
HEAVILY LLM generated, just a WIP