STOP behavior reward and learning masking. by riccardosavorgnan · Pull Request #353 · Emerge-Lab/PufferDrive

riccardosavorgnan · 2026-03-20T16:10:30Z

This PR updates the code so that we don't accumulate returns nor update the policy from steps where the car is STOPPED (STOP behavior from either collission or offroad).

High level:

we add a flag for the stopped condition passed to the python hierarchy
we set rewards to 0 when the flag is true
we set losses (grad) to 0 when the flag is true to prevent updating the policy in those instances.

NOTE: we do not mask the entropy bonus at the moment.

…ask for rewards and loss

stopped is now passed through to the python hierarchy and used as a m…

6496492

…ask for rewards and loss

riccardosavorgnan marked this pull request as ready for review March 20, 2026 16:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

STOP behavior reward and learning masking.#353

STOP behavior reward and learning masking.#353
riccardosavorgnan wants to merge 1 commit into3.0from
ricky/stop_behavior_gradient_fix

riccardosavorgnan commented Mar 20, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

riccardosavorgnan commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

riccardosavorgnan commented Mar 20, 2026 •

edited

Loading