Skip to content

STOP behavior reward and learning masking.#353

Open
riccardosavorgnan wants to merge 1 commit into3.0from
ricky/stop_behavior_gradient_fix
Open

STOP behavior reward and learning masking.#353
riccardosavorgnan wants to merge 1 commit into3.0from
ricky/stop_behavior_gradient_fix

Conversation

@riccardosavorgnan
Copy link
Collaborator

@riccardosavorgnan riccardosavorgnan commented Mar 20, 2026

This PR updates the code so that we don't accumulate returns nor update the policy from steps where the car is STOPPED (STOP behavior from either collission or offroad).

High level:

  • we add a flag for the stopped condition passed to the python hierarchy
  • we set rewards to 0 when the flag is true
  • we set losses (grad) to 0 when the flag is true to prevent updating the policy in those instances.

NOTE: we do not mask the entropy bonus at the moment.

@riccardosavorgnan riccardosavorgnan marked this pull request as ready for review March 20, 2026 16:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant