Skip to content

rai-opensource/q2rl

Repository files navigation

🍋 When Life Gives You BC, Make Q-functions:🍹
Extracting Q-values from Behavior Cloning
for On-Robot Reinforcement Learning

Lakshita Dodeja1,2, Ondrej Biza1, Shivam Vats2, Stephen Hart1, Stefanie Tellex2, Robin Walters3, Karl Schmeckpeper1, Thomas Weng1

1Robotics and AI Institute, 2Brown University, 3Northeastern University

Q2RL

Installation

We provide installation instructions with conda and uv.

Install with conda

  1. Setup Conda Environment:

    conda create -n q2rl python=3.10 -y
    conda activate q2rl
  2. Install other requirements and JAX:

    pip install -r requirements.txt
  3. Install D4RL and Adroit Envs:

    We use the D4RL and Adroit Envs versions from WSRL repo. Copying instructions from WSRL:

    This fork incorporates the antmaze-ultra environments and fixes the kitchen environment rewards to be consistent between the offline dataset and the environment.

    git clone git@github.com:zhouzypaul/D4RL.git
    cd D4RL
    pip install -e .
    

    To use Mujoco, you would also need to install mujoco manually to ~/.mujoco/ (for more instructions on download see here), and use the following environment variables

    export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$HOME/.mujoco/mujoco210/bin
    export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib/nvidia

    To use the adroit envs, you would need

    git clone --recursive https://github.com/nakamotoo/mj_envs.git
    cd mj_envs
    git submodule update --remote
    pip install -e .
    

    Download the adroit dataset from here and unzip the files into ~/adroit_data/. If you would like to put the adroit datasets into another directory, use the environment variable DATA_DIR_PREFIX (checkout the code here for more details).

    export DATA_DIR_PREFIX=/path/to/your/data
  4. Install Robomimic:

    We extract log probabilities and entropy of GMM files in robomimic/robomimic/algo/bc.py included in this repo.

    cd robomimic 
    pip install -e .
    

    Install numpy and mujocopy with these versions if other repos change their versions

    pip install mujoco==3.1.6 numpy==1.26.1
    
  5. Activate the conda environment before running any scripts.

Install with uv

  1. Install uv

  2. Clone this repository with git clone --recursive.

  3. Install Mujoco with ./scripts/install_mujoco.sh a. Ensure that your system has the requisite Mesa development headers installed; on Ubuntu, run sudo apt install libosmesa6-dev b. Note that you will need to export the environment variables printed by install_mujoco.sh to your .bashrc or .zshrc, or manually export them in your shell before running any scripts

  4. Install dependencies with uv sync.

  5. To use the adroit envs, you would need

    git clone --recursive https://github.com/nakamotoo/mj_envs.git
    cd mj_envs
    git submodule update --remote
    uv pip install -e .
    

    Download the adroit dataset from here and unzip the files into ~/adroit_data/. If you would like to put the adroit datasets into another directory, use the environment variable DATA_DIR_PREFIX (checkout the code here for more details).

    export DATA_DIR_PREFIX=/path/to/your/data
  6. Either run scripts with uv run or execute source .venv/bin/activate to enter the virtual environment before running any scripts.

Running Experiments

All BC policies and datasets are uploaded to huggingface here. Download using bash scripts/download.sh.

We follow a similar structure to WSRL.

All experiment scripts for q2rl and baselines are in the experiments/ directory. You can modify the paths in the example scripts based on your setup.

Also export the repo to the python path export PYTHONPATH=/path/to/q2rl:$PYTHONPATH. The example scripts do this for you.

To kill a running experiment, find the wandb group name from logs/, then run pkill -f "[wandb-group-name]".

Citation

If you like our work please cite us:

@inproceedings{dodeja2026q2rl,
  title     = {When Life Gives You BC, Make Q-functions:
               Extracting Q-values from Behavior Cloning
               for On-Robot Reinforcement Learning},
  author    = {Dodeja, Lakshita and Biza, Ondrej and Vats, Shivam and
               Hart, Stephen and Tellex, Stefanie and Walters, Robin and
               Schmeckpeper, Karl and Weng, Thomas},
  booktitle = {Robotics: Science and Systems (RSS)},
  year      = {2026},
}

Credits

This repo is built upon the WSRL and SERL repositories.


This repository is released as-is to accompany a paper submission. If you find any bugs, corrections, or issues that should be resolved for anyone looking to reproduce the results in this repository, please file an issue, and we will look at it as soon as we can. For other improvements, including new features, we recommend creating your own fork of the repository.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors