🍋 When Life Gives You BC, Make Q-functions:🍹
Extracting Q-values from Behavior Cloning
for On-Robot Reinforcement Learning
Lakshita Dodeja1,2, Ondrej Biza1, Shivam Vats2, Stephen Hart1, Stefanie Tellex2, Robin Walters3, Karl Schmeckpeper1, Thomas Weng1
1Robotics and AI Institute, 2Brown University, 3Northeastern University
We provide installation instructions with conda and uv.
-
Setup Conda Environment:
conda create -n q2rl python=3.10 -y conda activate q2rl
-
Install other requirements and JAX:
pip install -r requirements.txt
-
Install D4RL and Adroit Envs:
We use the D4RL and Adroit Envs versions from WSRL repo. Copying instructions from WSRL:
This fork incorporates the antmaze-ultra environments and fixes the kitchen environment rewards to be consistent between the offline dataset and the environment.
git clone git@github.com:zhouzypaul/D4RL.git cd D4RL pip install -e .To use Mujoco, you would also need to install mujoco manually to
~/.mujoco/(for more instructions on download see here), and use the following environment variablesexport LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$HOME/.mujoco/mujoco210/bin export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib/nvidia
To use the adroit envs, you would need
git clone --recursive https://github.com/nakamotoo/mj_envs.git cd mj_envs git submodule update --remote pip install -e .Download the adroit dataset from here and unzip the files into
~/adroit_data/. If you would like to put the adroit datasets into another directory, use the environment variableDATA_DIR_PREFIX(checkout the code here for more details).export DATA_DIR_PREFIX=/path/to/your/data -
Install Robomimic:
We extract log probabilities and entropy of GMM files in
robomimic/robomimic/algo/bc.pyincluded in this repo.cd robomimic pip install -e .Install
numpyandmujocopywith these versions if other repos change their versionspip install mujoco==3.1.6 numpy==1.26.1 -
Activate the conda environment before running any scripts.
-
Install
uv -
Clone this repository with
git clone --recursive. -
Install Mujoco with
./scripts/install_mujoco.sha. Ensure that your system has the requisite Mesa development headers installed; on Ubuntu, runsudo apt install libosmesa6-devb. Note that you will need to export the environment variables printed byinstall_mujoco.shto your.bashrcor.zshrc, or manually export them in your shell before running any scripts -
Install dependencies with
uv sync. -
To use the adroit envs, you would need
git clone --recursive https://github.com/nakamotoo/mj_envs.git cd mj_envs git submodule update --remote uv pip install -e .Download the adroit dataset from here and unzip the files into
~/adroit_data/. If you would like to put the adroit datasets into another directory, use the environment variableDATA_DIR_PREFIX(checkout the code here for more details).export DATA_DIR_PREFIX=/path/to/your/data -
Either run scripts with
uv runor executesource .venv/bin/activateto enter the virtual environment before running any scripts.
All BC policies and datasets are uploaded to huggingface here. Download using bash scripts/download.sh.
We follow a similar structure to WSRL.
All experiment scripts for q2rl and baselines are in the experiments/ directory.
You can modify the paths in the example scripts based on your setup.
Also export the repo to the python path export PYTHONPATH=/path/to/q2rl:$PYTHONPATH.
The example scripts do this for you.
To kill a running experiment, find the wandb group name from logs/, then run pkill -f "[wandb-group-name]".
If you like our work please cite us:
@inproceedings{dodeja2026q2rl,
title = {When Life Gives You BC, Make Q-functions:
Extracting Q-values from Behavior Cloning
for On-Robot Reinforcement Learning},
author = {Dodeja, Lakshita and Biza, Ondrej and Vats, Shivam and
Hart, Stephen and Tellex, Stefanie and Walters, Robin and
Schmeckpeper, Karl and Weng, Thomas},
booktitle = {Robotics: Science and Systems (RSS)},
year = {2026},
}
This repo is built upon the WSRL and SERL repositories.
This repository is released as-is to accompany a paper submission. If you find any bugs, corrections, or issues that should be resolved for anyone looking to reproduce the results in this repository, please file an issue, and we will look at it as soon as we can. For other improvements, including new features, we recommend creating your own fork of the repository.
