🍋 When Life Gives You BC, Make Q-functions:🍹
Extracting Q-values from Behavior Cloning
for On-Robot Reinforcement Learning

Paper | Website

Lakshita Dodeja^1,2, Ondrej Biza¹, Shivam Vats², Stephen Hart¹, Stefanie Tellex², Robin Walters³, Karl Schmeckpeper¹, Thomas Weng¹

¹Robotics and AI Institute, ²Brown University, ³Northeastern University

Installation

We provide installation instructions with conda and uv.

Install with conda

Setup Conda Environment:

conda create -n q2rl python=3.10 -y
conda activate q2rl

Install other requirements and JAX:
```
pip install -r requirements.txt
```
Install D4RL and Adroit Envs:

We use the D4RL and Adroit Envs versions from WSRL repo. Copying instructions from WSRL:

This fork incorporates the antmaze-ultra environments and fixes the kitchen environment rewards to be consistent between the offline dataset and the environment.
```
git clone git@github.com:zhouzypaul/D4RL.git
cd D4RL
pip install -e .
```
To use Mujoco, you would also need to install mujoco manually to ~/.mujoco/ (for more instructions on download see here), and use the following environment variables
```
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$HOME/.mujoco/mujoco210/bin
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib/nvidia
```
To use the adroit envs, you would need
```
git clone --recursive https://github.com/nakamotoo/mj_envs.git
cd mj_envs
git submodule update --remote
pip install -e .
```
Download the adroit dataset from here and unzip the files into ~/adroit_data/. If you would like to put the adroit datasets into another directory, use the environment variable DATA_DIR_PREFIX (checkout the code here for more details).
```
export DATA_DIR_PREFIX=/path/to/your/data
```
Install Robomimic:

We extract log probabilities and entropy of GMM files in robomimic/robomimic/algo/bc.py included in this repo.
```
cd robomimic 
pip install -e .
```
Install numpy and mujocopy with these versions if other repos change their versions
```
pip install mujoco==3.1.6 numpy==1.26.1
```
Activate the conda environment before running any scripts.

Install with uv

Install uv
Clone this repository with git clone --recursive.
Install Mujoco with ./scripts/install_mujoco.sh a. Ensure that your system has the requisite Mesa development headers installed; on Ubuntu, run sudo apt install libosmesa6-dev b. Note that you will need to export the environment variables printed by install_mujoco.sh to your .bashrc or .zshrc, or manually export them in your shell before running any scripts
Install dependencies with uv sync.
To use the adroit envs, you would need
```
git clone --recursive https://github.com/nakamotoo/mj_envs.git
cd mj_envs
git submodule update --remote
uv pip install -e .
```
Download the adroit dataset from here and unzip the files into ~/adroit_data/. If you would like to put the adroit datasets into another directory, use the environment variable DATA_DIR_PREFIX (checkout the code here for more details).
```
export DATA_DIR_PREFIX=/path/to/your/data
```
Either run scripts with uv run or execute source .venv/bin/activate to enter the virtual environment before running any scripts.

Running Experiments

All BC policies and datasets are uploaded to huggingface here. Download using bash scripts/download.sh.

We follow a similar structure to WSRL.

All experiment scripts for q2rl and baselines are in the experiments/ directory. You can modify the paths in the example scripts based on your setup.

Also export the repo to the python path export PYTHONPATH=/path/to/q2rl:$PYTHONPATH. The example scripts do this for you.

To kill a running experiment, find the wandb group name from logs/, then run pkill -f "[wandb-group-name]".

Citation

If you like our work please cite us:

@inproceedings{dodeja2026q2rl,
  title     = {When Life Gives You BC, Make Q-functions:
               Extracting Q-values from Behavior Cloning
               for On-Robot Reinforcement Learning},
  author    = {Dodeja, Lakshita and Biza, Ondrej and Vats, Shivam and
               Hart, Stephen and Tellex, Stefanie and Walters, Robin and
               Schmeckpeper, Karl and Weng, Thomas},
  booktitle = {Robotics: Science and Systems (RSS)},
  year      = {2026},
}

Credits

This repo is built upon the WSRL and SERL repositories.

This repository is released as-is to accompany a paper submission. If you find any bugs, corrections, or issues that should be resolved for anyone looking to reproduce the results in this repository, please file an issue, and we will look at it as soon as we can. For other improvements, including new features, we recommend creating your own fork of the repository.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
backend		backend
experiments		experiments
main		main
robomimic		robomimic
scripts		scripts
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
Q2RL.jpg		Q2RL.jpg
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🍋 When Life Gives You BC, Make Q-functions:🍹
Extracting Q-values from Behavior Cloning
for On-Robot Reinforcement Learning

Paper | Website

Installation

Install with conda

Install with uv

Running Experiments

Citation

Credits

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🍋 When Life Gives You BC, Make Q-functions:🍹 Extracting Q-values from Behavior Cloning for On-Robot Reinforcement Learning

Paper | Website

Installation

Install with conda

Install with uv

Running Experiments

Citation

Credits

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

🍋 When Life Gives You BC, Make Q-functions:🍹
Extracting Q-values from Behavior Cloning
for On-Robot Reinforcement Learning

Packages