PEFT-Arena

Official code release for the PEFT-Arena paper:

PEFT-Arena: Understanding Parameter-Efficient Finetuning from a Stability-Plasticity Perspective

Yangyi Huang, Ruotian Peng, Zeju Qiu, Jiale Kang, Yandong Wen, Bernhard Schölkopf, Weiyang Liu

Project page: https://spherelab.ai/PEFT-Arena
GitHub: https://github.com/Sphere-AI-Lab/PEFT-Arena
Project site source: docs/

Abstract

PEFT-Arena studies parameter-efficient finetuning through the stability-plasticity dilemma: how much a post-trained model improves on the target domain, and how much of its pretrained general capability it retains.

This repository contains the official training, evaluation, and analysis code for the paper's main experimental workflows:

supervised finetuning (SFT)
reinforcement learning with verifiable rewards (RLVR / GRPO)
evaluation on target-domain and general-retention benchmarks
spectral retention-adaptation profiling and plotting

The benchmark covers two target domains:

mathematical reasoning
medical reasoning

and measures general capability retention on:

BBH
IFEval
NQ

The paper reports experiments on Qwen2.5-7B and Llama3.2-3B-Instruct, comparing full finetuning and representative PEFT families including LoRA variants, OFT, IA3, VeRA, MiSS, and KeepLoRA.

What this repo provides

post-training with SFT and RLVR
evaluation on target-domain and general-retention benchmarks
spectral analysis and figure generation used in the paper
Checkpoints and data

Highlights

A unified benchmark for evaluating PEFT beyond downstream accuracy alone.
A stability-plasticity view of SFT and RLVR post-training.
Spectral analysis tools for studying retention and adaptation structure in weight updates.
Reproducible training and evaluation entrypoints through a single CLI in run.py.

What Is Included

run.py: unified CLI for training, evaluation, and adapter merge
train/: SFT and RL training wrappers plus PEFT-Arena-owned trainer code
eval/: math, medical, and general evaluation pipelines
tools/: checkpoint preparation, merge, spectral analysis, and plotting
third_party/math_eval and third_party/med_eval: bundled target-domain evaluation code
third_party/opencompass and third_party/verl: external dependencies used by general evaluation and RL training
docs/: project website / GitHub Pages source

Installation

Run commands from the repository root.

You should start from a Python environment with a compatible CUDA / PyTorch stack. The release setup script installs the PEFT-Arena-side dependencies on top of that environment.

Typical setup:

bash setup_env.sh

This script:

validates and backfills required third_party/ components
installs training and evaluation dependencies
installs the patched math_eval/latex2sympy package with antlr4-python3-runtime==4.9.3
installs OpenCompass, VeRL, vllm, human-eval, and evalplus

If you only want to fetch or validate the third-party trees:

bash setup_third_party.sh

Notes:

third_party/math_eval and third_party/med_eval are included in this release.
third_party/human-eval is not tracked in git; setup_third_party.sh will copy or fetch it when needed.
OpenCompass benchmark data is not bundled. Some datasets download automatically on first use, while others such as IFEval and mmlu still require local dataset preparation under third_party/opencompass/data.

Benchmark Protocol

Target-domain evaluation

Math: math500, amc23, aime24
Medical: packaged med_eval benchmark set including medical QA and reasoning tasks used in the paper

General retention evaluation

bbh
ifeval_nq
extended wrappers for humaneval, hellaswag, winogrande, mmlu, arc, gsm8k, and xcopa

Training settings

SFT
- math data: filtered 50k samples from OpenR1-Math
- medical data: 23k samples from MedThink
RLVR
- PEFT-Arena RL training with GRPO
- release code keeps the RL dataset and async rollout / agent-loop path aligned with the current peft_arena implementation

Quick Start

1. SFT

python run.py train sft \
  --model Qwen/Qwen2.5-7B \
  --adapter lora \
  --lora_rank 16 \
  --lora_alpha 32 \
  --data_train data/openr1-50k/train.parquet \
  --data_val data/openr1-50k/test.parquet \
  --output_dir checkpoints/sft/math/qwen2.5-7b/lora-r16

2. RLVR / GRPO

python run.py train rl \
  --model Qwen/Qwen2.5-7B \
  --adapter oft \
  --oft_block_size 32 \
  --data_train data/openr1-50k/train.parquet \
  --data_val data/openr1-50k/test.parquet \
  --output_dir checkpoints/rl/math/qwen2.5-7b/oft-b32

3. Evaluation

Evaluate one checkpoint on all supported domains:

python run.py eval \
  --checkpoint_path checkpoints/sft/math/qwen2.5-7b/lora-r16/global_step_780 \
  --domain all

Evaluate general-retention benchmarks only:

python run.py eval \
  --checkpoint_path checkpoints/sft/med/qwen2.5-7b/oft-b16/global_step_364 \
  --domain general \
  --benchmarks bbh,ifeval_nq,humaneval

Convenience wrappers remain available:

bash eval/eval_math.sh --checkpoint_path <ckpt>
bash eval/eval_med.sh --checkpoint_path <ckpt>
bash eval/eval_general.sh --checkpoint_path <ckpt>

4. Checkpoint export and merge

Prepare a training checkpoint for evaluation:

python tools/prepare_eval_checkpoint.py \
  --checkpoint_path checkpoints/sft/med/qwen2.5-7b/oft-b16/global_step_364

Merge a PEFT adapter into a standalone Hugging Face checkpoint:

python run.py merge \
  --adapter_path checkpoints/sft/math/qwen2.5-7b/lora-r16/global_step_780 \
  --output_path checkpoints/sft/math/qwen2.5-7b/lora-r16/global_step_780_merged

5. Result summarization

python eval/summarize_results.py --results_dir results
python eval/extract_new_benchmark_summary.py \
  --input results/summary.csv \
  --output results/new_benchmark_summary.csv

6. Spectral analysis

Run spectral analysis for one base / finetuned pair:

python tools/spectral_analysis.py \
  --base_model Qwen/Qwen2.5-7B \
  --finetuned_model checkpoints/sft/math/qwen2.5-7b/lora-r8/global_step_780 \
  --output_dir analysis/math/sft/qwen2.5-7b/lora-r8/global_step_780 \
  --layers 18 \
  --modules down_proj

Plot the analysis outputs:

python tools/plot_spectral_analysis.py \
  --input_dirs analysis/math/sft/qwen2.5-7b/full/global_step_780 analysis/math/sft/qwen2.5-7b/oft-b32/global_step_780 analysis/math/sft/qwen2.5-7b/lora-r8/global_step_780 \
  --labels SFT-FullFT SFT-OFT-b32 SFT-LoRA-r8 \
  --output_dir analysis/plot_sft_spectrum \
  --plot_type curves \
  --layer_names "model.layers.18.mlp.down_proj.weight" \
  --log_scale

Composite and utility plots:

python tools/plot_spectral_composite.py \
  --output analysis/plot_composite/composite_layer18_mlp_down_proj.pdf

python tools/plot_spectral_method_grid.py \
  --output analysis/plot_composite/method_grid_layer18_mlp_down_proj.pdf

python tools/compare_model_norms.py \
  --base-model Qwen/Qwen2.5-7B \
  --sft-model checkpoints/sft/math/qwen2.5-7b/lora-r8/global_step_780 \
  --output analysis/model_norms/qwen2.5-7b_lora-r8

Resources

Experiment Checkpoints

Training Data

Repository Layout

peft_arena_release/
├── run.py
├── setup_env.sh
├── setup_third_party.sh
├── configs/
├── docs/
├── train/
├── eval/
├── tools/
├── tests/
└── third_party/

Citation

If you find PEFT-Arena useful in your research, please cite:

@misc{huang2026peftarena,
  title={PEFT-Arena: Understanding Parameter-Efficient Finetuning from a Stability-Plasticity Perspective},
  author={Yangyi Huang and Ruotian Peng and Zeju Qiu and Jiale Kang and Yandong Wen and Bernhard Sch\"olkopf and Weiyang Liu},
  year={2026},
}

License

This repository is released under the MIT License. See LICENSE.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PEFT-Arena

Abstract

What this repo provides

Highlights

What Is Included

Installation

Benchmark Protocol

Target-domain evaluation

General retention evaluation

Training settings

Quick Start

1. SFT

2. RLVR / GRPO

3. Evaluation

4. Checkpoint export and merge

5. Result summarization

6. Spectral analysis

Resources

Repository Layout

Citation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
assets		assets
configs		configs
docs		docs
eval		eval
tests		tests
third_party		third_party
tools		tools
train		train
.gitignore		.gitignore
.gitmodules		.gitmodules
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
run.py		run.py
setup_env.sh		setup_env.sh
setup_third_party.sh		setup_third_party.sh

Folders and files

Latest commit

History

Repository files navigation

PEFT-Arena

Abstract

What this repo provides

Highlights

What Is Included

Installation

Benchmark Protocol

Target-domain evaluation

General retention evaluation

Training settings

Quick Start

1. SFT

2. RLVR / GRPO

3. Evaluation

4. Checkpoint export and merge

5. Result summarization

6. Spectral analysis

Resources

Repository Layout

Citation

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages