reduce proof size by using stacked PCS and jagged PCS

# Ceno + Jagged PCS

The goal is to break the linear relation between proof size of the number of chips.

## The current workflow of prover

1. `batch_commit`: Commit to each chip's trace matrix in a batch.

2. `tower_prove`: For each chip, run tower prover to accumulate its memory read / write records (over extension field $E$) and lookup records. Note the read/write records are placed as leaves in a complete binary tree (if the total number of records is not two's power, then pad it with `E::ONE`).
    Note that the tower prover works in the GKR way, meaning it reduce the output layer to the input layer through a sequence of layer sumcheck.

3. `main sumcheck`: For each chip, run main sumcheck to the prove the tower's input layer are constructed from the trace polynomials with proper padding and also enforce the zerocheck between trace polynomials.

4. `batch open`: At this phase, we need to open a sequence of multilinear polynomial evaluations per trace matrix $M_i$ produced by step 3.
  ```math
  \vec{v}: v_{i,j} = M_i[j](r_i)
  ```   
     

![Image](https://github.com/user-attachments/assets/4c3507fa-6f04-40c3-9609-a8c57f723725)

## Optimization 1: batch multiple main sumchecks

In this version, we plan to prove the main sumcheck for multiple chips in a batch.

At the end, the batched main sumcheck prover will output a sequence of evaluations 
```math 
\vec{v}: v_{i,j} = M_i[j](r_i)
```
with $r_i = r[(n-s)..n]$ per chip $i$'s trace matrix $M_i$ (assume all polynomials in $M_i$ have $s$ variables).

## Optimization 2: pack all polynomials

The main idea is borrowed from SP1's [Jagged PCS paper](https://eprint.iacr.org/2025/917.pdf).

The main idea is to concatenate/pack all the trace polynomials together and form a new "giga" multilinear polynomial. That is, `q = p0 || p1 || ... || pN`.

The main lemmas of jagged pcs paper is
1. The accumulative heights $\vec{t_i}$ with $t_{-1} = 0, t_i = t_{i-1} + h_i$ where $h_i$ is $p_i(x)$'s number of non-zero evaluations in the boolean hypercube.

2. There is a bijective mapping between $(i,r)$ and $b$: $$b = t_{i-1} + r.$$

3. It's important to see that $$q[b] = \sum_r \sum_{0 \leq i \leq N} p_i(r) * g(r,b,t_{i-1}, t_i)$$
   where $g(r,b,c,d) = 1$ if and only if $r + c = b$ and $b < d$.

   Substituting by $c = t_{i-1}, d = t_i$, this means $g(r,b,t_{i-1},t_i) = 1$ if and only if 

    - $b < t_i$ and $b \geq t_{i-1}$;
    - $r = b - t_{i-1}$ is the exact index of $q[b]$ in $p_i(x)$

   > **key insight**: Given b, we can **locate** the corresponding $(i,r)$ by doing summation over $g(r,b,t_{i-1},t_i)$.
   
### verifying $N+1$ evaluations by a single evaluation

In order to reduce the evaluations $v_i = p_i(z_r)$ to the evaluation of $q(\cdot)$. The reduction can be proceeded as follows:

1. Sample a new random point $z_i$;
2. $v = \sum_i \textrm{eq}(z_i,i) * v_i$;
3. It's easy to see $$v = \sum_{i,r} p_i(r) * \textrm{eq}(z_i, i) * \textrm{eq}(z_r, r).$$
4. The above equation can be rewritten as $v = \sum_b q(b) * f(b)$ where $$f(b) = \textrm{eq}(z_i,i = \textrm{col}(b))*\textrm{eq}(z_r,r = \textrm{row}(b)).$$
5. By using the lemma 3 above, we can get $$f(b) = \sum_{i,r} \textrm{eq}(z_i,i)*\textrm{eq}(z_r,r)*g(r,b,t_{i-1},t_i).$$

Step 4 is the so-called **"jagged"** sumcheck which reduces the validity of $\vec{v}_i$ to $q(z_b)$ and $f(z_b)$.

From step 5, we know that $$f(z_b) = \sum_i \textrm{eq}(z_i, i) * g(z_r, z_b, t_{i-1}, t_i).$$

The main technical result from jagged paper is $g(z_r,z_b, t_{i-1}, t_i)$ can be **succinctly evaluated** by verifier itself.

> Note: we can also use the generic sparse polynomial commitment like SPARK to evaluate $E_i(b) = \textrm{eq}(z_i, \textrm{col}(b))$ and $E_r(b) = \textrm{eq}(z_r, \textrm{row}(b))$, but that comes at pretty large overhead (**Memory In The Head** technique).

### New workflow on GPU

1. Copy trace row major matrices to GPU and then transpose them to column major (aka. polynomials).  

2. `batch commit`: concat the above polynomials to get $q$, and then we commit to it.

3. (optional) Since the multilinear polynomial `q` could be very large, we may want to **reshape** it as a fixed height matrix (even as several smaller matrices each has fixed height $h$ and fixed width $w$). Then we can commit it in a streaming way. (`q = M_1 || M_2 || ...` just like what Brakedown/Ligero does)

4. `tower_prove`: for each chip, run tower prover.

5. `batch main sumcheck`: prove the main sumchecks in each chip in a batch.

6. `jagged sumcheck`: reduce $v = \sum_i \textrm{eq}(z_i,i)*v_i$ to the evaluation of $q, f$ at point $z_b$. note that to evaluate $f(z_b)$ the verifier needs to evaluate $N+1$ evaluations of form $u_i = g(z_r, z_b, t_{i-1}, t_i)$. 

7. Since $N$ is pretty large, we need to run another sumcheck (so-called **"jagged assist" sumcheck** in the paper) to reduce $\vec{u}$ to $g(z)$

8. (optional) if we do the reshape process, then we need to reduce $q(z_b)$ to $M_j(z_2)$ where $z_b = (z_1, z_2)$.

9. `open`: open $q(z_b)$ using basefold/WHIR.

### suffix to prefix

The main idea is from @hero78119 .

Since the evaluations outputed by the batched main sumcheck prover is kind of suffix aligned. We need to do some changes to fit in the jagged protocol.

- `pi' = bitrev(pi)`.
- `q' = p1' || p2' || ...`.
- $v_i = p_i(z_r[s..]) = p_i^{'}(z_r^{'})$ with $z_r^{'} = \textrm{reverse}(z_r)$.


1. In the batch commit step, we commit to `q'`.
2. In the jagged sumcheck step, we reduce $v$ to the validity of $q'(z_b)$ and $f(z_b)$.
3. In the jagged assist sumcheck step, we reduce $u_i = g(z_r^{'}, z_b, t_{i-1}, t_i)$ to a single evaluation of $g$.
4. In the open step, we open $q'(z_b)$.

![Image](https://github.com/user-attachments/assets/4911a228-f6d8-4151-8082-a46c721e82ec)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

reduce proof size by using stacked PCS and jagged PCS #1272

Ceno + Jagged PCS

The current workflow of prover

Optimization 1: batch multiple main sumchecks

Optimization 2: pack all polynomials

verifying $N+1$ evaluations by a single evaluation

New workflow on GPU

suffix to prefix

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

reduce proof size by using stacked PCS and jagged PCS #1272

Description

Ceno + Jagged PCS

The current workflow of prover

Optimization 1: batch multiple main sumchecks

Optimization 2: pack all polynomials

verifying $N+1$ evaluations by a single evaluation

New workflow on GPU

suffix to prefix

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions