Ceno + Jagged PCS
The goal is to break the linear relation between proof size of the number of chips.
The current workflow of prover
-
batch_commit: Commit to each chip's trace matrix in a batch.
-
tower_prove: For each chip, run tower prover to accumulate its memory read / write records (over extension field $E$) and lookup records. Note the read/write records are placed as leaves in a complete binary tree (if the total number of records is not two's power, then pad it with E::ONE).
Note that the tower prover works in the GKR way, meaning it reduce the output layer to the input layer through a sequence of layer sumcheck.
-
main sumcheck: For each chip, run main sumcheck to the prove the tower's input layer are constructed from the trace polynomials with proper padding and also enforce the zerocheck between trace polynomials.
-
batch open: At this phase, we need to open a sequence of multilinear polynomial evaluations per trace matrix $M_i$ produced by step 3.
$$\vec{v}: v_{i,j} = M_i[j](r_i)$$

Optimization 1: batch multiple main sumchecks
In this version, we plan to prove the main sumcheck for multiple chips in a batch.
At the end, the batched main sumcheck prover will output a sequence of evaluations
$$\vec{v}: v_{i,j} = M_i[j](r_i)$$
with $r_i = r[(n-s)..n]$ per chip $i$'s trace matrix $M_i$ (assume all polynomials in $M_i$ have $s$ variables).
Optimization 2: pack all polynomials
The main idea is borrowed from SP1's Jagged PCS paper.
The main idea is to concatenate/pack all the trace polynomials together and form a new "giga" multilinear polynomial. That is, q = p0 || p1 || ... || pN.
The main lemmas of jagged pcs paper is
-
The accumulative heights $\vec{t_i}$ with $t_{-1} = 0, t_i = t_{i-1} + h_i$ where $h_i$ is $p_i(x)$'s number of non-zero evaluations in the boolean hypercube.
-
There is a bijective mapping between $(i,r)$ and $b$: $$b = t_{i-1} + r.$$
-
It's important to see that $$q[b] = \sum_r \sum_{0 \leq i \leq N} p_i(r) * g(r,b,t_{i-1}, t_i)$$
where $g(r,b,c,d) = 1$ if and only if $r + c = b$ and $b < d$.
Substituting by $c = t_{i-1}, d = t_i$, this means $g(r,b,t_{i-1},t_i) = 1$ if and only if
-
$b < t_i$ and $b \geq t_{i-1}$;
-
$r = b - t_{i-1}$ is the exact index of $q[b]$ in $p_i(x)$
key insight: Given b, we can locate the corresponding $(i,r)$ by doing summation over $g(r,b,t_{i-1},t_i)$.
verifying $N+1$ evaluations by a single evaluation
In order to reduce the evaluations $v_i = p_i(z_r)$ to the evaluation of $q(\cdot)$. The reduction can be proceeded as follows:
- Sample a new random point $z_i$;
-
$v = \sum_i \textrm{eq}(z_i,i) * v_i$;
- It's easy to see $$v = \sum_{i,r} p_i(r) * \textrm{eq}(z_i, i) * \textrm{eq}(z_r, r).$$
- The above equation can be rewritten as $v = \sum_b q(b) * f(b)$ where $$f(b) = \textrm{eq}(z_i,i = \textrm{col}(b))*\textrm{eq}(z_r,r = \textrm{row}(b)).$$
- By using the lemma 3 above, we can get $$f(b) = \sum_{i,r} \textrm{eq}(z_i,i)*\textrm{eq}(z_r,r)*g(r,b,t_{i-1},t_i).$$
Step 4 is the so-called "jagged" sumcheck which reduces the validity of $\vec{v}_i$ to $q(z_b)$ and $f(z_b)$.
From step 5, we know that $$f(z_b) = \sum_i \textrm{eq}(z_i, i) * g(z_r, z_b, t_{i-1}, t_i).$$
The main technical result from jagged paper is $g(z_r,z_b, t_{i-1}, t_i)$ can be succinctly evaluated by verifier itself.
Note: we can also use the generic sparse polynomial commitment like SPARK to evaluate $E_i(b) = \textrm{eq}(z_i, \textrm{col}(b))$ and $E_r(b) = \textrm{eq}(z_r, \textrm{row}(b))$, but that comes at pretty large overhead (Memory In The Head technique).
New workflow on GPU
-
Copy trace row major matrices to GPU and then transpose them to column major (aka. polynomials).
-
batch commit: concat the above polynomials to get $q$, and then we commit to it.
-
(optional) Since the multilinear polynomial q could be very large, we may want to reshape it as a fixed height matrix (even as several smaller matrices each has fixed height $h$ and fixed width $w$). Then we can commit it in a streaming way. (q = M_1 || M_2 || ... just like what Brakedown/Ligero does)
-
tower_prove: for each chip, run tower prover.
-
batch main sumcheck: prove the main sumchecks in each chip in a batch.
-
jagged sumcheck: reduce $v = \sum_i \textrm{eq}(z_i,i)*v_i$ to the evaluation of $q, f$ at point $z_b$. note that to evaluate $f(z_b)$ the verifier needs to evaluate $N+1$ evaluations of form $u_i = g(z_r, z_b, t_{i-1}, t_i)$.
-
Since $N$ is pretty large, we need to run another sumcheck (so-called "jagged assist" sumcheck in the paper) to reduce $\vec{u}$ to $g(z)$
-
(optional) if we do the reshape process, then we need to reduce $q(z_b)$ to $M_j(z_2)$ where $z_b = (z_1, z_2)$.
-
open: open $q(z_b)$ using basefold/WHIR.
suffix to prefix
The main idea is from @hero78119 .
Since the evaluations outputed by the batched main sumcheck prover is kind of suffix aligned. We need to do some changes to fit in the jagged protocol.
-
pi' = bitrev(pi).
-
q' = p1' || p2' || ....
-
$v_i = p_i(z_r[s..]) = p_i^{'}(z_r^{'})$ with $z_r^{'} = \textrm{reverse}(z_r)$.
- In the batch commit step, we commit to
q'.
- In the jagged sumcheck step, we reduce $v$ to the validity of $q'(z_b)$ and $f(z_b)$.
- In the jagged assist sumcheck step, we reduce $u_i = g(z_r^{'}, z_b, t_{i-1}, t_i)$ to a single evaluation of $g$.
- In the open step, we open $q'(z_b)$.

Ceno + Jagged PCS
The goal is to break the linear relation between proof size of the number of chips.
The current workflow of prover
batch_commit: Commit to each chip's trace matrix in a batch.tower_prove: For each chip, run tower prover to accumulate its memory read / write records (over extension fieldE::ONE).Note that the tower prover works in the GKR way, meaning it reduce the output layer to the input layer through a sequence of layer sumcheck.
main sumcheck: For each chip, run main sumcheck to the prove the tower's input layer are constructed from the trace polynomials with proper padding and also enforce the zerocheck between trace polynomials.batch open: At this phase, we need to open a sequence of multilinear polynomial evaluations per trace matrixOptimization 1: batch multiple main sumchecks
In this version, we plan to prove the main sumcheck for multiple chips in a batch.
At the end, the batched main sumcheck prover will output a sequence of evaluations
with$r_i = r[(n-s)..n]$ per chip $i$ 's trace matrix $M_i$ (assume all polynomials in $M_i$ have $s$ variables).
Optimization 2: pack all polynomials
The main idea is borrowed from SP1's Jagged PCS paper.
The main idea is to concatenate/pack all the trace polynomials together and form a new "giga" multilinear polynomial. That is,
q = p0 || p1 || ... || pN.The main lemmas of jagged pcs paper is
The accumulative heights$\vec{t_i}$ with $t_{-1} = 0, t_i = t_{i-1} + h_i$ where $h_i$ is $p_i(x)$ 's number of non-zero evaluations in the boolean hypercube.
There is a bijective mapping between$(i,r)$ and $b$ : $$b = t_{i-1} + r.$$
It's important to see that$$q[b] = \sum_r \sum_{0 \leq i \leq N} p_i(r) * g(r,b,t_{i-1}, t_i)$$ $g(r,b,c,d) = 1$ if and only if $r + c = b$ and $b < d$ .
where
Substituting by$c = t_{i-1}, d = t_i$ , this means $g(r,b,t_{i-1},t_i) = 1$ if and only if
verifying$N+1$ evaluations by a single evaluation
In order to reduce the evaluations$v_i = p_i(z_r)$ to the evaluation of $q(\cdot)$ . The reduction can be proceeded as follows:
Step 4 is the so-called "jagged" sumcheck which reduces the validity of$\vec{v}_i$ to $q(z_b)$ and $f(z_b)$ .
From step 5, we know that$$f(z_b) = \sum_i \textrm{eq}(z_i, i) * g(z_r, z_b, t_{i-1}, t_i).$$
The main technical result from jagged paper is$g(z_r,z_b, t_{i-1}, t_i)$ can be succinctly evaluated by verifier itself.
New workflow on GPU
Copy trace row major matrices to GPU and then transpose them to column major (aka. polynomials).
batch commit: concat the above polynomials to get(optional) Since the multilinear polynomial$h$ and fixed width $w$ ). Then we can commit it in a streaming way. (
qcould be very large, we may want to reshape it as a fixed height matrix (even as several smaller matrices each has fixed heightq = M_1 || M_2 || ...just like what Brakedown/Ligero does)tower_prove: for each chip, run tower prover.batch main sumcheck: prove the main sumchecks in each chip in a batch.jagged sumcheck: reduceSince$N$ is pretty large, we need to run another sumcheck (so-called "jagged assist" sumcheck in the paper) to reduce $\vec{u}$ to $g(z)$
(optional) if we do the reshape process, then we need to reduce$q(z_b)$ to $M_j(z_2)$ where $z_b = (z_1, z_2)$ .
open: opensuffix to prefix
The main idea is from @hero78119 .
Since the evaluations outputed by the batched main sumcheck prover is kind of suffix aligned. We need to do some changes to fit in the jagged protocol.
pi' = bitrev(pi).q' = p1' || p2' || ....q'.