Supporting High-Arity and High-Depth Gates with Constant Cost

This is a proposal to support gates of high multiplicative depth \mathcal{O}(\sqrt{n}) with additional-wire number of \mathbb{G}_1 elements increase in proof size over an equivalent Plonk circuit of size n. Proving cost is essentially the same or less and SRS is the same, lower bounded by the number of witness values due to permutation argument.

The idea is to take multi-polynomial commitments to the extreme, and adaptively choosing a gate depth based on necessity, so that a single row in the evaluation domain can accomodate multiple constraints.

Seemingly, one can also reduce the size of the evaluation domain, and hence reduce the addicity requirements on the underlying field.

The polynomial t(X) will have degree n, having in the worst case a summand consisting of the product of \sqrt{n} polynomials each of degree \sqrt{n}. The key summand in t(X) will be:

\sum_{i=0}^k q_i \sum_{j=0}^{2^{k-i}-1} \alpha^jG_{i, j}\left(\deg = 2^i, (q_{i,j,l})_{l=1}^{l_{j,i}}, (W_v)_{v=j\cdot2^i}^{(j+1)\cdot 2^i-1}\right)

where k = 1/2 \log_2 n, where \alpha is a random oracle value.

In practice, one can set G_{i, j} = G_i, and q_{i, j, l} = q_{i, l}.

The \sqrt{n} permutation summands (across \sqrt{n} polynomials) will also each have degree \sqrt{n}.

Similarly one will have \sqrt{n} witness polynomials/wires (Y_i)_{i=1}^{2^k} each of degree/length \sqrt{n}.

I suspect this has a relation to SPLONK, but it remains to be clarified. The relation seems to be in how wire values relate to one another, especially in terms of intermediate terms. A major saving seems to be to get rid of intermediate terms.

The cost is that the verifier must perform more exponentiations in the individual polynomials q. Hence, in practice, one would choose the maximum multiplicative depth to be constant. For instance, a sensible choice like 6 can create a gate that dynamically multiplexes across 2^5= 32 different execution paths in a single gate.

Although for a circuit of size 2^{20}, 2^{10} exponentiations is still relatively small.

Furthermore, the communication complexity/proof size will also increase, to the number of distinct wire values, which here can be up to \sqrt{n}. Again, this is relatively small, being up to 32-48KB for n=2^{20}.

One major application I see for such a circuit is to express complicated boolean logic, either in CNF or DNF, in a single gate, rather than have to define multiple intermediate variables. The nice thing is that the q values can be configured to give any boolean circuit with a given arity.

These boolean circuits can help express logic for complicated gates involving many branching conditions in a single gate (e.g. floating point), potentially saving a great number of constraints.

Another application is to express a product \prod_{i=1}^{2^k} a_i of distinct values. This can save up to 2^{k-1} intermediate values per such gate.

Overall, the choice whether to use such optimisations depends on how dominant such high multiplicative-depth gates are in a circuit. Any optimising compiler must make such choices guided by a cost model.

A potentially more elegant solution to achieving moderate multiplicative depth is just to use the existing z\omega evaluation point to combine two rows into one constraint, and permit a much larger arity. One is able to evaluate an up to 128-way multiplexing function.

If one intends not to increase the proof size and verifier cost in terms of splitting t(x), this comes at the cost of larger SRS and more prover exponentiations. So one gets tradeoffs between proof size and prover costs.

My favourite solution is to combine wires into evaluation points. This applies pleasing square-root arguments.

With 4 wire-commitment elements a, b, c, d and 4 evaluation points z, z\omega, z\omega^2, z\omega^3, just 2 more \mathbb{G}_1 than currently used for 4-wire PLONK, one can make use of 16 variables in one gate.

With just 2 more than that, one gets 25, 36, 49, 64 and so on, an n^2/2n scaling in gate arity. Note that increasing arity in this way does not increase prover exponentiations at all.

Taking into account t(X) size as well, one gets a factor \sqrt{n} scaling. For instance, against the base case of fast prover, for SRS of size 4x and prover exp. of 4a v.s. a, and the same cost of 4 \mathbb{G_1}, one can get a maximum gate degree of 16 v.s. 4. This is relatively little.

TODO: Take into account \mathbb{F} into proof size as well.

A first application to test the high-arity idea is the bilinear form over a field (\mathbb{F} or \mathbb{F}_2). To my knowledge, R1CS will be k times as inefficient for a k-dimensional bilinear form.
\sum_{i,j=1}^k a_{i,j}x_iy_j, where x, y are witness values.
R1CS: \sum_{i=1}^k x_i\left(\sum_{j=1}^ka_{i,j}y_j\right). k+1 constraints.

An application in my mind of high-arity, high-depth gates is to be used in any rollup or recursion type proof system. The inner proofs can have large proof sizes but smaller proving times, while the outer proof can be smaller.