This is a proposal to support gates of high multiplicative depth \mathcal{O}(\sqrt{n}) with additional-wire number of \mathbb{G}_1 elements increase in proof size over an equivalent Plonk circuit of size n. Proving cost is essentially the same or less and SRS is the same, lower bounded by the number of witness values due to permutation argument.
The idea is to take multi-polynomial commitments to the extreme, and adaptively choosing a gate depth based on necessity, so that a single row in the evaluation domain can accomodate multiple constraints.
Seemingly, one can also reduce the size of the evaluation domain, and hence reduce the addicity requirements on the underlying field.
The polynomial t(X) will have degree n, having in the worst case a summand consisting of the product of \sqrt{n} polynomials each of degree \sqrt{n}. The key summand in t(X) will be:
\sum_{i=0}^k q_i \sum_{j=0}^{2^{k-i}-1} \alpha^jG_{i, j}\left(\deg = 2^i, (q_{i,j,l})_{l=1}^{l_{j,i}}, (W_v)_{v=j\cdot2^i}^{(j+1)\cdot 2^i-1}\right)
where k = 1/2 \log_2 n, where \alpha is a random oracle value.
In practice, one can set G_{i, j} = G_i, and q_{i, j, l} = q_{i, l}.
The \sqrt{n} permutation summands (across \sqrt{n} polynomials) will also each have degree \sqrt{n}.
Similarly one will have \sqrt{n} witness polynomials/wires (Y_i)_{i=1}^{2^k} each of degree/length \sqrt{n}.
I suspect this has a relation to SPLONK, but it remains to be clarified. The relation seems to be in how wire values relate to one another, especially in terms of intermediate terms. A major saving seems to be to get rid of intermediate terms.
The cost is that the verifier must perform more exponentiations in the individual polynomials q. Hence, in practice, one would choose the maximum multiplicative depth to be constant. For instance, a sensible choice like 6 can create a gate that dynamically multiplexes across 2^5= 32 different execution paths in a single gate.
Although for a circuit of size 2^{20}, 2^{10} exponentiations is still relatively small.
Furthermore, the communication complexity/proof size will also increase, to the number of distinct wire values, which here can be up to \sqrt{n}. Again, this is relatively small, being up to 32-48KB for n=2^{20}.
One major application I see for such a circuit is to express complicated boolean logic, either in CNF or DNF, in a single gate, rather than have to define multiple intermediate variables. The nice thing is that the q values can be configured to give any boolean circuit with a given arity.
These boolean circuits can help express logic for complicated gates involving many branching conditions in a single gate (e.g. floating point), potentially saving a great number of constraints.
Another application is to express a product \prod_{i=1}^{2^k} a_i of distinct values. This can save up to 2^{k-1} intermediate values per such gate.
Overall, the choice whether to use such optimisations depends on how dominant such high multiplicative-depth gates are in a circuit. Any optimising compiler must make such choices guided by a cost model.