## The material here is outdated. An exclusive survey is now given in this book with Igor Sergeev.

### On XOR versus OR circuits

Let $A$ be a boolean $n\times n$ matrix. We are interested in the circuit complexity of computing the linear transformation $y=Ax$ using, respectively, only OR, only XOR, or only SUM operations. That is, we want to compute this transformation, respectively, over boolean semiring $(\{0,1\},\lor,\land)$, over the field $(\{0,1\},\oplus,\cdot)$, and over semiring $(N,+,\cdot)$ of natural numbers. We allow unbounded fanin gates. The size of a circuit is the number of wires. So, let $\OR(A)$, $\XOR(A)$ and $\SUM(A)$ denote, respectively the smallest number of wires in circuit with unbounded fanin OR, XOR and SUM gates computing $y=Ax$. If we speak only about circuits of depth $\leq d$, then the corresponding measures are $\OR_d(A)$, $\XOR_d(A)$ and $\SUM_d(A)$. Algebraically, these measures give the smallest weight $\sum_{i=1}^d|B_i|$ of the presentation of $A$ as a product $A=B_1\cdot B_2\cdots B_d$ of boolean matrices over the corresponding semirings, where $|B_i|$ is the number of $1$s in $B_i$. In particular, this observation implies the following simple (but handy) fact concerning depth-2 circuits.
Fact 1: If $A=P\cdot Q^T$ for $n\times r$ matrices $P$ and $Q$, then all $\SUM_2(A)$, $\OR_2(A)$ and $\XOR_2(A)$ are at most $2rn$.

SUM, OR and XOR circuits computing $y=Ax$ may be viewed as rectifier networks representing the matrix $A$. A rectifier network is directed acyclic graph with $n$ input nodes (fanin = $0$), and $n$ output nodes (fanout = $0$). Let $p(i,j)$ be the number of paths from input $i$ to output $j$. Then a network represents $A$ if $\mbox{in SUM circuit}\ \ A[i,j]=\begin{cases} 1 & \mbox{if p(i,j)=1,}\\ 0 & \mbox{if p(i,j)=0;} \end{cases}$ $\mbox{in OR circuit}\ \ A[i,j]=\begin{cases} 1 & \mbox{if p(i,j)>0,}\\ 0 & \mbox{if p(i,j)=0;} \end{cases}$ $\mbox{in XOR circuit}\ \ A[i,j]=\begin{cases} 1 & \mbox{if p(i,j) is odd,}\\ 0 & \mbox{if p(i,j) is even.} \end{cases}$

Note that SUM-circuits constitute the weakest of these models: in such a circuit, there cannot be more than one path from an input variable to an output gate. So, both $\OR(A)$ and $\XOR(A)$ are at most $\SUM(A)$. The most general circuit model is that of "general" circuits, where arbitrary(!) boolean functions are allowed to be used as gates. The corresponding measure $\GEN(A)$ is an obvious lower bound for all three measures above, in this case we want to compute $y=Ax$ over $GF(2)$ using arbitrary gates. Thus, we have the obvious inequalities $\GEN(A)\leq \OR(A), \XOR(A)\leq \SUM(A).$ for all matrices $A$. We are interested in maximal possible gaps between these measures on $n\times n$ matrices. For the maximum $\OR(n),\XOR(n)$ and $\SUM(n)$ over all such matrices, Lupanov's result (Thm. 13.18 in the book) implies that all they are $\Theta(n^2/\log n)$: $\SUM(n)\approx \OR(n) \approx \XOR(n)\approx \frac{n^2}{\log n}.$ Thus, there can be no nonconstant "worst-case" gaps between these three measures. [It remains still open whether $\GEN(n)=\Omega(n^2/\log n)$ also holds.] But if we measure the complexity of the same matrix $A$, then large gaps $\OR(A)/XOR(A)=\Omega(n/\log^2 n)$ and $\SUM(A)/\OR(A)=\Omega(\sqrt{n}/\log^2n)$ were shown in [1]. The goal of this comment is to show that the OR/XOR gap actually follows from some results in the book, and the proofs are elementary. Moreover, the constructed matrix $A$ has the property that $\XOR_d(A)=O(n)$ holds even in depth $d=3$.

Theorem 1: There exists an $n\times n$ matrix $A$ such that $\XOR_3(A)=O(n)$ but $\OR(A)=\Omega(n^2/\log^2 n)$.

Let us now give some details. Recall that a matrix $A$ is $k$-Ramsey if it contains no monochromatic $k\times k$ submatrices, and is $k$-free ($k\geq 2$) if it does not contain a $k\times k$ all-$1$ submatrix. Hence, $A$ is $k$-Ramsey iff both the matrix $A$ and its complement$\overline{A}$ are $k$-free. Let $|A|$ denote the number of $1$-entries in $A$. The results from the book we use are the following; for convenience of readers who do not have the book, we recall their proofs in the appendix bellow.

(i) Lupanov (Lemma 1.2 in the book)
For every $n\times m$ matrix $A$, and every positive integer $k$, $\SUM_2(A)\leq nm/k+\min\{n,m\}2^k$. By taking $k=\log n - \log\log n$, we obtain $\SUM_2(A)\leq 2nm/\log n$ . In particular, $\SUM_2(A)=O(n)$ if $m=O(\log n)$.
(ii) Nechiporuk-Wegener-Mehlhorn-Pippenger (Theorem 13.21 in the book)
If $A$ is $k$-free then $\OR(A)\geq |A|/(k-1)^2$ and $\OR_2(A)\geq |A|/(k-1)$.
(iii) Pudlák-Rödl-Jukna (Theorem 11.18 in the book)
There exists an $n\times r$ matrix $B$ such that $r=2\log n$, and the matrix $A=B\cdot B^T$ (over $GF(2)$) is $r$-Ramsey.
The matrix $A$ whose existence is shown in (iii) is just a submatrix of an $n^2\times n^2$ Sylvester matrix $H=H_{n^2}$ with $n=2^r$. Recall that rows and columns of $H$ correspond to the vectors in $GF(2)^{2r}$, and the value $H[u,v]$ is the scalar product of vectors $u$ and $v$ over $GF(2)$: $H_2=\begin{bmatrix} 0 & 0 \\ 0 & 1 \\ \end{bmatrix},\qquad H_4=\begin{bmatrix} 0 & 0 & 0 & 0 \\ 0 & 1 & 0 & 1 \\ 0 & 0 & 1 & 1 \\ 0 & 1 & 1 & 0 \\ \end{bmatrix},\qquad H_{2n}=\begin{bmatrix} H_{n} & H_{n} \\ H_{n} & \overline{H}_{n} \\ \end{bmatrix}$

Proof of Theorem 1:
Take the matrix $A=B\cdot B^T$ guaranteed by (iii). Then (i) implies that $\XOR_4(A)\leq \XOR_2(B)+\XOR_2(B^T)=O(nr/\log n)=O(n)$. To obtain the lower bound, we may assume that $A$ has $|A|\geq n^2/2$ ones; if not, then take the complement matrix $\overline{A}$. Since $A$ is $r$-free, (ii) implies that $\OR(A)\geq |A|/r^2=\Omega(n^2/\log^2n)$. $\Box$

Sergeev has improved $\XOR_4(A)=O(n)$ to $\XOR_3(A)=O(n)$.

Lemma 1 (Sergeev [3]) Let $m=a\log_2 n$. Then for every $n\times m$ matrix $A$ and for every $m\times n$ matrix $B$, we have $\XOR_3(A\cdot B)\leq\max\{3,6a\}\cdot n$.
The same upper bound holds also for SUM- and OR-circuits, if the product $A\cdot B$ is taken over the corresponding semiring.

This, in particular, implies the following interesting property of "narrow" depth-2 circuits: If an $n\times n$ matrix can be computed by a depth-2 circuit with $O(\log n)$ nodes on the middle level (and an arbitrary size!), then it has a depth-3 circuit of linear size.

Proof: See his self-explaining sketch (in it, the roles of $n$ and $m$ are interchanged). $\Box$

### Notes:

Below we give an overview of other known results in that direction -- establishing complexity gaps for linear transformations over different semirings.
OR versus XOR
1. A historical note. Fact (ii) was first proved by Nechiporuk in 1969. The proof of the fact (ii) in [13] is short (just 2 pages); this is Thm. 1.3 in [13] (Thm. 1.2 gives the stronger lower bound $|A|/k$ in the case of depth-2 circuits). The entire paper is long, and this important short piece was probably overseen by subsequent authors. So, as it often happens with important results, Thm. 1.3 from [13] was later re-proved in about 1980 (almost simultaneously) by several authors: Mehlhorn, Pippenger and Wegener. Interestingly, the same volume [13] also contains Nechiporuk's short note "On a Boolean matrix" with an $OR(A)=\Omega(n^{3/2})$ lower bound for an explicit 2-free matrix $A$. Unlike the long paper [13], this note became well-known, and was later cited by many authors.

2. XOR is powerful even in depth-$2$. Since the matrix $A$ from (iii) has the form $A=B\cdot B^T$ for an $n\times r$ matrix $B$ with $r=2\log n$, we have by Fact 1 that $\XOR_2(A)\leq 2rn =O(n\log n)$ even for depth-2 circuits. Thus, the gap $\OR(A)/\XOR_2(A)=\Omega(n/\log^3n)$ is still large, even if XOR circuits are required to have depth $2$, and the depth of OR-circuits is not restricted. If both OR- and XOR-circuits are required to have depth $2$, then the gap $\OR_2(A)/\XOR_2(A)=\Omega(n/\log^2n)$ is even larger.

3. Explicit gaps The gaps established in [1] as well as in Theorem 1 above are not quite satisfying: they are for random (albeit constructed in a special way) matrices, not for explicit ones. Strongest known explicit gaps were established by Gashkov and Sergeev in [10]. They showed that $\XOR(A)=O(n\ln n\ln\ln n)$ holds for known explicit $2$-free matrices with the largest possible number $|A|=\Omega(n^{3/2})$ of ones. Actually, for these matrices, $\XOR_{2t-1}(A)=O(n^{1+1/t})$ holds for every integer $t\geq 1$. On the other hand, by (ii), we have $\OR(A)=\Omega(n^{3/2})$ for these matrices. Thus, the gap $\OR(A)/\XOR_5(A)=\Omega(n^{1/6})$ shows it already at depth $5$ (for $t=3$). For so-called "norm matrices" constructed by Kollar, Ronyai and Szabo, they showed the gap $\OR(A)/\XOR(A)=\Omega(n/\exp(\sqrt{\ln n\ln\ln n}))$, even if AND-gates are allowed in OR-circuits.

Actually, Sergeev has shown the following upper XOR-bounds for arbitrary circulant matrices (many of known explicit $k$-free matrices are such). Recall that a matrix $A$ is circulant each its row is a cyclic shift of the previous one. That is, each circular matrix is defined by its first row.

Theorem (Sergeev [3]): For every integer constant $t\geq 1$ there is a constant $c=c(t)$ such that, for every circulant $n\times n$ matrix $A$:
• $\XOR_{2t-1}(A)\leq c n^{1+1/t}$
• $\XOR_{2t}(A)\leq c n^{1+1/t}/\log^{1/t}n$.
Improving an earlier result of Grinchuk [16], Grinchuk and Sergeev [17] have proved that, for every $k\geq 2$, $k$-free circulant matrices $A$ with $|A|=\Omega(k^{-4}n^{2-\alpha})$ ones exist, where $\alpha=2/k-2/k^2$; this is almost optimal, because $|A|\leq n^{2-1/k}$ holds for any $k$-free matrix, not only for circulant. Thus, taking $k\to\infty$ slowly growing, we see that circulant matrices achieving the gap $\OR(A)/\XOR_3(A)=\Omega(n^{1/2-o(1)})$ exist. Recall that we already have shown the much larger gap $\OR(A)/\XOR_2(A)=\Omega(n/\log^3n)$ even for depth-2 XOR-circuits. But this was achieved for matrices that are not circular. The class of circular matrices is much smaller: we have only $2^n$ out of $2^{n^2}$ such matrices.

For circular 2-free matrices $A$ (including explicit ones), the theorem gives a non-constant gap $\OR(A)/\XOR_4(A)=\Omega(\sqrt{\log n})$ also in depth $4$ (for $t=2$). It remains open whether there is a growing gap for 2-free matrices already in depth $3$. Concerning depth $2$, it is conjectured (already by Valiant 1977) that $\XOR_2(A)\geq n^{1+\epsilon}$ must hold for dense 2-free matrices, that is, that dense 2-free matrices are hard for depth-$2$ XOR-circuits.

4. XOR/OR gap remains unclear. It remains not clear whether, for some matrices, OR-circuits may be more powerful than XOR-circuits. So, a problem: prove or disprove that matrices $A$ with $\XOR(A)/\OR(A) \to\infty$, or at least, $\XOR_2(A)/\OR_2(A) \to\infty$ exist. Even existence of matrices $A$ with $\XOR(A)/\OR(A) > 3+c$ for a constant $c>0$ is known.

5. Intersection matrix or Kneser matrix $K=K_n$. This matrix could be a possible candidate for a non-constant XOR/OR. This is a boolean $n\times n$ matrix with $n=2^r$. Rows and columns of $K$ correspond to subsets $u\subseteq [r]=\{1,\ldots,r\}$, and $K[u,v]=1$ iff $u\cap v\neq \emptyset$. Kneser matrices can be defined inductively as follows: $K_2=\begin{bmatrix} 0 & 0 \\ 0 & 1 \\ \end{bmatrix},\qquad K_4=\begin{bmatrix} 0 & 0 & 0 & 0 \\ 0 & 1 & 0 & 1 \\ 0 & 0 & 1 & 1 \\ 0 & 1 & 1 & 1 \\ \end{bmatrix},\qquad K_{2n}=\begin{bmatrix} K_{n} & K_{n} \\ K_{n} & 1 \\ \end{bmatrix}$ By identifying subsets $u$ with their characteristic vectors, we see that $K$ is just a "boolean version" of the Sylvester matrix: $K[u,v]=\bigvee_{i=1}^r u_i\land v_i$. Thus, over the boolean semiring, we have that $K=B\cdot B^T$ for an $n\times \log n$ matrix $B$. Hence, $\OR_2(K)=O(n\log n)$ by Fact 1, and $\OR_3(K)=O(n)$ by Lemma 1. In fact, Sergeev [3] observed that the well-known results of Hansel and Krichevski imply that the trivial OR_2-circuit for $K$ is also optimal: $\OR_2(K)=\Theta(n\log n).$ Proof: See here. $\Box$.

Does $\XOR_3(K)=\omega(n)$ or at least $\XOR_2(K)=\omega(n)$?

The power of XOR
1. Parity-check matrices. An upper bound on the XOR-complexity of parity-check matrices of good codes was proved by Chashkin.
Theorem (Chashkin [12]): For any constant $c > 0$ and any subset $S\subseteq \{0,1\}^n$ of size $|S|\le 2^{n/2}$, there exists a boolean $m\times n$ matrix $A$ such that:
• $Ax\neq Ay$ for all $x\neq y\in S$ (the mapping $y=Ax$ is injective on $S$)
• $m \le \lfloor 2\log_2 |S|\rfloor -1$
• $\XOR(A)=O(n)$.
I found this result quite surprising. If we consider arbitrary operators $f:S\to\{0,1\}^m$, then none of them can be injective, unless $m\geq \log_2|S|$. But by Chashkin's result, if we allow the dimension of the range be just twice larger (than this "counting border"), then an even linear operator of linear XOR-complexity can do the job!

In particular, by taking $S$ to be the set of all vectors with Hamming weight $\le k$, we obtain the following interesting consequence.

Corollary: For every $1\leq k\leq n/\log_2 n$, there is an $m\times n$ matrix $A$ such that:
• $m =O(k\ln n)$
• any $k$ columns of $A$ are linear independent
• $\XOR(A)=O(n)$
Note that the matrix in this corollary is a parity-check matrix of a linear self-correcting code with very good parameters.

2. Generator matrices. For generator matrices of good self-correcting codes (i.e. a linear code which has constant rate and can correct a constant fraction of errors), the following (also surprising) upper bounds were proved in [8].
Theorem ([8]): There are boolean $n\times m$ matrices $A$ such that:
• $|Ax\oplus Ay|\geq n/8$ for all $x\neq y\in \{0,1\}^m$ (Hamming distance is at least $n/8$)
• $m\geq n/32$ (rate $m/n$ is constant)
• $\XOR_2(A)=O(n(\tfrac{\ln n}{\ln\ln n})^2)$
• $\XOR_3(A)=O(n\ln\ln n)$.
• $\XOR(A)=O(n)$.
Moreover, the upper bounds on $\XOR_2(A)$ and $\XOR_3(A)$ cannot be improved, even if arbitrary(!) boolean functions are allowed as gates; this (lower bounds for general circuits) is their main result. Today, these are the strongest explicit lower bounds for general circuits.

The above results give yet another explanation for why lower bounds for XOR-circuits are so difficult to prove: these circuits may be unexpectedly powerful!

SUM versus OR
1. Disjointness matrix or the Sierpinski gasket matrix $D_n$ is the complement of the Kneser (or intersection) matrix $K_n$. That is, $D=D_n$ is a boolean $n\times n$ matrix with $n=2^r$. Rows and columns correspond to subsets $u\subseteq [r]=\{1,\ldots,r\}$, and $D_n[u,v]=1$ iff $u\cap v= \emptyset$. The matrix $D$ can contain a $t\times t$ all-$1$ submatrix only if $t\leq \max_i \min\{2^i ,2^{r-i}\}=2^{r/2}=\sqrt{n}$. Thus, the matrix is $k$-free for $k=\sqrt{n}$. Since the number of 1-entries in $D$ is $|D|=\sum_{u\subseteq [r]} 2^{r-|u|}= \sum_{i=0}^l\binom{r}{i}2^{r-i}=3^{r} \geq n^{1.58}$ fact (ii) above gives a lower bound $\OR_2(D)\geq n^{1.58}/\sqrt{n}= n^{1.08}$, which is much larger than the upper bound $\OR_2(\overline{D})=O(n\log n)$ for the complement of $D$. We thus have a gap $\OR_2(D)/\OR_2(\overline{D})=\Omega(n^{c}/\log n)$ for $c=0.08$. A trivial upper bound on $\OR_2(D)$ is $|D|=n^{\log_23}\approx n^{1.58}$, achieved in depth-1. In depth 2 we have a much better upper bound.
Lemma (Sergeev [3]): There is a constant $c$ such that $\SUM_2(D_n)\leq c n^{\log_2(1+\sqrt{2})} \approx n^{1.28}.$
Proof: See here. $\Box$

2. If we allow logarithmic depth, then the recursive construction of the disjointness matrix $D_2=\begin{bmatrix} 1 & 1 \\ 1 & 0 \\ \end{bmatrix},\qquad D_4=\begin{bmatrix} 1 & 1 & 1 & 1 \\ 1 & 0 & 1 & 0 \\ 1 & 1 & 0 & 0 \\ 1 & 0 & 0 & 0 \\ \end{bmatrix},\qquad D_{2n}=\begin{bmatrix} D_{n} & D_{n} \\ D_{n} & 0 \\ \end{bmatrix}$ gives $\SUM(D_n)\leq \tfrac{1}{2}n\log n$. Indeed, to compute $\vec{y}=D_{2n}\vec{x}$, split the vectors $\vec{y}=(\vec{y}_1,\vec{y}_2)$, and $\vec{x}=(\vec{x}_1,\vec{x}_2)$, and compute $\vec{y}_2=D_{n}\vec{x_1}$ and $\vec{a}=D_{n}\vec{x_2}$. Then just use $n$ additional wires to compute $\vec{y}_1= \vec{y}_2+\vec{a}$. This gives $\SUM(D_{2n})\leq 2\cdot \SUM(D_n)+n =\tfrac{1}{2}(2n)\log (2n)$, as desired.

In [6], it is show that this bound is tight: $\SUM(D)\geq \tfrac{1}{2}n\log n$. Note that (ii) cannot yield any nontrivial lower bounds, because $D$ is not $k$-free for $k \ll \sqrt{n}$. In fact, the argument of [6] also yields $\OR(D)\geq \tfrac{1}{2}n\log n$. Recall that $\OR(\overline{D})=O(n)$, already in depth 3.

It is easy to see that the upper bound $\SUM(M_n)=O(n\log n)$ holds for any matrix which can be constructed by some recursion $M_{2n}=\begin{bmatrix} f_1(M_{n}) & f_2(M_{n}) \\ f_3(M_{n}) & f_4(M_n) \\ \end{bmatrix} \qquad \mbox{where f_i(M) is either 0,1, M or \overline{M}.}$

3. Sylvester matrix. Since the Sylvester $n\times n$ matrix $H=H_r$ with $n=2^{2r}$ is $2^{r}+1$-free and has $|H|=\Omega(n^2)$ ones, (ii) yields $\SUM_2(H)\geq \OR_2(H)=\Omega(n^{1.5})$. Sergeev has a recursive construction showing that this is tight.
Lemma (Sergeev [3]): $\SUM_2(H)\leq 2^{3r+1}=2n^{1.5}.$
A lower bound of Morgenstern [15] yields $\SUM(H)=\Theta(n\log n)$.

Since $H=P\cdot P^T$ for an $n\times r$ matrix $P$ with $r=\log n$, Fact 1 and Lemma 1 imply that $\XOR_2(H)=O(n\log n)$ and $\XOR_3(H)=O(n)$. A lower bound $\XOR_2(H)=\Omega(n\log n)$ for every $n\times n$ Hadamard matrix $H$ (including Sylvester matrices) was proved in [9].

A lower bound $\OR(H)=\Omega(n\log n)$ was proved by Grigor'ev [19]. Sergeev [3] has found a refinement of (ii) which yields an even more general lower bound. Define the area of an $a\times b$ rectangle to be $ab$, and let $r(A)$ be the maximal area of a rectangle in $A$.

Theorem (Sergeev [3]): For every $n\times n$ boolean matrix $A$, $\OR(A)\geq \frac{|A|}{r(A)}\log\frac{|A|}{n}.$
Proof: See here. $\Box$

Every $n\times n$ Hadamard matrix (including the Sylvester matrix) $H$ has $|H|=\Omega(n^2)$ and $r(H)\leq n$. Hence, $\OR(H)=\Theta(n\log n)$.

4. A SUM/OR gap. The existence of matrices $A$ achieving the SUM/OR gap $\SUM(A)/\OR(A)=\Omega(\sqrt{n}/\log^2n)$ was proved in [1]. This is done by first proving the following interesting version of (ii) for SUM-circuits. Define the integer rank $\pr(B)$ as the minimum $r$ such that matrix $B$ can be written as $B = PQ^T$ over the semiring of non-negative integers, where $P$ and $Q$ are $n\times r$ matrices. If we work over the boolean semiring, then the boolean rank $\br(B)$ is defined similarly. It is easy to see that $\br(B)$ is the smallest number of rectangles (all-1 submatrices of $B$) covering all ones of $B$, and $\pr(B)$ is the smallest number of pairwise disjoint rectangles doing the same job. In communication complexity, $\log \br(B)$ is exactly the nondeterministic communication complexity of $B$, whereas logarithm of $\pr(B)+\pr(\overline{B})$ is the deterministic communication complexity of $B$.

Let $B\otimes A$ denote the Kronecker product, that is a block-matrix obtained by replacing 1-entries of $B$ by copies of $A$.

Theorem (Find et al. [1]): If $A$ is $k$-free, then $\SUM(B\otimes A)\geq \pr(B)\cdot\frac{|A|}{(k-1)^2}.$
On the other hand, using (i) and the fact that the transformation $x\mapsto (B\otimes A)x$ can be written as a result $(A(XQ))P^T$ of three matrix products, where $B=PQ^T$ and $P,Q$ are $m\times \br(B)$ matrices, and $X$ is the $m\times m$ matrix of variables, it is shown in [1] that $\OR(B\otimes A)\lesssim \br(B)\cdot \OR_2(m)\approx \br(B)\cdot m^2/\log m$ holds for any $m\times m$ matrix $A$. If we take $B=\overline{I_m}$ to be the complement of the $m\times m$ identity matrix $I_m$, where $m=\sqrt{n}$, then $\pr(B)=\Omega(m)$, while $\br(B)=O(\log m)$, and the gap $\pr(B)/\br(B) \gtrsim m/\log m$ is already large. Thus, if we let $B=\overline{I_m}$ and $A$ be a random $m\times m$ matrix, then $\OR(B\otimes A)\lesssim m^2 = n$, whereas $\SUM(B\otimes A) \gtrsim m^3/\log^2 m\approx n^{3/2}/\log^2 n$, and the desired SUM/OR gap follows.

5. The OR-complexity of Kronecker products was earlier considered by Gál [18]. The term rank $\tr(B)$ of a boolean matrix $B$ is the largest number of its ones, no two of which lie in the same row or column. By König-Egerváry theorem, this is exactly the smallest number of rows and columns covering all ones of $B$. It is easy to see that $\tr(B)\geq \pr(B)\geq \br(B)$. Indeed, $\tr(B)$ is the smallest number $a+b$ such that, after some permutation of rows and columns, the matrix $B$ can be written in the form $B=\begin{bmatrix} C & D \\ F & 0 \end{bmatrix}$ where $0$ is the $(n-a)\times (n-b)$ all-0 matrix. We can therefore write $B$ as a sum of $a+b$ pairwise disjoint rectangles, each corresponding to one row or column of $B$.
Theorem (Gál [18]): For every boolean $m\times m$ matrix $A$ and $k\times l$ matrix $B$, $\tr(B)\cdot \OR_2(A)\leq \OR_2(B\otimes A) \lesssim \tr(B)\max\{k,l\}\cdot \OR_2(m).$
The proof of the lower bound is based on a simple fact that, in depth 2, we have that $\OR_2\begin{bmatrix} X & C \\ D & Y \\ \end{bmatrix} \geq \OR_2\begin{bmatrix} X & 0 \\ 0 & Y \\ \end{bmatrix}$ Using this, the lower bound can be easily shown by induction on $\tr(B)$. In larger depth, the fact does not hold anymore. Here we only have that $\OR\begin{bmatrix} X & C \\ D & Y \\ \end{bmatrix} \geq \max\{ \OR(X), \OR(Y)\}\geq \frac{1}{2}\cdot \OR\begin{bmatrix} X & 0 \\ 0 & Y \\ \end{bmatrix}$

In particular, if $B=\overline{I_m}$ and $A$ is a random $m\times m$ matrix with $m=\sqrt{n}$, then $\OR_2(\overline{I_m}\otimes A) =\Omega(n^{3/2}/\log n)$. This does not contradict the upper bound $\OR(\overline{I_m}\otimes A) \lesssim m^2=n$ given above, because here circuits of depth larger than 2 are used (depth 6 is enough).

Sergeev [3] improved the upper bound as follows. For a covering $\sigma$ of $B$ by rectangles (all-1 submatrices), let $s(\sigma)$ denote the sum of the lengths of shorter sides, and $S(\sigma)$ the sum of the lengths of longer sides of rectangles in $\sigma$. Hence, $OR_2(B)=\min \{s(\sigma)+S(\sigma)\}$ where the minimum is over all coverings $\sigma$ of $B$.

Theorem (Sergeev [3]): For every boolean $m\times m$ matrix $A$, every covering $\sigma$ of a matrix $B$, and every positive integer $t$, $\OR_2(B\otimes A) \leq s(\sigma)\frac{m^2}{t} + S(\sigma)m2^t.$
Proof: By (i), both $A$ and $A^T$ can be realized by a depth-2 OR-circuit of size $m^2/t+m2^t$ with $m2^t$ wires on the first level, and $m^2/t$ wires on the second level. If $R$ is a $k\times l$ rectangle, then $R\otimes A$ can be realized by a depth-2 circuit of size $\min\{k,l\}m^2/t + \max\{k,l\}m2^t.$ Taking the union of these circuits over all rectangles $R\in \sigma$, the desired upper bound follows. $\Box$

He then showed that $B=\overline{I_m}$ has a covering $\sigma$ with $s(\sigma)=O(m)$ and $S(\sigma)=O(m^{3/2})$; see here for a proof. By taking $t=(1/2)\log m - \log\log m$, this yields the tight upper bound $\OR_2(\overline{I_m}\otimes A) =O(m^{3}/\log m)=O(n^{3/2}/\log n)$.

6. Let us mention that OR/XOR gaps may be much larger than SUM/OR gaps. Fact (ii) implies that $\SUM(A)/\OR(A)\leq k^2$ holds for every matrix, where $k=k(A)$ is the largest number such that $A$ contains a $k\times k$ all-1 submatrix. On the other hand, as we mentioned above, there are explicit $k$-free matrices $A$ such that the gap $\OR(A)/\XOR(A)$ is at least $n^{1/2-\epsilon}$, for constant $k$, and even $n^{1-o(1)}$ for $k=O(\log n)$.
Circuit with arbitrary gates
1. Representing matrices. Instead of computing the transformation $y=Ax$ on all input vectors $x$, we can relax the problem and only require that the circuit computes it correctly on $n$ unit vectors $e_1,\ldots,e_n$ with exactly one $1$; if one allows the "pathological" situation that some output gates may be redundant (are constant gates), then one should also require that the circuit outputs $0$ on input $x=0$. In this case, one says that the circuit "represents" the matrix. Note that in each of the three models -- OR, XOR and SUM circuits -- this is no relaxation at all: every circuit representing $A$ must correctly compute the whole transformation $y=Ax$(!) And, by easy counting, we know that in all these three models, $n\times n$ matrices requiring $\Omega(n^2/\log n)$ wires to represent them exist (see, e.g. Theorem 13.18), without any restrictions on the circuit-depth. On the other hand, in Theorem 13.28 we have shown that the situation changes drastically, if we allow non-linear gates: every matrix can be represented by a depth-2 XOR-circuit using only $O(n\log n)$ wires, if we allow polynomials of larger degree (up to $\log n$) be used as gates on the last (output) level. Thus, if $\GEN[A]$ denotes the smallest number of wires in a general circuit representing $A$, then $\GEN_2[A]=O(n\log n)$ holds for every $n\times n$ matrix $A$. Drucker [7] (among other results) showed that this upper bound is optimal: matrices $A$ with $\GEN_2[A]=\Omega(n\log n)$ exist.

2. Do non-linear gates can help? For the Sylvester $n\times n$ matrix $H$ with $n=2^r$, Theorem 13.29 in the book implies that $\GEN_2[H]=\Omega(n\log n/\log\log n)$. Since $\XOR_2(H)=O(n\log n)$, this means that, at least in depth $2$, the use of non-linear gates does not help much even to represent $H$. In contrast, the use of more general gates helps much for, say, the disjointness matrix $D$, because $\OR_2[D]\geq n^{1.08}$ wires are necessary to represent $D$ in depth-2, if only OR gates are allowed.

3. Representation vs. Computation.We mentioned above that $\SUM[A]=\SUM(A)$, $\OR[A]=\OR(A)$, and $\XOR[A]=\XOR(A)$ hold for any matrix $A$. This is, however, no more the case if we allow general gates. Namely, in [8] explicit matrices $A$ such that $\GEN_2(A)=\Omega(n(\ln n/\ln\ln n)^2)$ are given. But we know that $\GEN_2[A]=O(n\ln n)$ holds for any matrix. Thus, unlike for XOR-circuits, in the case of general circuits it may be easier to compute $y=Ax$ on only $n$ basis vectors $e_1,\ldots,e_n$ than to do this for all inputs $x$.
In the next table we summarize our knowledge.
Matrix $\SUM$ $\OR$ $\OR_2$ $\XOR_2$ $\OR_3$ $\XOR_3$ $\GEN_2$
Submatrix $A$ of Sylvester $H_{n^2}$ $\Omega(\frac{n^2}{\ln^2 n})$ $\Omega(\frac{n^2}{\ln^2 n})$ $\Theta(\frac{n^2}{\ln n})$ $O(n\ln n)$ $\Omega(\frac{n^2}{\ln^2 n})$ $O(n)$ $O(n\ln n)$
$\bar{I}\otimes A$, $A$ random $\sqrt{n}\times \sqrt{n}$ $\Omega(\frac{n^{1.5}}{\ln^2 n})$ $O(n)$ $\Theta(\frac{n^{1.5}}{\ln n})$ $O(n\ln n)$ $O(n\ln n)$ $O(n)$ $O(n\ln n)$
Sierpinski matrix $D$ $\Theta(n\ln n)$ $\Theta(n\ln n)$ $\Omega(n^{1.08})$ $O(n^{1.28})$ $\omega(n)$? ? ? $O(n\ln n)$
Kneser matrix $K$ $O(n\ln n)$ $O(n)$ $\Theta(n\ln n)$ $\omega(n)$? $O(n)$ ? $O(n\ln n)$
Sylvester matrix $H$ $\Theta(n\ln n)$ $\Theta(n\ln n)$ $\Theta(n^{1.5})$ $\Theta(n\ln n)$ [9] ? $O(n)$ $\Omega(n\frac{\ln n}{\ln\ln n})$
Code matrices from [8] ? ? ? $\Theta(n(\frac{\ln n}{\ln\ln n})^2)$ ? $\Theta(n\ln\ln n)$ $\Theta(n(\frac{\ln n}{\ln\ln n})^2)$
Random $\mathbf{A}$ $\Theta(\frac{n^2}{\ln n})$ $\Theta(\frac{n^2}{\ln n})$ $\Theta(\frac{n^2}{\ln n})$ $\Theta(\frac{n^2}{\ln n})$ $\Theta(\frac{n^2}{\ln n})$ $\Theta(\frac{n^2}{\ln n})$ $\Omega(\frac{n^2}{\ln n})$ ?
A positive answer to the last question (in last lower right cell) is only known when either all gates on the middle layer or all gates on the last (output) layer must be XORs (see [11]).

Acknowledgment:
My thanks to Igor Sergeev for very interesting discussions and for telling his upper bounds mentioned above.

### References:1

1. M. Find, M. Göös, M. Järvisalo, P. Kaski, M. Koivisto, J.H. Korhonen (2013): Separating OR, SUM, and XOR Circuits, arXiv:1304.0513v1
2. P.Pudlák and V. Rödl (2004): Pseudorandom sets and explicit constructions of Ramsey graphs, in: Quaderni di Matematica, Vol. 13, Dipartimanto di Matematica, Seconda Universita di Napoli, Caserta - 2004, editor J. Krajicek, pp.327-346. Manuscript
3. I. Sergeev, personal communication.
4. S. Jukna (2006): Disproving the single level conjecture, SIAM J. Comput. 36(1), 83-98. Manuscript
5. S. Jukna (2006): On graph complexity, Combinatorics, Probability & Computing 15, 855-876. Manuscript
6. J. Boyar and M. Find (2012): Cancellation-free circuits: An approach for proving superlinear lower bounds for linear Boolean operators, arXiv:1207.5321.
7. A. Drucker (2012): Limitations of lower-bound methods for the wire complexity of boolean operators, in: Proc. of 27th IEEE Conf. on Comput. Complexity (CCC 2012). Recipient of the Ronald V. Book Prize for Best Student Paper. Conference version: pdf     Full version on ECCC: link     Slides: pdf
8. A. Gál, K.A. Hansen, M. Koucky, P. Pudlák, and E. Viola (2012): Tight bounds on computing error-correcting codes by bounded-depth circuits with arbitrary gates, in: Prof of 44th Ann. ACM Symp. on Theory of Computing, STOC'12, pp. 479-494. Manuscript
9. N. Alon, M. Karchmer, and A. Wigderson (1990) Linear circuits over GF(2). SIAM J. Comput. 19(6), 1064-1067.
10. S. B. Gashkov, I. S. Sergeev (2010): On the complexity of linear Boolean operators with thin matrixes, Diskretn. Anal. Issled. Oper., 17:3 (2010), 3-18 Paper (in Russian).
11. S. Jukna and G. Schnitger (2010): Circuits with arbitrary gates for random operators, arXiv:1004.5236
12. A.V. Chashkin (2009): Perfect linear hashing in Boolean cube, in: Transactions "Discrete mathematics and its applications", issue V, p. 56-67. (publisher: Keldysh Institute of Applied Mathematics). Manuscript (preliminary version) (in Russian)
13. E.I. Nechiporuk (1969): On topological principles of self-correction, Problemy Kibernetiki 21, 5-102. Paper (in Russian). English translation in: Systems Theory Res. 21 (1971).
14. E.I. Nechiporuk (1964): Self-correcting diode networks, Soviet Physics Doklady, Vol. 9, no. 6, 422-425.
15. J. Morgenstern (1973): Note on a lower bound on the linear complexity of the fast Fourier transform, J. ACM 20(2), 305-306.
16. M.I. Grinchuk (1988): Complexity of the realization of cyclic Boolean matrices by gate circuits, Izv. Vyssh. Uchebn. Zaved. Mat., no. 7, 39-44 (in Russian).
17. M.I. Grinchuk, I. S. Sergeev (2011): Thin circulant matrixes and lower bounds on complexity of some Boolean operators, Diskretn. Anal. Issled. Oper., 18:5, 38–53 (in Russian)
18. A. Gál (1988): On the complexity of realization of some classes of matrices by rectifier networks, Matematicheskije Voprosy Kibernetiki, vol. 1, 234-235. Paper (in Russian).
19. D. Yu. Grigor'ev (1981): A lower bound for the computational complexity of a set of disjunctives in a monotone basis, Journal of Soviet Mathematics, 15:1, 11-14. Paper.

### Appendix

Here we recall the proof of the second lower bound in (ii), as well as the construction of the matrix in (iii).

1. To (i). The main step is to prove that, for every $n\times k$ matrix $B$, we have that $\SUM_2(B)\leq n+k2^{k-1}\tag{*}$ Having this, we can finish the proof of (i) as follows. Split our $n\times m$ matrix $A$ (column-wise) into $t\leq m/k$ submatrices of dimension $n\times k$. By (*), we then have that $\OR_2(A)\leq tn + tk2^{k-1}\leq nm/k + m 2^{k-1}$. Taking $k:=\log n -\log\log n$, gives $\SUM_2(A)\leq 2 nm/\log n$, as claimed. So, it remains to prove (*).

Split the rows of $B$ into groups, where the rows in one group all have the same values. This gives us a decomposition of $B$ into $t\leq 2^k$ rectangles. For the $i$-th of these matrices $B_i$, let $r_i$ be the number of its nonzero rows, and $c_i$ the number of its nonzero columns. Hence, $\SUM_2(B_i)\leq r_i+c_i$. Since each nonzero row of $A$ lies in exactly one of the these matrices, we obtain: $\SUM_2(B)\leq \sum_{i=1}^t \SUM_2(B_i) \leq \sum_{i=1}^t r_i+\sum_{i=1}^t c_i\leq n+\sum_{j=0}^k\sum_{i:c_i=j}j \leq n+\sum_{j=0}^m\binom{k}{j}\cdot j = n+k2^{k-1}.$ $\Box$

2. To (ii): As in the book, we use Nechiporuk-Pippenger's argument, because it not just lower bounds $\OR(A)$ for $k$-free matrices $A$, but also gives the following structural fact. An $a\times b$ rectangle $R$ is $k$-balanced if $a < k$ and $b < k$. The decomposition number $\Dec_k(A)$ of $A$ is the smallest number of pairwise disjoint $k$-balanced rectangles in $A$ covering all 1-entries of $A$. The covering number $\Cov_k(A)$ is defined in the same way, but this time the rectangles are not required be disjoint. (Note that this time we count the number of rectangles, not the sum of their dimensions.) Since no $k$-balanced rectangle can cover more than $(k-1)^2$ ones of $A$, we have that $\Dec_k(A)\geq \Cov_k(A)\geq \frac{|A|}{(k-1)^2}.$ On the other hand, we have the following relations between OR/SUM complexity and the covering/decomposition numbers.
Fact: If $A$ is $k$-free, then \begin{align} & \Cov_k(A)\leq \OR(A)\leq \OR_2(A)\leq 2k\cdot \Cov_k(A)\\ & \\ & \Dec_k(A)\leq \SUM(A)\leq \SUM_2(A)\leq 2k\cdot \Dec_k(A) \end{align}
Proof: We only need to prove the lower bounds $\SUM(A)\geq \Dec_k(A)$ and $\OR(A)\geq \Cov_k(A)$. Let $F$ be an OR-circuit for $A$. For a node $u$ in $F$, let $s_u$ be the number of inputs from which $u$ is reachable, and $t_u$ be the number of outputs reachable from $u$. A wire $(u,v)$ "eligible" if both $s_u < k$ and $t_v < k$ hold. An eligible cut is a set $C$ of eligible edges such that every input-output path has a wire in $C$. We claim that such a cut exists, that is, every input-output path contains an eligible wire.

To see this, take an input-output path $P$. Nodes $u$ of this path with $s_u < k$ constitute the initial part $P_1$, and those with $t_u < k$ the final part $P_2$ of $P$. Had these parts no wire in common, then $P$ would contain a node $w$ such that $s_w\geq k$ and $t_w\geq k$. But this would mean that $A$ contains a $k\times k$ rectangle, contradicting the $k$-freeness of $A$. Thus, $P_1$ and $P_2$ share some wire $(u,v)$ in common, meaning that $s_u < k$ and $t_v < k$, that is, the wire $(u,v)$ is eligible.

Take now a minimal cut $C$ of $F$ consisting of eligible wires. By our claim, such a cut exists. Each wire $(u,v)$ in $F$ gives an $s_u\times t_v$ rectangle in $A$. If the wire is eligible, then this rectangle is $k$-balanced. Thus, $C$ gives a covering (or decomposition, if we had a SUM-circuit) of $A$ into $|C|$ $k$-balanced rectangles, as desired. $\Box$

The stronger lower bound $\OR_2(A)\geq |A|/(k-1)$ in the case of depth-2 circuits follows from the observation that then either $s_u=1$ or $t_v=1$.

3. To (iii). Each subset $S\subseteq GF(2)^{2r}$ gives us an $|S|\times |S|$ submatrix $H_S$ of the Sylvester matrix $H$ whose rows and columns correspond to vectors in $S$. The following simple (but neat) combinatorial condition on $S$ leading to Ramsey submatrices $H_S$ was observed by Pudlák and Rödl [2], and is stated as Lemma 11.19 in the book:
Let $S\subseteq GF(2)^{2r}$. If $|S\cap V|< t$ holds for every subspace $V\subseteq GF(2)^{2r}$ of dimension $\leq r$, then $H_S$ is $t$-Ramsey.
Proof: Suppose that $H_S$ contains a monochromatic $t\times t$ submatrix $T$. Our goal is to show that then there is a subspace $V\subseteq GF(2)^{2r}$ of dimension $r$ such that $|S\cap V|\geq t$. The submatrix $T$ corresponds to two subsets of vectors $X,Y\subseteq S$ such that $\langle u,v\rangle=a$ for some $a\in\{0,1\}$ and all $u\in X$ and $v\in Y$. Viewing the vectors in $X$ as the rows of the coefficient matrix and the vectors in $Y$ as (columns of) unknowns, we obtain that the sum $\mathrm{dim}(X')+\mathrm{dim}(Y')$ of the dimensions of vector spaces $X'$ and $Y'$, spanned by $X$ and by $Y$, cannot exceed $2r+a\leq 2r+1$. Hence, at least one of these dimensions, say $\mathrm{dim}(X')$ must be $\leq r$. So, we can take $V=X'$, and obtain that $|S\cap V|\geq |X|=t$, as claimed. $\Box$

It is difficult to explicitly construct large sets $S$ with this property, but one can show their existence using a simple probabilistic argument (given in [4], Section 4). Namely, pick a subset $S\subseteq GF(2)^{2r}$ at random, by including each vector at random and independently in $S$ with probability $p=2^{1-r}=2/n$. Chernoff´s inequalities, together with the fact that there are at most $\tbinom{2r}{r}\leq 2^{2r}/\sqrt{2r}$ subspaces $V\subseteq GF(2)^{2r}$ of dimension $r$, then imply that with probability $1-o(1)$, $|S|\geq pn/2=2^r=n$ and $|S\cap V| < 2r$ holds for every such subspace $V$. Thus,

Almost all $n\times n$ submatrices $H_S$ of $H_{n^2}$ are $t$-Ramsey for $t\leq 2r=2\log n$.

1. [Not ordered list. I don't know how to make bibliography in MaxJax.]?

S. Jukna, April 2013