Fact 1: If A=P\cdot Q^T for n\times r matrices P and Q, then all \SUM_2(A), \OR_2(A) and \XOR_2(A) are at most 2rn.
SUM, OR and XOR circuits computing y=Ax may be viewed as rectifier networks representing the matrix A. A rectifier network is directed acyclic graph with n input nodes (fanin = 0), and n output nodes (fanout = 0). Let p(i,j) be the number of paths from input i to output j. Then a network represents A if \mbox{in SUM circuit}\ \ A[i,j]=\begin{cases} 1 & \mbox{if $p(i,j)=1$,}\\ 0 & \mbox{if $p(i,j)=0$;} \end{cases} \mbox{in OR circuit}\ \ A[i,j]=\begin{cases} 1 & \mbox{if $p(i,j)>0$,}\\ 0 & \mbox{if $p(i,j)=0$;} \end{cases} \mbox{in XOR circuit}\ \ A[i,j]=\begin{cases} 1 & \mbox{if $p(i,j)$ is odd,}\\ 0 & \mbox{if $p(i,j)$ is even.} \end{cases}
Note that SUM-circuits constitute the weakest of these models: in such a circuit, there cannot be more than one path from an input variable to an output gate. So, both \OR(A) and \XOR(A) are at most \SUM(A). The most general circuit model is that of "general" circuits, where arbitrary(!) boolean functions are allowed to be used as gates. The corresponding measure \GEN(A) is an obvious lower bound for all three measures above, in this case we want to compute y=Ax over GF(2) using arbitrary gates. Thus, we have the obvious inequalities \GEN(A)\leq \OR(A), \XOR(A)\leq \SUM(A). for all matrices A. We are interested in maximal possible gaps between these measures on n\times n matrices. For the maximum \OR(n),\XOR(n) and \SUM(n) over all such matrices, Lupanov's result (Thm. 13.18 in the book) implies that all they are \Theta(n^2/\log n): \SUM(n)\approx \OR(n) \approx \XOR(n)\approx \frac{n^2}{\log n}. Thus, there can be no nonconstant "worst-case" gaps between these three measures. [It remains still open whether \GEN(n)=\Omega(n^2/\log n) also holds.] But if we measure the complexity of the same matrix A, then large gaps \OR(A)/XOR(A)=\Omega(n/\log^2 n) and \SUM(A)/\OR(A)=\Omega(\sqrt{n}/\log^2n) were shown in [1]. The goal of this comment is to show that the OR/XOR gap actually follows from some results in the book, and the proofs are elementary. Moreover, the constructed matrix A has the property that \XOR_d(A)=O(n) holds even in depth d=3.
Theorem 1: There exists an n\times n matrix A such that \XOR_3(A)=O(n) but \OR(A)=\Omega(n^2/\log^2 n).
Let us now give some details. Recall that a matrix A is k-Ramsey if it contains no monochromatic k\times k submatrices, and is k-free (k\geq 2) if it does not contain a k\times k all-1 submatrix. Hence, A is k-Ramsey iff both the matrix A and its complement\overline{A} are k-free. Let |A| denote the number of 1-entries in A. The results from the book we use are the following; for convenience of readers who do not have the book, we recall their proofs in the appendix bellow.
The matrix A whose existence is shown in (iii) is just a submatrix of an n^2\times n^2 Sylvester matrix H=H_{n^2} with n=2^r. Recall that rows and columns of H correspond to the vectors in GF(2)^{2r}, and the value H[u,v] is the scalar product of vectors u and v over GF(2): H_2=\begin{bmatrix} 0 & 0 \\ 0 & 1 \\ \end{bmatrix},\qquad H_4=\begin{bmatrix} 0 & 0 & 0 & 0 \\ 0 & 1 & 0 & 1 \\ 0 & 0 & 1 & 1 \\ 0 & 1 & 1 & 0 \\ \end{bmatrix},\qquad H_{2n}=\begin{bmatrix} H_{n} & H_{n} \\ H_{n} & \overline{H}_{n} \\ \end{bmatrix}
- (i) Lupanov (Lemma 1.2 in the book)
- For every n\times m matrix A, and every positive integer k, \SUM_2(A)\leq nm/k+\min\{n,m\}2^k. By taking k=\log n - \log\log n, we obtain \SUM_2(A)\leq 2nm/\log n . In particular, \SUM_2(A)=O(n) if m=O(\log n).
- (ii) Nechiporuk-Wegener-Mehlhorn-Pippenger (Theorem 13.21 in the book)
- If A is k-free then \OR(A)\geq |A|/(k-1)^2 and \OR_2(A)\geq |A|/(k-1).
- (iii) Pudlák-Rödl-Jukna (Theorem 11.18 in the book)
- There exists an n\times r matrix B such that r=2\log n, and the matrix A=B\cdot B^T (over GF(2)) is r-Ramsey.
Proof of Theorem 1:
Take the matrix A=B\cdot B^T guaranteed by (iii). Then (i) implies that
\XOR_4(A)\leq \XOR_2(B)+\XOR_2(B^T)=O(nr/\log n)=O(n).
To obtain the lower bound, we may assume that A has |A|\geq n^2/2 ones; if not,
then take the complement matrix \overline{A}. Since A is r-free, (ii) implies that
\OR(A)\geq |A|/r^2=\Omega(n^2/\log^2n). \Box
Sergeev has improved \XOR_4(A)=O(n) to \XOR_3(A)=O(n).
Lemma 1 (Sergeev [3]) Let m=a\log_2 n. Then for every n\times m matrix A and for every m\times n matrix B, we have \XOR_3(A\cdot B)\leq\max\{3,6a\}\cdot n.The same upper bound holds also for SUM- and OR-circuits, if the product A\cdot B is taken over the corresponding semiring.
This, in particular, implies the following interesting property of "narrow" depth-2 circuits: If an n\times n matrix can be computed by a depth-2 circuit with O(\log n) nodes on the middle level (and an arbitrary size!), then it has a depth-3 circuit of linear size.
Proof: See his self-explaining sketch (in it, the roles of n and m are interchanged). \Box
Actually, Sergeev has shown the following upper XOR-bounds for arbitrary circulant matrices (many of known explicit k-free matrices are such). Recall that a matrix A is circulant each its row is a cyclic shift of the previous one. That is, each circular matrix is defined by its first row.
Theorem (Sergeev [3]): For every integer constant t\geq 1 there is a constant c=c(t) such that, for every circulant n\times n matrix A:Improving an earlier result of Grinchuk [16], Grinchuk and Sergeev [17] have proved that, for every k\geq 2, k-free circulant matrices A with |A|=\Omega(k^{-4}n^{2-\alpha}) ones exist, where \alpha=2/k-2/k^2; this is almost optimal, because |A|\leq n^{2-1/k} holds for any k-free matrix, not only for circulant. Thus, taking k\to\infty slowly growing, we see that circulant matrices achieving the gap \OR(A)/\XOR_3(A)=\Omega(n^{1/2-o(1)}) exist. Recall that we already have shown the much larger gap \OR(A)/\XOR_2(A)=\Omega(n/\log^3n) even for depth-2 XOR-circuits. But this was achieved for matrices that are not circular. The class of circular matrices is much smaller: we have only 2^n out of 2^{n^2} such matrices.
- \XOR_{2t-1}(A)\leq c n^{1+1/t}
- \XOR_{2t}(A)\leq c n^{1+1/t}/\log^{1/t}n.
For circular 2-free matrices A (including explicit ones), the theorem gives a non-constant gap \OR(A)/\XOR_4(A)=\Omega(\sqrt{\log n}) also in depth 4 (for t=2). It remains open whether there is a growing gap for 2-free matrices already in depth 3. Concerning depth 2, it is conjectured (already by Valiant 1977) that \XOR_2(A)\geq n^{1+\epsilon} must hold for dense 2-free matrices, that is, that dense 2-free matrices are hard for depth-2 XOR-circuits.
Does \XOR_3(K)=\omega(n) or at least \XOR_2(K)=\omega(n)?
Theorem (Chashkin [12]): For any constant c > 0 and any subset S\subseteq \{0,1\}^n of size |S|\le 2^{n/2}, there exists a boolean m\times n matrix A such that:I found this result quite surprising. If we consider arbitrary operators f:S\to\{0,1\}^m, then none of them can be injective, unless m\geq \log_2|S|. But by Chashkin's result, if we allow the dimension of the range be just twice larger (than this "counting border"), then an even linear operator of linear XOR-complexity can do the job!
- Ax\neq Ay for all x\neq y\in S (the mapping y=Ax is injective on S)
- m \le \lfloor 2\log_2 |S|\rfloor -1
- \XOR(A)=O(n).
In particular, by taking S to be the set of all vectors with Hamming weight \le k, we obtain the following interesting consequence.
Corollary: For every 1\leq k\leq n/\log_2 n, there is an m\times n matrix A such that:Note that the matrix in this corollary is a parity-check matrix of a linear self-correcting code with very good parameters.
- m =O(k\ln n)
- any k columns of A are linear independent
- \XOR(A)=O(n)
Theorem ([8]): There are boolean n\times m matrices A such that:Moreover, the upper bounds on \XOR_2(A) and \XOR_3(A) cannot be improved, even if arbitrary(!) boolean functions are allowed as gates; this (lower bounds for general circuits) is their main result. Today, these are the strongest explicit lower bounds for general circuits.
- |Ax\oplus Ay|\geq n/8 for all x\neq y\in \{0,1\}^m (Hamming distance is at least n/8)
- m\geq n/32 (rate m/n is constant)
- \XOR_2(A)=O(n(\tfrac{\ln n}{\ln\ln n})^2)
- \XOR_3(A)=O(n\ln\ln n).
- \XOR(A)=O(n).
The above results give yet another explanation for why lower bounds for XOR-circuits are so difficult to prove: these circuits may be unexpectedly powerful!
Lemma (Sergeev [3]): There is a constant c such that \SUM_2(D_n)\leq c n^{\log_2(1+\sqrt{2})} \approx n^{1.28}.Proof: See here. \Box
In [6], it is show that this bound is tight: \SUM(D)\geq \tfrac{1}{2}n\log n. Note that (ii) cannot yield any nontrivial lower bounds, because D is not k-free for k \ll \sqrt{n}. In fact, the argument of [6] also yields \OR(D)\geq \tfrac{1}{2}n\log n. Recall that \OR(\overline{D})=O(n), already in depth 3.
It is easy to see that the upper bound \SUM(M_n)=O(n\log n) holds for any matrix which can be constructed by some recursion M_{2n}=\begin{bmatrix} f_1(M_{n}) & f_2(M_{n}) \\ f_3(M_{n}) & f_4(M_n) \\ \end{bmatrix} \qquad \mbox{where $f_i(M)$ is either $0,1, M$ or $\overline{M}$.}
Lemma (Sergeev [3]): \SUM_2(H)\leq 2^{3r+1}=2n^{1.5}.A lower bound of Morgenstern [15] yields \SUM(H)=\Theta(n\log n).
Since H=P\cdot P^T for an n\times r matrix P with r=\log n, Fact 1 and Lemma 1 imply that \XOR_2(H)=O(n\log n) and \XOR_3(H)=O(n). A lower bound \XOR_2(H)=\Omega(n\log n) for every n\times n Hadamard matrix H (including Sylvester matrices) was proved in [9].
A lower bound \OR(H)=\Omega(n\log n) was proved by Grigor'ev [19]. Sergeev [3] has found a refinement of (ii) which yields an even more general lower bound. Define the area of an a\times b rectangle to be ab, and let r(A) be the maximal area of a rectangle in A.
Theorem (Sergeev [3]): For every n\times n boolean matrix A, \OR(A)\geq \frac{|A|}{r(A)}\log\frac{|A|}{n}.Proof: See here. \Box
Every n\times n Hadamard matrix (including the Sylvester matrix) H has |H|=\Omega(n^2) and r(H)\leq n. Hence, \OR(H)=\Theta(n\log n).
Let B\otimes A denote the Kronecker product, that is a block-matrix obtained by replacing 1-entries of B by copies of A.
Theorem (Find et al. [1]): If A is k-free, then \SUM(B\otimes A)\geq \pr(B)\cdot\frac{|A|}{(k-1)^2}.On the other hand, using (i) and the fact that the transformation x\mapsto (B\otimes A)x can be written as a result (A(XQ))P^T of three matrix products, where B=PQ^T and P,Q are m\times \br(B) matrices, and X is the m\times m matrix of variables, it is shown in [1] that \OR(B\otimes A)\lesssim \br(B)\cdot \OR_2(m)\approx \br(B)\cdot m^2/\log m holds for any m\times m matrix A. If we take B=\overline{I_m} to be the complement of the m\times m identity matrix I_m, where m=\sqrt{n}, then \pr(B)=\Omega(m), while \br(B)=O(\log m), and the gap \pr(B)/\br(B) \gtrsim m/\log m is already large. Thus, if we let B=\overline{I_m} and A be a random m\times m matrix, then \OR(B\otimes A)\lesssim m^2 = n, whereas \SUM(B\otimes A) \gtrsim m^3/\log^2 m\approx n^{3/2}/\log^2 n, and the desired SUM/OR gap follows.
Theorem (Gál [18]): For every boolean m\times m matrix A and k\times l matrix B, \tr(B)\cdot \OR_2(A)\leq \OR_2(B\otimes A) \lesssim \tr(B)\max\{k,l\}\cdot \OR_2(m).The proof of the lower bound is based on a simple fact that, in depth 2, we have that \OR_2\begin{bmatrix} X & C \\ D & Y \\ \end{bmatrix} \geq \OR_2\begin{bmatrix} X & 0 \\ 0 & Y \\ \end{bmatrix} Using this, the lower bound can be easily shown by induction on \tr(B). In larger depth, the fact does not hold anymore. Here we only have that \OR\begin{bmatrix} X & C \\ D & Y \\ \end{bmatrix} \geq \max\{ \OR(X), \OR(Y)\}\geq \frac{1}{2}\cdot \OR\begin{bmatrix} X & 0 \\ 0 & Y \\ \end{bmatrix}
In particular, if B=\overline{I_m} and A is a random m\times m matrix with m=\sqrt{n}, then \OR_2(\overline{I_m}\otimes A) =\Omega(n^{3/2}/\log n). This does not contradict the upper bound \OR(\overline{I_m}\otimes A) \lesssim m^2=n given above, because here circuits of depth larger than 2 are used (depth 6 is enough).
Sergeev [3] improved the upper bound as follows. For a covering \sigma of B by rectangles (all-1 submatrices), let s(\sigma) denote the sum of the lengths of shorter sides, and S(\sigma) the sum of the lengths of longer sides of rectangles in \sigma. Hence, OR_2(B)=\min \{s(\sigma)+S(\sigma)\} where the minimum is over all coverings \sigma of B.
Theorem (Sergeev [3]): For every boolean m\times m matrix A, every covering \sigma of a matrix B, and every positive integer t, \OR_2(B\otimes A) \leq s(\sigma)\frac{m^2}{t} + S(\sigma)m2^t.Proof: By (i), both A and A^T can be realized by a depth-2 OR-circuit of size m^2/t+m2^t with m2^t wires on the first level, and m^2/t wires on the second level. If R is a k\times l rectangle, then R\otimes A can be realized by a depth-2 circuit of size \min\{k,l\}m^2/t + \max\{k,l\}m2^t. Taking the union of these circuits over all rectangles R\in \sigma, the desired upper bound follows. \Box
He then showed that B=\overline{I_m} has a covering \sigma with s(\sigma)=O(m) and S(\sigma)=O(m^{3/2}); see here for a proof. By taking t=(1/2)\log m - \log\log m, this yields the tight upper bound \OR_2(\overline{I_m}\otimes A) =O(m^{3}/\log m)=O(n^{3/2}/\log n).
Matrix | \SUM | \OR | \OR_2 | \XOR_2 | \OR_3 | \XOR_3 | \GEN_2 |
Submatrix A of Sylvester H_{n^2} | \Omega(\frac{n^2}{\ln^2 n}) | \Omega(\frac{n^2}{\ln^2 n}) | \Theta(\frac{n^2}{\ln n}) | O(n\ln n) | \Omega(\frac{n^2}{\ln^2 n}) | O(n) | O(n\ln n) |
\bar{I}\otimes A, A random \sqrt{n}\times \sqrt{n} | \Omega(\frac{n^{1.5}}{\ln^2 n}) | O(n) | \Theta(\frac{n^{1.5}}{\ln n}) | O(n\ln n) | O(n\ln n) | O(n) | O(n\ln n) |
Sierpinski matrix D | \Theta(n\ln n) | \Theta(n\ln n) | \Omega(n^{1.08}) O(n^{1.28}) | \omega(n)? | ? | ? | O(n\ln n) |
Kneser matrix K | O(n\ln n) | O(n) | \Theta(n\ln n) | \omega(n)? | O(n) | ? | O(n\ln n) |
Sylvester matrix H | \Theta(n\ln n) | \Theta(n\ln n) | \Theta(n^{1.5}) | \Theta(n\ln n) [9] | ? | O(n) | \Omega(n\frac{\ln n}{\ln\ln n}) |
Code matrices from [8] | ? | ? | ? | \Theta(n(\frac{\ln n}{\ln\ln n})^2) | ? | \Theta(n\ln\ln n) | \Theta(n(\frac{\ln n}{\ln\ln n})^2) |
Random \mathbf{A} | \Theta(\frac{n^2}{\ln n}) | \Theta(\frac{n^2}{\ln n}) | \Theta(\frac{n^2}{\ln n}) | \Theta(\frac{n^2}{\ln n}) | \Theta(\frac{n^2}{\ln n}) | \Theta(\frac{n^2}{\ln n}) | \Omega(\frac{n^2}{\ln n}) ? |
Acknowledgment:
My thanks to Igor Sergeev for very interesting discussions and for telling his
upper bounds mentioned above.
Split the rows of B into groups, where the rows in one group all have the same values. This gives us a decomposition of B into t\leq 2^k rectangles. For the i-th of these matrices B_i, let r_i be the number of its nonzero rows, and c_i the number of its nonzero columns. Hence, \SUM_2(B_i)\leq r_i+c_i. Since each nonzero row of A lies in exactly one of the these matrices, we obtain: \SUM_2(B)\leq \sum_{i=1}^t \SUM_2(B_i) \leq \sum_{i=1}^t r_i+\sum_{i=1}^t c_i\leq n+\sum_{j=0}^k\sum_{i:c_i=j}j \leq n+\sum_{j=0}^m\binom{k}{j}\cdot j = n+k2^{k-1}. \Box
Fact: If A is k-free, then \begin{align} & \Cov_k(A)\leq \OR(A)\leq \OR_2(A)\leq 2k\cdot \Cov_k(A)\\ & \\ & \Dec_k(A)\leq \SUM(A)\leq \SUM_2(A)\leq 2k\cdot \Dec_k(A) \end{align}Proof: We only need to prove the lower bounds \SUM(A)\geq \Dec_k(A) and \OR(A)\geq \Cov_k(A). Let F be an OR-circuit for A. For a node u in F, let s_u be the number of inputs from which u is reachable, and t_u be the number of outputs reachable from u. A wire (u,v) "eligible" if both s_u < k and t_v < k hold. An eligible cut is a set C of eligible edges such that every input-output path has a wire in C. We claim that such a cut exists, that is, every input-output path contains an eligible wire.
To see this, take an input-output path P. Nodes u of this path with s_u < k constitute the initial part P_1, and those with t_u < k the final part P_2 of P. Had these parts no wire in common, then P would contain a node w such that s_w\geq k and t_w\geq k. But this would mean that A contains a k\times k rectangle, contradicting the k-freeness of A. Thus, P_1 and P_2 share some wire (u,v) in common, meaning that s_u < k and t_v < k, that is, the wire (u,v) is eligible.
Take now a minimal cut C of F consisting of eligible wires. By our claim, such a cut exists. Each wire (u,v) in F gives an s_u\times t_v rectangle in A. If the wire is eligible, then this rectangle is k-balanced. Thus, C gives a covering (or decomposition, if we had a SUM-circuit) of A into |C| k-balanced rectangles, as desired. \Box
The stronger lower bound \OR_2(A)\geq |A|/(k-1) in the case of depth-2 circuits follows from the observation that then either s_u=1 or t_v=1.
Let S\subseteq GF(2)^{2r}. If |S\cap V|< t holds for every subspace V\subseteq GF(2)^{2r} of dimension \leq r, then H_S is t-Ramsey.Proof: Suppose that H_S contains a monochromatic t\times t submatrix T. Our goal is to show that then there is a subspace V\subseteq GF(2)^{2r} of dimension r such that |S\cap V|\geq t. The submatrix T corresponds to two subsets of vectors X,Y\subseteq S such that \langle u,v\rangle=a for some a\in\{0,1\} and all u\in X and v\in Y. Viewing the vectors in X as the rows of the coefficient matrix and the vectors in Y as (columns of) unknowns, we obtain that the sum \mathrm{dim}(X')+\mathrm{dim}(Y') of the dimensions of vector spaces X' and Y', spanned by X and by Y, cannot exceed 2r+a\leq 2r+1. Hence, at least one of these dimensions, say \mathrm{dim}(X') must be \leq r. So, we can take V=X', and obtain that |S\cap V|\geq |X|=t, as claimed. \Box
It is difficult to explicitly construct large sets S with this property, but one can show their existence using a simple probabilistic argument (given in [4], Section 4). Namely, pick a subset S\subseteq GF(2)^{2r} at random, by including each vector at random and independently in S with probability p=2^{1-r}=2/n. Chernoff´s inequalities, together with the fact that there are at most \tbinom{2r}{r}\leq 2^{2r}/\sqrt{2r} subspaces V\subseteq GF(2)^{2r} of dimension r, then imply that with probability 1-o(1), |S|\geq pn/2=2^r=n and |S\cap V| < 2r holds for every such subspace V. Thus,
Almost all n\times n submatrices H_S of H_{n^2} are t-Ramsey for t\leq 2r=2\log n.
S. Jukna, April 2013