Exponential lower bound for semantic read-once branching programs

We consider nondeterministic branching programs (NBP) computing functions $f:D^n\to\{0,1\}$ over a given finite domain $D$. Recall that such a program is a directed acyclic graph, some edges of which are labeled by tests $x_i=d$ for elements $d\in D$. The program accepts an input $x\in D^n$ iff there is an $s$-$t$ path all whose tests are consistent with $x$. The program is semantic read-once if along every consistent $s$-$t$ path no variable is tested more than once. (In the book, we called such programs weakly read-once.) Let $\nbp{f}$ denote the minimum number of nodes in such a program computing $f$. The main difficulty in proving large lower bounds on $\nbp{f}$ is in there being no restrictions on inconsistent paths, i.e. on those containing some two contradictory tests $x_i=d$ and $x_i=d'$ for $d\neq d'$. Research Problem 16.12 asks to prove an exponential lower bound on $\nbp{f}$ when $D=\{0,1\}$. So far, this was done only for huge domains $D$. The smallest was $|D|=2^{13}$ (see Chapter 17).

Recently, Stephen Cook, Jeff Edmonds, Venkatesh Medabalimi, and Toniann Pitassi [1] have made a substantial progress by reducing the domain till $D=\{0,1,2\}$. Albeit their argument does not work for the boolean domain $D=\{0,1\}$, the result makes a great decrease of the domain-size from a messy $2^{13}$ to a "just next to boolean" size $3$. Bellow I present their argument. Warning: the presentation is somewhat different from the original paper [1]---I'we tried to remain as close to the book as possible---so, all potential errors in computations are entirely my fault.

A general lower bound

An $n_1$-by-$n_2$ rectangle is a set $R\subseteq D^n$ of vectors for which there are disjoint subsets $X,Y\subseteq [n]$ of sizes $|X|\geq n_1$ and $|Y|\geq n_2$, sets $A\subseteq D^X$ and $B\subseteq D^Y$ of vectors, and a vector $w\in D^{[n]-X-Y}$ such that (by permuting the index-set $[n]$, if necessary) the set $R$ can be written as a Cartesian product \[ R=A\times\{w\}\times B\,. \]

Sets $X$ and $Y$ are left and right feets, sets $A$ and $B$ left and right legs, and vector $w$ the spine of the rectangle. (In the book, we called $R$ a "broom", and vector $w$ the "broomstick" of this broom.) As in the book, call a function $f:D^n\to\{0,1\}$ sensitive if every two vectors in $f^{-1}(1)$ differ in at last two positions. Finally, let $\R_{m,\ell}$ denote the set of all $m$-by-$4m$ rectangles $R\subseteq D^n$ with both legs of size $\ell$.

Rectangle Bound: Let $m=cn$ for some constant $0 < c \leq 0.01$. Let $f:D^n\to\{0,1\}$ be a sensitive function such that $f(R)\not\equiv 1$ for all rectangles $R\in\R_{m,\ell}$. Then \[ \nbp{f} \geq \frac{(1/2-c)^m|f^{-1}(1)|}{2\ell|D|^{n-m}} =\frac{|f^{-1}(1)|}{|D|^{n-m}2^m}\cdot \frac{(1-2c)^m}{2r} \,. \]

Proof: See here . $\Box$

Note that in the boolean case (where $|D|=2$), the resulting lower bound is trivial, because then $|f^{-1}(1)|\leq 2^n$. But it may be non-trivial already for $|D|=3$, if $f$ is dense enough, i.e. accepts many of all $3^n$ inputs.

A bit of algebra

We will construct our "hard" function $f:D^n\to\{0,1\}$ using small-degree polynomials over a finite field. For this, we need the following two quite simple properties of such polynomials.

Let $u(z)=u_0+u_1z+u_2z^2+\cdots+u_{d-1}z^{d-1}$ be a random polynomial of degree $d-1$ over a field $\FF$ with $K$ elements whose coefficients $u_i\in\FF$ are chosen uniformly at random in $\FF$. In both two lemmas below, we only use one of the first results in the interpolation theory:

the values of a polynomial of degree $d-1$ on $d$ points uniquely determine the polynomial (its coefficients).

Mark some subset of $M\leq K$ field elements.

Lemma 1: Let $R\subseteq \FF$ be a subset of $|R|=d$ field elements. Then the probability that $u$ takes only marked values on $R$ does not exceed $(M/K)^d$.

Proof: Fix an order $z_1,\ldots,z_d$ of elements in $R$, and consider the $d$-tuples $u(z_1),\ldots,u(z_d)$ of values of polynomials $u$ on these elements. Since we have $d$ elements in each $d$-tuple, and the degree is $d-1$, the interpolation property (i) implies that each $d$-tuple of values will be taken by exactly one polynomial. Thus, the probability to take any fixed tuple is exactly $K^{-d}$. Since we have $M^d$ (entirely) marked $d$-tuples of field elements, the union bound gives that at least one $d$-tuple is be marked with probability at most $K^{-d}M^d$. $\Box$

Lemma 2: If $M \ll K$, then with probability at least $1-o(1)$, at least $(1-o(1))M$ values $u(z)$ of $u$ are marked.

Proof: For an element $z\in\FF$, let $X_z$ be an indicator random $0$-$1$ variable for the event that $u(z)$ is marked, and let $X=\sum_{z\in\FF} X_z$. Then, for every $z$, we have $p:=\prob{X_z=1}=M/K$. By the linearity of expectation, the expectation of $X$ is $\mu=K\cdot p= M$. The goal is to show that $X$ is at least $(1-o(1))M$ with probability at last $1-o(1)$. Using the interpolation property (i), it is easy to show that the r.v.'s $X_z$ are pairwise independent (actually, even $d$-wise independent); this important fact was observed by Joffe already 40 years ago. Using this, one can show that the variance of $X$ is \[ \sigma^2=\mathrm{Var}(X)=K\cdot p(1-p)=K\cdot \frac{M}{K}\cdot\left(1- \frac{M}{K}\right) = M - \frac{M^2}{K} \approx M =\mu\,. \] By Chebychev's inequality, for every $a>0$ we have $\prob{|X-\mu| < a\sigma} < 1/a^2$. Setting $a:=\mu^{1/6}$ gives $\prob{X < M - M^{2/3}} < M^{-1/3}$ ,as desired. $\Box$

An explicit lower bound

Let $|D|=q$ be a prime number, and $\FF=GF(q^n)$ be a field with $K=q^n$ elements. Let $0<\delta < 1$ be a constant (to be specified later), and mark the first $M = K^{1-\delta} -1$ field elements. Associate with every polynomial $u(z)$ of degree $d-1$ over $\FF$ the function $\poly_u:\FF\to\{0,1\}$ such that $\poly_u(x)=1$ which iff the value $u(x)$ is marked. Assume now that $u$ is a random polynomial of degree $d-1$ over $\FF$. By viewing elements $x$ of $\FF$ as vectors $x\in D^n$, we thus have a random function $\poly_u:D^n\to\{0,1\}$.

Recall that $\R=\R_{m,\ell}$ denotes the set of all $m$-by-$4m$ rectangles $R\subseteq D^n$ with both legs of size $\ell$.

Lemma 3: Let $m=c n$ for some constant $0 < c \leq 0.01$, and $\ell^2\geq d$. Suppose that \[ (e/c)^{5cn}|D|^{5\ell cn + n-\delta dn}<1/4 \qquad \qquad \qquad (*) \] Then there is a polynomial $u$ of degree $d-1$ such that the set $\poly_u^{-1}(1)$ has at least $\tfrac{1}{2}|D|^{n-\delta n}$ vectors, and contains no rectangle in $\R_{m,\ell}$.

Proof: Let $u$ be a random polynomial of degree $d-1$. By Lemma 2, we have that with probability $\geq 1/2$, $f_u$ accepts at least $\tfrac{1}{2}M=\tfrac{1}{2} K^{1-\delta}=\tfrac{1}{2}|D|^{n-\delta n}$ input vectors. Take now a rectangle $R\in \R_{m,\ell}$. Since $|R|\geq \ell^2\geq d$, Lemma 1 implies that $\prob{\poly_u(R)\equiv 1}\leq p:= (M/K)^d=|D|^{-\delta dn}$. On the other hand, the total number of rectangles in $\R_{m,\ell}$ can be upper-bounded as follows: \[ |\R_{m,\ell}|\leq (e/c)^{5 cn}|D|^{5 c\ell n + n}\,. \qquad \qquad \qquad (**) \]

Proof: Just count. Recall that $\R_{m,\ell}$ consists of $m$-by-$4m$ rectangles $R=A\times\{w\}\times B$ with legs $A\subseteq D^X$ and $B\subseteq D^Y$ of sizes $|A|,|B|= \ell$, feets $X$ and $Y$ of sizes $|X|=m$ and $|Y|=4m$, and a spine $w\in D^{[n]-X-Y}$. The number of choices for $X$ is $\binom{n}{m}$. Given $X$, we choose $\ell$ vectors in $A$ from the $|D|^m$ possible values. Thus, the total number of left-legs $A$ is at most $\binom{n}{m}\cdot |D|^{\ell m}$. Similarly, the total number of right-legs $B$ is at most $\binom{n}{4m}\cdot |D|^{4\ell m}$. The number of choices for the spine $w$ is $|D|^{n-5m}$. Thus, \[ |\R_{m,\ell}|\leq \binom{n}{m}\binom{n}{4m}|D|^{\ell m}|D|^{4\ell m}|D|^{n-4m} < \binom{n}{m}\binom{n}{4m}|D|^{5\ell m +n} \] Using $m=cn$ and the inequality $\binom{n}{k}\leq (en/k)^k$ we conclude that \[ |\R_{m,\ell}|\leq \left(\frac{e}{c} \right)^{5m}|D|^{5\ell m+n} \leq \left(\frac{e}{c} \right)^{5 cn}|D|^{5c\ell n+n}\,.\qquad\qquad \Box \] Thus, by our assumption (*) and the union bound, the probability that $\poly_u(R)\equiv 1$ for at lest one rectangle $R\in \R_{m,\ell}$ does not exceed \[ p\cdot |\R_{m,\ell}|=|\R_{m,\ell}|/|D|^{\delta dn} \leq \left(\frac{e}{c} \right)^{5 cn}|D|^{5c\ell n+n -\delta dn} < \frac{1}{4}\,. \] So, there exists a polynomial $u$ of degree $d-1$ such that $\poly_n^{-1}(1)$ has at least $\tfrac{1}{2}|D|^{n-\delta n}$ vectors, and $\poly_u(R)\not\equiv 1$ for all rectangles $R\in \R_{m,\ell}$. $\Box$

Fix a polynomial $u$ guaranteed by Lemma 3. We cannot directly apply the Rectangle Bound to the corresponding function $\poly_u(x)$ because it may be not sensitive. To ensure the sensitivity, we just shrink the number of accepted inputs by a factor of $1/q$ where $q=|D|$. For an element $a\in GF(q)$, let $h_a:D^n\to\{0,1\}$ be the characteristic function of the hyperplane at $a$: \[ h_a(x)=1\ \ \mbox{ iff }\ \ x_1+x_2+\cdots+x_n = a \bmod{q}\,. \] Fix an element $a$ for which $h_a$ accepts the largest number of vectors accepted by $\poly_u$, and consider the function $f(x):=\poly_u(x)\land h_a(x)$. Then, clearly, $f(R)\not\equiv 1$ still holds for all rectangles $R\in \R_{m,\ell}$, and $f$ accepts at least $\tfrac{1}{q}|f^{-1}(1)|\geq \tfrac{1}{2q}|D|^{n-\delta n}$ vectors. The reason to add the hyperplane $h_a$ was that now the resulting function is sensitive: any two accepted vectors are at Hamming distance at last $2$. By the Rectangle Bound, we have that, if the parameters $c,\ell, \delta$ and $d$ satisfy the condition (*) of Lemma 3, then \[ \nbp{f} \geq \frac{(1/2-c)^m|f^{-1}(1)|}{2\ell|D|^{n-m}} \geq \frac{(1/2-c)^{cn}}{4\ell q}\cdot |D|^{(c-\delta)n} \,. \] Assume now that $q=|D|=3$. To ensure the condition (*), take $c:=0.01$, $\delta:=c/300=1/30000$, $\ell:=3000$ and $d=\ell^2$. By factoring out $n$ in the condition (*), it is enough to show that \[ |D|^{\delta d-5rc-1} > (4e/c)^{5c} \ \ \mbox{ that is }\ \ |D|^{300-150-1} > 1200^{0.05} =1.42547 ... \] which clearly holds for any $|D|\geq 2$. Thus, the condition (*) is fulfilled, and the lower bound for $|D|=3$ is \[ \nbp{f} \geq \frac{(1/2-c)^{cn}}{4\ell q}\cdot |D|^{(c-\delta)n} \geq \frac{0.449^{0.01 n}}{12(3000)^2}\cdot 3^{0.009 n} > \frac{1.0018^n}{12(3000)^2} =2^{\Omega(n)}\,. \] This lower bound is for a function $f(x)=\poly_u(x)$ where polynomial $u$ is not specified: we have only shown that a polynomial $u$ with large $\nbp{f}$ exists. To get an explicit bound, just consider the function $\poly(u,x):=\poly_u(x)$ of $N=dn+n$ variables, the first $dn$ of which specify the polynomial $u$ as a sequence of its $d$ coefficients over the field $GF(3^n)$, each being a vector of length $n$ in $D=GF(3)$. Then the "hard" function $\poly_u(x)$ is a subfunction of $\poly(u,x)$ obtained by fixing the coefficients of $u$. Thus, every 1NBP for the explicit function $\poly(u,x)$ of $N=O(n)$ variables must also have size $2^{\Omega(n)}$.

Reference:

S. Cook, J. Edmonds, V. Medabalimi, and T. Pitassi, Lower Bounds for Nondeterministic Semantic Read-Once Branching Programs, available at Edmonds' home page. Local copy is here.