\( \def\<#1>{\left<#1\right>} \newcommand{\tens}[1]{\mathsf{#1}} \newcommand{\g}{\tens{C}} \newcommand{\h}{\tens{h}} \newcommand{\C}{\tens{C}} \newcommand{\err}{\epsilon} \newcommand{\euler}{\mathrm{e}} \newcommand{\maj}{\mathrm{Maj}} \newcommand{\prob}[1]{\mathrm{Pr}\left\{#1\right\}} \newcommand{\euler}{\mathrm{e}} \newcommand{\NN}{\mathbb{N}} \newcommand{\RR}{\mathbb{R}} \newcommand{\ZZ}{\mathbb{Z}} \newcommand{\QQ}{\mathbb{Q}} \newcommand{\f}{{\cal F}} \newcommand{\pprob}{\mathrm{Pr}} \newcommand{\Z}[2]{Z_{#2}(#1)} \newcommand{\vc}[1]{\mathrm{dim}(#1)} \newcommand{\VC}[1]{\mathrm{VC}(#1)} \newcommand{\pred}[1]{\left[#1\right]} \)

A general upper bound on the VC dimension

We will need the following neat upper bound on the number of zero-patterns of polynomials. Let $R$ be an arbitrary field. A zero-pattern of a sequence $(p_1,\ldots,p_m)$ of $m$ polynomials in $R[x_1,\ldots,x_n]$ is a subset $S\subseteq \{1,\ldots,m\}$ for which there exists an input $x\in R^n$ such that such that $p_i(x)= 0$ if and only if $i\in S$.
Theorem 1 [Ronyai, Babai and Ganapathy]: Let $R$ be a field. Then no sequence of $m$ polynomials of degree at most $d$ in $n$ variables can have more than $\binom{md}{n}\leq (\euler md/n)^n$ zero-patterns.
Proof: To stress the simplicity of their argument, we will show a slightly worse upper bound $\binom{n+md}{n}$. Assume that a sequence $(f_1,\ldots,f_m)$ has $p$ different zero-patterns, and let $v_1,\ldots,v_p\in R^n$ be witnesses to these zero-patterns. Let $S_i=\{k~\colon~f_k(v_i)\neq 0\}$ be a zero-pattern witnessed by the $i$-th vector $v_i$, and consider the polynomials $g_i:=\prod_{k\in S_i}f_k$. We claim that these polynomials are linearly independent over our field. This claim completes the proof of the theorem since each $g_i$ has degree at most $D:=md$, and the dimension of the space of polynomials of degree at most $D$ is exactly $\binom{n+D}{D}$. To prove the claim, it is enough to note that $g_i(v_j)\neq 0$ if and only if $S_i\subseteq S_j$. Suppose contrariwise that a nontrivial linear relation $\lambda_1 g_i(x)+\cdots+\lambda_p g_p(x)=0$ exists. Let $j$ be a subscript such that $|S_j|$ is minimal among the $S_i$ with $\lambda_i\neq 0$. Substitute $v_j$ in the relation. While $\lambda_jg_j(v_j)\neq 0$, we have $\lambda_ig_i(v_j)=0$ for all $i\neq j$, a contradiction. $\Box$

Let $R$ be a field. An algebraic formula $\Phi(x)$ of size $s$ and degree $d$ over $R$ is an arbitrary boolean combination of $s$ atomic predicates, each being of the form $\pred{p>0}$ or $\pred{p=0}$ or $\pred{p<0}$ for some polynomial $p\in R[x_1,\ldots,x_n]$ of degree at most $d$.

The following result was proved by Goldberg and Jerrum 1995 for algebraic formulas over the field of real numbers. Their proof was based on a result about sign-patterns of real polynomials proved earlier by Warren 1967 by using heavy tools from algebraic geometry. I recently realized that a much simpler Theorem 1 yields an even more general result (holding over any field).

Theorem 2: Let $R$ be a field, and let $\Phi(x,y)$ be an algebraic formula of $n+m$ variables over $R$. Let $\f$ be the family of all functions $F_x: R^m\to\{0,1\}$ defined by: $F_x(y)=1$ iff $\Phi(x,y)=1$. Then $\VC{\f}\leq 2n\log(\euler ks)$, where $s$ is the size and $k$ the degree of $\Phi$.
Proof: Let $s$ be the size of the formula $\Phi$; hence, $\Phi$ contains $s$ distinct polynomials. We can assume that all the atomic predicates are of the form $\pred{p=0}$ of $\pred{p>0}$: just replace each predicate $\pred{p<0}$ by the predicate $\pred{-p>0}$. What we achieve by this assumption is that now the value of a predicate defined by a polynomial $p$ only depends on whether $p=0$ or not. The size of the formula (the number of distinct atomic predicates) remains unchanged. Let $v=\VC{\f}$ be the VC dimension of $\f$. Then there are $v$ vectors $a_1,\ldots,a_v$ in $\RR^m$ such that \[ \Big\{\big(\Phi(x,a_1),\ldots,\Phi(x,a_v)\big)\colon x\in\RR^n\Big\}=\{0,1\}^v \ \ \ \ (*) \] Let $p_1(x),\ldots,p_t(x)$ be the polynomials contained in at least one of the formulas $\Phi(x,a_i)$. So $t\leq v\cdot s$. In order to fulfill (*), the polynomials $p_1,\ldots,p_t$ must have at least $2^v$ zero-patterns. On the other hand, Theorem 1 implies that there cannot be more than $(\euler kt/n)^n$ sign-patterns. This yields the inequality $2^v\leq (\euler kt/n)^n\leq (\euler kvs/n)^n$. By taking logarithms, we obtain that $v$ cannot be larger than $n\log(\euler ks) + n\log(v/n)$. If $\euler ks\geq v/n$, then $v\leq 2 n\log(\euler ks)$. If $\euler ks \leq v/n$ then $v\leq 2n\log(v/n)$, or equivalently $v/n\leq \log (v/n)^2$, implying that $v/n\leq 2$, and hence also $v\leq 4n$. Thus, in both cases we have $v\leq 2 n\log(\euler ks)$, as desired. $\Box$
S. Jukna, March 28, 2017