Introduction

Let $x \in \mathbb{R}^n$ , and let $\sigma: \mathbb{R} \to \mathbb{R}$ be function. A single-hidden-layer neural network is defined as a finite linear combinations of the form

\Phi(x) = \sum_{j=1}^{M} \alpha_j \, \sigma(w_j^\top x + b_j),

where $\alpha_j \in \mathbb{R}$ are the output weights, $w_j \in \mathbb{R}^n$ are the input weights, $b_j \in \mathbb{R}$ are the biases y $M \in \mathbb{N}$ is the number of hidden units.

Then the set of all single-hidden-layer neural networks with $M$ hidden units is

\mathcal{N}^{(M)}_n(\sigma) = \left\{ \Phi: \mathbb{R}^n \to \mathbb{R} \,\middle|\, \Phi(x) = \sum_{j=1}^{M} \alpha_j \, \sigma(w_j^\top x + b_j)\right\}

Now the set of all single-hidden-layer neural networks with an arbitrary large hidden number is

\mathcal{N}_n(\sigma)=\bigcup_{m=1}^{\infty}\mathcal{N}^{(m)}_n(\sigma)