Construccion of neural networks

Single Hidden Layer Neural Network

Let $x \in \mathbb{R}^n$ , and let $\sigma: \mathbb{R} \to \mathbb{R}$ be function. A single hidden layer neural network is defined as a finite linear combinations of the form

\Phi(x) = \sum_{j=1}^{M} \alpha_j \, \sigma(w_j^\top x + \beta_j),

where $\alpha_j \in \mathbb{R}$ are the output weights, $w_j \in \mathbb{R}^n$ are the input weights, $\beta_j \in \mathbb{R}$ are the biases y $M \in \mathbb{N}$ is the number of neurons.

Then the set of all single-hidden-layer neural networks with $M$ hidden units is

\mathcal{N}^{(M)}_n(\sigma) = \left\{ \Phi: \mathbb{R}^n \to \mathbb{R} \,\middle|\, \Phi(x) = \sum_{j=1}^{M} \alpha_j \, \sigma(w_j^\top x + \beta_j)\right\}

Now the set of all single-hidden-layer neural networks with an arbitrary large hidden number is

\mathcal{N}_n(\sigma)=\bigcup_{m=1}^{\infty}\mathcal{N}^{(m)}_n(\sigma)

Multilayer Neural Networks

The definition of a neural network can be extended beyond a single hidden layer. A multilayer network consists of several compositions of affine transformations followed by activation functions.

Example of two hidden layers

The following function defines a neural network with two hidden layers and a scalar output,

\Phi(x) = \sum_{k=1}^{M^{(2)}} \alpha_k^{(2)} \sigma^{(2)} \left( \sum_{j=1}^{M^{(1)}} \alpha^{(1)}_{k j} \sigma^{(1)}\left(w_j^\top x + \beta^{(1)}_j\right) + \beta^{(2)} _k \right)

$x \in \mathbb{R}^n$ is the input vector,
$\sigma^{(1)}$ , $\sigma^{(2)}$ are the activation functions of the first and second hidden layers respectively,
$w_j \in \mathbb{R}^n$ , $\beta^{(1)}_j \in \mathbb{R}$ : inputs weights and biases of the first hidden layer,
$\alpha^{(1)}_{k j} \in \mathbb{R}$ , $\beta^{(2)}_k \in \mathbb{R}$ : inputs weights and biases of the second hidden layer,
$\alpha_k^{(2)} \in \mathbb{R}$ : output weights from the second hidden layer to the output,
$M^{(1)}$ , $M^{(2)}$ are the number of neurons in the first and second hidden layers, respectively.

We can reinterpret the previous example recursively, by defining each layer as a function, for this, let $\Phi^{(1)}: \mathbb{R}^n \to \mathbb{R}^{M^{(1)}}$ be the first hidden layer, defined as:

\Phi^{(1)}(x) = \left[\sigma^{(1)}\left(w_1^\top x + \beta^{(1)}_1\right), \, \dots, \, \sigma^{(1)}\left(w_{M^{(1)}}^\top x + \beta^{(1)}_{M^{(1)}}\right)\right]

and let $\Phi^{(2)}: \mathbb{R}^{M^{(1)}} \to \mathbb{R}$ be the second hidden layer (connected to the output), defined as:

\Phi^{(2)}(z) = \sum_{k=1}^{M^{(2)}} \alpha_k^{(2)} \, \sigma^{(2)} \left( \sum_{j=1}^{M^{(1)}} \alpha^{(1)}_{k j} z_j + \beta^{(2)}_k \right)

If we define the weight vector of the $k$ -th neuron in the second layer as

\boldsymbol{\alpha}^{(1)}_k = \left( \alpha^{(1)}_{k1}, \, \dots, \, \alpha^{(1)}_{k M^{(1)}} \right) \in \mathbb{R}^{M^{(1)}},

then the expression for $\Phi^{(2)}$ becomes:

\Phi^{(2)}(z) = \sum_{k=1}^{M^{(2)}} \alpha_k^{(2)} \, \sigma^{(2)} \left( (\boldsymbol{\alpha}^{(1)}_k)^\top z + \beta^{(2)}_k \right)

Finally, the full network is the composition:

\Phi(x) = \Phi^{(2)}\circ\Phi^{(1)}(x)

In the same way this construction defines a two-hidden-layer network, it can be recursively extended to define a neural network with any number of hidden layers. So for $l$ layers we would have

\Phi(x) = \Phi^{(l)} \circ \cdots \circ \Phi^{(2)} \circ \Phi^{(1)}(x).

Multioutput Neural Networks

The notion of a neural network can be naturally extended to functions with multiple outputs. A single hidden layer multioutput neural network is a function $\Phi: \mathbb{R}^n \to \mathbb{R}^m$ , where $m \geq 1$ , defined by:

\Phi(x) = \Alpha \cdot \sigma\left(W \cdot x + \Beta\right),

where:

$x \in \mathbb{R}^n$ is the input vector and output $\Phi(x)\in\mathbb{R}^m$ ,
$W \in \mathbb{R}^{q \times n}$ : weights from input to hidden layer,
$\Alpha \in \mathbb{R}^{m \times q}$ : weights from hidden to output layer,
$\Beta \in \mathbb{R}^q$ : biases of hidden layer,
$\sigma: \mathbb{R} \to \mathbb{R}$ is the activation function, extended component-wise to $\mathbb{R}^q$ ,
$q$ are the number of neurons.

Example: Multioutput Neural Network from

\mathbb{R}^3\to\mathbb{R}^2

with 4 hidden neurons

Consider a Single Hidden Multioutput neural network $\Phi(x)$ Where:

$x \in \mathbb{R}^3$ is the input vector and $\Phi(x)\in\mathbb{R}^2$ ,
$W \in \mathbb{R}^{4 \times 3}$ is the weight matrix from input to hidden layer,
$\Alpha \in \mathbb{R}^{2 \times 4}$ is the weight matrix from hidden layer to output,
$\Beta \in \mathbb{R}^4$ is the bias vector of the hidden layer,
$\sigma$ is applied component-wise to $W x + \Beta$ ,

Explicitly:

W = \begin{bmatrix} w_{11} & w_{12} & w_{13} \\ w_{21} & w_{22} & w_{23} \\ w_{31} & w_{32} & w_{33} \\ w_{41} & w_{42} & w_{43} \end{bmatrix}, \quad \Beta = \begin{bmatrix} \beta_1 \\ \beta_2 \\ \beta_3 \\ \beta_4 \end{bmatrix}

The $Wx+\Beta$ vector is:

\begin{bmatrix} w_{11} x_1 + w_{12} x_2 + w_{13} x_3 + \beta_1 \\ w_{21} x_1 + w_{22} x_2 + w_{23} x_3 + \beta_2 \\ w_{31} x_1 + w_{32} x_2 + w_{33} x_3 + \beta_3 \\ w_{41} x_1 + w_{42} x_2 + w_{43} x_3 + \beta_4 \end{bmatrix}

After applying the activation function component-wise:

\sigma(Wx+\Beta) = \begin{bmatrix} \sigma(w_{11} x_1 + w_{12} x_2 + w_{13} x_3 + \beta_1) \\ \sigma(w_{21} x_1 + w_{22} x_2 + w_{23} x_3 + \beta_2 ) \\ \sigma(w_{31} x_1 + w_{32} x_2 + w_{33} x_3 + \beta_3 ) \\ \sigma(w_{41} x_1 + w_{42} x_2 + w_{43} x_3 + \beta_4) \end{bmatrix} = \begin{bmatrix} \sigma_{1} \\ \sigma_{2} \\ \sigma_{3} \\ \sigma_{4} \end{bmatrix}

Now let the output weight matrix be:

\Alpha = \begin{bmatrix} \alpha_{11} & \alpha_{12} & \alpha_{13} & \alpha_{14} \\ \alpha_{21} & \alpha_{22} & \alpha_{23} & \alpha_{24} \end{bmatrix}

Then the output of the network is:

\Phi(x) = \begin{bmatrix} \alpha_{11} \sigma_{1}+ \alpha_{12} \sigma_{2} + \alpha_{13} \sigma_{3} + \alpha_{14} \sigma_{4} \\ \alpha_{21} \sigma_{1} + \alpha_{22} \sigma_{2} + \alpha_{23} \sigma_{3} + \alpha_{24} \sigma_{4} \end{bmatrix}

Or in fully expanded summation form:

\Phi(x) = \begin{bmatrix} \displaystyle \sum_{j=1}^{4} \alpha_{1j} \, \sigma\left( \sum_{k=1}^{3} w_{jk} x_k + \beta_j \right) \\\\ \displaystyle \sum_{j=1}^{4} \alpha_{2j} \, \sigma\left( \sum_{k=1}^{3} w_{jk} x_k + \beta_j \right) \end{bmatrix}

note

A multioutput neural network with several hidden layers can be constructed using function composition as described in the Multilayer Neural Networks section, with each layer represented in matrix form.

References

While the definition of neural networks used here is quite general and does not specify any particular activation function ( $\sigma$ ), it is worth noting that 1 foundational work focused specifically on sigmoid activation functions.

For the construction of multilayer neural network you can see 2.

G. Cybenko, Approximation by superpositions of a sigmoidal function, Mathematics of Control, Signals and Systems, vol. 2, no. 4, pp. 303–314, 1989.
K. Hornik, Approximation capabilities of multilayer feedforward networks, Neural Networks, vol. 4, no. 2, pp. 251–257, 1991.

Single Hidden Layer Neural Network​

Multilayer Neural Networks​

Multioutput Neural Networks​

References​

Single Hidden Layer Neural Network

Multilayer Neural Networks

Multioutput Neural Networks

References