Skip to main content

Construccion of neural networks

Single Hidden Layer Neural Network

Let xRnx \in \mathbb{R}^n, and let σ:RR\sigma: \mathbb{R} \to \mathbb{R} be function. A single hidden layer neural network is defined as a finite linear combinations of the form

Φ(x)=j=1Mαjσ(wjx+βj),\Phi(x) = \sum_{j=1}^{M} \alpha_j \, \sigma(w_j^\top x + \beta_j),

where αjR\alpha_j \in \mathbb{R} are the output weights, wjRnw_j \in \mathbb{R}^n are the input weights, βjR\beta_j \in \mathbb{R} are the biases y MNM \in \mathbb{N} is the number of neurons.

Then the set of all single-hidden-layer neural networks with MM hidden units is

Nn(M)(σ)={Φ:RnR|Φ(x)=j=1Mαjσ(wjx+βj)} \mathcal{N}^{(M)}_n(\sigma) = \left\{ \Phi: \mathbb{R}^n \to \mathbb{R} \,\middle|\, \Phi(x) = \sum_{j=1}^{M} \alpha_j \, \sigma(w_j^\top x + \beta_j)\right\}

Now the set of all single-hidden-layer neural networks with an arbitrary large hidden number is

Nn(σ)=m=1Nn(m)(σ)\mathcal{N}_n(\sigma)=\bigcup_{m=1}^{\infty}\mathcal{N}^{(m)}_n(\sigma)

Multilayer Neural Networks

The definition of a neural network can be extended beyond a single hidden layer. A multilayer network consists of several compositions of affine transformations followed by activation functions.

Example of two hidden layers

The following function defines a neural network with two hidden layers and a scalar output,

Φ(x)=k=1M(2)αk(2)σ(2)(j=1M(1)αkj(1)σ(1)(wjx+βj(1))+βk(2))\Phi(x) = \sum_{k=1}^{M^{(2)}} \alpha_k^{(2)} \sigma^{(2)} \left( \sum_{j=1}^{M^{(1)}} \alpha^{(1)}_{k j} \sigma^{(1)}\left(w_j^\top x + \beta^{(1)}_j\right) + \beta^{(2)} _k \right)
  • xRnx \in \mathbb{R}^n is the input vector,
  • σ(1)\sigma^{(1)}, σ(2)\sigma^{(2)} are the activation functions of the first and second hidden layers respectively,
  • wjRnw_j \in \mathbb{R}^n, βj(1)R\beta^{(1)}_j \in \mathbb{R}: inputs weights and biases of the first hidden layer,
  • αkj(1)R\alpha^{(1)}_{k j} \in \mathbb{R}, βk(2)R\beta^{(2)}_k \in \mathbb{R}: inputs weights and biases of the second hidden layer,
  • αk(2)R\alpha_k^{(2)} \in \mathbb{R}: output weights from the second hidden layer to the output,
  • M(1)M^{(1)}, M(2)M^{(2)} are the number of neurons in the first and second hidden layers, respectively.

We can reinterpret the previous example recursively, by defining each layer as a function, for this, let Φ(1):RnRM(1)\Phi^{(1)}: \mathbb{R}^n \to \mathbb{R}^{M^{(1)}} be the first hidden layer, defined as:

Φ(1)(x)=[σ(1)(w1x+β1(1)),,σ(1)(wM(1)x+βM(1)(1))]\Phi^{(1)}(x) = \left[\sigma^{(1)}\left(w_1^\top x + \beta^{(1)}_1\right), \, \dots, \, \sigma^{(1)}\left(w_{M^{(1)}}^\top x + \beta^{(1)}_{M^{(1)}}\right)\right]

and let Φ(2):RM(1)R\Phi^{(2)}: \mathbb{R}^{M^{(1)}} \to \mathbb{R} be the second hidden layer (connected to the output), defined as:

Φ(2)(z)=k=1M(2)αk(2)σ(2)(j=1M(1)αkj(1)zj+βk(2))\Phi^{(2)}(z) = \sum_{k=1}^{M^{(2)}} \alpha_k^{(2)} \, \sigma^{(2)} \left( \sum_{j=1}^{M^{(1)}} \alpha^{(1)}_{k j} z_j + \beta^{(2)}_k \right)

If we define the weight vector of the kk-th neuron in the second layer as

αk(1)=(αk1(1),,αkM(1)(1))RM(1),\boldsymbol{\alpha}^{(1)}_k = \left( \alpha^{(1)}_{k1}, \, \dots, \, \alpha^{(1)}_{k M^{(1)}} \right) \in \mathbb{R}^{M^{(1)}},

then the expression for Φ(2)\Phi^{(2)} becomes:

Φ(2)(z)=k=1M(2)αk(2)σ(2)((αk(1))z+βk(2))\Phi^{(2)}(z) = \sum_{k=1}^{M^{(2)}} \alpha_k^{(2)} \, \sigma^{(2)} \left( (\boldsymbol{\alpha}^{(1)}_k)^\top z + \beta^{(2)}_k \right)

Finally, the full network is the composition:

Φ(x)=Φ(2)Φ(1)(x)\Phi(x) = \Phi^{(2)}\circ\Phi^{(1)}(x)

In the same way this construction defines a two-hidden-layer network, it can be recursively extended to define a neural network with any number of hidden layers. So for ll layers we would have

Φ(x)=Φ(l)Φ(2)Φ(1)(x).\Phi(x) = \Phi^{(l)} \circ \cdots \circ \Phi^{(2)} \circ \Phi^{(1)}(x).

Multioutput Neural Networks

The notion of a neural network can be naturally extended to functions with multiple outputs. A single hidden layer multioutput neural network is a function Φ:RnRm\Phi: \mathbb{R}^n \to \mathbb{R}^m, where m1m \geq 1, defined by:

Φ(x)=Aσ(Wx+B),\Phi(x) = \Alpha \cdot \sigma\left(W \cdot x + \Beta\right),

where:

  • xRnx \in \mathbb{R}^n is the input vector and output Φ(x)Rm\Phi(x)\in\mathbb{R}^m,
  • WRq×nW \in \mathbb{R}^{q \times n}: weights from input to hidden layer,
  • ARm×q\Alpha \in \mathbb{R}^{m \times q}: weights from hidden to output layer,
  • BRq\Beta \in \mathbb{R}^q: biases of hidden layer,
  • σ:RR\sigma: \mathbb{R} \to \mathbb{R} is the activation function, extended component-wise to Rq\mathbb{R}^q,
  • qq are the number of neurons.
Example: Multioutput Neural Network from R3R2\mathbb{R}^3\to\mathbb{R}^2 with 4 hidden neurons

Consider a Single Hidden Multioutput neural network Φ(x)\Phi(x) Where:

  • xR3x \in \mathbb{R}^3 is the input vector and Φ(x)R2\Phi(x)\in\mathbb{R}^2,
  • WR4×3W \in \mathbb{R}^{4 \times 3} is the weight matrix from input to hidden layer,
  • AR2×4\Alpha \in \mathbb{R}^{2 \times 4} is the weight matrix from hidden layer to output,
  • BR4\Beta \in \mathbb{R}^4 is the bias vector of the hidden layer,
  • σ\sigma is applied component-wise to Wx+BW x + \Beta,

Explicitly:

W=[w11w12w13w21w22w23w31w32w33w41w42w43],B=[β1β2β3β4]W = \begin{bmatrix} w_{11} & w_{12} & w_{13} \\ w_{21} & w_{22} & w_{23} \\ w_{31} & w_{32} & w_{33} \\ w_{41} & w_{42} & w_{43} \end{bmatrix}, \quad \Beta = \begin{bmatrix} \beta_1 \\ \beta_2 \\ \beta_3 \\ \beta_4 \end{bmatrix}

The Wx+BWx+\Beta vector is:

[w11x1+w12x2+w13x3+β1w21x1+w22x2+w23x3+β2w31x1+w32x2+w33x3+β3w41x1+w42x2+w43x3+β4]\begin{bmatrix} w_{11} x_1 + w_{12} x_2 + w_{13} x_3 + \beta_1 \\ w_{21} x_1 + w_{22} x_2 + w_{23} x_3 + \beta_2 \\ w_{31} x_1 + w_{32} x_2 + w_{33} x_3 + \beta_3 \\ w_{41} x_1 + w_{42} x_2 + w_{43} x_3 + \beta_4 \end{bmatrix}

After applying the activation function component-wise:

σ(Wx+B)=[σ(w11x1+w12x2+w13x3+β1)σ(w21x1+w22x2+w23x3+β2)σ(w31x1+w32x2+w33x3+β3)σ(w41x1+w42x2+w43x3+β4)]=[σ1σ2σ3σ4]\sigma(Wx+\Beta) = \begin{bmatrix} \sigma(w_{11} x_1 + w_{12} x_2 + w_{13} x_3 + \beta_1) \\ \sigma(w_{21} x_1 + w_{22} x_2 + w_{23} x_3 + \beta_2 ) \\ \sigma(w_{31} x_1 + w_{32} x_2 + w_{33} x_3 + \beta_3 ) \\ \sigma(w_{41} x_1 + w_{42} x_2 + w_{43} x_3 + \beta_4) \end{bmatrix} = \begin{bmatrix} \sigma_{1} \\ \sigma_{2} \\ \sigma_{3} \\ \sigma_{4} \end{bmatrix}

Now let the output weight matrix be:

A=[α11α12α13α14α21α22α23α24]\Alpha = \begin{bmatrix} \alpha_{11} & \alpha_{12} & \alpha_{13} & \alpha_{14} \\ \alpha_{21} & \alpha_{22} & \alpha_{23} & \alpha_{24} \end{bmatrix}

Then the output of the network is:

Φ(x)=[α11σ1+α12σ2+α13σ3+α14σ4α21σ1+α22σ2+α23σ3+α24σ4]\Phi(x) = \begin{bmatrix} \alpha_{11} \sigma_{1}+ \alpha_{12} \sigma_{2} + \alpha_{13} \sigma_{3} + \alpha_{14} \sigma_{4} \\ \alpha_{21} \sigma_{1} + \alpha_{22} \sigma_{2} + \alpha_{23} \sigma_{3} + \alpha_{24} \sigma_{4} \end{bmatrix}

Or in fully expanded summation form:

Φ(x)=[j=14α1jσ(k=13wjkxk+βj)j=14α2jσ(k=13wjkxk+βj)]\Phi(x) = \begin{bmatrix} \displaystyle \sum_{j=1}^{4} \alpha_{1j} \, \sigma\left( \sum_{k=1}^{3} w_{jk} x_k + \beta_j \right) \\\\ \displaystyle \sum_{j=1}^{4} \alpha_{2j} \, \sigma\left( \sum_{k=1}^{3} w_{jk} x_k + \beta_j \right) \end{bmatrix}
note

A multioutput neural network with several hidden layers can be constructed using function composition as described in the Multilayer Neural Networks section, with each layer represented in matrix form.

References

While the definition of neural networks used here is quite general and does not specify any particular activation function (σ\sigma), it is worth noting that 1 foundational work focused specifically on sigmoid activation functions.

For the construction of multilayer neural network you can see 2.

  1. G. Cybenko, Approximation by superpositions of a sigmoidal function, Mathematics of Control, Signals and Systems, vol. 2, no. 4, pp. 303–314, 1989.

  2. K. Hornik, Approximation capabilities of multilayer feedforward networks, Neural Networks, vol. 4, no. 2, pp. 251–257, 1991.