The definition of a neural network can be extended beyond a single hidden layer. A multilayer network consists of several compositions of affine transformations followed by activation functions.
Example of two hidden layers
The following function defines a neural network with two hidden layers and a scalar output,
σ(1), σ(2) are the activation functions of the first and second hidden layers respectively,
wj∈Rn, βj(1)∈R: inputs weights and biases of the first hidden layer,
αkj(1)∈R, βk(2)∈R: inputs weights and biases of the second hidden layer,
αk(2)∈R: output weights from the second hidden layer to the output,
M(1), M(2) are the number of neurons in the first and second hidden layers, respectively.
We can reinterpret the previous example recursively, by defining each layer as a function, for this, let Φ(1):Rn→RM(1) be the first hidden layer, defined as:
If we define the weight vector of the k-th neuron in the second layer as
αk(1)=(αk1(1),…,αkM(1)(1))∈RM(1),
then the expression for Φ(2) becomes:
Φ(2)(z)=k=1∑M(2)αk(2)σ(2)((αk(1))⊤z+βk(2))
Finally, the full network is the composition:
Φ(x)=Φ(2)∘Φ(1)(x)
In the same way this construction defines a two-hidden-layer network, it can be recursively extended to define a neural network with any number of hidden layers. So for l layers we would have
The notion of a neural network can be naturally extended to functions with multiple outputs. A single hidden layer multioutput neural network is a function Φ:Rn→Rm, where m≥1, defined by:
Φ(x)=A⋅σ(W⋅x+B),
where:
x∈Rn is the input vector and output Φ(x)∈Rm,
W∈Rq×n: weights from input to hidden layer,
A∈Rm×q: weights from hidden to output layer,
B∈Rq: biases of hidden layer,
σ:R→R is the activation function, extended component-wise to Rq,
q are the number of neurons.
Example: Multioutput Neural Network from R3→R2 with 4 hidden neurons
Consider a Single Hidden Multioutput neural network Φ(x) Where:
x∈R3 is the input vector and Φ(x)∈R2,
W∈R4×3 is the weight matrix from input to hidden layer,
A∈R2×4 is the weight matrix from hidden layer to output,
A multioutput neural network with several hidden layers can be constructed using function composition as described in the Multilayer Neural Networks section, with each layer represented in matrix form.
While the definition of neural networks used here is quite general and does not specify any particular activation function (σ), it is worth noting that 1 foundational work focused specifically on sigmoid activation functions.
For the construction of multilayer neural network you can see 2.
G. Cybenko, Approximation by superpositions of a sigmoidal function, Mathematics of Control, Signals and Systems, vol. 2, no. 4, pp. 303–314, 1989.
K. Hornik, Approximation capabilities of multilayer feedforward networks, Neural Networks, vol. 4, no. 2, pp. 251–257, 1991.