Sinuoidal Functions I
For the examples, we will use the following libraries,
# Libraries
import matplotlib.pyplot as plt
import numpy as np
import tensorflow as tf
and we're going to have to have is compact (closed and bounded).
Implementation very close to the definition for
Let The idea is to approximate a neural network to the continuous function , , supported by the Universal approximation theorem of Spaces of Continuos Functions.
For this we will define and implement the function, the neural network and the function, the latter will be the softsign function which is continuous, bounded and not constant.
# Omega domain
omega = np.linspace(-np.pi, np.pi, 32, dtype="float64")
omega = omega[:, np.newaxis]
omega_tf = tf.convert_to_tensor(omega)
# python function f = sin(x)
@tf.function
def f(x):
return tf.sin(x)
# function sigma softsign
@tf.function
def sigma(x):
return x / (1 + tf.abs(x))
# Single hidden Layer Neural Network Phi wit M=30 hidden units
M = 30
weight = tf.Variable(tf.random.normal([1,M], dtype=tf.float64))
beta = tf.Variable(tf.random.normal([M], dtype=tf.float64))
alpha = tf.Variable(tf.random.normal([M,1], dtype=tf.float64))
@tf.function
def Phi(x):
return tf.matmul(sigma(tf.add(tf.matmul(x, weight), beta)), alpha)
To approximate the desired function using a neural network, we will employ the gradient descent method, which is a widely used optimization technique for minimizing loss functions. In our case, since we are dealing with continuous functions, the loss function will be defined using the
# norm
@tf.function
def norm(f, g, domain):
diff = tf.abs(f(domain) - g(domain))
return tf.reduce_max(diff)
# training by gradient descent
learning_rate = 0.1
training_epochs = 20000
optimizer = tf.keras.optimizers.Adam(learning_rate)
for epoch in range(training_epochs):
with tf.GradientTape() as tape:
loss = norm(f, Phi, omega_tf)
gradients = tape.gradient(loss, [weight, beta, alpha])
optimizer.apply_gradients(zip(gradients, [weight, beta, alpha]))
# Graphic
plt.plot(omega, sin(omega), 'ro', label='sin(x)')
plt.plot(omega, phi(omega), label='Phi(x)')
plt.legend()
plt.show()
Honestly, with all the theory, the technology, and the golden era of machine learning we live in... I would have expected a better approximation. But there’s a reason behind this underwhelming performance: the loss function we chose.
Although the previous example used custom norm to provide an intuitive link between mathematical theory and computational implementation, this approach is not generally recommended for practical model training.
Standard loss functions like Mean Squared Error (MSE) are optimized for performance, numerical stability, and compatibility with modern training frameworks such as TensorFlow.
Using custom norms may introduce unnecessary complexity, lead to slower or less stable training, and make the code harder to interpret and maintain. Therefore, unless there is a strong theoretical or research-driven motivation, it is preferable to rely on the standard loss functions provided by the framework.
Let's build another more challenging example that tests the above.
Smart Implementation for
To implement this effectively, there's no need to rewrite everything as pure Python functions, you can rely on TensorFlow’s optimized operations. While this may sacrifice the elegance of mathematical continuity, it's important to remember that machines don’t truly "understand" such concepts. That’s why it’s often better to adapt to what they can process efficiently.
Let , is defined by 64 points (twice as large as the previous example, since it is a domino twice as large) and is "defined" at once with the help of numpy and tensorflow.
omega = np.linspace(0, 4*np.pi, 64, dtype="float64")
f_omega = omega*np.sin(omega) # function xsin(x) "implementation"
omega = omega[:, np.newaxis]
f_omega = f_omega[:, np.newaxis]
omega_tf = tf.convert_to_tensor(omega)
f_omega_tf = tf.convert_to_tensor(f_omega)
To show the effectiveness of accommodating to the machine, we reduced the number of by half and used the native softsign version
# Single hidden Layer Neural Network Phi wit M=15 hidden units
M = 15
weight = tf.Variable(tf.random.normal([1,M], dtype=tf.float64))
beta = tf.Variable(tf.random.normal([M], dtype=tf.float64))
alpha = tf.Variable(tf.random.normal([M,1], dtype=tf.float64))
@tf.function
def Phi(x):
return tf.matmul(tf.keras.activations.softsign(tf.add(tf.matmul(x, weight), beta)), alpha)
Also, we reduced the epochs by 10 times and used the native MSE
# training by gradient descent
learning_rate = 0.1
training_epochs = 2000
optimizer = tf.keras.optimizers.Adam(learning_rate)
mse = tf.keras.losses.MeanSquaredError()
for epoch in range(training_epochs):
with tf.GradientTape() as tape:
loss = mse(Phi(omega_tf), f_omega_tf)
gradients = tape.gradient(loss, [weight, beta, alpha])
optimizer.apply_gradients(zip(gradients, [weight, beta, alpha]))
The following image shows the improvement in complexity and efficiency of the new implementation.
# Graphic
plt.plot(omega, f_omega, 'ro', label='xsin(x)')
plt.plot(omega, Phi(omega), label='phi(x)')
plt.legend()
plt.show()