Article
02/08/2019

February 2019

Volume 34 Number 2

[Artificially Intelligent]

A Closer Look at Neural Networks

By Frank La | February 2019

Frank La Vigne Neural networks are an essential element of many advanced artificial intelligence (AI) solutions. However, few people un-derstand the core mathematical or structural underpinnings of this concept. While initial research into neural networks dates back decades, it wasn’t until recently that the computing power and size of training datasets made them practical for general use.

Neural networks, or more specifically, artificial neural networks, are loosely based on biological neural networks in the brains of animals. While not an algorithm per se, a neural network is a kind of framework for algorithms to process input and produce a “learned” output. Neural networks have proven themselves useful at performing tasks that traditional programming methods have severe difficulty solving. Although there are several different variations of neural networks, they all share the same core structure and concepts. There exist frameworks, like Keras, designed to make them easier to implement and to hide many implementation details. However, I never fully grasped the power and the beauty of neural networks until I had to program one manually. That will be the aim of this column: to build out a simple neural network from scratch with Python.

Neurons and Neural Networks

Before building a neural network from scratch, it’s important to understand its core components. Every neural network consists of a series of connected nodes called neurons. In turn, each neuron is part of a network of neurons arranged in layers. Every neuron in each layer is connected to every neuron in the next layer. Each artificial neuron takes in a series of inputs and computes a weighted sum value. The neuron will activate or not based on the output of the activation function, which takes in all the input values and weights along with a bias and computes a number. This number is a value between -1 and 1 or 0 and 1, depending on the type of activation function. This value is then passed on to other connected neurons into the next layer in a process called forward propagation.

As for the layers, there are three basic kinds: input layers, hidden layers and output layers. Input layers represent the input data while the output layer contains the output. The hidden layers determine the depth of the neural network—this is where the term “deep learning” comes from. In practice, neural networks can have hundreds of hidden levels, with the only upper limit being available processing power.

Forward Propagation

Forward propagation is the process by which data flows through a neural network from the input layer to the output layer. It involves computing a weighted sum of all inputs and factoring in a bias. Once the weighted sum is computed, it is then run through an activation function. Using the neuron in Figure 1 as an example, the neuron has two inputs, x1 and x2, along with two weights, w1 and w2. The weighted sum is represented by z, while the value a represents the value computed by the activation function when it’s passed the value of the weighted sum. Recall that the goal of the activation function is to compress the output of the neuron into a value between a range. Bias is added to fine-tune the sensitivity of the neuron.

Figure 1 Forward Propagation in a Neural Network

To illustrate this, it may be best to go through the code. To keep things clear, I’ll use variable names that match up with those in Figure 1. Start by launching a Jupyter Notebook (for more on that, go here: msdn.com/magazine/mt829269), then enter then following into a blank cell and execute:

x1 = .5
w1 = .2
x2 = 4
w2 = .5
b = .03

z = x1 * w1 + x2 * w2 + b

print (z)

The value of z, the weighted sum, is 2.13. Recall that the next step is to run this value through an activation function. Next, enter the following code into a new cell to create an activation
function and execute it:

import numpy as np
def sigmoid_activation(weighted_sum):
  return 1.0 / (1.0 + np.exp(-1 * weighted_sum))
a = sigmoid_activation(z)
print(a)

The output should show that a is equal to 0.8937850083248244. In a multi-layer neural network, the value of a is passed on to the next layer. Therefore, activations in one layer cascade to the next and, eventually, through the entire network.

The sigmoid_activation returns values between 0 and 1 no matter how large or small the number is. Enter the following code to test it:

print(sigmoid_activation(1000000))
print(sigmoid_activation(.000001))

The output should read 1.0 and roughly 0.50, respectively.

For more information about the mathematical constant e, please refer to the Wikipedia article on the subject (bit.ly/2s8VL9Q).

How Does a Neural Network Work?

Given the relative simplicity of the structure and mathematics of a neural network, it’s natural to wonder how it can be applied to a wide array of AI problems. The power is in the network, not necessarily the neurons themselves. Each neuron in a neural network represents a combination of input values, weights and biases. Through training, the appropriate weights and values can be determined.

By now, you’ve undoubtedly heard about the MNIST dataset, which is commonly used as a sort of “Hello World” for neural networks. I had seen it dozens of times before the notion of how neural networks functioned finally clicked for me. If you’re not familiar with the problem, there are plenty of examples online that break it down (see varianceexplained.org/r/digit-eda). The MNIST dataset challenge nicely illustrates how easy it is for neural networks to take on tasks that have baffled traditional algorithmic approaches for decades.

Here’s the problem summarized, given a 28x28 pixel grayscale image of a handwritten digit that a neural network must learn to read as the correct value. That 28x28 pixel image consists of 784 individual numerical values between zero and 255, making it easy to imagine the structure of the input layer. The input layer consists of 784 neurons, with values passed through an activation function that ensures that values are between zero and one. Therefore, lighter pixels will have a value closer to one and darker pixels will have a value closer to zero. The output layer consists of 10 neurons, one for each digit. The neuron with the highest value represents the answer. For instance, if the neuron for eight has the highest activation function value, then the neural network has determined that eight is the output value.

Just adding two hidden layers, with 32 neurons each, will have a dramatic effect. How so? Recall that each neuron is connected to every neuron in the previous layer and the next layer. That means that there are 784x32 weights in the first layer, 32x32 weights in the second layer, and 32x10 weights in the third layer. There are also 32 + 32 + 10 biases that need to be added, as well. That yields a grand total of 26,506 adjustable values, and in this relatively simple example of a three-layer neural network, just over 26,506 weights and biases. Effectively, this means that there are 26,506 parameters to adjust to achieve an ideal output. For an excellent visualization and
explanation of this structure and the power behind it, watch the “But What *Is* a Neural Network? | Deep Learning, Chapter 1” video on YouTube at bit.ly/2RziJVW. And for interactive experimentation with neural networks, be sure to check out playground.tensorflow.org.

Keep in mind that a real-world neural network may have hundreds of thousands of neurons across hundreds of layers. It is this substantial quantity of parameters that gives neural networks their uncanny ability to perform tasks that have traditionally been beyond the capability of computer programs. With all these “knobs and dials” it’s little wonder how these relatively simple
structures can tackle so many tasks. This is also why training a neural network requires so much processing power and makes GPUs ideal for this kind of massively parallel computation.

Building a Neural Network

With the structure and mathematics explained, it’s time to build out a neural network. Enter the code in Figure 2 into a new cell and execute it. The initialize neural network function simplifies the creation of multilayer neural networks.

Figure 2 Building a Neural Network

def initialize_neural_network(num_inputs, num_hidden_layers, 
  num_nodes_hidden, num_nodes_output):
    
  num_nodes_previous = num_inputs # number of nodes in the previous layer

  network = {}
    
  # Loop through each layer and randomly initialize 
  # the weights and biases associated with each layer.
  for layer in range(num_hidden_layers + 1):
        
    if layer == num_hidden_layers:
      layer_name = 'output' 
      num_nodes = num_nodes_output
    else:
      layer_name = 'layer_{}'.format(layer + 1) 
      num_nodes = num_nodes_hidden[layer] 
        
    # Initialize weights and bias for each node.
    network[layer_name] = {}
    for node in range(num_nodes):
      node_name = 'node_{}'.format(node+1)
      network[layer_name][node_name] = {
        'weights': np.around(np.random.uniform(size=num_nodes_previous), 
          decimals=2),
        'bias': np.around(np.random.uniform(size=1), decimals=2),
      }
    
    num_nodes_previous = num_nodes

  return network

To create a neural network with 10 inputs, two outputs, and five hidden layers with 32 nodes each, enter the following code into a blank cell and execute it:

network1 = initialize_neural_network(10, 5, [32, 32, 32, 32, 32], 2)

To create a network that has a structure to match the previous neural network described to solve the MNIST challenge, adjust the parameters as follows:

mnist_network = initialize_neural_network(784, 2, [32, 32], 10)
print(network1)

This code creates a neural network with 784 input nodes, two hidden layers with 32 nodes each, and an output layer of 10 nodes. Note that the output displays the networks in JSON format. Also note that the weights and biases are initialized to random values.

Exploring the Neural Network

So far, we have the structure of a neural network but we haven’t done anything with it. Now, let’s create some input for network1, which has 10 input nodes, like so:

from random import seed
np.random.seed(2019)
input_values = np.around(np.random.uniform(size=10), decimals=2)

print('Input values = {}'.format(input_values))

The output will be a numpy array of 10 random values that will serve as the input values for the neural network. Next, to view the weights and biases for the first node of the first layer, enter the following code:

node_weights = network1['layer_1']['node_1']['weights']
node_bias = network1['layer_1']['node_1']['bias']

print(node_weights)
print(node_bias)

Note that there are 10 values for the weights and one value for the bias. Next, enter the following code to create a function that calculates the weighted sum:

def calculate_weighted_sum(inputs, weights, bias):
  return np.sum(inputs * weights) + bias

Now enter the following code to compute and display the weighted sum (z) for this node:

weighted_sum_for_node = calculate_weighted_sum(inputs, node_weights, node_bias)
print('Weighted sum for layer1, node1 = {}'.format(
  np.around(weighted_sum_for_node[0], decimals=2)))

The value returned should be 3.15. Next, use the sigmoid_activation function to compute a value for this node, as follows:

node_output_value  = sigmoid_activation(weighted_sum_for_node)
print('Output value for layer1, node1 = 
  {}'.format(np.around(node_output_value[0], decimals=2)))

The final output value for this node is 0.96. It’s this value that will be passed on to all the neurons in the next layer.

Feel free to experiment and iterate though any of the 5,600 or so nodes in this network. Alternatively, you could repeat the steps I described for each of these nodes to get an appreciation for the sheer volume of calculations in a neural network. Naturally, this is a task better performed programmatically. I will cover this and how to train a neural network in the next column.

Wrapping Up

Neural networks have led to incredible advances in AI and have been applied to difficult real-world problems such as speech recognition and computer vision with great success. While their structures may be complex, they’re made of relatively simple building blocks. Neurons are arranged in layers in a neural network and each neuron passes on values. Input values therefore cascade through the entire network and influence the output.

Neurons themselves are simple and perform basic mathematical functions. They become powerful, however, when they’re connected to each other. The sheer number of tweakable values even in simple neural networks provides a great deal of control over the output and can prove useful in training.

While this article focused on building a neural network in Python, just about any programming language could be used to create a neural network. There are examples online of this being done in JavaScript, C#, Java or any number of modern languages. Where Python excels, however, is in the availability of widely supported frameworks, such as Keras, to make the creation of neural networks simpler.

Frank La Vigne works at Microsoft as an AI Technology Solutions Professional where he helps companies achieve more by getting the most out of their data with analytics and AI. He also co-hosts the DataDriven podcast. He blogs regularly and you can watch him on his YouTube channel, “Frank’s World TV” (FranksWorld.TV).

Discuss this article in the MSDN Magazine forum