Demystifying Neural Networks: Solving the XOR Problem with Backpropagation by Rajeshwar Vempaty

xor neural network

If you would like me to write another article explaining a topic in depth, please leave a comment. Normally we will have more than 4 data points, a lot more. In a way we’re removing the linear dependence from the first 3 inputs because we know the first column is close to the output that we want. Here we pay attention to the first column, particularly the top 3 values of 0, 1, 1. We adjust only the second column such that later we will map the negatives to 0 and the bottom term has to be 1.

Class of 2025 relishes time together at Hey Day

First of all, you should think about how your targets look like. Forclassification problems, one usually takes as many output neurons as one hasclasses. Then the softmax function is applied.1 The softmax function makes sure that the output of every single neuron is in \([0, 1]\) and the sum of all outputs is exactly \(1\). This means the output can be interpreted as a probability distribution over all classes. The next step would be to create a data set because we cannot just train our data on these four points.

The Mean Squared Error

Neural networks can learn complex patterns that are hard to program manually. They enable computers to learn from data and make decisions on their own, paving the way for smarter technology. Like I said earlier, the random synaptic weight will most likely not give us the correct output the first try. So we need a way to adjust the synpatic weights until it starts producing accurate outputs and “learns” the trend.

xor neural network

How Neural Networks Solve the XOR Problem

There are no connections between units in the input layer. Instead, all units in the input layer are connected directly to the output unit. Since, there may be many weights contributing to this error, we take the partial derivative, to find the minimum error, with respect to each weight at a time.

xor neural network

  1. It was invented in the late 1950s by Frank Rosenblatt.
  2. An MLP is generally restricted to having a single hidden layer.
  3. Empirically, it is better to use the ReLU instead of the softplus.
  4. In Keras we have binary cross entropy cost funtion for binary classification and categorical cross entropy function for multi class classification.

Looking for online tutorials, this example appears over and over, so I suppose it is a common practice to start DL courses with such idea. That is why I would like to “start” with a different example. As, out example for this post is a rather simple problem, we don’t have to do much changes in our original model except going for LeakyReLU instead of ReLU function. For, many of the practical problems we can directly refer to industry standards or common practices to achieve good results. As our XOR problem is a binary classification problem, we are using binary_crossentropy loss. The XOR gate can be usually termed as a combination of NOT and AND gates and this type of logic finds its vast application in cryptography and fault tolerance.

The purpose of hidden units is the learn some hidden feature or representation of input data which eventually helps in solving the problem at hand. For example, in case of cat recognition hidden layers may first find the edges, second hidden layer may identify body parts and then third hidden layer may make prediction whether it is a cat or not. Unlike AND and OR, XOR’s outputs are not linearly separable.Therefore, we need to introduce another hidden layer to solve it. In the context of the provided code, the neural network is being trained to predict the XOR operation based on the input data [0, 0], [0, 1], [1, 0], and [1, 1]. It learns to mimic the XOR truth table by adjusting its internal weights during training. Once trained, the neural network should be able to accurately predict the XOR of new inputs it hasn’t seen before.

However, the XOR problem requires a non-linear decision boundary to classify the inputs accurately. This means that a single-layer perceptron fails to solve the XOR problem, emphasizing the need for more complex neural networks. The XOR problem is a classic problem in artificial intelligence and machine learning. XOR, which stands for exclusive OR, is a logical operation that takes two binary inputs and returns true if exactly one of the inputs is true. The XOR gate follows a specific truth table, where the output is true only when the inputs differ.

Let us try to understand the XOR operating logic using a truth table. Our goal is to find the weight vector corresponding to the point where the error is minimum i.e. the minima of the error gradient. We need to look for a more general model, which would allow for non-linear decision boundaries, like a curve, as is the case above. Out of all the 2 input logic gates, the XOR and XNOR gates are the only ones that are not linearly-separable.

This is often simplified and written as a dot- product of the weight and input vectors plus the bias. After running, Tensorflow will feed the input data 5000 times and try to fit the model. The goal of our network is to train a network to receive two boolean inputs and return True only when one input is True and the other is False.

Hundreds of undergraduates take classes in the fine arts each semester, among them painting and drawing, ceramics and sculpture, printmaking and animation, photography and videography. The courses, through the School of Arts & Sciences and the Stuart Weitzman School of Design, give students the opportunity to immerse themselves in an art form in a collaborative way. But what logic did the model use to solve the XOR problem?

The black and orange points ended up in the same place (the origin), and the image just shows the black dot. Empirically, it is better to use the ReLU instead of the softplus. Furthermore, the dead ReLU is a more important problem than the non-differentiability at the origin. Then, at the end, the pros (simple evaluation and simple slope) outweight the cons (dead neuron and non-differentiability at the origin). If you want to read another explanation on why a stack of linear layers is still linear, please access this Google’s Machine Learning Crash Course page.

The classic multiplication algorithm will have complexity as O(n3). Jupyer notebook will help to enter code and run it in a comfortable environment. The central object of TensorFlow is a dataflow graph representing calculations. The vertices of the graph represent operations, and the edges represent tensors (multidimensional arrays that are the basis of TensorFlow). The data flow graph as a whole is a complete description of the calculations that are implemented within the session and performed on CPU or GPU devices.

xor neural network

There are no fixed rules on the number of hidden layers or the number of nodes in each layer of a network. The best performing models are obtained through trial and error. This function allows us to fit the output in a way that makes more sense. For example, in the case of a simple classifier, an output of say -2.5 or 8 doesn’t make much sense with regards to classification. If we use something called a sigmoidal activation function, we can fit that within a range of 0 to 1, which can be interpreted directly as a probability of a datapoint belonging to a particular class.


Leave a Reply

Your email address will not be published. Required fields are marked *