In real-world implementations, deep learning is a powerful tool for creating complex models that learn and improve over time, such as image recognition systems. However, the field also faces some significant challenges — like catastrophic forgetting, the tendency of deep learning models to forget information related to previously learned tasks — that stem from how neural networks function.
In order to understand how to overcome such problems, it is essential to first look at how deep learning makes sense of data, as well as how this field relates to artificial intelligence and machine learning. Let's get started.
A Quick Introduction to Artificial Intelligence, Machine Learning, and Deep Learning
As you can see in the image below, artificial intelligence is not a new concept. It was first introduced in the 1950s by famed mathematician Alan Turing in a paper called “Computing Machinery and Intelligence.”
- Artificial intelligence (AI): Any technique that enables computers to mimic human behavior
- Machine learning: An AI technique that enables computers to learn without being explicitly programmed
- Deep learning: A subset of machine learning that makes the computation of multi-layer artificial neural networks feasible
For me, the most fascinating concept listed here is the ability to enable computers to "learn without being explicitly programmed." Typical programs provide instructions to a computer about how to perform tasks that help achieve an end result. The computational and/or functional efficiency of such programs depends on how well the programmer has thought about the limitations or merits of the computing and/or algorithms that are being used to produce an output. For example, programs like these can sort numbers, process text, or get computers to play chess against humans!
However, in the case of real machine learning, a program simply tells the computer what needs to be achieved, leaving it to the program architecture to determine how. Examples of real-world machine learning include systems that scan x-ray images to identify tumors or driverless cars that decide which route to take in order to arrive at a destination.
How to Approach Machine Learning
In order to mathematically model “learning” within an intelligent system, you must fit a model that can leverage learning algorithms, optimization techniques, and feedback (external and/or internal) mechanisms.
Machine learning algorithms can generally be classified as follows:
- Supervised: Given a desired set of output(s), supervised machine learning learns from the set of inputs (e.g., classification problems, such as identifying spam emails)
- Unsupervised: Attempts to recognize patterns or structure in data (e.g., clustering, which can be seen in Amazon’s “customers who bought this also bought…” feature)
- Reinforced: Allows machines to automatically decide how to behave in an environment in order to maximize performance, which is based on “reward” feedback or a reinforcement signal (e.g., self-driving cars)
There are many algorithms that can be applied in machine learning. Some of the most important ones are linear regression, logistic regression, support vector machines, k-nearest neighbors, and random forest.
How Artificial Neural Networks Learn
Artificial neural networks (ANNs) are inspired by biological nervous systems. They are a widely accepted example of machine learning, and they can use any of the algorithms listed above in order to learn. ANNs are also essential to the field of deep learning, which I will delve into in the next section.
ANNs consist of interconnected neurons. When a neuron is presented with an input, it will typically create a single output that is defined by the neuron’s bias or activation function. Every input to a neuron has a weight parameter attached to it.
If you observe nature very closely, you can see that systems that are able to learn are highly adaptable to their surroundings. In their quest to acquire knowledge, these systems to modify information that they’ve already collected or their internal structure using inputs from the outside world. That is exactly what ANNs do. To be more precise, ANNs alter the weights of connections based on input and the desired output, which you can learn more about here.
But how and why would altering weights help? Well, if you look a little closer at the structure of an ANN, there are a few parameters that can be altered to modify its architecture. For example, you could create new connections among neurons, delete those connections, or add and delete the neurons themselves. You could even modify the input function or activation function or the bias. And, as it turns out, altering weights is the most practical and universal approach to updating the architecture of a neural network. Deleting a connection, for example, can be achieved by setting the weight to 0. Similarly, a neuron can be deleted by setting weights on all its connections to zero.
In order for an ANN to learn, it first needs to be trained. Training is the process by which the ANN gets familiar with the problem it needs to solve. For instance, in the case of a supervised model, a set of input values and the desired output values are presented to the model. During an iterative training process, the weights of the neurons are incrementally adjusted. The training process typically uses a few important parameters, including learning rate (also called step size) and momentum, along with a function that helps optimize the learning process (gradient descent, for example).
By the end of the training phase, when the ANN has finally learned enough about the problem to be solved (e.g., recognize the English alphabet), we can imagine its network parameters to have gradually traversed in an N-dimensional space (in this case, a 26-dimensional space based on the number of letters in the alphabet), finally settling at an optimally stable global minimum within this space. In the above diagram, I’ve tried to represent a similar phenomenon that can be easily visualized in a three-dimensional space.
Evolution of Deep Learning
The adoption of — and focus on — machine learning increased within the research community between the 1990s and early 2000s, after which advancements plateaued due to data availability and computing power limitations, among other reasons.
However, starting 2010, the industry witnessed a phenomenal increase in areas of:
High-performance computing (including GPUs)
- More efficient algorithms (Big O notation, vectorization, etc.)
- Monumental increase in data availability (big data, cheaper storage)
This, in turn, sparked the emergence of deep learning as a major focus in the field of machine learning.
In simple terms, deep learning is when ANNs learn from large amounts of data. Similar to how humans learn from experience, a deep learning algorithm performs a task repeatedly, each time tweaking it slightly to improve the outcome. We refer to this process as deep learning because neural networks have various (deep) layers that enable learning. A deep neural network is essentially a multi-layered neural network, as represented in the figure above.
Deep Learning Challenges: Catastrophic Forgetting
In real-world implementations, deep learning algorithms face a significant challenge called “catastrophic forgetting,” also known as catastrophic interference. In simplistic terms, catastrophic forgetting is a neural network’s inability to learn different tasks in a sequential way; for example, if you train a neural network to perform Task A, and then use it to learn Task B, the algorithm will have a tendency to fit the objectives function of the new task — even if that means changing the weights that were relevant to Task A.
One of the critical steps in building artificial intelligence is ensuring that it has the ability to continually learn in the way that humans do: understanding a new task without forgetting how to perform a task that has already been learned. As I have already mentioned, ANNs learn by incrementally adjusting the weights associated with inter-neuron connections over repeated iterations of training. When training for a particular task, the parameters of an ANN are fairly stable within a global minimum and an N-dimensional space.
In the case of catastrophic forgetting, the global minimum for Task A will no longer be valid for the increased dimensional space that includes both A and B. There are studies that claim it is possible to minimize the effects of catastrophic forgetting to some degree, but it remains one of the key challenges as data scientists attempt to achieve higher levels of deep neural network performance and get closer to creating artificial intelligence that resembles the intelligence of humans.
Have any questions? Connect with me on LinkedIn.