Neural Networks
Comprehensive Intuition of Neural Networks and Deep Learning
" Neural networks are a type of computer program that is designed to mimic the way the human brain works. They are made up of layers of interconnected nodes, which process information and make decisions based on that information. Neural networks are used in a wide variety of applications, from image and speech recognition to predicting stock prices and identifying fraudulent transactions. Essentially, neural networks take in data, process it through a series of interconnected layers, and output a result. They are incredibly powerful and have the ability to learn and improve over time, making them one of the most important technological advancements in recent years. "
A Densely Connected Artificial Neural Network
The Concept of "Deep" in Deep Learning
Deep learning is a specific subfield of machine learning: a new take on learning representations from data that puts an emphasis on learning successive layers of increasingly meaningful representations. The deep in deep learning isn’t a reference to any kind of deeper understanding achieved by the approach; rather, it stands for this idea of successive layers of representations. How many layers contribute to a model of the data is called the depth of the model. Other appropriate names for the field could have been layered representations learning and hierarchical representations learning. Modern deep learning often involves tens or even hundreds of successive layers of representations— and they’re all learned automatically from exposure to training data. Meanwhile, other approaches to machine learning tend to focus on learning only one or two layers of representations of the data; hence, they’re sometimes called shallow learning.
In deep learning, these layered representations are (almost always) learned via models called neural networks, structured in literal layers stacked on top of each other. The term neural network is a reference to neurobiology, but although some of the central concepts in deep learning were developed in part by drawing inspiration from our understanding of the brain, deep-learning models are not models of the brain. There’s no evidence that the brain implements anything like the learning mechanisms used in modern deep-learning models.
Deep Representation Learned by a Network
An Illustration of a Neuron
The Common Misconception
You may come across popscience articles proclaiming that deep learning works like the brain or was modeled after the brain, but that isn’t the case. It would be confusing and counterproductive for newcomers to the field to think of deep learning as being in any way related to neurobiology; you don’t need that shroud of “just like our minds” mystique and mystery, and you may as well forget anything you may have read about hypothetical links between deep learning and biology. For our purposes, deep learning is a mathematical framework for learning representations from data.
Think of a deep network as a multistage information-distillation operation, where information goes through successive filters and comes out increasingly purified (that is, useful with regard to some task).
The Mathematics
The Perceptron Simplified
A Shallow Network
X1 and X2 are Features and are termed as
→ "Input Nodes"
Both Input nodes are Connected to the Output Node which is Activated by the Sigmoid function ↓. for a binary Classification Task
The Weights, Bias and the Forward Pass
For each input, a Weight, and for every Layer, a Bias term is associated.
A single Forward Pass Equation representing the following network before the application of Activation is as follows,
→ A = X1 • W1 + X2 • W2 + b
The Equation with sigmoid activation is denoted by,
→ Z = σ(X1 • W1 + X2 • W2 + b)
* Different instructers/books follow different notations. This notation is referenced from Dr. Andre Ng's Deep Learning Course
Associated Weights and Bias
A Densely Connected Neural Network with a Single Hidden Layer
Scaling Up
Despite having an appearance of a complicated network, this architecture is still considered a Shallow Network.
→ By convention,
No of Hidden Layers > 2 is considered as a Deep Neural Network
The Algorithmic Flow chart for this Network is as follows:
Dimensionality Table
It is imperative to know the dimensions for every numerical element in the network because, during the time of application, it is a general convention to parameterise functions based on the accurate dimensions.
Activation Functions → Explaining them with Math is boring,
Try the Interactive playground Developed by Ammar Yasser on Streamlit
Select the function from the dropdown
← Summary of Activation Functions
Summarised Flow chart of Forward Propagation
Learning Aspect of a Neural Network - The Backpropagation
Gradient Descent
The goal of the training model is to minimize the loss function, usually with randomly initialized parameters, and use a gradient descent method with the following main steps. Randomization of parameter initialization is not necessary for logistic regression (zero initialization is fine), but it is necessary for multilayer neural networks.
The Algorithm for Gradient Descent ( Backpropagation )
Step1 → Start calculating the cost and gradient for the given training set of (x,y) with the parameters w and b.
Step2 → update parameters w and b with pre-set learning rate:
=> w_new = w_old – learning_rate * gradient_of_at(w_old)
Repeat these steps 1 and 2 until you reach the minimal values of the cost function.
3 Dimensional Perspective of Gradient Decent approaching Optimal Minima
Logistic Regression Cost Function (J)
In Logistic regression, we want to train the parameters `w` and `b`; we need to define a cost function.
The cost function is the average of the loss function of the entire training set. We are going to find the parameters 𝑤 𝑎𝑛𝑑 𝑏 that minimize the overall cost function.
The Entire Cycle
— Mathematica Completur —
An Interactive Playground - Developed by Google
Task Based Architectures of Deep Neural Networks
Artificial Neural Network
Feed-forward neural networks, also known as multilayer perceptrons (MLPs), are the most common type of neural network used in machine learning applications. They consist of multiple layers of interconnected processing nodes or neurons, where each node receives input from the previous layer and performs a simple computation before passing the output to the next layer. The first layer is called the input layer, the last layer is called the output layer, and all the layers in between are called the hidden layers.
Input layer: This layer receives the input data and passes it to the next layer.
Hidden layers: These layers perform computations on the input data using weights and biases, and pass the output to the next layer. Each neuron in a hidden layer takes a weighted sum of the inputs and applies a nonlinear activation function to produce an output. The number of hidden layers and the number of neurons in each layer are hyperparameters that need to be tuned during training.
Output layer: This layer produces the final output of the neural network, which could be a scalar value for regression problems or a probability distribution over classes for classification problems.
Convolutional Neural Network
Convolutional Neural Networks (CNNs) are a type of deep neural network that are designed to process and analyze data with a grid-like topology, such as images, videos, and audio signals. CNNs are widely used in computer vision applications such as object recognition, image classification, and segmentation.
The key feature of CNNs is the use of convolutional layers, which consist of a set of filters that slide over the input data and perform a dot product between the filter weights and the corresponding input values. The output of each filter is then passed through an activation function to produce the feature map, which highlights the important features of the input data that are relevant for the task at hand.
Input layer: This layer receives the input data, which is typically an image or a video frame.
Convolutional layers: These layers apply a set of filters to the input data, which produces a set of feature maps that capture the important features of the input data. Each filter is initialized with random weights and is updated during training using backpropagation.
Pooling layers: These layers downsample the feature maps by taking the maximum or average value over a small region of the feature map. This reduces the dimensionality of the feature maps and makes the network more robust to small variations in the input data.
Fully connected layers: These layers take the flattened output of the last pooling layer and perform a matrix multiplication with a set of weights to produce the final output of the network. The output could be a probability distribution over classes for classification problems or a scalar value for regression problems.
The output layer in a convolutional neural network (CNN) produces the final output of the network, which could be a scalar value for regression problems or a probability distribution over classes for classification problems. In CNNs, the output layer typically consists of one or more fully connected layers that receive input from the last convolutional layer or pooling layer.
Recurrent Neural Network
Recurrent Neural Networks (RNNs) are a type of neural network that is designed to process sequential data, such as time series, speech signals, and natural language. Unlike feedforward neural networks, which process each input independently, RNNs maintain a hidden state that depends on both the current input and the previous hidden state. This allows RNNs to model temporal dependencies and capture long-term dependencies in the input data.
The key feature of RNNs is the use of recurrent connections, which allow the hidden state to be updated at each time step based on the current input and the previous hidden state. The basic structure of an RNN is as follows:
Input layer: This layer receives the input data at each time step.
Hidden layer: This layer maintains a hidden state that is updated at each time step based on the current input and the previous hidden state. The hidden state can be thought of as a memory that encodes information about the past inputs.
Output layer: This layer produces the final output of the network at each time step, which could be a scalar value for regression problems or a probability distribution over classes for classification problems.
A Partial Conclusion to the page.
Due to its exceptional capacity to learn and make predictions on hard tasks, deep learning is a fast expanding topic of machine learning that has received a lot of attention in recent years. In this branch of machine learning, deep neural networks—which are made up of several layers of linked nodes—are used to develop highly accurate representations of data.
Computer vision, NLP, voice recognition, and many more fields are all under deep learning's broad purview. Each of these domains has its own set of difficulties, necessitating the development of distinct methods and algorithms for optimal performance.
For instance, in computer vision, visual data like photographs and videos are processed and analyzed. Object identification, picture segmentation, and classification are just some of the many activities that fall under this umbrella. Deep learning methods, such as convolutional neural networks (CNNs) and recurrent neural networks (RNNs), are often utilized to attain excellent performance in these tasks.
The term "natural language processing" (NLP) refers to the study and application of computational methods to linguistic data such as text and voice. Machine translation, sentiment analysis, and language modelling are just few of the activities that fall under this umbrella. High performance in these tasks is often attained by the use of deep learning methods, such as transformer models and sequence-to-sequence models.
Further, deep learning encompasses a wide variety of subfields, including generative models, reinforcement learning, and unsupervised learning. Since each of these uses unique methods and algorithms, deep learning's potential applications are vast.
In addition, new methods and algorithms are constantly being invented and improved because to the brisk pace of deep learning research and development. Therefore, it is difficult to keep up with the most recent innovations and developments in the field.
As we have seen, deep learning is a broad and intricate subject that includes several sub-disciplines of machine learning, each of which has its own set of difficulties and prerequisites. Therefore, it is impossible to provide a comprehensive overview of deep learning in a single article or even a single volume. Instead, it calls for incessant study and research into new possibilities in the sector.
Attributions: The explanations, statements, and misconceptions are referenced from the book Deep Learning with Python by François Chollet. The Math section is absolutely referencing the Mathematical notations from the course Neural Networks and Deep Learning by Andre Ng.