MLDNN-03 Introduction to Deep Learning

MLDNN-03 Introduction to Deep Learning

Deep Learning (DL)

  • Deep Learning is a subset of Machine Learning that uses artificial neural networks with multiple layers to learn from data.
  • It is inspired by the structure of the human brain and is capable of learning complex patterns automatically.
  • It works especially well with large datasets and complex problems.

How Deep Learning Works

  • Deep learning models consist of multiple layers: input layer, hidden layers, and output layer.
  • The input layer receives raw data such as images, text, or audio.
  • Hidden layers process the data and extract important features step by step.
  • The output layer gives the final prediction or result.
  • Each layer learns more complex patterns than the previous one.

Key Features of Deep Learning

  • Uses multi-layer neural networks (deep neural networks).
  • Automatically extracts features from raw data.
  • Handles large and complex datasets efficiently.
  • Provides high accuracy for tasks like image and speech recognition.

Applications of Deep Learning in Real-World Domains

1. Healthcare

  • Deep Learning is widely used in medical diagnosis and treatment planning.
  • It helps in analyzing medical images such as X-rays, MRIs, and CT scans to detect diseases.
  • It can identify patterns that are difficult for humans to detect.
  • Example: Detecting tumors, predicting diseases like cancer, and assisting doctors in diagnosis.

2. Finance

  • Deep Learning is used in the finance sector for fraud detection and risk analysis.
  • It analyzes large amounts of transaction data to identify unusual patterns.
  • It helps in making better investment decisions and predicting market trends.
  • Example: Detecting fraudulent credit card transactions and predicting stock price movements.

3. Computer Vision

  • Computer Vision is a field where machines interpret and understand visual data like images and videos.
  • Deep Learning helps in recognizing objects, faces, and scenes accurately.
  • It is used in security systems, automation, and image processing.
  • Example: Face recognition systems, self-driving cars, and object detection in images.

4. Natural Language Processing (NLP)

  • Deep Learning helps machines understand and process human language.
  • It is used in translation, chatbots, and text analysis.
  • Example: Language translation apps and virtual assistants.

5. Speech Recognition

  • Deep Learning converts speech into text and understands voice commands.
  • Example: Voice assistants like Siri and Google Assistant.

Artificial Neural Network (ANN)

  • An Artificial Neural Network (ANN) is a computational model inspired by the human brain.
  • It consists of interconnected nodes (neurons) that process information.
  • ANN is used to recognize patterns, learn from data, and make predictions.

Basic Structure of ANN

1. Input Layer

  • This layer receives the input data from the outside world.
  • Each neuron represents a feature of the data.

2. Hidden Layer(s)

  • These layers process the input data.
  • They perform calculations and extract important patterns.
  • There can be one or multiple hidden layers in a network.

3. Output Layer

  • This layer produces the final result or prediction.
  • It gives the output in the required format (e.g., class label or value).

Working of ANN

  • Input data is passed to the input layer.
  • Each neuron multiplies the input by a weight and adds a bias.
  • The result is passed through an activation function.
  • The processed data moves through hidden layers.
  • Finally, the output layer produces the result.
  • The network learns by adjusting weights and biases to reduce error.

Example

  • In image recognition, the input layer receives pixel values.
  • Hidden layers detect features like edges and shapes.
  • The output layer identifies the object (e.g., cat or dog).

Components of a Neuron in an ANN

1. Inputs (x₁, x₂, …, xₙ)

  • These are the input values or features provided to the neuron.
  • They represent the data that the model receives.

2. Weights (w₁, w₂, …, wₙ)

  • Each input is multiplied by a weight that represents its importance.
  • Higher weight means greater influence on the output.

3. Bias (b)

  • Bias is an additional value added to the weighted sum.
  • It helps in shifting the output and improving model performance.

4. Weighted Sum (z)

  • It is the sum of all inputs multiplied by their weights plus bias: z = w₁x₁ + w₂x₂ + ... + wₙxₙ + b

5. Activation Function

  • It decides whether the neuron should be activated or not.
  • It converts the weighted sum into the final output.

6. Output (y)

  • The final result produced by the neuron after applying the activation function.

Perceptron Model

  • The Perceptron is the simplest form of an Artificial Neural Network.
  • It is a single-layer neural network used for binary classification (Yes/No, 0/1).
  • It takes input values, processes them, and produces an output based on a decision rule.

Structure of Perceptron

  • It consists of input values, weights, bias, and an activation function.
  • Each input is assigned a weight that represents its importance.
  • A bias is added to adjust the final output.
  • The activation function decides the final result.

Mathematical Representation

The perceptron calculates a weighted sum of inputs:

  • z = w1x1 + w2x2 + w3x3 + ... + b The output is then passed through an activation function:
  • y = f(z) For a simple step function:
    • If z ≥ 0 → Output = 1
    • If z < 0 → Output = 0

Explanation of Terms

  • x₁, x₂, …, xₙ (Inputs): These are the input features given to the model.
  • w₁, w₂, …, wₙ (Weights): These represent the importance of each input.
  • b (Bias): A constant value added to shift the decision boundary.
  • z (Weighted Sum): The combined value of inputs and weights plus bias.
  • f(z) (Activation Function): Determines the final output based on z.
  • y (Output): The final prediction (0 or 1).

Working of Perceptron

  • Input values are multiplied by their respective weights.
  • All values are added together along with bias.
  • The result (z) is passed to the activation function.
  • The activation function produces the final output (0 or 1).
  • The model learns by adjusting weights when predictions are incorrect.

Example

  • Predict whether a student will pass or fail:
    • Inputs: study hours, attendance
    • Output: Pass (1) or Fail (0)
  • The perceptron combines inputs and makes a decision based on a threshold.

Activation Functions

  • An activation function is a mathematical function used in neural networks to decide whether a neuron should be activated or not.
  • It takes the weighted sum of inputs (including bias) and converts it into an output signal.
  • It controls how much information should pass to the next layer in the network.
  • Activation functions help the network learn and represent complex patterns in data.

Why it is Important

  • It introduces non-linearity, which allows neural networks to learn complex and real-world relationships.
  • Without activation functions, the network would behave like a simple linear model and cannot solve complex problems.
  • It helps neurons decide whether to activate based on input values.
  • It improves the learning ability and accuracy of the model.
  • It enables deep neural networks to learn multiple levels of abstraction (from simple to complex features).
  • It allows the network to handle tasks like image recognition, speech processing, and classification effectively.

Example

  • In image recognition, activation functions help detect edges, shapes, and objects step by step.
  • Without activation functions, the network would not be able to distinguish between complex patterns in images.

1. Sigmoid Function

  • The sigmoid function gives output values between 0 and 1.
  • It is mainly used for binary classification problems.
  • Formula: f(x) = 1 / (1 + e⁻ˣ)
  • Example: Predicting whether an email is spam or not.

2. Tanh Function

  • The tanh function gives output values between -1 and +1.
  • It is zero-centered, which helps in better learning compared to sigmoid.
  • Example: Sentiment analysis (positive, negative, neutral).

3. ReLU (Rectified Linear Unit)

  • ReLU outputs 0 if the input is negative, otherwise it returns the input value.
  • It is simple, fast, and widely used in deep learning models.
  • Formula: f(x) = max(0, x)
  • Example: Used in hidden layers of neural networks.

4. Softmax Function

  • Softmax converts values into probabilities that sum to 1.
  • It is used in multi-class classification problems.
  • Example: Classifying an image as cat, dog, or bird.

Softmax Activation Function & its Role in Multi-Class Classification

  • Softmax is an activation function used in neural networks for multi-class classification problems.
  • It converts raw output values (logits) into probabilities.
  • The output values are in the range of 0 to 1, and their total sum is always equal to 1.

How Softmax Works

  • It takes multiple input values and applies an exponential function to each.
  • Then, each value is divided by the sum of all exponential values.
  • This produces a probability distribution over all classes.

Role in Multi-Class Classification

  • Softmax helps the model decide which class is most likely among multiple options.
  • Each output neuron represents one class.
  • The class with the highest probability is selected as the final prediction.

Example

  • Suppose a model classifies an image as:
    • Cat → 0.7
    • Dog → 0.2
    • Rabbit → 0.1
  • The model predicts "Cat" because it has the highest probability.

Optimization in Deep Learning

  • Optimization is the process of improving a deep learning model by reducing its error (loss).
  • It involves adjusting the model’s weights and biases to make better predictions.

Why Optimization is Necessary

  • Initially, the model makes incorrect predictions due to random weights.
  • Optimization helps the model learn from errors and improve accuracy step by step.
  • It minimizes the loss function, which measures the difference between predicted and actual values.
  • Without optimization, the model cannot learn or improve its performance.

How it Works

  • The model makes predictions and calculates the loss.
  • An optimization algorithm (like gradient descent) updates the weights.
  • This process is repeated until the loss is minimized.

Example

  • It is like improving your exam score by correcting mistakes after each test.
  • With each correction, your performance becomes better.

Gradient Descent

  • Gradient Descent is an optimization algorithm used in Machine Learning and Deep Learning to minimize the error (loss) of a model.
  • It works by adjusting the model’s weights step by step so that the predictions become closer to the actual values.
  • The main goal is to find the best values of parameters that give minimum loss.

Basic Idea

  • Imagine the error as a curve (loss function).
  • Gradient Descent tries to find the lowest point on this curve.
  • It moves step by step in the direction where the error decreases the most.

How Gradient Descent Works

1. Initialize Parameters

  • The model starts with random values of weights.

2. Calculate Loss

  • The model makes predictions and compares them with actual values.
  • The difference between them is called loss or error.

3. Compute Gradient

  • The gradient tells how much and in which direction the weights should change to reduce error.

4. Update Weights

  • Weights are updated in the opposite direction of the gradient.
  • This step is controlled by a parameter called learning rate.

5. Repeat Process

  • Steps are repeated until the loss becomes very small or stops decreasing.

Simple Example

  • Imagine you are trying to reach the bottom of a hill in the dark.
  • You cannot see the whole path, so you take small steps downward.
  • At each step, you check the slope and move in the direction where the ground goes down.
  • Eventually, you reach the lowest point.
  • Similarly, Gradient Descent adjusts weights step by step to reach minimum error.

Key Points

  • Learning rate controls how big each step is.
  • Too large learning rate may skip the minimum point.
  • Too small learning rate makes learning slow.

Differences between AI, ML, and DL

AspectAI (Artificial Intelligence)ML (Machine Learning)DL (Deep Learning)
Nature of TechnologyGeneral concept of creating intelligent systemsTechnique enabling machines to learn from dataAdvanced technique based on neural networks
Working StyleUses rules, logic, and learning methodsUses statistical algorithms to learn patternsUses layered neural networks to learn automatically
Feature HandlingMay rely on predefined rulesRequires manual feature selectionAutomatically extracts features from raw data
Data Type HandlingWorks with simple or structured dataMainly works with structured dataHandles complex unstructured data (images, audio, text)
Performance LevelModerate, depends on system designGood with proper dataHigh for complex, large-scale problems

DL in Image Recognition

  • Deep Learning uses neural networks, especially Convolutional Neural Networks (CNNs), to analyze and understand images.
  • The model processes images through multiple layers to extract features step by step.
  • In the initial layers, simple features like edges, lines, and colors are detected.
  • In deeper layers, complex patterns such as shapes, textures, and objects are identified.
  • The final layer classifies the image into categories based on learned patterns.
  • This approach allows the system to automatically learn features without manual programming.
  • Example: Face recognition in smartphones, object detection in images, and handwriting recognition.

DL in Natural Language Processing (NLP)

  • Deep Learning is used to process and understand human language in both text and speech form.
  • It uses models like Recurrent Neural Networks (RNNs) and Transformers to capture context and meaning.
  • The system learns patterns in language such as grammar, sentence structure, and word relationships.
  • It can perform tasks like translation, text generation, sentiment analysis, and question answering.
  • Deep Learning enables machines to understand context rather than just individual words.
  • Example: Language translation (English to Hindi), chatbots, virtual assistants, and text summarization.

Differentiate between Batch Gradient Descent, Stochastic Gradient Descent, and Mini-Batch Gradient Descent.

Batch Gradient Descent

  • Batch Gradient Descent uses the entire dataset to compute the gradient in each iteration.
  • It calculates the average error over all training examples before updating weights.

Advantages

  • Provides stable and accurate updates.
  • Converges smoothly towards the minimum.

Disadvantages

  • Slow for large datasets.
  • Requires high memory and computation power.

Stochastic Gradient Descent (SGD)

  • Stochastic Gradient Descent updates weights using one data point at a time.
  • It performs frequent updates after each training example.

Advantages

  • Faster and requires less memory.
  • Can escape local minima due to random updates.

Disadvantages

  • Updates are noisy and less stable.
  • May not converge smoothly to the exact minimum.

Mini-Batch Gradient Descent

  • Mini-Batch Gradient Descent uses a small subset (batch) of data to update weights.
  • It is a combination of Batch and Stochastic Gradient Descent.

Advantages

  • Faster than batch and more stable than SGD.
  • Efficient for large datasets and widely used in practice.

Disadvantages

  • Requires choosing appropriate batch size.
  • Performance depends on batch size selection.
AspectBatch Gradient DescentStochastic Gradient Descent (SGD)Mini-Batch Gradient Descent
ApproachUses entire dataset per updateUses one data point per updateUses small subset (batch) per update
Gradient CalculationAverage error over all training examplesError from single exampleAverage error over batch
SpeedSlowFastModerate and efficient
Memory UsageHighLowModerate
StabilityHigh (smooth convergence)Low (noisy updates)Balanced stability
ConvergenceSmooth, accurateFluctuates, less preciseFaster and relatively stable
Handling Large DataNot suitableSuitableHighly suitable
AdvantagesStable, accurate updatesFast, low memory, escapes local minimaBalanced speed and stability, widely used
DisadvantagesSlow, computationally expensiveNoisy, unstable convergenceDepends on batch size selection
Data per UpdateEntire datasetOne data pointSmall batch
Typical UsageSmall datasetsOnline learningDeep learning (most common)

A neuron receives inputs 0.5 and 0.8 with weights 0.4 and 0.6 respectively. Calculate the output using Sigmoid activation (assume bias = 0).

Step 1: Calculate Weighted Sum (z)

  • Formula: z = (w₁x₁ + w₂x₂ + b)

  • Given:

    • x₁ = 0.5, w₁ = 0.4
    • x₂ = 0.8, w₂ = 0.6
    • b = 0
  • Calculation:

    • z = (0.4 × 0.5) + (0.6 × 0.8) + 0
    • z = 0.2 + 0.48
    • z = 0.68

Step 2: Apply Sigmoid Function

  • Formula: σ(z) = 1 / (1 + e⁻ᶻ)
  • Substitute z = 0.68: σ(0.68) = 1 / (1 + e⁻⁰·⁶
  • Approximate value: e⁻⁰·⁶⁸ ≈ 0.506
  • Final calculation:
    • σ(0.68) = 1 / (1 + 0.506)
    • σ(0.68) ≈ 1 / 1.506
    • σ(0.68) ≈ 0.664

Final Output: The output of the neuron using Sigmoid activation is approximately 0.66.


Made By SOU Student for SOU Students