Skip to content

Deep Learning and Neural Networks

Regression - Gradient Descent Batch, Mini-Batch, Stochastic, Loss, RMSProp, Adam (Lesson 209)

Objective

  • Explore how the linear regression algorithm trains a model and understand the role of gradient descent and loss functions.

Concepts

  1. Linear Regression:
  2. Basic algorithm for predictive modeling.
  3. Involves finding the best weights for input features to predict an output.

  4. Loss Function (Mean Squared Error):

  5. Measures the difference between predicted values and actual values.
  6. Lower loss indicates better predictions.

  7. Gradient Descent:

  8. A systematic approach to find optimal weights.
  9. Uses the gradient of the loss function to adjust weights.
  10. Moves weights in the direction that reduces the loss.

  11. Learning Rate:

  12. Controls the magnitude of weight adjustments.
  13. Low learning rate: Many small steps to reach the optimal weight.
  14. High learning rate: May overshoot the optimal weight.

  15. Variants of Gradient Descent:

  16. Batch Gradient Descent: Adjusts weights using all training examples in each iteration.
  17. Stochastic Gradient Descent (SGD): Adjusts weights based on each training example.
  18. Mini Batch Gradient Descent: Combines batch and SGD, adjusting weights using small subsets of training data.

  19. Optimization Techniques:

  20. Adaptive learning rate algorithms (e.g., RMSProp, Adagrad, Adam) to improve convergence.
  21. Momentum to reduce oscillations and speed up convergence.

Process

  1. Training Linear Regression:
  2. Initialize random weights.
  3. Calculate loss using loss function.
  4. Adjust weights based on the gradient of the loss function.

  5. Gradient Descent in Action:

  6. Plot loss versus weight.
  7. Start with a random weight.
  8. Move weight in the direction that reduces the loss (opposite of the gradient).
  9. Continue until reaching an optimal weight.

  10. Dealing with Multiple Features:

  11. Loss plot becomes multidimensional.
  12. Adjust weights of all features simultaneously.

Classification - Gradient Descent, Loss Function (Lesson 210)

Objective

  • Explore logistic regression as a classification algorithm and understand its functioning in predicting probabilities.

Concepts

  1. Logistic Regression Overview:
  2. A classification algorithm, despite the name suggesting regression.
  3. Similar to linear regression but predicts probabilities (0 to 1) using a sigmoid function.

  4. Sigmoid Function:

  5. Key component in logistic regression.
  6. Converts any input to a value between 0 and 1, ideal for probability predictions.

  7. Model Training:

  8. Input features (X) with corresponding weights (W).
  9. Model predicts the probability of belonging to a positive class.
  10. Cutoff generally at 0.5 for classifying into positive or negative classes.

  11. Logistic Loss Function:

  12. Measures the quality of predictions.
  13. Composed of two parts: one for positive and one for negative samples.
  14. Loss is high for misclassifications and low for accurate predictions.

  15. Gradient Descent Optimization:

  16. Used to find the optimal weights that minimize the logistic loss.
  17. Process involves adjusting weights based on the loss gradient.
  18. Produces a loss curve (parabola) from which gradient and optimal weights are determined.

Process

  1. Applying Sigmoid Function:
  2. Use linear model output as input to sigmoid function.
  3. Predicts the probability of sample belonging to the positive class.

  4. Setting Cutoff for Classification:

  5. Default cutoff is 0.5.
  6. Adjusting weights changes the criteria for classification.

  7. Computing Logistic Loss:

  8. Calculate logistic loss for a range of predicted probabilities.
  9. Compare against actual labels to evaluate loss.

  10. Weight Optimization with Gradient Descent:

  11. Start with random weights.
  12. Adjust weights iteratively to minimize logistic loss.

Neural Networks and Deep Learning (Lesson 211)

Objective

  • Understand the structure and functioning of neural networks in deep learning.

Neural Network Structure

  1. Basic Architecture:
  2. Comprises an input layer, hidden layers, and an output layer.
  3. Appears similar to logistic regression but extends with multiple neurons in hidden layers.

  4. Neurons and Activation Functions:

  5. Neurons generate new features by blending existing features with different weights.
  6. Activation functions introduce non-linearity, improving handling of complex datasets.

  7. Common Activation Functions:

  8. Sigmoid: Converts input to a range between 0 and 1.
  9. Tanh (Hyperbolic Tangent): Output ranges from -1 to 1.
  10. ReLU (Rectified Linear Unit): Outputs 0 for negative input, and raw input for positive values.

Network Types and Applications

  1. General-Purpose Networks:
  2. Fully connected; each neuron in a layer connected to all neurons in the next layer.
  3. Useful for diverse applications but may lead to overfitting.

  4. Convolutional Neural Networks (CNNs):

  5. Specialized for image and video analysis.
  6. Focuses on patterns around each pixel, not just the pixel itself.

  7. Recurrent Neural Networks (RNNs):

  8. Ideal for time series forecasting and natural language processing.
  9. Capable of remembering historical data, crucial for sequence-dependent predictions.

Key Points

  • Benefits of Neural Networks:
  • Can fit nonlinear datasets effectively.
  • Automatically generates new feature combinations.
  • Highly scalable and adaptable for various complex applications.

  • Challenges:

  • Complexity in tuning and risk of overfitting.
  • Requires extensive computation, especially for large networks.

Labs

  1. Regression with SKLearn Neural Network (Lesson 213)
  2. Regression with Keras and TensorFlow (Lesson 214)
  3. Binary Classification - Customer Churn Prediction (Lesson 216 & 217)
  4. Multiclass Classification - Iris (Lesson 218)

Convolutional Neural Network (CNN) (Lesson 230)

How CNNs Work

  1. Convolution Operation:
  2. CNNs break down images into smaller squares (patches) using a sliding window.
  3. For instance, a 4x4 filter slides across the image, capturing each 4x4 patch.
  4. Each neuron receives a patch rather than an individual pixel, preserving spatial context.

  5. Feature Learning:

  6. Neurons in CNNs learn to differentiate between different classes of image features (e.g., cars vs. faces).
  7. They identify dominant characteristics specific to each image class.

Advantages of CNNs

  • Preservation of Spatial Relationships: By analyzing patches rather than individual pixels, CNNs maintain the spatial hierarchy of pixels, crucial for understanding image content.
  • Efficiency: CNN models are generally smaller and more efficient compared to deep, general-purpose neural networks for image classification.
  • Improved Performance: CNNs typically outperform traditional networks in image-related tasks due to their ability to capture and learn from spatial information in images.

Reference

231. Recurrent Neural Networks (RNN), LSTM

Key Characteristics of RNNs

  • Sequential Processing: Unlike general-purpose neural networks that process single inputs, RNNs handle sequences of inputs (e.g., series of words, stock prices over time).
  • Memory Mechanism: RNNs maintain an internal state to remember past information, crucial for sequential decision-making.
  • Feedback Loops: These loops allow RNNs to update and maintain their internal state based on new inputs and previously learned information.

LSTM Networks

  • Long Term and Short Term Memory: LSTMs are a special kind of RNN capable of learning long-term dependencies.
  • Selective Memory: They excel in remembering important past information and forgetting irrelevant details, making them effective for complex sequential tasks.

Reference

Generative Adversarial Networks (GANs) (Lesson 232)

Core Components

  • Two-Player Game Setup: GANs consist of two key players – the Discriminator and the Generator.
  • Discriminator Network: Trained to distinguish between real images (assigning high probability to real ones).
  • Generator Network: Produces synthetic data, such as fake images.
  • Learning Process: The discriminator learns to assign low probabilities to these fake images.

Game Dynamics

  • Concurrent Optimization: The generator tries to create images that the discriminator will perceive as real.
  • Stable State Goal: Achieving a state where the generator produces perfectly realistic images indistinguishable from actual data.

Applications

  • Synthetic Data Generation: Creating realistic synthetic images for training other models.
  • Diverse Object Creation: Capable of producing a wide array of objects.
  • Practical Use Case: Apple's utilization of GANs to merge text sources with smaller trajectory datasets to create new trajectories for expanding their dataset.

Reference