📖 Deep Learning¶

🧠 What is Deep Learning?
🧪 Setup: Complex Dataset
🧱 Deep Network Architecture
🎯 Loss Surfaces and Optimization
🧰 Training Deep Networks
🧠 Advanced Training Tricks
📚 Transfer Learning & Pretraining
📈 Scaling Up
🔚 Closing Notes

🧠 What is Deep Learning?¶

🔑 Relationship to Neural Networks¶

🧱 Depth vs. Width¶

🧠 Representation Learning¶

Back to the top

🧪 Setup: Complex Dataset¶

🧬 Example: CIFAR-10 or Similar¶

📊 Class Imbalance / Real-world Noise¶

🧮 Feature Complexity vs. Model Depth¶

Back to the top

🧱 Deep Network Architecture¶

🏗️ Stacking Layers: Concept and Challenges¶

🔥 Activation Functions¶

🧠 Role of Depth in Feature Hierarchy¶

Back to the top

🎯 Loss Surfaces and Optimization¶

🌄 Non-convex Landscapes¶

🌀 Vanishing/Exploding Gradients¶

⚙️ Weight Initialization Strategies¶

Back to the top

🧰 Training Deep Networks¶

🧮 Batch Training and Mini-Batch SGD¶

🛠️ Gradient Clipping¶

🚀 Optimizers¶

Back to the top

🧠 Advanced Training Tricks¶

⏱️ Learning Rate Scheduling¶

🧊 Early Stopping¶

🎲 Dropout in Deep Models¶

🧪 Data Augmentation¶

Back to the top

📚 Transfer Learning & Pretraining¶

🔄 Why Pretrained Models Work¶

🏗️ Fine-tuning vs. Feature Extraction¶

🌍 Common Pretrained Networks¶

Back to the top

📈 Scaling Up¶

🧮 Depth vs. Performance Tradeoffs¶

🧠 Hardware Considerations¶

⚖️ Batch Norm vs. Gradient Flow¶

Back to the top

🔚 Closing Notes¶

🧠 Summary of Key Concepts¶

⚠️ Common Pitfalls in Deep Learning¶

🚀 What's Next: Transformers & Attention¶

Back to the top