Table of Contents
Master few-shot learning and meta-learning. Complete guide to training models to learn from few examples, rapid adaptation, and learning to learn.
Introduction: Few-Shot Learning
Humans learn remarkably fast.
See a dog you’ve never seen before: “That’s a dog.”
See a new animal: “That’s probably not a dog.”
See 5 examples of a new animal: “I understand this species.”
Learn from minimal data.
Meanwhile, deep learning requires thousands of examples per category.
Few-shot learning asks: Can AI learn like humans, from minimal examples?
This is not just academic—practically important:
- New products: No historical data yet
- Rare events: Few examples exist
- Cost: Labeling expensive
- Speed: Need to adapt quickly
- Personalization: Per-user adaptation from few interactions
This guide covers few-shot learning: from fundamentals to meta-learning to practical implementations.
Few-Shot Learning Fundamentals
Definition
Learn from very few labeled examples (usually 1-5 per class).
Standard learning:
1000 cat images, 1000 dog images → Train classifier
Deploy: Works well
Few-shot learning:
1 cat image, 1 dog image → Train classifier
Deploy: Should work well
Much harder!
K-Shot, N-Way
K-shot: K labeled examples per class
N-way: N different classes
Example: 5-way 1-shot
5 different classes
1 labeled example per class
Total: 5 labeled images
Task: Classify new images into these 5 classes
Standard: 5-way 5-shot (25 labeled images)
Challenging: 5-way 1-shot (5 labeled images)
The Challenge
With so few examples:
- Can’t overfit (only 5 examples)
- Can’t afford long training
- Must leverage prior knowledge
- Must generalize from minimal
The Few-Shot Problem
Why It’s Hard
Insufficient Data:
Standard: 1000 examples → Learn pattern
Few-shot: 1 example → Memorize or generalize?
Learning vs Memorization:
With 1000 examples: Model learns pattern
With 1 example: Model memorizes (trivial)
Problem: Single example underdetermines solution
Generalization:
From 1 example of "dog," generalize to all dogs?
Requires strong prior knowledge
Model must somehow know what makes something a dog
Meta-Learning Approaches
Core idea: Learn how to learn.
Instead of: Train model on task
Do: Train model on many tasks, learning to adapt quickly
Train on Many Tasks
Tasks T1, T2, T3, ..., T100
Each task: Few-shot classification problem
Meta-train: Learn from T1-T50
Meta-test: Evaluate on T51-T100 (held out)
Model learns: How to quickly adapt to new task
Learning to Learn
Model learns:
- What features matter for classification
- How to extract useful information from few examples
- How to adapt parameters given few examples
Task 1 (meta-train): 5 examples of cats/dogs
Model learns: "Color, shape, size matter"
Task 2 (meta-train): 5 examples of birds/planes
Model learns: "Wings, movement, altitude matter"
Task 3 (meta-test): 5 examples of cars/trucks
Model applies: "Size, shape, wheels matter"
Generalizes knowledge from previous tasks
Metric Learning
Learn distance metric, classify by distance.
Siamese Networks
Twin networks sharing weights.
Input two images:
Network 1: Image A → representation
Network 2: Image B → representation
Compute distance: ||rep_A - rep_B||
Training: Loss = distance if same class, high distance if different
Result: Learn metric where:
- Same classes: Close representations
- Different classes: Far representations
Inference:
Support set: Few examples (compute representations)
Query image: Compute representation
Classify: Closest support example
Prototypical Networks
Create prototype (mean) for each class.
Class A: [example_1, example_2, example_3]
Prototype_A = mean(representations)
Query image: Compute representation
Distance to each prototype
Classify: Closest prototype
Advantages:
- Simple
- Works well
- Interpretable (prototype is class center)
Optimization-Based Methods
Learn how to optimize for new task.
Model-Agnostic Meta-Learning (MAML)
Learn good initialization for quick adaptation.
Process:
1. Start with parameters θ
2. On task i:
- Sample few examples
- Take one gradient step: θ' = θ - α∇loss
- Evaluate on test examples of task i
- Compute meta-gradient of test loss
3. Update θ using meta-gradient
4. Repeat on many tasks
Result: θ is initialization that adapts quickly
Intuition:
θ is "sweet spot"
From θ, one gradient step (on few examples) → good classifier
Advantage: Model-agnostic (works with any differentiable model)
Disadvantage: Computationally expensive (gradient of gradient)
Optimization-Agnostic Meta-Learning
Learn optimizer (how to update parameters).
Instead of: Use fixed optimizer (SGD, Adam)
Learn: How should parameters update given task
Result: Task-specific optimizer
Model-Based Methods
Learn model that directly ingests support set.
Memory-Augmented Networks
Model has external memory for support set.
Support set → Write to memory
Query image → Read from memory + neural network
Output: Classification
Advantage: Flexible, can store arbitrary information
Relation Networks
Learn to compare directly.
Input: Support examples + Query
Network: Learns relationship between query and supports
Output: Relation score (similarity)
Transfer Learning vs Meta-Learning
Transfer Learning:
Pretrain on large dataset (ImageNet)
Fine-tune on new task (with few examples)
Good: Leverages large dataset
Bad: Assumes similar source and target
Meta-Learning:
Meta-train on diverse tasks
Meta-test on new task (with few examples)
Good: Learns how to adapt
Bad: Requires many diverse meta-train tasks
When to Use:
- Transfer learning: Similar source and target
- Meta-learning: Very different distributions expected
Combining: Pretrain → Meta-train often best
Evaluation and Benchmarks
Standard Benchmarks
miniImageNet: 100 classes from ImageNet, 600 images/class
Omniglot: 1,623 characters from 50 languages, 20 images each
Standard Protocol:
- 5-way 5-shot, 5-way 1-shot
- 15 query images per class
- Evaluate accuracy
Realistic Evaluation
Academic benchmarks sometimes easy.
Challenges:
- Domain shift (test ≠ train distribution)
- Larger number of ways/shots
- Long-tail (few examples of rare classes)
Practical Applications
One-Shot Learning for Personalization
Learn user preferences from one interaction.
User visits product: Shows interest
System: "This user likes electronics"
Personalize: Show related electronics
Rapid Adaptation to New Data
New disease appears → Few cases documented.
Medical AI: Trained on common diseases
New disease: Few cases labeled
Few-shot learning: Adapt to new disease quickly
Active Learning
Learn what to label next.
Few labeled examples
Many unlabeled
Query: "Which examples most informative?"
Label those
Retrain with few-shot learning
Key Takeaways
✓ Few-shot learning possible – But challenging
✓ Human-like learning ideal – Learn from minimal data
✓ Meta-learning key – Learn to learn on many tasks
✓ Multiple approaches – Metric, optimization, model-based
✓ MAML popular – Optimization-based, general
✓ Transfer learning often sufficient – Simpler, works well
✓ Evaluation on realistic tasks – Benchmarks sometimes easy
✓ Computational cost high – Meta-learning expensive
✓ Active research area – Rapid improvements
✓ Practical applications real – Personalization, rapid adaptation
Frequently Asked Questions
Q: Is few-shot learning better than transfer learning?
A: Depends. Transfer learning often simpler and works well. Few-shot if expecting distribution shift.
Q: How many meta-train tasks needed?
A: Hundreds minimum. More diverse tasks → better meta-learning.
Q: Does few-shot work with 1 example?
A: Sometimes. Easier with 5. Depends on problem difficulty.
Q: Can I use few-shot for my problem?
A: If: Can create many diverse tasks for meta-training → Yes.
Q: Is MAML the best approach?
A: Often good. Metric learning sometimes simpler. Try multiple.

