# Convolutional Neural Networks (CNNs)
Convolutional Neural Networks are a class of [[Deep Learning]] models specifically designed to process data with a grid-like topology, such as images. Instead of connecting every neuron to every input (as in standard [[Neural Networks (NNs)]]), CNNs use convolutional layers that apply small learnable filters (kernels) across the input, detecting local patterns like edges, textures, and shapes. This weight-sharing approach drastically reduces the number of parameters compared to fully connected networks, making CNNs both more efficient and more effective at spatial tasks.
## Core Components
| Layer Type | Purpose |
|------------|---------|
| **Convolutional Layer** | Applies filters to detect local features (edges, corners, textures) |
| **Pooling Layer** | Downsamples feature maps to reduce spatial dimensions and computation |
| **Fully Connected Layer** | Maps learned features to output classes or values |
| **Activation** | Non-linearity (typically ReLU) between layers |
## How Convolution Works
A filter (e.g., 3x3 matrix) slides across the input image. At each position, it computes a dot product between the filter weights and the input patch, producing a feature map. Multiple filters per layer detect different features. Early layers capture low-level features (edges, gradients); deeper layers combine those into higher-level patterns (eyes, wheels, letters).
## Key Properties
- **Local connectivity**: each neuron connects only to a small region of the input
- **Weight sharing**: the same filter is applied across the entire input, enforcing translation invariance
- **Hierarchical feature learning**: layers build increasingly abstract representations automatically, eliminating manual feature engineering
## Landmark Architectures
| Architecture | Year | Key Innovation |
|--------------|------|----------------|
| **LeNet-5** | 1998 | First practical CNN ([[Yann LeCun]]), handwritten digit recognition |
| **AlexNet** | 2012 | Deep CNN + GPU training, won ImageNet ([[Geoffrey Hinton]]'s team) |
| **VGGNet** | 2014 | Very deep (16-19 layers), uniform 3x3 filters |
| **GoogLeNet/Inception** | 2014 | Multi-scale filters in parallel (inception modules) |
| **ResNet** | 2015 | Residual connections enabling 100+ layer networks |
| **EfficientNet** | 2019 | Compound scaling of depth, width, resolution |
## Applications
- [[Computer Vision (CV)]]: image classification, object detection, segmentation
- [[Natural Language Processing (NLP)]]: text classification (1D convolutions)
- Medical imaging: tumor detection, retinal scan analysis
- Autonomous driving: scene understanding, lane detection
- Audio: spectrogram-based speech and music processing
## References
- LeCun et al., "Gradient-based learning applied to document recognition" (1998)
- Krizhevsky et al., "ImageNet Classification with Deep Convolutional Neural Networks" (2012)
- https://en.wikipedia.org/wiki/Convolutional_neural_network
## Related
- [[Deep Learning]]
- [[Neural Networks (NNs)]]
- [[Computer Vision (CV)]]
- [[Machine Learning (ML)]]
- [[Backpropagation]]
- [[Activation Functions]]
- [[Supervised Learning (SL)]]
- [[Yann LeCun]]
- [[Geoffrey Hinton]]