Convolutional Neural Networks (CNNs)

# Convolutional Neural Networks (CNNs) Convolutional Neural Networks are a class of [[Deep Learning]] models specifically designed to process data with a grid-like topology, such as images. Instead of connecting every neuron to every input (as in standard [[Neural Networks (NNs)]]), CNNs use convolutional layers that apply small learnable filters (kernels) across the input, detecting local patterns like edges, textures, and shapes. This weight-sharing approach drastically reduces the number of parameters compared to fully connected networks, making CNNs both more efficient and more effective at spatial tasks. ## Core Components | Layer Type | Purpose | |------------|---------| | **Convolutional Layer** | Applies filters to detect local features (edges, corners, textures) | | **Pooling Layer** | Downsamples feature maps to reduce spatial dimensions and computation | | **Fully Connected Layer** | Maps learned features to output classes or values | | **Activation** | Non-linearity (typically ReLU) between layers | ## How Convolution Works A filter (e.g., 3x3 matrix) slides across the input image. At each position, it computes a dot product between the filter weights and the input patch, producing a feature map. Multiple filters per layer detect different features. Early layers capture low-level features (edges, gradients); deeper layers combine those into higher-level patterns (eyes, wheels, letters). ## Key Properties - **Local connectivity**: each neuron connects only to a small region of the input - **Weight sharing**: the same filter is applied across the entire input, enforcing translation invariance - **Hierarchical feature learning**: layers build increasingly abstract representations automatically, eliminating manual feature engineering ## Landmark Architectures | Architecture | Year | Key Innovation | |--------------|------|----------------| | **LeNet-5** | 1998 | First practical CNN ([[Yann LeCun]]), handwritten digit recognition | | **AlexNet** | 2012 | Deep CNN + GPU training, won ImageNet ([[Geoffrey Hinton]]'s team) | | **VGGNet** | 2014 | Very deep (16-19 layers), uniform 3x3 filters | | **GoogLeNet/Inception** | 2014 | Multi-scale filters in parallel (inception modules) | | **ResNet** | 2015 | Residual connections enabling 100+ layer networks | | **EfficientNet** | 2019 | Compound scaling of depth, width, resolution | ## Applications - [[Computer Vision (CV)]]: image classification, object detection, segmentation - [[Natural Language Processing (NLP)]]: text classification (1D convolutions) - Medical imaging: tumor detection, retinal scan analysis - Autonomous driving: scene understanding, lane detection - Audio: spectrogram-based speech and music processing ## References - LeCun et al., "Gradient-based learning applied to document recognition" (1998) - Krizhevsky et al., "ImageNet Classification with Deep Convolutional Neural Networks" (2012) - https://en.wikipedia.org/wiki/Convolutional_neural_network ## Related - [[Deep Learning]] - [[Neural Networks (NNs)]] - [[Computer Vision (CV)]] - [[Machine Learning (ML)]] - [[Backpropagation]] - [[Activation Functions]] - [[Supervised Learning (SL)]] - [[Yann LeCun]] - [[Geoffrey Hinton]]