# Computer Vision (CV) Computer Vision is the field of [[Artificial Intelligence (AI)]] that enables machines to interpret and understand visual information from the world: images, videos, and 3D data. The goal is to automate tasks that the human visual system can do, from recognizing objects to understanding scenes and generating visual content. Before [[Deep Learning]], CV relied heavily on hand-crafted features (SIFT, HOG, Haar cascades) and classical techniques (edge detection, template matching). The 2012 ImageNet breakthrough with AlexNet proved that [[Convolutional Neural Networks (CNNs)]] could learn features directly from data, largely replacing manual feature engineering and becoming the dominant approach. ## Core Tasks | Task | Description | |------|-------------| | **Image Classification** | Assign a label to an entire image | | **Object Detection** | Locate and classify multiple objects with bounding boxes | | **Semantic Segmentation** | Classify every pixel in an image | | **Instance Segmentation** | Distinguish individual object instances at pixel level | | **Pose Estimation** | Detect body/object keypoints and orientation | | **Image Generation** | Create new images from noise or text prompts | | **Optical Character Recognition** | Extract text from images | | **Depth Estimation** | Infer 3D structure from 2D images | ## Key Techniques - **CNNs**: backbone for most vision tasks ([[Convolutional Neural Networks (CNNs)]]) - **Vision Transformers (ViT)**: apply the [[Transformers]] architecture to image patches, competitive with CNNs since 2020 - **[[Generative AI (Gen AI)]]**: diffusion models (Stable Diffusion, DALL-E) and GANs for image synthesis - **Multimodal models**: CLIP, GPT-4V combine vision with [[Natural Language Processing (NLP)]] ## Applications - Autonomous driving: perception, lane detection, obstacle avoidance - Medical imaging: tumor detection, retinal scans, pathology - Manufacturing: defect inspection, quality control - Surveillance and security: face recognition, anomaly detection - Agriculture: crop monitoring, disease detection - Retail: visual search, checkout-free stores ## References - Szeliski, *Computer Vision: Algorithms and Applications* (2022) - https://en.wikipedia.org/wiki/Computer_vision ## Related - [[Artificial Intelligence (AI)]] - [[Deep Learning]] - [[Convolutional Neural Networks (CNNs)]] - [[Machine Learning (ML)]] - [[Natural Language Processing (NLP)]] - [[Generative AI (Gen AI)]] - [[Transformers]]