# Computer Vision (CV)
Computer Vision is the field of [[Artificial Intelligence (AI)]] that enables machines to interpret and understand visual information from the world: images, videos, and 3D data. The goal is to automate tasks that the human visual system can do, from recognizing objects to understanding scenes and generating visual content.
Before [[Deep Learning]], CV relied heavily on hand-crafted features (SIFT, HOG, Haar cascades) and classical techniques (edge detection, template matching). The 2012 ImageNet breakthrough with AlexNet proved that [[Convolutional Neural Networks (CNNs)]] could learn features directly from data, largely replacing manual feature engineering and becoming the dominant approach.
## Core Tasks
| Task | Description |
|------|-------------|
| **Image Classification** | Assign a label to an entire image |
| **Object Detection** | Locate and classify multiple objects with bounding boxes |
| **Semantic Segmentation** | Classify every pixel in an image |
| **Instance Segmentation** | Distinguish individual object instances at pixel level |
| **Pose Estimation** | Detect body/object keypoints and orientation |
| **Image Generation** | Create new images from noise or text prompts |
| **Optical Character Recognition** | Extract text from images |
| **Depth Estimation** | Infer 3D structure from 2D images |
## Key Techniques
- **CNNs**: backbone for most vision tasks ([[Convolutional Neural Networks (CNNs)]])
- **Vision Transformers (ViT)**: apply the [[Transformers]] architecture to image patches, competitive with CNNs since 2020
- **[[Generative AI (Gen AI)]]**: diffusion models (Stable Diffusion, DALL-E) and GANs for image synthesis
- **Multimodal models**: CLIP, GPT-4V combine vision with [[Natural Language Processing (NLP)]]
## Applications
- Autonomous driving: perception, lane detection, obstacle avoidance
- Medical imaging: tumor detection, retinal scans, pathology
- Manufacturing: defect inspection, quality control
- Surveillance and security: face recognition, anomaly detection
- Agriculture: crop monitoring, disease detection
- Retail: visual search, checkout-free stores
## References
- Szeliski, *Computer Vision: Algorithms and Applications* (2022)
- https://en.wikipedia.org/wiki/Computer_vision
## Related
- [[Artificial Intelligence (AI)]]
- [[Deep Learning]]
- [[Convolutional Neural Networks (CNNs)]]
- [[Machine Learning (ML)]]
- [[Natural Language Processing (NLP)]]
- [[Generative AI (Gen AI)]]
- [[Transformers]]