Generative AI (Gen AI) - DeveloPassion

# Generative AI (Gen AI) Generative AI (GenAI) is a category of [[Artificial Intelligence (AI)]] systems that create new content—text, images, audio, video, or code—rather than simply analyzing or classifying existing data. Unlike discriminative models that distinguish between categories, generative models learn the underlying distribution of training data and can produce novel outputs that resemble it. The field exploded into mainstream awareness with ChatGPT's launch in November 2022. GenAI is built on [[Deep Learning]] architectures, primarily [[Transformers]] (for text) and [[Diffusion Models]] (for images). [[Large Language Models (LLMs)]] like GPT-4, Claude, and Gemini power text generation, while systems like DALL-E, Midjourney, and Stable Diffusion generate images. The technology is transforming creative work, software development, customer service, and knowledge work, while raising concerns about misinformation, copyright, and job displacement. ## Types of Generative AI | Type | Examples | Output | |------|----------|--------| | **Text** | GPT-4, Claude, Gemini, LLaMA | Writing, code, conversation | | **Image** | DALL-E, Midjourney, Stable Diffusion | Art, photos, designs | | **Audio** | ElevenLabs, MusicLM, Suno | Speech, music | | **Video** | Sora, Runway, Pika | Video clips, animations | | **Code** | GitHub Copilot, Cursor | Programming assistance | | **3D** | Point-E, GET3D | 3D models | ## Key Architectures | Architecture | Year | Use Case | |--------------|------|----------| | **GANs** | 2014 | Image generation (StyleGAN) | | **VAEs** | 2013 | Image generation, latent spaces | | **Transformers** | 2017 | Text, multimodal | | **Diffusion** | 2020 | High-quality images | ## How LLMs Work ``` Input: "The capital of France is" ↓ Tokenization ↓ Transformer layers (attention) ↓ Probability distribution over next tokens ↓ Output: "Paris" (highest probability) ``` ## Timeline of GenAI | Year | Milestone | |------|-----------| | **2014** | GANs introduced by Ian Goodfellow | | **2017** | Transformer architecture | | **2018** | GPT-1 (117M parameters) | | **2020** | GPT-3 (175B parameters) | | **2021** | DALL-E, Codex | | **2022** | ChatGPT, Stable Diffusion | | **2023** | GPT-4, Claude, Midjourney v5 | | **2024** | Sora (video), Claude 3, multimodal models | ## Applications | Domain | Applications | |--------|--------------| | **Writing** | Drafting, editing, summarization | | **Coding** | Autocomplete, debugging, explanation | | **Design** | Concept art, prototyping, variations | | **Marketing** | Copy, ads, personalization | | **Customer Service** | Chatbots, support automation | | **Education** | Tutoring, content creation | | **Research** | Literature review, hypothesis generation | ## Concerns and Risks - **Misinformation**: Generating fake news, deepfakes - **Copyright**: Training on copyrighted material - **Job displacement**: Automating creative work - **Bias**: Reflecting training data biases - **Hallucinations**: Confidently generating false information - **Security**: Generating malware, phishing See also: [[Generative AI Risks]] ## References - https://en.wikipedia.org/wiki/Generative_artificial_intelligence - https://openai.com - https://www.anthropic.com ## Related - [[Generative AI Risks]] - [[Neural Networks and Deep Learning]] - [[Deep Learning]] - [[Large Language Models (LLMs)]] - [[Natural Language Processing (NLP)]] - [[Machine Learning (ML)]] - [[ChatGPT]] - [[Claude]]