# Goku Goku is a family of joint image-and-video generation foundation models built on rectified flow [[Transformers]]. It handles text-to-image (T2I), text-to-video (T2V), and image-to-video (I2V) generation within a single unified architecture. Accepted as a CVPR 2025 Highlight paper. Joint effort between The University of Hong Kong (HKU) and ByteDance. 22 authors, led by Shoufa Chen and Chongjian Ge. ## Model Variants | Model | Layers | Dimensions | Heads | Purpose | |-------|--------|-----------|-------|---------| | Goku-1B | 28 | 1152 | 16 | Pilot experiments | | Goku-2B | 28 | 1792 | 28 | Production (efficiency/quality balance) | | Goku-8B | 40 | 3072 | 48 | Production (highest quality) | ## Architecture - **3D Joint Image-Video VAE**: compresses images and videos into a shared latent space (8x8x4 compression for video, 8x8 for images) - **Rectified Flow Transformer**: uses rectified flow (linear interpolation between noise and data) instead of DDPM diffusion; converges ~2.5x faster - **Full attention mechanism**: plain full attention across all image/video tokens (no factored temporal+spatial); uses FlashAttention - **Patch n' Pack** (from NaViT): packs images and videos of varying aspect ratios/lengths into single minibatches - **3D RoPE Position Embedding**: handles variable resolutions and video lengths - **Text encoder**: Flan-T5 ## Training - 3-stage progressive training: T2I pretraining, joint image-video learning with cascaded resolution (288x512 -> 480x864 -> 720x1280), modality-specific finetuning - Trained on ~160M image-text pairs and ~36M video-text pairs - Infrastructure: 3D parallelism across 1000+ GPUs ## Performance - **VBench** (video): 84.85 total score. Surpasses HunyuanVideo, Gen-3, Kling, CogVideoX - **GenEval** (image): 0.76 with prompt rewriting. Outperforms [[Stable Diffusion]] 3 (0.74), DALL-E 3 (0.67) - **Goku+**: extended variant for advertising; generates marketing avatars, product-to-video clips. Claims 100x cost reduction for ad video creation ## References - https://saiyan-world.github.io/goku/ - https://arxiv.org/abs/2502.04896 - https://github.com/Saiyan-World/goku ## Related - [[Sora]] - [[FLUX.1]] - [[Stable Diffusion]] - [[Transformers]]