# Goku
Goku is a family of joint image-and-video generation foundation models built on rectified flow [[Transformers]]. It handles text-to-image (T2I), text-to-video (T2V), and image-to-video (I2V) generation within a single unified architecture. Accepted as a CVPR 2025 Highlight paper.
Joint effort between The University of Hong Kong (HKU) and ByteDance. 22 authors, led by Shoufa Chen and Chongjian Ge.
## Model Variants
| Model | Layers | Dimensions | Heads | Purpose |
|-------|--------|-----------|-------|---------|
| Goku-1B | 28 | 1152 | 16 | Pilot experiments |
| Goku-2B | 28 | 1792 | 28 | Production (efficiency/quality balance) |
| Goku-8B | 40 | 3072 | 48 | Production (highest quality) |
## Architecture
- **3D Joint Image-Video VAE**: compresses images and videos into a shared latent space (8x8x4 compression for video, 8x8 for images)
- **Rectified Flow Transformer**: uses rectified flow (linear interpolation between noise and data) instead of DDPM diffusion; converges ~2.5x faster
- **Full attention mechanism**: plain full attention across all image/video tokens (no factored temporal+spatial); uses FlashAttention
- **Patch n' Pack** (from NaViT): packs images and videos of varying aspect ratios/lengths into single minibatches
- **3D RoPE Position Embedding**: handles variable resolutions and video lengths
- **Text encoder**: Flan-T5
## Training
- 3-stage progressive training: T2I pretraining, joint image-video learning with cascaded resolution (288x512 -> 480x864 -> 720x1280), modality-specific finetuning
- Trained on ~160M image-text pairs and ~36M video-text pairs
- Infrastructure: 3D parallelism across 1000+ GPUs
## Performance
- **VBench** (video): 84.85 total score. Surpasses HunyuanVideo, Gen-3, Kling, CogVideoX
- **GenEval** (image): 0.76 with prompt rewriting. Outperforms [[Stable Diffusion]] 3 (0.74), DALL-E 3 (0.67)
- **Goku+**: extended variant for advertising; generates marketing avatars, product-to-video clips. Claims 100x cost reduction for ad video creation
## References
- https://saiyan-world.github.io/goku/
- https://arxiv.org/abs/2502.04896
- https://github.com/Saiyan-World/goku
## Related
- [[Sora]]
- [[FLUX.1]]
- [[Stable Diffusion]]
- [[Transformers]]