# Artificial Analysis [Artificial Analysis](https://artificialanalysis.ai/) is an independent benchmarking and comparison platform for AI models and inference providers. It tracks frontier models across providers like [[OpenAI]], [[Anthropic]], [[Google]] (Gemini), Meta, DeepSeek, Mistral, and xAI; scoring each on quality, speed (tokens/sec), latency (time-to-first-token), context window, and price (input/output tokens). What makes it useful: - Side-by-side model comparisons on a single intelligence index aggregating multiple benchmarks (MMLU-Pro, GPQA, MATH, HumanEval, etc.). - Provider-level comparisons for the same model; e.g., Llama 3.1 405B served by Together vs Fireworks vs Groq, with measured throughput and price. - Coverage beyond text: image generation, image editing, video generation, speech-to-text, and text-to-speech leaderboards. - Regular API speed tests, so the data reflects real inference performance rather than vendor claims. When to use it: - Choosing a model for a specific job; balance intelligence vs latency vs cost. - Evaluating which inference provider to use for an open-weights model. - Tracking how the frontier evolves over time without manually scraping each lab. ## References - Official website: https://artificialanalysis.ai/ - Methodology: https://artificialanalysis.ai/methodology - Leaderboards: https://artificialanalysis.ai/leaderboards/models ## Related - [[Claude]] - [[ChatGPT]] - [[Gemini]] - [[OpenAI]] - [[Anthropic]]