# Artificial Analysis
[Artificial Analysis](https://artificialanalysis.ai/) is an independent benchmarking and comparison platform for AI models and inference providers. It tracks frontier models across providers like [[OpenAI]], [[Anthropic]], [[Google]] (Gemini), Meta, DeepSeek, Mistral, and xAI; scoring each on quality, speed (tokens/sec), latency (time-to-first-token), context window, and price (input/output tokens).
What makes it useful:
- Side-by-side model comparisons on a single intelligence index aggregating multiple benchmarks (MMLU-Pro, GPQA, MATH, HumanEval, etc.).
- Provider-level comparisons for the same model; e.g., Llama 3.1 405B served by Together vs Fireworks vs Groq, with measured throughput and price.
- Coverage beyond text: image generation, image editing, video generation, speech-to-text, and text-to-speech leaderboards.
- Regular API speed tests, so the data reflects real inference performance rather than vendor claims.
When to use it:
- Choosing a model for a specific job; balance intelligence vs latency vs cost.
- Evaluating which inference provider to use for an open-weights model.
- Tracking how the frontier evolves over time without manually scraping each lab.
## References
- Official website: https://artificialanalysis.ai/
- Methodology: https://artificialanalysis.ai/methodology
- Leaderboards: https://artificialanalysis.ai/leaderboards/models
## Related
- [[Claude]]
- [[ChatGPT]]
- [[Gemini]]
- [[OpenAI]]
- [[Anthropic]]