# Groq
Groq is an AI hardware-and-cloud company founded in 2016 by Jonathan Ross, the engineer who originally led Google's TPU project. Its differentiator is custom silicon called the **LPU (Language Processing Unit)** ; a deterministic, sequential-execution architecture purpose-built for transformer inference, not training.
The marketable consequence: Groq routinely runs open models like Llama, Mixtral, Qwen, and DeepSeek at **hundreds to thousands of tokens per second** ; an order of magnitude faster than the same models on GPUs at comparable batch sizes. For applications where latency is the user-experience bottleneck (voice agents, real-time copilots, agent fan-out), that gap is the product.
## What's Not
Groq is **not a model provider** in the sense of training its own frontier models. It hosts open-weight models on its hardware and exposes them via a standard OpenAI-compatible API at **GroqCloud**. No Groq-1, no Groq-2 ; the LPU is the moat, not the weights.
## Why The Speed Matters
Tokens-per-second is not just a benchmark vanity metric:
- **Voice / real-time UX**: sub-second turn latency starts to feel like a phone call instead of a chat
- **Agent fan-out**: when a sub-agent or [[Cook (Orchestration CLI)|race / pick]] pattern spawns N parallel calls, Groq lets you finish in the time one slow call would take elsewhere
- **Tool-use loops**: agents often round-trip the model many times per task; halving each round-trip compounds quickly
## Models On GroqCloud
At any given time the catalogue includes:
- Llama family (Meta)
- Qwen family (Alibaba)
- Mixtral / Mistral
- DeepSeek
- GPT-OSS
- Various smaller models for embedding, moderation, and structured tasks
The exact list rotates frequently as open-weight releases happen ; the Groq docs are the source of truth.
## API & Integration
- **OpenAI-compatible** ; pointing the OpenAI Python/TS SDK at `https://api.groq.com/openai/v1` Just Works
- **First-class in the [[Vercel AI SDK]]** via `@ai-sdk/groq`
- **Supported by [[Cook AI Agent]]** as one of its provider flags
- **[[Vercel AI Gateway]]** can proxy to Groq alongside other providers for unified observability
## Pricing & Free Tier
A relatively generous free tier (rate-limited per model) is available; paid tiers are typically billed per million input/output tokens, often **lower than the same open model hosted elsewhere**. Combined with the speed, Groq is a strong choice for cost-conscious agent infrastructure.
## Trade-offs
- **No frontier-class model**: when you need state-of-the-art reasoning, Anthropic's Opus or OpenAI's flagship still win on quality
- **Open-weight only**: if you need a closed model (Claude, GPT-5.x), Groq isn't the path
- **Rate-limit ceilings**: high-throughput workloads may bump caps; production users benefit from explicit capacity arrangements
## Naming Confusion
Not to be confused with **xAI's "Grok"** chatbot. Groq (with a `q`) is the silicon company; Grok is Elon Musk's model. Both are AI-adjacent, both 4 letters, completely unrelated.
## References
- Website: https://groq.com/
- GroqCloud console: https://console.groq.com/
- Documentation: https://console.groq.com/docs
- API reference: https://console.groq.com/docs/api-reference
## Related
- [[OpenAI]]
- [[Anthropic]]
- [[Vercel AI SDK]]
- [[Vercel AI Gateway]]
- [[Cook AI Agent]]