# Groq Groq is an AI hardware-and-cloud company founded in 2016 by Jonathan Ross, the engineer who originally led Google's TPU project. Its differentiator is custom silicon called the **LPU (Language Processing Unit)** ; a deterministic, sequential-execution architecture purpose-built for transformer inference, not training. The marketable consequence: Groq routinely runs open models like Llama, Mixtral, Qwen, and DeepSeek at **hundreds to thousands of tokens per second** ; an order of magnitude faster than the same models on GPUs at comparable batch sizes. For applications where latency is the user-experience bottleneck (voice agents, real-time copilots, agent fan-out), that gap is the product. ## What's Not Groq is **not a model provider** in the sense of training its own frontier models. It hosts open-weight models on its hardware and exposes them via a standard OpenAI-compatible API at **GroqCloud**. No Groq-1, no Groq-2 ; the LPU is the moat, not the weights. ## Why The Speed Matters Tokens-per-second is not just a benchmark vanity metric: - **Voice / real-time UX**: sub-second turn latency starts to feel like a phone call instead of a chat - **Agent fan-out**: when a sub-agent or [[Cook (Orchestration CLI)|race / pick]] pattern spawns N parallel calls, Groq lets you finish in the time one slow call would take elsewhere - **Tool-use loops**: agents often round-trip the model many times per task; halving each round-trip compounds quickly ## Models On GroqCloud At any given time the catalogue includes: - Llama family (Meta) - Qwen family (Alibaba) - Mixtral / Mistral - DeepSeek - GPT-OSS - Various smaller models for embedding, moderation, and structured tasks The exact list rotates frequently as open-weight releases happen ; the Groq docs are the source of truth. ## API & Integration - **OpenAI-compatible** ; pointing the OpenAI Python/TS SDK at `https://api.groq.com/openai/v1` Just Works - **First-class in the [[Vercel AI SDK]]** via `@ai-sdk/groq` - **Supported by [[Cook AI Agent]]** as one of its provider flags - **[[Vercel AI Gateway]]** can proxy to Groq alongside other providers for unified observability ## Pricing & Free Tier A relatively generous free tier (rate-limited per model) is available; paid tiers are typically billed per million input/output tokens, often **lower than the same open model hosted elsewhere**. Combined with the speed, Groq is a strong choice for cost-conscious agent infrastructure. ## Trade-offs - **No frontier-class model**: when you need state-of-the-art reasoning, Anthropic's Opus or OpenAI's flagship still win on quality - **Open-weight only**: if you need a closed model (Claude, GPT-5.x), Groq isn't the path - **Rate-limit ceilings**: high-throughput workloads may bump caps; production users benefit from explicit capacity arrangements ## Naming Confusion Not to be confused with **xAI's "Grok"** chatbot. Groq (with a `q`) is the silicon company; Grok is Elon Musk's model. Both are AI-adjacent, both 4 letters, completely unrelated. ## References - Website: https://groq.com/ - GroqCloud console: https://console.groq.com/ - Documentation: https://console.groq.com/docs - API reference: https://console.groq.com/docs/api-reference ## Related - [[OpenAI]] - [[Anthropic]] - [[Vercel AI SDK]] - [[Vercel AI Gateway]] - [[Cook AI Agent]]