# Tinker Tinker is a managed fine-tuning and reinforcement learning API from Mira Murati's **Thinking Machines Lab**. The pitch is "full algorithmic control over training, none of the infrastructure tax." It is one of the two primary trainers integrated with [[Atropos]] (the other being [[Axolotl]]). ## What it actually exposes A small, deliberate API surface; four primitives that compose into any fine-tuning or RL workflow: - `forward_backward`; runs forward + backward, accumulates gradients. - `optim_step`; applies the gradient update. - `sample`; generates tokens for inference, evaluation, or RL action selection. - `save_state`; checkpoint training progress. Everything else (data loading, logging, environments, RL algorithms) is user code on top. The platform owns infrastructure; the researcher owns the algorithm. ## How fine-tuning happens Tinker uses **LoRA** (Low-Rank Adaptation); train a streamlined adapter rather than updating all base-model weights. Cheaper, faster, and quality competitive with full fine-tuning for most adaptation tasks. ## Supported models - **Qwen** (4B to 397B parameters) - **Meta Llama** (1B to 70B) - **DeepSeek** - **Moonshot Kimi** - **NVIDIA Nemotron** - **OpenAI GPT-OSS** A reasonable cross-section of the open-weight frontier as of early 2026. ## Pricing shape Token-based, per-operation: - Prefill, sampling, and training metered separately, $0.03 to $12.81 per million tokens depending on model size and operation. - Storage: $0.10 / GB-month for adapter checkpoints. The model-size scaling is steep; small models are dirt cheap, frontier-sized open-weights cost real money even with LoRA. ## Where it fits - **Researchers running RL experiments** without owning a cluster; the natural fit. - **Startups fine-tuning open-weight bases** on proprietary data without standing up the training stack. - **Atropos integration**; [[Atropos]] (Nous Research's environment microservice framework) lists Tinker as a first-class trainer; you build environments in Atropos, run the actual training on Tinker. ## Why it matters The 2026 trend is **training compute moving down-market** through managed APIs and LoRA. Tinker is the cleanest example: full algorithmic flexibility, no infrastructure burden, pay-as-you-go. Combined with environment frameworks like Atropos and trainer-agnostic libraries, it makes serious post-training accessible to teams that previously could not justify the GPU spend. ## References - Product page: https://thinkingmachines.ai/tinker/ ## Related - [[Atropos]] - [[Axolotl]] - [[AI Fine-Tuning]] - [[Low Rank Adapter (LoRA)]] - [[AI Open Weight Models]] - [[Reinforcement Learning From Human Feedback (RLHF)]] - [[Nous Research]]