# Tinker
Tinker is a managed fine-tuning and reinforcement learning API from Mira Murati's **Thinking Machines Lab**. The pitch is "full algorithmic control over training, none of the infrastructure tax." It is one of the two primary trainers integrated with [[Atropos]] (the other being [[Axolotl]]).
## What it actually exposes
A small, deliberate API surface; four primitives that compose into any fine-tuning or RL workflow:
- `forward_backward`; runs forward + backward, accumulates gradients.
- `optim_step`; applies the gradient update.
- `sample`; generates tokens for inference, evaluation, or RL action selection.
- `save_state`; checkpoint training progress.
Everything else (data loading, logging, environments, RL algorithms) is user code on top. The platform owns infrastructure; the researcher owns the algorithm.
## How fine-tuning happens
Tinker uses **LoRA** (Low-Rank Adaptation); train a streamlined adapter rather than updating all base-model weights. Cheaper, faster, and quality competitive with full fine-tuning for most adaptation tasks.
## Supported models
- **Qwen** (4B to 397B parameters)
- **Meta Llama** (1B to 70B)
- **DeepSeek**
- **Moonshot Kimi**
- **NVIDIA Nemotron**
- **OpenAI GPT-OSS**
A reasonable cross-section of the open-weight frontier as of early 2026.
## Pricing shape
Token-based, per-operation:
- Prefill, sampling, and training metered separately, $0.03 to $12.81 per million tokens depending on model size and operation.
- Storage: $0.10 / GB-month for adapter checkpoints.
The model-size scaling is steep; small models are dirt cheap, frontier-sized open-weights cost real money even with LoRA.
## Where it fits
- **Researchers running RL experiments** without owning a cluster; the natural fit.
- **Startups fine-tuning open-weight bases** on proprietary data without standing up the training stack.
- **Atropos integration**; [[Atropos]] (Nous Research's environment microservice framework) lists Tinker as a first-class trainer; you build environments in Atropos, run the actual training on Tinker.
## Why it matters
The 2026 trend is **training compute moving down-market** through managed APIs and LoRA. Tinker is the cleanest example: full algorithmic flexibility, no infrastructure burden, pay-as-you-go. Combined with environment frameworks like Atropos and trainer-agnostic libraries, it makes serious post-training accessible to teams that previously could not justify the GPU spend.
## References
- Product page: https://thinkingmachines.ai/tinker/
## Related
- [[Atropos]]
- [[Axolotl]]
- [[AI Fine-Tuning]]
- [[Low Rank Adapter (LoRA)]]
- [[AI Open Weight Models]]
- [[Reinforcement Learning From Human Feedback (RLHF)]]
- [[Nous Research]]