# Atropos
Atropos is [[Nous Research]]'s open-source environment microservice framework for asynchronous reinforcement learning with [[Large Language Models (LLMs)]]. It is the training-side complement to Nous's product line; while [[Hermes]] (the model) and [[Hermes Agent]] (the harness) are what users touch, Atropos is what produces the training signal that improves them.
The pitch is a structural one. Building bespoke RL infrastructure for LLMs is a significant engineering tax; Atropos turns each environment into a microservice connected to a shared trajectory API, and lets researchers compose training runs from many environments at once. It is to LLM RL roughly what Gym was to classical RL.
## What it covers
Environment categories supported:
- **Dataset-based evaluation**; GSM8K, MMLU, etc.
- **Interactive games**; Blackjack, text adventures.
- **Human feedback alignment**; RLHF / RLAIF pipelines.
- **Multi-turn interactions**; conversational eval, tool-call rollouts.
- **Code execution tasks**; programmatic checking of generated code.
- **Multimodal**; mixed-modality trajectories.
The framework owns the data pipeline from generation through scoring and stores trajectories in a shared store consumable by trainers.
## Real-world results
Nous reports concrete deltas from Atropos-trained models:
- **4.6x improvement** in parallel task handling for tool-calling vs baseline.
- **2.5x improvement** in financial-prediction accuracy vs baseline.
These come from Nous's own pipelines; treat as directional rather than universal.
## Where it fits
- Used internally for training Hermes models and improving [[Hermes Agent]] skills.
- Integrates with trainer platforms; **Axolotl** and **Tinker** are first-class.
- Public via the `atroposlib` package, which is a runtime dependency of `hermes-agent` for its RL training integration.
## Why it matters
The 2026 frontier story is increasingly that small open-weight models close the gap with closed-weight ones via better RL post-training. Atropos is the open-source plumbing that makes that approach reproducible outside the frontier labs. The `Hermes Agent → trajectories → Atropos → improved Hermes` loop is the template for how an organization can self-improve a model from product usage.
## References
- Repository: https://github.com/NousResearch/atropos
- Nous Research: https://nousresearch.com/
## Related
- [[Nous Research]]
- [[Hermes]]
- [[Hermes Agent]]
- [[Reinforcement Learning From Human Feedback (RLHF)]]
- [[Large Language Models (LLMs)]]
- [[AI Open Weight Models]]
- [[AI Fine-Tuning]]
- [[AI Training Data Collection]]