LLM | colored-dye's blog

May 28, 2026	On-Policy Distillation
May 14, 2026	Rubric-based Rewards in Reinforcement Learning
May 03, 2026	Training Prompt-only Steering Vectors in a Principled Manner
Apr 13, 2026	Claude Mythos Preview System Card
Feb 26, 2026	A Personal Review of *PO Algorithms for Reasoning and Agentic Use (on-going)
Feb 18, 2026	Concept Distributed Alignment Search for Faithful Representation Steering