colored-dye's blog

Interpreting the minds of machines and myself

Training Prompt-only Steering Vectors in a Principled Manner

our recent work on prompt-only SV and SV training dynamics.

13 min read · 2026

Concept Distributed Alignment Search for Faithful Representation Steering

discussions regarding our recent work on faithful representation steering.

20 min read · 2026

On-Policy Distillation

an informal review of on-policy distillation.

1 min read · May 28, 2026

2026 · RL OPD LLM · tech
Rubric-based Rewards in Reinforcement Learning

an informal review of RL with rubrics as rewards.

26 min read · May 14, 2026

2026 · RL Rubrics LLM · tech
Training Prompt-only Steering Vectors in a Principled Manner

our recent work on prompt-only SV and SV training dynamics.

13 min read · May 03, 2026

2026 · steering LLM · tech
Claude Mythos Preview System Card

system card of Claude Mythos Preview

1 min read · April 13, 2026

2026 · LLM · tech
A Personal Review of *PO Algorithms for Reasoning and Agentic Use (on-going)

an informal review of Policy Optimization or Preference Optimization algorithms for LLM reasoning/agentic capabilities.

6 min read · February 26, 2026

2026 · RL Agent LLM · tech