-
On-Policy Distillation
an informal review of on-policy distillation.
-
Rubric-based Rewards in Reinforcement Learning
an informal review of RL with rubrics as rewards.
-
Training Prompt-only Steering Vectors in a Principled Manner
our recent work on prompt-only SV and SV training dynamics.
-
Claude Mythos Preview System Card
system card of Claude Mythos Preview
-
A Personal Review of *PO Algorithms for Reasoning and Agentic Use (on-going)
an informal review of Policy Optimization or Preference Optimization algorithms for LLM reasoning/agentic capabilities.