colored-dye's blog

avatar.jpg

baoyuntai [at] outlook [dot] com

This is Yuntai Bao, a third-year PhD candidate at School of Software Technology, Zhejiang University, advised by Xuhong Zhang. I’m expected to graduate in 2028. My research interest includes mechanistic interpretability (mech interp), AI safety, neural network learning dynamics as well as general principles of ML systems. I have experiences in steering vectors, model probes and training data attribution.

Currently, I am committed to pragmatic interpretability in order to enable effective and efficient (compute & data) model control via theoretical/empirical insights from mech interp. Beyond interpretability, I am also working on LLM post-training including RL, knowledge distillation and LLM-based agents. I also have experiences in cryptography and software/OS security.

Please feel free to reach out~


selected publications

(* indicates equal contribution)
  1. Towards Steering without Sacrifice: Principled Training of Steering Vectors for Prompt-only Interventions
    Yuntai Bao, Qinfeng Li, Xinyan Yu, Xuhong Zhang, Ge Su, Wenqi Zhang, Liu Yan, Haiqin Weng, and Jianwei Yin
    In Forty-third International Conference on Machine Learning, 2026
  2. PragLocker: Protecting Agent Intellectual Property in Untrusted Deployments via Non-Portable Prompts
    Qinfeng Li*, Yuntai Bao*, Jianghui Hu*, Wenqi Zhang, Jintao Chen, Huifeng Zhu, Yier Jin, and Xuhong Zhang
    In Forty-third International Conference on Machine Learning, 2026
  3. Faithful Bi-Directional Model Steering via Distribution Matching and Distributed Interchange Interventions
    Yuntai Bao, Xuhong Zhang, Jintao Chen, Ge Su, Yuxiang Cai, Hao Peng, Bing Sun, Haiqin Weng, Liu Yan, and Jianwei Yin
    2026
  4. Scalable Multi-Stage Influence Function for Large Language Models via Eigenvalue-Corrected Kronecker-Factored Parameterization
    Yuntai Bao, Xuhong Zhang, Tianyu Du, Xinkui Zhao, Jiang Zong, Hao Peng, and Jianwei Yin
    In Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence, IJCAI-25, Aug 2025
    Main Track
  5. Probing the Geometry of Truth: Consistency and Generalization of Truth Directions in LLMs Across Logical Transformations and Question Answering Tasks
    Yuntai Bao, Xuhong Zhang, Tianyu Du, Xinkui Zhao, Zhengwen Feng, Hao Peng, and Jianwei Yin
    In Findings of the Association for Computational Linguistics: ACL 2025, Jul 2025