colored-dye's blog

avatar.jpg

I am Yuntai Bao, a third-year doctoral student at School of Software Technology, Zhejiang University, advised by Xuhong Zhang. My research interest includes mechanistic interpretability (mech interp), AI safety, and general principles of ML systems. I have experiences in training data attribution, model probes and steering vectors. Currently, I am committed to pragmatic interpretability–to enable effective and efficient model control via mech interp.

Please feel free to reach out~

selected publications

  1. Faithful Bi-Directional Model Steering via Distribution Matching and Distributed Interchange Interventions
    Yuntai Bao, Xuhong Zhang, Jintao Chen, Ge Su, Yuxiang Cai, Hao Peng, Bing Sun, Haiqin Weng, Liu Yan, and Jianwei Yin
    arXiv preprint arXiv:2602.05234, 2026
  2. Scalable Multi-Stage Influence Function for Large Language Models via Eigenvalue-Corrected Kronecker-Factored Parameterization
    Yuntai Bao, Xuhong Zhang, Tianyu Du, Xinkui Zhao, Jiang Zong, Hao Peng, and Jianwei Yin
    In Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence, IJCAI-25, Aug 2025
    Main Track
  3. Probing the Geometry of Truth: Consistency and Generalization of Truth Directions in LLMs Across Logical Transformations and Question Answering Tasks
    Yuntai Bao, Xuhong Zhang, Tianyu Du, Xinkui Zhao, Zhengwen Feng, Hao Peng, and Jianwei Yin
    In Findings of the Association for Computational Linguistics: ACL 2025, Jul 2025