CV English / Chinese
Contact Information
| Name | Yuntai Bao |
| Professional Title | PhD student |
| baoyuntai@outlook.com | |
| Location | School of Software Technology, Zhejiang University, Ningbo, Zhejiang Province 315000 |
Professional Summary
I am committed to achieving effective and efficient model control via mechanistic interpretability.
Education
-
2023 - 2028 Zhejiang, China
Ph.D.
School of Software Technology, Zhejiang University
Artificial Intelligence
-
2019 - 2023 Zhejiang, China
Bachelor of Engineering
College of Computer Science and Technology, Zhejiang University
Information Security
- Top 25% of the class (9/38).
Publications
-
2026 Faithful Bi-Directional Model Steering via Distribution Matching and Distributed Interchange Interventions
ICLR 2026
This paper introduces Concept Distributed Alignment Search (CDAS), a steering method that employs a distribution matching objective and distributed interchange interventions to faithfully manipulate internal concept features without overfitting to external preferences. CDAS achieves stable bi-directional control—effectively overriding safety refusals and neutralizing backdoors—while preserving general model utility.
-
2025 Scalable Multi-Stage Influence Function for Large Language Models via Eigenvalue-Corrected Kronecker-Factored Parameterization
IJCAI 2025
This paper introduces a scalable multi-stage influence function that attributes the predictions of fine-tuned LLMs back to their pretraining data, and this approach efficiently scales to billion-parameter models.
-
2025 Probing the Geometry of Truth: Consistency and Generalization of Truth Directions in LLMs Across Logical Transformations and Question Answering Tasks
Findings of ACL 2025
This paper investigates the internal representation of truth in LLMs, revealing that consistent “truth directions” emerge primarily in capable models and generalize effectively across logical transformations and diverse question-answering tasks. The truthfulness probes can be practically applied to selective question answering, improving task accuracy by filtering out incorrect model outputs.