CV English / Chinese

Contact Information

Name	Yuntai Bao
Professional Title	PhD student
Email	baoyuntai@outlook.com
Location	School of Software Technology, Zhejiang University, Ningbo, Zhejiang Province 315000

Professional Summary

I am committed to achieving effective and efficient model control via mechanistic interpretability.

Education

2023 - 2028

Zhejiang, China
Ph.D.

School of Software Technology, Zhejiang University

Artificial Intelligence
2019 - 2023

Zhejiang, China
Bachelor of Engineering

College of Computer Science and Technology, Zhejiang University

Information Security
- Top 25% of the class (9/38).

Publications

2026

Faithful Bi-Directional Model Steering via Distribution Matching and Distributed Interchange Interventions

ICLR 2026

This paper introduces Concept Distributed Alignment Search (CDAS), a steering method that employs a distribution matching objective and distributed interchange interventions to faithfully manipulate internal concept features without overfitting to external preferences. CDAS achieves stable bi-directional control—effectively overriding safety refusals and neutralizing backdoors—while preserving general model utility.
2025

Scalable Multi-Stage Influence Function for Large Language Models via Eigenvalue-Corrected Kronecker-Factored Parameterization

IJCAI 2025

This paper introduces a scalable multi-stage influence function that attributes the predictions of fine-tuned LLMs back to their pretraining data, and this approach efficiently scales to billion-parameter models.
2025

Probing the Geometry of Truth: Consistency and Generalization of Truth Directions in LLMs Across Logical Transformations and Question Answering Tasks

Findings of ACL 2025

This paper investigates the internal representation of truth in LLMs, revealing that consistent “truth directions” emerge primarily in capable models and generalize effectively across logical transformations and diverse question-answering tasks. The truthfulness probes can be practically applied to selective question answering, improving task accuracy by filtering out incorrect model outputs.

Skills

Programming languages: Python, C/C++

Languages

Chinese : Native speaker

English : Fluent

Interests

Mechanistic interpretability: causal variable localization, circuit analysis

Representation steering: steering vector